KEMBAR78
bpo-13153: Use OS native encoding for converting between Python and Tcl. by serhiy-storchaka · Pull Request #16545 · python/cpython · GitHub
Skip to content

Conversation

@serhiy-storchaka
Copy link
Member

@serhiy-storchaka serhiy-storchaka commented Oct 2, 2019

On Windows use UTF-16 (or UTF-32 for 32-bit Tcl_UniChar) with the
"surrogatepass" error handler for converting to/from Tcl Unicode objects.

On Linux use UTF-8 with the "surrogateescape" error handler for converting
to/from Tcl String objects.

Converting strings from Tcl to Python and back now never fails
(except MemoryError).

https://bugs.python.org/issue13153

On Windows use UTF-16 (or UTF-32 for 32-bit Tcl_UniChar) with the
"surrogatepass" error handler for converting to/from Tcl Unicode objects.

On Linux use UTF-8 with the "surrogateescape" error handler for converting
to/from Tcl String objects.

Converting strings from Tcl to Python and back now never fails
(except MemoryError).
@serhiy-storchaka serhiy-storchaka changed the title bpo-22214: Use OS native encoding for converting between Python and Tcl. bpo-13153: Use OS native encoding for converting between Python and Tcl. Oct 2, 2019
@taleinat
Copy link
Contributor

taleinat commented Oct 2, 2019

Works on Windows 10!

Copy link
Member

@terryjreedy terryjreedy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works better than I expected. Pasting an astral char not only does not raise and exit, but the actual char is displayed if the font + OS extension supports it, otherwise a replacement box. I see the computer emoji below.

>>> print('💻', '\U0001f4bb')  # 💻 was pasted.
💻 💻

It also fixes user printing of astral chars (bpo 2274), either as the char or replacement char, and tracebacks with astral chars (bpo 36698). I presume it will fix display of file names and contents with astral chars, and will test later.

There is also a problem that I did not expect. Editing code past astral chars, on the same line, is discombobulated. For me, on Windows, the insert cursor | is displayed two chars to the left of where it should be for each astral char it follows on the same line. For instance, to change the f in '\U00|01f4bb', position the | cursor as shown, hit DEL, and the replacement. Backspace and replacement will not work correctly. Chars immediately past an astral cannot be edited at all. This is better than IDLE closing, but if, as I suspect, we cannot change this, the IDLE doc should mention that astral literals disable proper editing on the remainder of the physical line.

@taleinat
Copy link
Contributor

taleinat commented Oct 3, 2019

This does indeed seem to work very well, and solve many issues simultaneously.

The remaining issue, mentioned by @terryjreedy, appears to be entirely internal to Tk. The next version of Tk is supposed to have greatly improved support for Unicode, so hopefully that would help.

In the meantime, yes, let's get this in with the added warning in the docs.

@serhiy-storchaka serhiy-storchaka merged commit 06cb94b into python:master Oct 4, 2019
@serhiy-storchaka serhiy-storchaka deleted the tkinter-unicode branch October 4, 2019 10:09
@miss-islington
Copy link
Contributor

Thanks @serhiy-storchaka for the PR 🌮🎉.. I'm working now to backport this PR to: 3.7, 3.8.
🐍🍒⛏🤖

@bedevere-bot
Copy link

GH-16580 is a backport of this pull request to the 3.8 branch.

@bedevere-bot
Copy link

GH-16581 is a backport of this pull request to the 3.7 branch.

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Oct 4, 2019
…cl. (pythonGH-16545)

On Windows use UTF-16 (or UTF-32 for 32-bit Tcl_UniChar) with the
"surrogatepass" error handler for converting to/from Tcl Unicode objects.

On Linux use UTF-8 with the "surrogateescape" error handler for converting
to/from Tcl String objects.

Converting strings from Tcl to Python and back now never fails
(except MemoryError).
(cherry picked from commit 06cb94b)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
miss-islington added a commit that referenced this pull request Oct 4, 2019
…cl. (GH-16545)

On Windows use UTF-16 (or UTF-32 for 32-bit Tcl_UniChar) with the
"surrogatepass" error handler for converting to/from Tcl Unicode objects.

On Linux use UTF-8 with the "surrogateescape" error handler for converting
to/from Tcl String objects.

Converting strings from Tcl to Python and back now never fails
(except MemoryError).
(cherry picked from commit 06cb94b)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
miss-islington added a commit that referenced this pull request Oct 4, 2019
…cl. (GH-16545)

On Windows use UTF-16 (or UTF-32 for 32-bit Tcl_UniChar) with the
"surrogatepass" error handler for converting to/from Tcl Unicode objects.

On Linux use UTF-8 with the "surrogateescape" error handler for converting
to/from Tcl String objects.

Converting strings from Tcl to Python and back now never fails
(except MemoryError).
(cherry picked from commit 06cb94b)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
jacobneiltaylor pushed a commit to jacobneiltaylor/cpython that referenced this pull request Dec 5, 2019
…cl. (pythonGH-16545)

On Windows use UTF-16 (or UTF-32 for 32-bit Tcl_UniChar) with the
"surrogatepass" error handler for converting to/from Tcl Unicode objects.

On Linux use UTF-8 with the "surrogateescape" error handler for converting
to/from Tcl String objects.

Converting strings from Tcl to Python and back now never fails
(except MemoryError).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type-bug An unexpected behavior, bug, or error

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants