KEMBAR78
gh-101828: Fix `jisx0213` codecs removing null characters by StanFromIreland · Pull Request #139340 · python/cpython · GitHub
Skip to content

Conversation

@StanFromIreland
Copy link
Member

@StanFromIreland StanFromIreland commented Sep 25, 2025

@corona10
Copy link
Member

@StanFromIreland Thank you for the investigation. Let me take a look at this weekend :)

@corona10 corona10 added needs backport to 3.13 bugs and security fixes needs backport to 3.14 bugs and security fixes labels Sep 28, 2025
Copy link
Member

@corona10 corona10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
This patch prevents NULL bytes from being consumed as part of character pair encoding in the string. While this fixes the data loss bug, it does change existing behavior, so backporting needs discussion.

cc @methane

@corona10
Copy link
Member

corona10 commented Oct 4, 2025

I will wait Naoki san for a week and plan to merge this PR.

@methane
Copy link
Member

methane commented Oct 6, 2025

euc_jis_2004 has same logic. would you update it too?

>>> "\u00e6abc".encode('euc_jis_2004')
b'\xa9\xdcabc'
>>> "\u00e6\0abc".encode('euc_jis_2004')
b'\xa9\xdcabc'

@StanFromIreland
Copy link
Member Author

euc_jis_2004 has same logic. would you update it too?

Is done.

@methane methane changed the title gh-101828: Fix shift_jisx0213 & shift_jis_2004 codecs removing null characters gh-101828: Fix jisx0213 codecs removing null characters Oct 7, 2025
@methane
Copy link
Member

methane commented Oct 7, 2025

iso2022_jp_3 and iso2022_jp_2004 have same issue.
Would you add this patch?

diff --git a/Modules/cjkcodecs/_codecs_iso2022.c b/Modules/cjkcodecs/_codecs_iso2022.c
index ef6faeb7127..83afdd0a1ee 100644
--- a/Modules/cjkcodecs/_codecs_iso2022.c
+++ b/Modules/cjkcodecs/_codecs_iso2022.c
@@ -802,10 +802,12 @@ jisx0213_encoder(const MultibyteCodec *codec, const Py_UCS4 *data,
         return coded;

     case 2: /* second character of unicode pair */
-        coded = find_pairencmap((ucs2_t)data[0], (ucs2_t)data[1],
-                                jisx0213_pair_encmap, JISX0213_ENCPAIRS);
-        if (coded != DBCINV)
-            return coded;
+        if (data[1] != 0) { /* Don't consume null char as part of pair */
+            coded = find_pairencmap((ucs2_t)data[0], (ucs2_t)data[1],
+                                    jisx0213_pair_encmap, JISX0213_ENCPAIRS);
+            if (coded != DBCINV)
+                return coded;
+        }
         _Py_FALLTHROUGH;

@corona10 corona10 merged commit 87eadce into python:main Oct 14, 2025
45 checks passed
@miss-islington-app
Copy link

Thanks @StanFromIreland for the PR, and @corona10 for merging it 🌮🎉.. I'm working now to backport this PR to: 3.13, 3.14.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Oct 14, 2025
…ongh-139340)

(cherry picked from commit 87eadce)

Co-authored-by: Stan Ulbrych <89152624+StanFromIreland@users.noreply.github.com>
@miss-islington-app
Copy link

Sorry, @StanFromIreland and @corona10, I could not cleanly backport this to 3.13 due to a conflict.
Please backport using cherry_picker on command line.

cherry_picker 87eadce3e0309d80a95e85d70a00028b5dca9907 3.13

@bedevere-app
Copy link

bedevere-app bot commented Oct 14, 2025

GH-140110 is a backport of this pull request to the 3.14 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.14 bugs and security fixes label Oct 14, 2025
@StanFromIreland StanFromIreland deleted the shift_jis branch October 14, 2025 14:03
@StanFromIreland
Copy link
Member Author

I can backport.

StanFromIreland added a commit to StanFromIreland/cpython that referenced this pull request Oct 14, 2025
pythongh-139340)

(cherry picked from commit 87eadce)

Co-authored-by: Stan Ulbrych <89152624+StanFromIreland@users.noreply.github.com>
@bedevere-app
Copy link

bedevere-app bot commented Oct 14, 2025

GH-140112 is a backport of this pull request to the 3.13 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.13 bugs and security fixes label Oct 14, 2025
corona10 pushed a commit that referenced this pull request Oct 14, 2025
…139340) (gh-140110)

gh-101828: Fix `jisx0213` codecs removing null characters (gh-139340)
(cherry picked from commit 87eadce)

Co-authored-by: Stan Ulbrych <89152624+StanFromIreland@users.noreply.github.com>
corona10 pushed a commit that referenced this pull request Oct 14, 2025
…139340) (gh-140112)

* [3.13] gh-101828: Fix `jisx0213` codecs removing null characters (gh-139340)
(cherry picked from commit 87eadce)

Co-authored-by: Stan Ulbrych <89152624+StanFromIreland@users.noreply.github.com>

* Accidentally removed line
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants