-
-
Notifications
You must be signed in to change notification settings - Fork 33.2k
Description
Bug report
When encoding a null-terminated string in shift_jisx0213
, the null-terminator sometimes gets truncated. To add a null-terminator when encoding, I usually use (string + "\0").encode(encoding)
which works with most encodings. However, this doesn't seem to be the case here.
Instead, I'm using string.encode(encoding) + "\0".encode(encoding)
as a workaround to create the correct result. However, this won't produce the correct result for utf-16
, because the BOM would be included twice.
Consider the following sample script to check this for yourself.
strings: list[str] = [
"hello world",
"バルーンフルーツ",
"バルーンフィッシュ",
"ライフアップキノコ"
]
encoding = "shift_jisx0213"
for string in strings:
encoded_direct_null = (string + "\0").encode(encoding)
encoded_append_null = string.encode(encoding) + "\0".encode(encoding)
print(repr(string))
print(" - encoded_append_null (EXPECTED!):", encoded_append_null.hex())
print(" - encoded_direct_null: ", encoded_direct_null.hex())
print()
This generates the following results. As you can see, the two results are not the same and in the second and fourth examples, the null-terminator has been removed for some reason. I've tried this with utf-8
and shift_jis
as well, but these yield the correct results.
'hello world'
- encoded_append_null (EXPECTED!): 68656c6c6f20776f726c6400
- encoded_direct_null: 68656c6c6f20776f726c6400
'バルーンフルーツ'
- encoded_append_null (EXPECTED!): 836f838b815b83938374838b815b836300
- encoded_direct_null: 836f838b815b83938374838b815b8363
'バルーンフィッシュ'
- encoded_append_null (EXPECTED!): 836f838b815b83938374834283628356838500
- encoded_direct_null: 836f838b815b83938374834283628356838500
'ライフアップキノコ'
- encoded_append_null (EXPECTED!): 838983438374834183628376834c836d835200
- encoded_direct_null: 838983438374834183628376834c836d8352
Your environment
- Python: Python 3.10.7
- OS: Windows 10 Home
Linked PRs
Metadata
Metadata
Assignees
Labels
Projects
Status