-
-
Notifications
You must be signed in to change notification settings - Fork 33.2k
Fix bpo-36041: fix folding of quoted string in display_name violates RFC #12054
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
For display names that are longer than the policy's maximum line length, the folding process removes the quotes from the display name which may violate the structure of an address header if the content of the display name is specifically formed.
Since the _refold_parse_tree is context unaware, the function is unable to determine if we can directly encode the text into the header. In the language of RFC 5322, _refold_parse_tree is unaware if it is folding a structured header or an unstructured header. Since a quoted string will always be acceptable for both header types, when a quoted string needs to be refolded (because it would make the line overflow) prepare quoted children instead of the usual semantic children of the BareQuotedString.
Hello, and thanks for your contribution! I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA). Unfortunately we couldn't find an account corresponding to your GitHub username on bugs.python.org (b.p.o) to verify you have signed the CLA (this might be simply due to a missing "GitHub Name" entry in your b.p.o account settings). This is necessary for legal reasons before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue. You can check yourself to see if the CLA has been received. Thanks again for your contribution, we look forward to reviewing it! |
@SwampWalker put "Fix bug 36041: folding of quoted string in display_name violates RFC" in the PR title. It is good to add the issue title also. Not just the bpo number. |
I haven't tested it, but it seems to me the problem with this is that if you have something like |
I was just following the instructions in the PR prefill nanjekyejoannah, you might want to see those instructions updated. I've updated the title. |
Bitdancer, you have misunderstood what the fix does. It does not quote the children, it adds a DQUOTE terminal as the first and last child and escapes the intermediate children. The test I added shows that each individual child is not separately quoted. I had considered refactoring the quote_string() function to extract a separate escape_string() function so I wouldn't have to call quote_string()[1:-1] with the followup slicing to remove the quotes. The reason I did not is because that escaping is somewhat specific to quoted strings and making a escape_string_for_quoted_string() method seemed to add more boilerplate than what I thought the function would deserve. |
Ah, I see. You are right, I did not spend enough time understanding the code (I was trying to give you quick feedback and obviously I was too quick :) I will try to review this completely soon. but that sounds like the right solution. |
Does this pull request also happen to fix 40359? (Which is a ticket I created earlier today, about parsing long attachment file names.) |
@bitdancer is there any chance you can can be bothered do another review round on these changes? 🙃 if so, that would be great! 🙏 |
Fix bug 36041: don't unquote quoted strings during email header folding
The refolding process is unable to determine if it is folding an unstructured header (where unquoting and
unescaping the string would be allowed) or a structured header where the quotes are required. Therefore, do the safe thing and keep the quotes and escapes.
https://bugs.python.org/issue36041