KEMBAR78
gh-126807: pygettext: Do not attempt to extract messages from function definitions. by tomasr8 · Pull Request #126808 · python/cpython · GitHub
Skip to content

Conversation

@tomasr8
Copy link
Member

@tomasr8 tomasr8 commented Nov 13, 2024

Fixes a bug where pygettext would attempt to extract a message from a code like this:

def _(x): pass

This is because pygettext only looks at one token at a time and _(x) looks like a function call.

However, since x is not a string literal, it would erroneously issue a warning.

This PR fixes that by keeping track of the previous token and checking if it's def or class.

Fixes a bug where pygettext would attempt
to extract a message from a code like this:

def _(x): pass

This is because pygettext only looks at one
token at a time and '_(x)' looks like a
function call.

However, since 'x' is not a string literal,
it would erroneously issue a warning.

This commit fixes that by keeping track
of the previous token and checking if it's
'def' or 'class'.
@@ -1,11 +1,10 @@
#! /usr/bin/env python3
# -*- coding: iso-8859-1 -*-
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no other file that uses this encoding, I think it's safe (and more practical) to use utf-8.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not related change, so please keep the coding cookie.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it! I'll revert :) Would you accept a separate (perhaps not backported) PR that removes the coding and the commented-out code or do you think it's not worth it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll accept it if there are pygettext tests for files with non-UTF-8 encoding.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, I'll add it to my todo list :)

Comment on lines 346 to 349
if (
ttype == tokenize.NAME and tstring in opts.keywords
and (not self.__prev_token or not _is_def_or_class_keyword(self.__prev_token))
):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new logic is, if we see one of the gettext keywords and the previous token is not def or class, only then we transition to __keywordseen.

@tomasr8 tomasr8 added needs backport to 3.12 only security fixes needs backport to 3.13 bugs and security fixes labels Nov 13, 2024
Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that no warnings are emitted if option --docstrings is used. I think that we can use a similar approach. We can add

            if ttype == tokenize.NAME and tstring in ('class', 'def'):
                self.__state = self.__ignorenext
                return

where __ignorenext simply sets self.__state = self.__waiting.

@@ -1,11 +1,10 @@
#! /usr/bin/env python3
# -*- coding: iso-8859-1 -*-
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not related change, so please keep the coding cookie.

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. 👍

@@ -1,11 +1,10 @@
#! /usr/bin/env python3
# -*- coding: iso-8859-1 -*-
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll accept it if there are pygettext tests for files with non-UTF-8 encoding.

@serhiy-storchaka serhiy-storchaka enabled auto-merge (squash) November 14, 2024 21:47
@serhiy-storchaka serhiy-storchaka merged commit 9a45638 into python:main Nov 14, 2024
40 checks passed
@miss-islington-app
Copy link

Thanks @tomasr8 for the PR, and @serhiy-storchaka for merging it 🌮🎉.. I'm working now to backport this PR to: 3.12, 3.13.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Nov 14, 2024
…unction definitions. (pythonGH-126808)

Fixes a bug where pygettext would attempt
to extract a message from a code like this:

def _(x): pass

This is because pygettext only looks at one
token at a time and '_(x)' looks like a
function call.

However, since 'x' is not a string literal,
it would erroneously issue a warning.
(cherry picked from commit 9a45638)

Co-authored-by: Tomas R. <tomas.roun8@gmail.com>
@bedevere-app
Copy link

bedevere-app bot commented Nov 14, 2024

GH-126846 is a backport of this pull request to the 3.13 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.13 bugs and security fixes label Nov 14, 2024
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Nov 14, 2024
…unction definitions. (pythonGH-126808)

Fixes a bug where pygettext would attempt
to extract a message from a code like this:

def _(x): pass

This is because pygettext only looks at one
token at a time and '_(x)' looks like a
function call.

However, since 'x' is not a string literal,
it would erroneously issue a warning.
(cherry picked from commit 9a45638)

Co-authored-by: Tomas R. <tomas.roun8@gmail.com>
@bedevere-app
Copy link

bedevere-app bot commented Nov 14, 2024

GH-126847 is a backport of this pull request to the 3.12 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.12 only security fixes label Nov 14, 2024
@tomasr8 tomasr8 deleted the pygettext-126807 branch November 14, 2024 22:19
serhiy-storchaka pushed a commit that referenced this pull request Nov 14, 2024
…function definitions. (GH-126808) (GH-126847)

Fixes a bug where pygettext would attempt
to extract a message from a code like this:

def _(x): pass

This is because pygettext only looks at one
token at a time and '_(x)' looks like a
function call.

However, since 'x' is not a string literal,
it would erroneously issue a warning.
(cherry picked from commit 9a45638)

Co-authored-by: Tomas R <tomas.roun8@gmail.com>
serhiy-storchaka pushed a commit that referenced this pull request Nov 14, 2024
…function definitions. (GH-126808) (GH-126846)

Fixes a bug where pygettext would attempt
to extract a message from a code like this:

def _(x): pass

This is because pygettext only looks at one
token at a time and '_(x)' looks like a
function call.

However, since 'x' is not a string literal,
it would erroneously issue a warning.
(cherry picked from commit 9a45638)

Co-authored-by: Tomas R <tomas.roun8@gmail.com>
picnixz pushed a commit to picnixz/cpython that referenced this pull request Dec 8, 2024
…unction definitions. (pythonGH-126808)

Fixes a bug where pygettext would attempt
to extract a message from a code like this:

def _(x): pass

This is because pygettext only looks at one
token at a time and '_(x)' looks like a
function call.

However, since 'x' is not a string literal,
it would erroneously issue a warning.
ebonnal pushed a commit to ebonnal/cpython that referenced this pull request Jan 12, 2025
…unction definitions. (pythonGH-126808)

Fixes a bug where pygettext would attempt
to extract a message from a code like this:

def _(x): pass

This is because pygettext only looks at one
token at a time and '_(x)' looks like a
function call.

However, since 'x' is not a string literal,
it would erroneously issue a warning.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants