KEMBAR78
bpo-46541: Replace core use of _Py_IDENTIFIER() with statically initialized global objects. by ericsnowcurrently · Pull Request #30928 · python/cpython · GitHub
Skip to content

Conversation

@ericsnowcurrently
Copy link
Member

@ericsnowcurrently ericsnowcurrently commented Jan 26, 2022

In this PR we're no longer using _Py_IDENTIFIER() (or _Py_static_string()) in any core CPython code. It is still used in a number of non-builtin stdlib modules.

The replacement is: PyUnicodeObject (not pointer) fields under _PyRuntimeState, statically initialized as part of _PyRuntime. A new _Py_GET_GLOBAL_IDENTIFIER() macro facilitates lookup of the fields (along with _Py_GET_GLOBAL_STRING() for non-identifier strings).

https://bugs.python.org/issue46541#msg411799 explains the rationale for this change.

The core of the change is in:

  • (new) Include/internal/pycore_global_strings.h - the declarations for the global strings, along with the macros
  • Include/internal/pycore_runtime_init.h - added the static initializers for the global strings
  • Include/internal/pycore_global_objects.h - where the struct in pycore_global_strings.h is hooked into _PyRuntimeState
  • Tools/scripts/generate_global_objects.py - added generation of the global string declarations and static initializers

I've also added a --check flag to generate_global_objects.py (along with make check-global-objects) to check for unused global strings. That check is added to the PR CI config.

The remainder of the PR updates the core code to use _Py_GET_GLOBAL_IDENTIFIER() instead of _Py_IDENTIFIER() and the related _Py*Id functions (likewise for _Py_GET_GLOBAL_STRING() instead of _Py_static_string()). This includes adding a few functions where there wasn't already an alternative to _Py*Id(), replacing the _Py_Identifier * parameter with PyObject *.

I'm planning on addressing the following separately:

  • stop using _Py_IDENTIFIER() in the stdlib modules
  • (maybe) get rid of _Py_IDENTIFIER(), etc. entirely -- this may not be doable as at least one package on PyPI using this (private) API
  • (maybe) intern the strings during runtime init

https://bugs.python.org/issue46541

@ericsnowcurrently ericsnowcurrently changed the title bpo-XXXXX: Replace _Py_IDENTIFIER() with statically initialized global objects. bpo-46541: Replace _Py_IDENTIFIER() with statically initialized global objects. Jan 26, 2022
ericsnowcurrently added a commit that referenced this pull request Jan 27, 2022
This change is a prerequisite for generating code for other global objects (like strings in gh-30928).

(We borrowed some code from Tools/scripts/deepfreeze.py.)

https://bugs.python.org/issue46541
@ericsnowcurrently ericsnowcurrently marked this pull request as ready for review February 1, 2022 01:06
@bedevere-bot
Copy link

When you're done making the requested changes, leave the comment: I have made the requested changes; please review again.

PyObject *
_PyDict_GetItemWithError(PyObject *dp, PyObject *kv)
{
assert(PyUnicode_CheckExact(kv));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a half-baked idea: would it be possible to initialize all hashes of global strings at interpreter startup, then eliminate some branching with a special

int
_PyDict_GetItemStrWithHashInitialized(PyDictObject *mp, PyUnicodeObject *key)
{
    assert(PyDict_CheckExact(mp));
    assert(PyUnicode_CheckExact(key));
    Py_hash_t hash = ((PyASCIIObject *)key)->hash;
    assert(hash != -1);
    // Inline _PyDict_GetItem_KnownHash --> _Py_dict_lookup
    PyDictKeysObject *dk = mp->ma_keys;
    Py_ssize_t ix = dictkeys_stringlookup(dk, key, hash);
    if (ix == DKIX_EMPTY) {
        return NULL;
    }
    else if (kind == DICT_KEYS_SPLIT) {
        return mp->ma_values->values[ix];
    }
    else {
        return DK_ENTRIES(dk)[ix].me_value;
    }
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ericsnowcurrently
Copy link
Member Author

@markshannon approved this offline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants