KEMBAR78
gh-135532: optimize calls to `PyMem_Malloc` in SHAKE digest computation by picnixz · Pull Request #135744 · python/cpython · GitHub
Skip to content

Conversation

@picnixz
Copy link
Member

@picnixz picnixz commented Jun 20, 2025

@picnixz
Copy link
Member Author

picnixz commented Jun 20, 2025

!buildbot FIPS only

@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @picnixz for commit de09602 🤖

Results will be shown at:

https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F135744%2Fmerge

The command will test the builders whose names match following regular expression: FIPS only

The builders matched are:

  • AMD64 RHEL8 FIPS Only Blake2 Builtin Hash PR
  • AMD64 CentOS9 FIPS Only Blake2 Builtin Hash PR

@picnixz picnixz force-pushed the refactor/sha3/digest-computation-135532 branch from de09602 to 7852a46 Compare June 20, 2025 09:27
@picnixz picnixz added the type-refactor Code refactoring (with no changes in behavior) label Jun 20, 2025
@picnixz
Copy link
Member Author

picnixz commented Jun 20, 2025

!buildbot FIPS only

@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @picnixz for commit 7852a46 🤖

Results will be shown at:

https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F135744%2Fmerge

The command will test the builders whose names match following regular expression: FIPS only

The builders matched are:

  • AMD64 RHEL8 FIPS Only Blake2 Builtin Hash PR
  • AMD64 CentOS9 FIPS Only Blake2 Builtin Hash PR

@picnixz picnixz marked this pull request as draft June 20, 2025 12:02
@picnixz
Copy link
Member Author

picnixz commented Jun 20, 2025

I should have thought a bit more but I don't know why we have a hard limit at 2**29. It looks like we're reserving 4 bits for the output length, just in case, but we should be able to use a digest of arbitrary length. There was a possible of overflows in 3.6 but this was fixed, and since then, as we're using OpenSSL, we don't have that issue anymore. And HACL* only requires the length to be an uint32_t.

So, I think we should limit ourselves to uint32_t in both cases, even when using OpenSSL. It doesn't make sense to have a digest of size 2**32. SHAKE-128 is 128-bit secure with 32 bytes while SHAKE-256 is 256-bit secure with 64 bytes for the output length.

Now, for OpenSSL's default provider, we have:

int EVP_DigestFinalXOF(EVP_MD_CTX *ctx, unsigned char *md, size_t size)
{
    int ret = 0;
    OSSL_PARAM params[2];
    size_t i = 0;

    if (ctx->digest == NULL) {
        ERR_raise(ERR_LIB_EVP, EVP_R_INVALID_NULL_ALGORITHM);
        return 0;
    }

    if (ctx->digest->prov == NULL)
        goto legacy;

    if (ctx->digest->dfinal == NULL) {
        ERR_raise(ERR_LIB_EVP, EVP_R_FINAL_ERROR);
        return 0;
    }

    if ((ctx->flags & EVP_MD_CTX_FLAG_FINALISED) != 0) {
        ERR_raise(ERR_LIB_EVP, EVP_R_FINAL_ERROR);
        return 0;
    }

    /*
     * For backward compatibility we pass the XOFLEN via a param here so that
     * older providers can use the supplied value. Ideally we should have just
     * used the size passed into ctx->digest->dfinal().
     */
    params[i++] = OSSL_PARAM_construct_size_t(OSSL_DIGEST_PARAM_XOFLEN, &size);
    params[i++] = OSSL_PARAM_construct_end();

    if (EVP_MD_CTX_set_params(ctx, params) >= 0)
        ret = ctx->digest->dfinal(ctx->algctx, md, &size, size);

    ctx->flags |= EVP_MD_CTX_FLAG_FINALISED;

    return ret;

legacy:
    if (EVP_MD_xof(ctx->digest)
        && size <= INT_MAX
        && ctx->digest->md_ctrl(ctx, EVP_MD_CTRL_XOF_LEN, (int)size, NULL)) {
        ret = ctx->digest->final(ctx, md);
        if (ctx->digest->cleanup != NULL) {
            ctx->digest->cleanup(ctx);
            EVP_MD_CTX_set_flags(ctx, EVP_MD_CTX_FLAG_CLEANED);
        }
        OPENSSL_cleanse(ctx->md_data, ctx->digest->ctx_size);
    } else {
        ERR_raise(ERR_LIB_EVP, EVP_R_NOT_XOF_OR_INVALID_LENGTH);
    }

    return ret;
}

So with a NULL provider, we expect size <= INT_MAX, even if it fits on a SIZE_T. On 32-bit systems, it's (1<<31). So the limit of 2**29 is arbitrary IMO and should be replaced by UINT32_MAX.

  • With openssl legacy provider 16-bit platform: not supported
  • with openssl modern provider 32/64-bit platforms: size_t supported, should be at least a uint32_t for Python.
  • with HACL*: only uint32_t is supported

Thus, we can just require the digest length to fit on an uint32_t.

@picnixz picnixz force-pushed the refactor/sha3/digest-computation-135532 branch 2 times, most recently from b69e8e4 to f885e8c Compare June 20, 2025 13:24
@picnixz picnixz marked this pull request as ready for review June 20, 2025 13:25
@picnixz picnixz force-pushed the refactor/sha3/digest-computation-135532 branch 2 times, most recently from 4c5de6e to b26da82 Compare June 20, 2025 13:40
@picnixz picnixz marked this pull request as draft June 20, 2025 14:08
@picnixz picnixz force-pushed the refactor/sha3/digest-computation-135532 branch 3 times, most recently from 3b49ff6 to 5a83082 Compare June 20, 2025 15:49
@picnixz

This comment was marked as resolved.

@picnixz picnixz force-pushed the refactor/sha3/digest-computation-135532 branch from 5a83082 to fc54897 Compare June 21, 2025 10:44
@picnixz picnixz marked this pull request as ready for review June 21, 2025 11:00
@picnixz
Copy link
Member Author

picnixz commented Jun 21, 2025

!buildbot FIPS only

@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @picnixz for commit fc54897 🤖

Results will be shown at:

https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F135744%2Fmerge

The command will test the builders whose names match following regular expression: FIPS only

The builders matched are:

  • AMD64 RHEL8 FIPS Only Blake2 Builtin Hash PR
  • AMD64 CentOS9 FIPS Only Blake2 Builtin Hash PR

@picnixz picnixz merged commit d08b4b2 into python:main Jun 21, 2025
43 checks passed
@picnixz picnixz deleted the refactor/sha3/digest-computation-135532 branch June 21, 2025 12:32
Pranjal095 pushed a commit to Pranjal095/cpython that referenced this pull request Jul 12, 2025
…putation (python#135744)

- Add a fast path when the digest length is 0 to avoid calling useless functions.
- Directly allocate via `PyBytes_FromStringAndSize(NULL, length)` when possible.
taegyunkim pushed a commit to taegyunkim/cpython that referenced this pull request Aug 4, 2025
…putation (python#135744)

- Add a fast path when the digest length is 0 to avoid calling useless functions.
- Directly allocate via `PyBytes_FromStringAndSize(NULL, length)` when possible.
Agent-Hellboy pushed a commit to Agent-Hellboy/cpython that referenced this pull request Aug 19, 2025
…putation (python#135744)

- Add a fast path when the digest length is 0 to avoid calling useless functions.
- Directly allocate via `PyBytes_FromStringAndSize(NULL, length)` when possible.
}

CHECK_HACL_UINT32_T_LENGTH(length);
PyObject *digest = PyBytes_FromStringAndSize(NULL, length);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should check for error (NULL) to detect memory allocation failure. I wrote #138923 which fix indirectly the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

skip news type-refactor Code refactoring (with no changes in behavior)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants