KEMBAR78
Fix MXFP4 quantizer validation to allow CPU inference with dequantize option by returnL · Pull Request #39953 · huggingface/transformers · GitHub
Skip to content

Conversation

@returnL
Copy link
Contributor

@returnL returnL commented Aug 6, 2025

What does this PR do?

This PR fixes a bug that prevented MXFP4 models from running on CPU when quantization_config.dequantize=True was set.

Problem

The validation logic in Mxfp4HfQuantizer checked CUDA availability before checking the dequantize flag, causing failures on CPU-only environments even when dequantization was enabled.

Solution

Reordered validation checks to prioritize dequantize configuration:

  1. Check if dequantize is enabled - if yes, skip GPU validations
  2. Only then check CUDA availability

Changes Made

  • Fix: Moved dequantize check before CUDA validation in quantizer_mxfp4.py
  • Tests: Added test cases to verify CPU inference with dequantize=True

Before submitting

  • Did you read the contributor guideline?
  • Did you write any new necessary tests?

Who can review?

@SunMarc @MekkCyber

returnL added 3 commits August 6, 2025 17:05
Move dequantize check before CUDA availability check to allow
CPU inference when quantization_config.dequantize is True.
This enables users to run MXFP4 models on CPU by automatically
converting them to BF16 format.
@github-actions
Copy link
Contributor

github-actions bot commented Aug 6, 2025

[For maintainers] Suggested jobs to run (before merge)

run-slow: mxfp4

Copy link
Contributor

@MekkCyber MekkCyber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! thanks for fixing this and thanks for adding tests 🤗

Copy link
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks !

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@MekkCyber MekkCyber merged commit dd70a8c into huggingface:main Aug 6, 2025
24 checks passed
@returnL returnL deleted the fix/mxfp4-cpu-dequantize-validation branch August 6, 2025 16:19
@SunMarc SunMarc added the for patch Tag issues / labels that should be included in the next patch label Aug 6, 2025
ArthurZucker pushed a commit that referenced this pull request Aug 13, 2025
… option (#39953)

* Fix MXFP4 quantizer validation to enable CPU dequantization

Move dequantize check before CUDA availability check to allow
CPU inference when quantization_config.dequantize is True.
This enables users to run MXFP4 models on CPU by automatically
converting them to BF16 format.

* Add tests for MXFP4 quantizer CPU dequantization validation

* fix: format mxfp4 test file with ruff
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

for patch Tag issues / labels that should be included in the next patch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants