Fix int4 quantized model cannot work with cpu #39724

yuanwu2017 · 2025-07-28T08:53:51Z

What does this PR do?

Currently the CPU can support int4 quantized model, should not prevent its execution.

import torch
from transformers import TorchAoConfig, AutoModelForCausalLM, AutoTokenizer
from torchao.quantization import Int4WeightOnlyConfig
from torchao.dtypes import Int4CPULayout

quant_config = Int4WeightOnlyConfig(group_size=32, layout=Int4CPULayout())
quantization_config = TorchAoConfig(quant_type=quant_config)

# Load and quantize the model
quantized_model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3.1-8B-Instruct",
    torch_dtype="auto",
    device_map="cpu",
    quantization_config=quantization_config
)
# save the quantized model
output_dir = "llama-3-8b-torchao-int8"
#print(f"quantized_model:{quantized_model}")
quantized_model.save_pretrained(output_dir, safe_serialization=False)
# reload the quantized model
reloaded_model = AutoModelForCausalLM.from_pretrained(
    output_dir,
    device_map="auto",
    torch_dtype=torch.bfloat16
)
#print(f"reloaded_model:{reloaded_model}")
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-8B-Instruct")
input_text = "What are we having for dinner?"
input_ids = tokenizer(input_text, return_tensors="pt")

# Move input to the same device as the model
if hasattr(reloaded_model, 'device'):
    input_ids = input_ids.to(reloaded_model.device)
elif next(reloaded_model.parameters()).device != torch.device('cpu'):
    input_ids = input_ids.to(next(reloaded_model.parameters()).device)

output = reloaded_model.generate(**input_ids, max_new_tokens=10)
print(tokenizer.decode(output[0], skip_special_tokens=True))

Fix the following issue:

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Signed-off-by: yuanwu <yuan.wu@intel.com>

Rocketknight1 · 2025-07-28T13:02:20Z

cc @SunMarc @MekkCyber

MekkCyber

Yes definitely ! thanks for catching this

SunMarc

LGTM, just a nit

src/transformers/quantizers/quantizer_torchao.py

Signed-off-by: yuanwu <yuan.wu@intel.com>

src/transformers/quantizers/quantizer_torchao.py

HuggingFaceDocBuilderDev · 2025-08-05T11:22:14Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Signed-off-by: yuanwu <yuan.wu@intel.com>

Fix int4 quantized model cannot work with cpu

bacea45

Signed-off-by: yuanwu <yuan.wu@intel.com>

yuanwu2017 marked this pull request as draft July 28, 2025 09:10

Update the comments

ba5b0a8

Signed-off-by: yuanwu <yuan.wu@intel.com>

MekkCyber approved these changes Jul 28, 2025

View reviewed changes

yuanwu2017 marked this pull request as ready for review July 29, 2025 03:39

Merge branch 'main' into int4

d9d4bf5

MekkCyber requested a review from SunMarc July 29, 2025 07:47

SunMarc reviewed Jul 29, 2025

View reviewed changes

src/transformers/quantizers/quantizer_torchao.py Outdated Show resolved Hide resolved

yuanwu2017 and others added 2 commits August 3, 2025 04:56

update

4f08b2e

Signed-off-by: yuanwu <yuan.wu@intel.com>

Merge branch 'main' into int4

77c8219

SunMarc reviewed Aug 5, 2025

View reviewed changes

src/transformers/quantizers/quantizer_torchao.py Outdated Show resolved Hide resolved

yuanwu2017 and others added 2 commits August 6, 2025 01:31

update

f119f53

Signed-off-by: yuanwu <yuan.wu@intel.com>

Merge branch 'main' into int4

fafd77d

SunMarc enabled auto-merge (squash) August 6, 2025 13:02

Merge branch 'main' into int4

1e067d9

SunMarc merged commit bf1bd6a into huggingface:main Aug 7, 2025
24 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix int4 quantized model cannot work with cpu #39724

Fix int4 quantized model cannot work with cpu #39724

Uh oh!

yuanwu2017 commented Jul 28, 2025 •

edited

Loading

Uh oh!

Rocketknight1 commented Jul 28, 2025

Uh oh!

MekkCyber left a comment

Uh oh!

SunMarc left a comment

Uh oh!

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Aug 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Fix int4 quantized model cannot work with cpu #39724

Fix int4 quantized model cannot work with cpu #39724

Uh oh!

Conversation

yuanwu2017 commented Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

Rocketknight1 commented Jul 28, 2025

Uh oh!

MekkCyber left a comment

Choose a reason for hiding this comment

Uh oh!

SunMarc left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Aug 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

yuanwu2017 commented Jul 28, 2025 •

edited

Loading