KEMBAR78
Fix int4 quantized model cannot work with cpu by yuanwu2017 · Pull Request #39724 · huggingface/transformers · GitHub
Skip to content

Conversation

@yuanwu2017
Copy link
Contributor

@yuanwu2017 yuanwu2017 commented Jul 28, 2025

What does this PR do?

Currently the CPU can support int4 quantized model, should not prevent its execution.

import torch
from transformers import TorchAoConfig, AutoModelForCausalLM, AutoTokenizer
from torchao.quantization import Int4WeightOnlyConfig
from torchao.dtypes import Int4CPULayout

quant_config = Int4WeightOnlyConfig(group_size=32, layout=Int4CPULayout())
quantization_config = TorchAoConfig(quant_type=quant_config)

# Load and quantize the model
quantized_model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3.1-8B-Instruct",
    torch_dtype="auto",
    device_map="cpu",
    quantization_config=quantization_config
)
# save the quantized model
output_dir = "llama-3-8b-torchao-int8"
#print(f"quantized_model:{quantized_model}")
quantized_model.save_pretrained(output_dir, safe_serialization=False)
# reload the quantized model
reloaded_model = AutoModelForCausalLM.from_pretrained(
    output_dir,
    device_map="auto",
    torch_dtype=torch.bfloat16
)
#print(f"reloaded_model:{reloaded_model}")
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-8B-Instruct")
input_text = "What are we having for dinner?"
input_ids = tokenizer(input_text, return_tensors="pt")

# Move input to the same device as the model
if hasattr(reloaded_model, 'device'):
    input_ids = input_ids.to(reloaded_model.device)
elif next(reloaded_model.parameters()).device != torch.device('cpu'):
    input_ids = input_ids.to(next(reloaded_model.parameters()).device)

output = reloaded_model.generate(**input_ids, max_new_tokens=10)
print(tokenizer.decode(output[0], skip_special_tokens=True))

Fix the following issue:

image

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Signed-off-by: yuanwu <yuan.wu@intel.com>
@yuanwu2017 yuanwu2017 marked this pull request as draft July 28, 2025 09:10
Signed-off-by: yuanwu <yuan.wu@intel.com>
@Rocketknight1
Copy link
Member

cc @SunMarc @MekkCyber

Copy link
Contributor

@MekkCyber MekkCyber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes definitely ! thanks for catching this

@yuanwu2017 yuanwu2017 marked this pull request as ready for review July 29, 2025 03:39
@MekkCyber MekkCyber requested a review from SunMarc July 29, 2025 07:47
Copy link
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a nit

yuanwu2017 and others added 2 commits August 3, 2025 04:56
Signed-off-by: yuanwu <yuan.wu@intel.com>
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

yuanwu2017 and others added 2 commits August 6, 2025 01:31
Signed-off-by: yuanwu <yuan.wu@intel.com>
@SunMarc SunMarc enabled auto-merge (squash) August 6, 2025 13:02
@SunMarc SunMarc merged commit bf1bd6a into huggingface:main Aug 7, 2025
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants