KEMBAR78
Fix non FA2 tests after FA2 installed in CI docker image by ydshieh · Pull Request #40430 · huggingface/transformers · GitHub
Skip to content

Conversation

@ydshieh
Copy link
Collaborator

@ydshieh ydshieh commented Aug 25, 2025

What does this PR do?

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@ydshieh ydshieh marked this pull request as ready for review August 25, 2025 15:07
model = Glm4vForConditionalGeneration.from_pretrained(
"THUDM/GLM-4.1V-9B-Thinking", dtype=torch.float16, device_map="auto"
)
questions = ["Describe this video."] * 2
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't use batch 2, otherwise OOM. It's fine to simply to test batch 1 here, the goal is to check if video works

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, we have the same for images at

@slow
def test_small_model_integration_test(self):
model = Glm4vForConditionalGeneration.from_pretrained(
"THUDM/GLM-4.1V-9B-Thinking", dtype="auto", device_map="auto"
)
inputs = self.processor.apply_chat_template(
self.message, tokenize=True, add_generation_prompt=True, return_dict=True, return_tensors="pt"
)
expected_input_ids = [151331, 151333, 151336, 198, 151339, 151343, 151343, 151343, 151343, 151343, 151343, 151343, 151343, 151343, 151343, 151343, 151343] # fmt: skip
assert expected_input_ids == inputs.input_ids[0].tolist()[:17]
expected_pixel_slice = torch.tensor(
[
[-0.0988, -0.0842, -0.0842],
[-0.5660, -0.5514, -0.4200],
[-0.0259, -0.0259, -0.0259],
[-0.1280, -0.0988, -0.2010],
[-0.4638, -0.5806, -0.6974],
[-1.2083, -1.2229, -1.2083],
],
dtype=torch.float32,
device="cpu",
)
assert torch.allclose(expected_pixel_slice, inputs.pixel_values[:6, :3], atol=3e-3)
# verify generation
inputs = inputs.to(torch_device)
output = model.generate(**inputs, max_new_tokens=30)
EXPECTED_DECODED_TEXT = "\nWhat kind of dog is this?\n<think>Got it, let's look at the image. The animal in the picture is not a dog; it's a cat. Specifically, it looks"
self.assertEqual(
self.processor.decode(output[0], skip_special_tokens=True),
EXPECTED_DECODED_TEXT,
)
@slow
def test_small_model_integration_test_batch(self):
model = Glm4vForConditionalGeneration.from_pretrained(
"THUDM/GLM-4.1V-9B-Thinking", dtype="auto", device_map="auto"
)
batch_messages = [self.message] * 2
inputs = self.processor.apply_chat_template(
batch_messages, tokenize=True, add_generation_prompt=True, return_dict=True, return_tensors="pt"
).to(torch_device)
# it should not matter whether two images are the same size or not
output = model.generate(**inputs, max_new_tokens=30)
EXPECTED_DECODED_TEXT = [
"\nWhat kind of dog is this?\n<think>Got it, let's look at the image. The animal in the picture is not a dog; it's a cat. Specifically, it looks",
"\nWhat kind of dog is this?\n<think>Got it, let's look at the image. The animal in the picture is not a dog; it's a cat. Specifically, it looks"
] # fmt: skip
self.assertEqual(
self.processor.batch_decode(output, skip_special_tokens=True),
EXPECTED_DECODED_TEXT,
)

Would it be possible to have something similar for videos? Fine with this tho as well, just not sure if batching x videos could have some different issues than batching x image

Copy link
Collaborator Author

@ydshieh ydshieh Aug 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will just leave as it is. Don't have enough bandwidth. It never pass anyway.

model_id = "mistralai/Mistral-7B-v0.1"
EXPECTED_COMPLETIONS = [
"This is a nice place. This is a nice place. This is a nice place. This is",
"scenery, scenery, scenery, scenery, scenery,",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

due to the 800 --> 682 below

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Repetition goes wild 😆

Comment on lines 355 to 356
if attn_implementation in ["flex_attention", "eager"]:
input_text = input_text[:1]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eager still OOM with 682, just make it batch size 1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small comment to add ^

@pytest.mark.flash_attn_test
def test_model_600m_long_prompt(self):
EXPECTED_OUTPUT_TOKEN_IDS = [306, 338]
EXPECTED_OUTPUT_TOKEN_IDS = [198, 198]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

never run before. The sdpa version of this test already use this 198

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe move to same test and parametrize instead? Looks like another parity check between sdpa and flash that was hidden

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

look at the 2 tests, one is doing more work than another and loading is also different (4-bit or not). I will simply keep as they are

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ic, makes sense didnt look in detail myself ^^

@ydshieh ydshieh requested a review from vasqu August 25, 2025 15:25
Copy link
Contributor

@vasqu vasqu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall! Just some smaller things and good to see that we now run these hidden tests instead

model = Glm4vForConditionalGeneration.from_pretrained(
"THUDM/GLM-4.1V-9B-Thinking", dtype=torch.float16, device_map="auto"
)
questions = ["Describe this video."] * 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, we have the same for images at

@slow
def test_small_model_integration_test(self):
model = Glm4vForConditionalGeneration.from_pretrained(
"THUDM/GLM-4.1V-9B-Thinking", dtype="auto", device_map="auto"
)
inputs = self.processor.apply_chat_template(
self.message, tokenize=True, add_generation_prompt=True, return_dict=True, return_tensors="pt"
)
expected_input_ids = [151331, 151333, 151336, 198, 151339, 151343, 151343, 151343, 151343, 151343, 151343, 151343, 151343, 151343, 151343, 151343, 151343] # fmt: skip
assert expected_input_ids == inputs.input_ids[0].tolist()[:17]
expected_pixel_slice = torch.tensor(
[
[-0.0988, -0.0842, -0.0842],
[-0.5660, -0.5514, -0.4200],
[-0.0259, -0.0259, -0.0259],
[-0.1280, -0.0988, -0.2010],
[-0.4638, -0.5806, -0.6974],
[-1.2083, -1.2229, -1.2083],
],
dtype=torch.float32,
device="cpu",
)
assert torch.allclose(expected_pixel_slice, inputs.pixel_values[:6, :3], atol=3e-3)
# verify generation
inputs = inputs.to(torch_device)
output = model.generate(**inputs, max_new_tokens=30)
EXPECTED_DECODED_TEXT = "\nWhat kind of dog is this?\n<think>Got it, let's look at the image. The animal in the picture is not a dog; it's a cat. Specifically, it looks"
self.assertEqual(
self.processor.decode(output[0], skip_special_tokens=True),
EXPECTED_DECODED_TEXT,
)
@slow
def test_small_model_integration_test_batch(self):
model = Glm4vForConditionalGeneration.from_pretrained(
"THUDM/GLM-4.1V-9B-Thinking", dtype="auto", device_map="auto"
)
batch_messages = [self.message] * 2
inputs = self.processor.apply_chat_template(
batch_messages, tokenize=True, add_generation_prompt=True, return_dict=True, return_tensors="pt"
).to(torch_device)
# it should not matter whether two images are the same size or not
output = model.generate(**inputs, max_new_tokens=30)
EXPECTED_DECODED_TEXT = [
"\nWhat kind of dog is this?\n<think>Got it, let's look at the image. The animal in the picture is not a dog; it's a cat. Specifically, it looks",
"\nWhat kind of dog is this?\n<think>Got it, let's look at the image. The animal in the picture is not a dog; it's a cat. Specifically, it looks"
] # fmt: skip
self.assertEqual(
self.processor.batch_decode(output, skip_special_tokens=True),
EXPECTED_DECODED_TEXT,
)

Would it be possible to have something similar for videos? Fine with this tho as well, just not sure if batching x videos could have some different issues than batching x image

model_id = "mistralai/Mistral-7B-v0.1"
EXPECTED_COMPLETIONS = [
"This is a nice place. This is a nice place. This is a nice place. This is",
"scenery, scenery, scenery, scenery, scenery,",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Repetition goes wild 😆

Comment on lines 355 to 356
if attn_implementation in ["flex_attention", "eager"]:
input_text = input_text[:1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small comment to add ^

@pytest.mark.flash_attn_test
def test_model_600m_long_prompt(self):
EXPECTED_OUTPUT_TOKEN_IDS = [306, 338]
EXPECTED_OUTPUT_TOKEN_IDS = [198, 198]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe move to same test and parametrize instead? Looks like another parity check between sdpa and flash that was hidden

@github-actions
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: glm4v, mistral, qwen2, qwen3

@ydshieh ydshieh merged commit 922e65b into main Aug 26, 2025
19 checks passed
@ydshieh ydshieh deleted the fix_non_fa2_tests_after_fa2_in_images branch August 26, 2025 08:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants