KEMBAR78
Fixed incorrect normalization by audioXD · Pull Request #40436 · huggingface/transformers · GitHub
Skip to content

Conversation

@audioXD
Copy link
Contributor

@audioXD audioXD commented Aug 25, 2025

I've notices a possible typo in src/transformers/image_processing_utils_fast.py#compile_friendly_resize for uin8 the normalization is done slightly off with 256 instead of 255, which still works because its done consistenly (its normalized and denormalized the same way) incorrect.

image = image.float() / 256
image = image * 256

The exaplanation is simple:

255 (the max value for a uint8) should map to 1. but it doesn't with the current implementation

@Rocketknight1
Copy link
Member

cc @yonigozlan @qubvel

Copy link
Contributor

@qubvel qubvel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, makes sense 👍

@qubvel qubvel enabled auto-merge (squash) August 26, 2025 13:42
@qubvel qubvel merged commit 5a8ba87 into huggingface:main Aug 26, 2025
24 checks passed
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@yonigozlan
Copy link
Member

@remi-or I think you had used 256 on purpose? Can you check that this changes isn't breaking?

@remi-or
Copy link
Collaborator

remi-or commented Sep 2, 2025

@yonigozlan I think I tested it against torchvision.resize and parity test worked with 256. Maybe the snippet is in the PR?
edit: PR is here #38540

Comment on lines 310 to 311
image = torch.where(image > 255, 255, image)
image = torch.where(image < 0, 0, image)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, it might be more optimal to use torch.clamp(image, 0, 255) once instead of torch.where twice

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally I would agree! But this is on purpose : #38540

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, according to the PR it seems we have to revert this PR to use 256 and keep torch.where. To prevent this code from further regression we have to either add tests that fails on CI (cuda) if modified or properly comment the code

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regression this PR introduced already caused failures on the AMD CI, which is as important as NVIDIA (or cuda) CI!
As for properly commenting the code, both code paths where compile_friendly_resize is called are commented. You can check it out by expanding the diff, those lines are right above the function 🙂
If you want, we can add # this is to match torchvision.resize next to 256 and # We use torch.where instead of torch.clamp to avoid an error with torch.compile as comments to make sure no one will introduce the regression again. Wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@remi-or absolutely agree AMD CI is as important as NVIDIA, what I'm trying to say is that we need a test that fails in PR's CI to prevent merging this PR. In terms of comments, yeah, it's better to comment non-obvious code right in place, otherwise it looks like a typo and is easy to miss the comment located in a different part (and that's happened in this case).
I'll do a quick fix for this, thanks for jumping in and clarifying 🤗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants