Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Context
What is the purpose of this PR? Is it to
Issue first reported in #1874 found that eval fails for llama3.2 vision 11B. This issue found was that the VisionCrossAttentionMask was padding the masks to 4 tiles during inference while at the same time the padded_collate_tiled_images_and_mask function was assuming that the masks weren't padded and inferring incorrect shape information.
The solution is to remove any inference time padding logic from the mask transform and pass pad_max_tiles=4 to the collate function during inference and eval to let the collate function handle all the padding.
During this investigation I found that padded_collate_tiled_images_and_mask was using "image_seq_len" variable from the last definition in a loop, meaning that if there were multiple images with different sizes the variable would be wrong at the end. Updated this as well.
Changelog
What are the changes made in this PR?
Test plan
Please make sure to do each of the following if applicable to your PR. If you're unsure about any one of these just ask and we will happily help. We also have a contributing page for some guidance on contributing.
pre-commit install)pytest testspytest tests -m integration_testRan as usual
tune run dev/generate_v2 --config llama3_2_vision/generation_v2
Ran as usual
tune run full_finetune_single_device --config llama2_2_vision/11B_full_single_device
Fixed
tune run eleuther_eval --config llama3_2_vision/evaluation