KEMBAR78
Qwen2.5 by calvinpelletier · Pull Request #1863 · meta-pytorch/torchtune · GitHub
Skip to content

Conversation

@calvinpelletier
Copy link
Contributor

@calvinpelletier calvinpelletier commented Oct 18, 2024

Context

What is the purpose of this PR? Is it to

  • add a new feature
  • fix a bug
  • update tests and/or documentation
  • other (please add here)

Issue #1624

Changelog

  • updated the list of special tokens
  • created a prompt template
  • added model builders
  • added a unit test for Qwen2.5 tokenization
  • moved special token logic from template to tokenizer
  • created separate qwen2/2.5 tokenizers
  • separated base and instruct model builders
  • created various configs

Test plan

Checked tokenizer and chat template parity with the current official Qwen2.5 implementation (huggingface).

Ran Qwen2.5 finetunes

Checklist

  • run pre-commit hooks and linters (make sure you've first installed via pre-commit install)
  • add unit tests for any new functionality
  • update docstrings for any new or updated methods or classes
  • run unit tests via pytest tests
  • run recipe tests via pytest tests -m integration_test
  • manually run any new or modified recipes with sufficient proof of correctness
  • include relevant commands and any other artifacts in this summary (see above for links to loss curves)

@pytorch-bot
Copy link

pytorch-bot bot commented Oct 18, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1863

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 9271c5c with merge base 09c2619 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 18, 2024
Copy link
Contributor

@pbontrager pbontrager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm interested to get more opinions here, but I think it would be clearer to have a fully separate 2.5 folder even if a few of the models have the exact same definitions. For the .5, 1.5, and 7B models you can directly have them call the qwen2 builders if you want. But this way we have separate tokenizers (without introducing the concept of tokenizer versions) and separate recipes.

Additionally, I want to see more information on how the new models were checked for accuracy and more configs for the 2.5 model.

@codecov-commenter
Copy link

codecov-commenter commented Oct 20, 2024

Codecov Report

Attention: Patch coverage is 12.08791% with 80 lines in your changes missing coverage. Please review.

Project coverage is 69.32%. Comparing base (c70ad29) to head (9595a46).
Report is 13 commits behind head on main.

Files with missing lines Patch % Lines
torchtune/models/qwen2_5/_model_builders.py 0.00% 41 Missing ⚠️
torchtune/models/qwen2_5/_prompt_template.py 0.00% 36 Missing ⚠️
torchtune/models/qwen2_5/__init__.py 0.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1863      +/-   ##
==========================================
+ Coverage   67.30%   69.32%   +2.02%     
==========================================
  Files         304      311       +7     
  Lines       16000    16227     +227     
==========================================
+ Hits        10768    11250     +482     
+ Misses       5232     4977     -255     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@fzyzcjy
Copy link

fzyzcjy commented Oct 22, 2024

Hi thank you for your great work!

I wonder whether it will support qwen2.5-math (instead of qwen2.5)?

@calvinpelletier
Copy link
Contributor Author

Hi thank you for your great work!

I wonder whether it will support qwen2.5-math (instead of qwen2.5)?

@fzyzcjy Yep! It's the same model just with different weights. So for example, run tune download Qwen/Qwen2.5-Math-7B-Instruct --output-dir /tmp/Qwen2_5-Math-7B-Instruct --ignore-patterns None then use the qwen2_5/7B_lora_single_device config with the checkpointer.checkpoint_dir updated to /tmp/Qwen2_5-Math-7B-Instruct.

For best results, you should preprocess your dataset to match the expected data format of Qwen2.5-Math. They recommend CoT or TIR reasoning, so a system prompt like "Please reason step by step, and put your final answer within \boxed{}."

Copy link
Contributor

@ebsmothers ebsmothers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a bunch of comments, also wanna +1 @pbontrager's comment here. We probably wanna at least have some E2E runs with eval results for a couple of different model sizes

def __call__(
self,
messages: List[Message],
inference: bool = False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is in the interface, but I don't see it used here (or in any of our other prompt templates). Can we remove it? @pbontrager

@calvinpelletier
Copy link
Contributor Author

oh @ebsmothers I just realized what happened, I was looking at the qwen2.5 instruct configs and you were looking at the base model configs. I just assumed they would be the same models with different params, since instruct should just be a finetune of the base model. I'll create separate model builders for the instruct and base models.

Copy link
Collaborator

@RdoubleA RdoubleA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks awesome, I just left a few minor comments. Will let @ebsmothers comment more on the model builders.

The extra text will still get tokenized as normal text, not as special tokens.
Default: None
errors (str): Paradigm to follow when decoding bytes to UTF-8. Defaults to "replace".
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would make this parameter name a bit more descriptive, like decode_error_handler?

@joecummings joecummings mentioned this pull request Oct 30, 2024
34 tasks
Copy link
Member

@joecummings joecummings left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're missing some default configs.

Copy link
Collaborator

@RdoubleA RdoubleA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's not too difficult, would be awesome if you can expose packed and compile in all the configs.

I'm also a bit confused on why some model builders have a base and instruct version and some don't. Would also be great if you could add to the docstring why the instruct is different from the base (seems like different seq len?)

Copy link
Collaborator

@RdoubleA RdoubleA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Legendary PR, thanks for adding this model and all its variants

By default, we set the cache size equals to size of the official Qwen2 tokenizer.
Example:
>>> tokenizer = Qwen2Tokenizer(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you update this example?

@RdoubleA RdoubleA merged commit 73d0821 into meta-pytorch:main Oct 31, 2024
17 checks passed
@calvinpelletier calvinpelletier deleted the qwen2.5 branch December 8, 2024 18:19
@ebsmothers ebsmothers mentioned this pull request Apr 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants