KEMBAR78
Move schedulers to training from modules. by krammnic · Pull Request #1801 · meta-pytorch/torchtune · GitHub
Skip to content

Conversation

@krammnic
Copy link
Collaborator

Context

What is the purpose of this PR? Is it to

  • add a new feature
  • fix a bug
  • update tests and/or documentation
  • other (please add here)

Please link to any issues this PR addresses.

Changelog

What are the changes made in this PR?
#1587

Test plan

Please make sure to do each of the following if applicable to your PR. If you're unsure about any one of these just ask and we will happily help. We also have a contributing page for some guidance on contributing.

  • run pre-commit hooks and linters (make sure you've first installed via pre-commit install)
  • add unit tests for any new functionality
  • update docstrings for any new or updated methods or classes
  • run unit tests via pytest tests
  • run recipe tests via pytest tests -m integration_test
  • manually run any new or modified recipes with sufficient proof of correctness
  • include relevant commands and any other artifacts in this summary (pastes of loss curves, eval results, etc.)

UX

If your function changed a public API, please add a dummy example of what the user experience will look like when calling it.
Here is a docstring example
and a tutorial example

  • I did not change any public API
  • I have added an example to docs or docstrings

@pytorch-bot
Copy link

pytorch-bot bot commented Oct 10, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1801

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit a3e2ffe with merge base 5de5001 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 10, 2024
@krammnic
Copy link
Collaborator Author

@joecummings Require review

@krammnic
Copy link
Collaborator Author

Should include backward capability, no merge right now!

@krammnic
Copy link
Collaborator Author

@ebsmothers Fixed backward capability

Copy link
Contributor

@ebsmothers ebsmothers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also update the usage here and here?

create_optim_in_bwd_wrapper
register_optim_in_bwd_hooks

.. _metric_logging_label:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably need to update this (otherwise it's duplicated below)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@krammnic in response to your question

Suggested change
.. _metric_logging_label:
.. _lr_scheduler_label:

Copy link
Contributor

@ebsmothers ebsmothers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh actually I think the new version under torchtune/training/lr_schedulers.py is missing from the PR.

Also separately if you are running into issues with your local pre-commit not lining up with the CI in lint job, you can mimic these steps in our CI job more directly on your local machine to try to get the same setup (though I'd think that the usual flow in our CONTRIBUTING.md around pre-commit install should be sufficient)

@krammnic
Copy link
Collaborator Author

Yeah, will do some fix soon. I have broken something with my commits

@krammnic
Copy link
Collaborator Author

Oh actually I think the new version under torchtune/training/lr_schedulers.py is missing from the PR.

Also separately if you are running into issues with your local pre-commit not lining up with the CI in lint job, you can mimic these steps in our CI job more directly on your local machine to try to get the same setup (though I'd think that the usual flow in our CONTRIBUTING.md around pre-commit install should be sufficient)

Fixed. Yeah, probably I will mimic CI using instructions that you provided, because the way that is provided in CONTRIBUTING.md is pretty inconvenient for me.

@ebsmothers
Copy link
Contributor

Oh actually I think the new version under torchtune/training/lr_schedulers.py is missing from the PR.
Also separately if you are running into issues with your local pre-commit not lining up with the CI in lint job, you can mimic these steps in our CI job more directly on your local machine to try to get the same setup (though I'd think that the usual flow in our CONTRIBUTING.md around pre-commit install should be sufficient)

Fixed. Yeah, probably I will mimic CI using instructions that you provided, because the way that is provided in CONTRIBUTING.md is pretty inconvenient for me.

Sounds good. By the way if CONTRIBUTING.md is not currently clear or giving the right setup you need, we should update it. We set it up a while back and many devs don't use it much day-to-day. But as someone onboarding to the project recently if there are certain areas where you're feeling friction let us know and we can hopefully make the process smoother.

@krammnic
Copy link
Collaborator Author

@ebsmothers Which label to put for schedulers section in docs? Will fix lint now and probably except this point it will be ready to merge.

@ebsmothers
Copy link
Contributor

@ebsmothers Which label to put for schedulers section in docs? Will fix lint now and probably except this point it will be ready to merge.

Updated my inline comment here with the suggestion

@krammnic
Copy link
Collaborator Author

Tag is fixed.

@krammnic
Copy link
Collaborator Author

Yeah, docs is built. Now let me fix lint and it will be ready to merge

@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 69.01%. Comparing base (54673b7) to head (0269dd0).
Report is 13 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1801      +/-   ##
==========================================
+ Coverage   67.05%   69.01%   +1.95%     
==========================================
  Files         305      306       +1     
  Lines       15937    16024      +87     
==========================================
+ Hits        10687    11059     +372     
+ Misses       5250     4965     -285     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@krammnic krammnic changed the title [WIP] Move schedulers to training from modules. Move schedulers to training from modules. Oct 13, 2024
@krammnic
Copy link
Collaborator Author

Last lint fix. Should be fine right now


get_cosine_schedule_with_warmup

.. lr_scheduler_label:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should swap with metric_logging_label above, and should have an underscore before: .. _lr_scheduler_label:. otherwise it gets interpreted as a comment (sphinx, I know)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

.. _metric_logging_label:

Schedulers
-----------------
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the underline needs to match the length of the "Schedulers" title

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@krammnic
Copy link
Collaborator Author

Finnaly! Can be merged...

@krammnic
Copy link
Collaborator Author

@RdoubleA As you already reviewed this maybe we can merge?


get_cosine_schedule_with_warmup

.. _lr_scheduler_label:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this right? Seems to me like lr scheduler label and metric logging label need to be swapped

Copy link
Collaborator Author

@krammnic krammnic Oct 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you speaking about 2 sections?(swap them) I'm not sure why you might want to swap specific labels

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently the _lr_scheduler_label is just above the "Metric Logging" section, while the _metric_logging_label is just above the "Schedulers" section. I believe it should be the other way around. (I don't care too much whether we swap labels or section orders, but one of them needs to be swapped. Otherwise labels are pointing to the wrong sections, no?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I seen now what was your idea, fixed!

@joecummings
Copy link
Member

@RdoubleA and/or @ebsmothers can one of you sign off on this?

@ebsmothers
Copy link
Contributor

@RdoubleA and/or @ebsmothers can one of you sign off on this?

@joecummings I have one more comment on the file api_ref_training.rst. After that I am good to merge this

Copy link
Contributor

@ebsmothers ebsmothers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making this change! Once CI is green I think we're good to merge here

@ebsmothers ebsmothers merged commit 78ceee6 into meta-pytorch:main Oct 14, 2024
17 checks passed
mori360 pushed a commit to mori360/torchtune that referenced this pull request Oct 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants