KEMBAR78
[DCP][state_dict][doc] Update the distributed state_dict document by fegin · Pull Request #121290 · pytorch/pytorch · GitHub
Skip to content

Conversation

@fegin
Copy link
Contributor

@fegin fegin commented Mar 6, 2024

@pytorch-bot
Copy link

pytorch-bot bot commented Mar 6, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/121290

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit ade614a with merge base 34a28f0 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

fegin added a commit that referenced this pull request Mar 6, 2024
As title

ghstack-source-id: 3bd9224
Pull Request resolved: #121290
Copy link
Contributor

@LucasLLC LucasLLC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this, great explanation. One small nit


To tackle these challenges, we offer a collection of APIs for users to easily manage state_dicts. `get_model_state_dict` returns a model state dictionary with keys consistent with those returned by the unparallelized model state dictionary. Similarly, `get_optimizer_state_dict` provides the optimizer state dictionary with keys uniform across all parallelisms applied. To achieve this consistency, `get_optimizer_state_dict` converts parameter IDs to fully qualified names identical to those found in the unparallelized model state dictionary.

Note that results returned by hese APIs can be used directly with the `save()` and `load()` methods without requiring any additional conversions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we clarify here that this is distributed.checkpoint.save/load vs torch?

@fegin fegin changed the title [DCP][state_dict] Update the distributed state_dict document [DCP][state_dict][doc] Update the distributed state_dict document Mar 8, 2024
…ocument"

As title

cc LucasLLC

[ghstack-poisoned]
fegin added a commit that referenced this pull request Mar 8, 2024
As title

ghstack-source-id: 5faa6a1
Pull Request resolved: #121290
@fegin
Copy link
Contributor Author

fegin commented Mar 8, 2024

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Mar 8, 2024
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pytorchmergebot pushed a commit that referenced this pull request Mar 12, 2024
Root may not exist due to FSDP lazy initialization.

Pull Request resolved: #121544
Approved by: https://github.com/Skylion007
ghstack dependencies: #121273, #121276, #121290
@github-actions github-actions bot deleted the gh/fegin/216/head branch April 8, 2024 01:51
mvpatel2000 pushed a commit to mvpatel2000/pytorch that referenced this pull request May 17, 2024
Root may not exist due to FSDP lazy initialization.

Pull Request resolved: pytorch#121544
Approved by: https://github.com/Skylion007
ghstack dependencies: pytorch#121273, pytorch#121276, pytorch#121290
atalman pushed a commit that referenced this pull request May 27, 2024
Root may not exist due to FSDP lazy initialization.

Pull Request resolved: #121544
Approved by: https://github.com/Skylion007
ghstack dependencies: #121273, #121276, #121290

Co-authored-by: Chien-Chin Huang <chienchin@fb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants