-
Notifications
You must be signed in to change notification settings - Fork 25.7k
C++ Frontend: adding two distributed samples (Random and Sequential) #16910
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments. Particularly interested in why DistributedSampler is not more general
18e07e3 to
37f601e
Compare
|
Fixed all review suggestions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM, but there's one significant flaw in the random sampler. Should be good to land once this is fixed.
|
Fixed the review comments and hopefully the windows build issue. |
450947e to
0b71056
Compare
|
Windows CI is still failing. |
|
Fix the review comments. The checkpointing mechanism is probably need to change once we have a checkpoint mechanism in the dataloader. I don't think we need to finalize it for this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@soumith is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Adding two distrbuted samplers, Random and Sequential to the mix. Similar to python counterpart, DistributedSampler introduces a new method
set_epoch(size_t epoch)which can be use to shuffle data determinstically between distributed processes.