-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Add option in data loader for out of order data #141833
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option in data loader for out of order data #141833
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/141833
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit f8a09cc with merge base 30d907c ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this feature! After a first pass it looks fine to me, just a couple of small changes/suggestions
Thanks for the quick review @andrewkho! I've made the changes you've suggested. |
torch/utils/data/dataloader.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One potential issue here that occurred to me: since work is continuously being distributed to workers in round-robin fashion, we can hit a scenario where one slow or blocked worker can end up holding all the tasks, blocking other tasks from being scheduled.
As a follow up, when in_order is False, we should also distribute work to workers to keep them balanced since reproducibility is already given up in these scenarios.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep! That issue occurs currently with in_order=True I think (out of order tasks are buffered up instead of being given out)?
Would like this added as a warning, and fixed in a later PR, or fixed as part of this PR?
My guess would be to track the number of tasks given to each worker and check here if it's < the prefetch factor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd be fine to do in a separate PR, given this is default-off, although I think we'd like it to land together in the next release. Are you up to do a separate PR for that? We can brainstorm the best way to get it working reliably
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep I'm very happy to do the follow up PR!
|
Please wait for CI to pass and then land |
|
@pytorchbot merge -r |
|
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
|
Successfully rebased |
8370dae to
f8a09cc
Compare
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 3 mandatory check(s) failed. The first few are: Dig deeper by viewing the failures on hud |
|
Ah sorry @andrewkho, I completely forgot that running merge with |
|
Hi @andrewkho - small bump on this. Would you be able to approve the workflows/CI? |
|
@michael-diggin sorry for delay, just kicked off the workflows, thanks for your patience! |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Fixes pytorch#105203 Facing a similar problem to the linked issue, where variable sized input data can mean that a handful of slow to process samples holds up smaller and faster to process samples from being used. This also leads to lower GPU utilization as well. In certain cases, e.g. evaluation epochs, inference pipelines or other cases where reproducibility isn't important, this can bring significant speed ups. This PR adds an `allow_out_of_order` bool input to the `DataLoader` class, defaulting to `false`, which when set to `true` will returning data from workers in whatever order they are ready/processed in, rather in the strict index order. Instead of storing data that was returned out of order, it is passed directly to the main thread and the entry in `_task_info` is deleted. The main changes are they to check that an entry in `_task_info` does exist, and only increasing `self._rcvd_idx` when the lowest index remaining gets returned. Two tests are added to test this for iterable type datasets and index type datasets. Pull Request resolved: pytorch#141833 Approved by: https://github.com/andrewkho
Fixes #105203 and is a follow up PR to #141833 When `in_order` is True (the default), tasks are given out to workers in a round robin fashion. When `in_order` is False this is no longer needed, as we give up guarantees of reproducibility, and instead tasks should be given to workers that are able to perform work. In this PR I've added tracking of the number of outstanding tasks for each worker (updated when tasks are added to their queue, and when data is returned to the main thread). When finding the next queue to add a task to, if `in_order` is False it will only add the task to the workers queue if it has fewer than `_prefetch_factor` tasks outstanding. The current default behaviour is left as is. Tests are also updated to assert on the worker IDs for each sample of data returned. I've run the following to confirm they aren't flaky ```bash for i in {1..20}; do python test/test_dataloader.py TestOutOfOrderDataLoader; done ``` Pull Request resolved: #142324 Approved by: https://github.com/andrewkho
Fixes #105203
Facing a similar problem to the linked issue, where variable sized input data can mean that a handful of slow to process samples holds up smaller and faster to process samples from being used. This also leads to lower GPU utilization as well. In certain cases, e.g. evaluation epochs, inference pipelines or other cases where reproducibility isn't important, this can bring significant speed ups.
This PR adds an
allow_out_of_orderbool input to theDataLoaderclass, defaulting tofalse, which when set totruewill returning data from workers in whatever order they are ready/processed in, rather in the strict index order.Instead of storing data that was returned out of order, it is passed directly to the main thread and the entry in
_task_infois deleted. The main changes are they to check that an entry in_task_infodoes exist, and only increasingself._rcvd_idxwhen the lowest index remaining gets returned.Two tests are added to test this for iterable type datasets and index type datasets.
cc @andrewkho @divyanshk @ssnl @VitalyFedyunin @dzhulgakov