KEMBAR78
Add default PyTorch seeding and worker_init_fn to DataLoader by ssnl · Pull Request #4018 · pytorch/pytorch · GitHub
Skip to content

Conversation

@ssnl
Copy link
Collaborator

@ssnl ssnl commented Dec 4, 2017

Fixes #3880 .

Copy link
Contributor

@apaszke apaszke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks good, but we should be careful with passing functions around. lambdas aren't picklable

_set_worker_signal_handlers()

torch.set_num_threads(1)
torch.manual_seed(torch.initial_seed() + id + 1)

This comment was marked as off-topic.

This comment was marked as off-topic.



def _worker_loop(dataset, index_queue, data_queue, collate_fn):
def _worker_loop(dataset, index_queue, data_queue, collate_fn, init_fn, id):

This comment was marked as off-topic.

args=(self.dataset, self.index_queue, self.worker_result_queue, self.collate_fn))
for _ in range(self.num_workers)]
args=(self.dataset, self.index_queue, self.worker_result_queue, self.collate_fn,
self.worker_init_fn, i))

This comment was marked as off-topic.

@netheril96
Copy link

Does this fix #3880 completely? As I understand it, at each epoch the worker processes are created fresh, so if the main process never calls random functions, torch.initial_seed() will always be the same in every worker at every epoch. In that case, the randomness is still restricted.

this value in :attr:`worker_init_fn`, which can be used to set other seeds
(e.g. NumPy) before data loading.
.. warning:: If ``spawn'' start method is used, :attr:`worker_init_fn` cannot be a lambda

This comment was marked as off-topic.

This comment was marked as off-topic.

Copy link
Contributor

@apaszke apaszke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but would be nice to have one more test.

self.done_event.set()
# if worker_manager_thread is waiting to put
while not self.data_queue.empty():
while self.data_queue is not None and not self.data_queue.empty():

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

this value in :attr:`worker_init_fn`, which can be used to set other seeds
(e.g. NumPy) before data loading.
.. warning:: If ``spawn'' start method is used, :attr:`worker_init_fn` cannot be a lambda

This comment was marked as off-topic.

worker_init_fn=init_fn)
for batch in dataloader:
self.assertEqual(12345, batch[0])
self.assertEqual(12345, batch[1])

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

@soumith soumith merged commit 5cc26c0 into pytorch:master Dec 18, 2017
@ssnl ssnl deleted the dl_seed branch December 18, 2017 16:23
@soumith soumith added the 0.3.1 label Feb 7, 2018
soumith pushed a commit that referenced this pull request Feb 7, 2018
* Add default PyTorch seeding and worker_init_fn to DataLoader

* generate seed using current RNG each time

* worker_seed <- main_proc_RNG_generated_seed + worker_id
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Reinitialize the random generator in worker processes of DataLoader

5 participants