[CUDA graphs] hotfix for test_graph_grad_scaling on windows #64339
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Graphed workloads that try to capture a full backward pass must do warmup on a non-default stream. If warmup happens on the default stream, AccumulateGrad functions might tag themselves to run on the default stream, and therefore won't be capturable.
@ngimel and I suspect on windows, some test_cuda.py tests run with the default stream as the ambient stream, which breaks
test_graph_grad_scalingbecausetest_graph_grad_scalingdoes warmup on the ambient stream (it assumes the ambient stream is a non-default stream).This PR explicitly sets a side stream for the warmup in
test_graph_grad_scaling, which is what I should have done all along because it's what the new documentation recommends.I pushed the PR branch straight to the main pytorch repo in case we need to run ci-all on it (I'm not sure what the requirements are these days).