KEMBAR78
Fix SVD forward-mode AD multiplication priority by redwrasse · Pull Request #161027 · pytorch/pytorch · GitHub
Skip to content

Conversation

@redwrasse
Copy link
Contributor

Multiplication order priority for the SVD JVP appears to have been the opposite of the optimal one.

Results from a crude CPU benchmark on my laptop for random matrices of various ratios:

  Performance Results Table

  | Test Case                        | Matrix Size | Aspect Ratio | Before JVP (ms) | After JVP (ms) | Change (ms) | % Change | Status              |
  |----------------------------------|-------------|--------------|-----------------|----------------|-------------|----------|---------------------|
  | Tall matrix (10:1 ratio)         | 1000×100    | 10:1 tall    | 3.13            | 3.24           | +0.11       | -3.5%    | ❌ Regression        |
  | Tall matrix (10:1 ratio, larger) | 2000×200    | 10:1 tall    | 15.72           | 14.66          | -1.06       | +6.7%    | ✅ Improvement       |
  | Tall matrix (10:1 ratio, large)  | 5000×500    | 10:1 tall    | 105.97          | 101.84         | -4.13       | +3.9%    | ✅ Improvement       |
  | Wide matrix (1:10 ratio)         | 100×1000    | 1:10 wide    | 5.90            | 4.64           | -1.26       | +21.4%   | ✅ Major Improvement |
  | Wide matrix (1:10 ratio, larger) | 200×2000    | 1:10 wide    | 18.29           | 17.78          | -0.51       | +2.8%    | ✅ Improvement       |
  | Wide matrix (1:10 ratio, large)  | 500×5000    | 1:10 wide    | 137.40          | 128.70         | -8.70       | +6.3%    | ✅ Improvement       |
  | Square matrix (baseline)         | 1000×1000   | 1:1 square   | 116.16          | 106.09         | -10.07      | +8.7%    | ✅ Improvement       |
  | Square matrix (larger baseline)  | 2000×2000   | 1:1 square   | 714.30          | 673.23         | -41.07      | +5.7%    | ✅ Improvement       |

@pytorch-bot
Copy link

pytorch-bot bot commented Aug 20, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161027

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 8b78279 with merge base 3c8c509 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Signed-off-by: redwrasse <mail@redwrasse.io>
@redwrasse redwrasse force-pushed the redwrasse/svd-jvp-mm branch from 0cb10f2 to 8b78279 Compare August 20, 2025 02:07
@soulitzer soulitzer added the release notes: autograd release notes category label Aug 20, 2025
@pytorch pytorch deleted a comment from github-actions bot Aug 20, 2025
Copy link
Contributor

@soulitzer soulitzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@soulitzer
Copy link
Contributor

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 20, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

markc-614 pushed a commit to markc-614/pytorch that referenced this pull request Sep 17, 2025
Multiplication order priority for the SVD JVP appears to have been the opposite of the optimal one.

Results from a crude CPU benchmark on my laptop for random matrices of various ratios:

```
  Performance Results Table

  | Test Case                        | Matrix Size | Aspect Ratio | Before JVP (ms) | After JVP (ms) | Change (ms) | % Change | Status              |
  |----------------------------------|-------------|--------------|-----------------|----------------|-------------|----------|---------------------|
  | Tall matrix (10:1 ratio)         | 1000×100    | 10:1 tall    | 3.13            | 3.24           | +0.11       | -3.5%    | ❌ Regression        |
  | Tall matrix (10:1 ratio, larger) | 2000×200    | 10:1 tall    | 15.72           | 14.66          | -1.06       | +6.7%    | ✅ Improvement       |
  | Tall matrix (10:1 ratio, large)  | 5000×500    | 10:1 tall    | 105.97          | 101.84         | -4.13       | +3.9%    | ✅ Improvement       |
  | Wide matrix (1:10 ratio)         | 100×1000    | 1:10 wide    | 5.90            | 4.64           | -1.26       | +21.4%   | ✅ Major Improvement |
  | Wide matrix (1:10 ratio, larger) | 200×2000    | 1:10 wide    | 18.29           | 17.78          | -0.51       | +2.8%    | ✅ Improvement       |
  | Wide matrix (1:10 ratio, large)  | 500×5000    | 1:10 wide    | 137.40          | 128.70         | -8.70       | +6.3%    | ✅ Improvement       |
  | Square matrix (baseline)         | 1000×1000   | 1:1 square   | 116.16          | 106.09         | -10.07      | +8.7%    | ✅ Improvement       |
  | Square matrix (larger baseline)  | 2000×2000   | 1:1 square   | 714.30          | 673.23         | -41.07      | +5.7%    | ✅ Improvement       |

```

Pull Request resolved: pytorch#161027
Approved by: https://github.com/soulitzer
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged open source release notes: autograd release notes category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants