KEMBAR78
[PyTorch] Add efficient isnan for NEON half by swolchok · Pull Request #139083 · pytorch/pytorch · GitHub
Skip to content

Conversation

Same as the efficient one for float when f16 hardware support is available.

Differential Revision: [D65003321](https://our.internmc.facebook.com/intern/diff/D65003321/)

[ghstack-poisoned]
@pytorch-bot pytorch-bot bot added the module: cpu CPU specific problem (e.g., perf, algorithm) label Oct 28, 2024
@pytorch-bot
Copy link

pytorch-bot bot commented Oct 28, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/139083

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 2d1d999 with merge base 86602a6 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D65003321

Same as the efficient one for float when f16 hardware support is available.

Differential Revision: [D65003321](https://our.internmc.facebook.com/intern/diff/D65003321/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D65003321

Same as the efficient one for float when f16 hardware support is available.

Differential Revision: [D65003321](https://our.internmc.facebook.com/intern/diff/D65003321/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D65003321

Same as the efficient one for float when f16 hardware support is available.

Differential Revision: [D65003321](https://our.internmc.facebook.com/intern/diff/D65003321/)

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D65003321

Same as the efficient one for float when f16 hardware support is available.

Differential Revision: [D65003321](https://our.internmc.facebook.com/intern/diff/D65003321/)

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D65003321

@swolchok
Copy link
Contributor Author

I have fixes for this but don't want to re-kick CI on ready-to-go diffs below it in the stack...

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 31, 2024
Same as the efficient one for float when f16 hardware support is available.

Differential Revision: [D65003321](https://our.internmc.facebook.com/intern/diff/D65003321/)

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D65003321

Same as the efficient one for float when f16 hardware support is available.

Testing: Added exhaustive isnan test coverage

Differential Revision: [D65003321](https://our.internmc.facebook.com/intern/diff/D65003321/)

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D65003321

Same as the efficient one for float when f16 hardware support is available.

Testing: Added exhaustive isnan test coverage

Differential Revision: [D65003321](https://our.internmc.facebook.com/intern/diff/D65003321/)

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D65003321

@malfet
Copy link
Contributor

malfet commented Nov 1, 2024

@pytorchbot merge -f "Lint + builds + relevant tests are green"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pytorchmergebot pushed a commit that referenced this pull request Nov 1, 2024
This is the first big milestone we've been building towards!
(Following rev also hooks this up to actual gemv.)
Testing: To check perf, I ran python torchchat.py generate stories110M
--dtype fp16 --device cpu on an x86 machine without AVX512FP16. Observed roughly 5x tokens/sec increase.
Differential Revision: [D64280688](https://our.internmc.facebook.com/intern/diff/D64280688/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D64280688/)!
Pull Request resolved: #137918
Approved by: https://github.com/malfet
ghstack dependencies: #139082, #139083
pytorchmergebot pushed a commit that referenced this pull request Nov 1, 2024
…rchitectures (#138005)

Following up on previous rev to use fp16_gemv_trans in gemv, not just gemm-used-for-gemv.

Differential Revision: [D64351092](https://our.internmc.facebook.com/intern/diff/D64351092/)
Pull Request resolved: #138005
Approved by: https://github.com/malfet
ghstack dependencies: #139082, #139083, #137918
pytorchmergebot pushed a commit that referenced this pull request Nov 1, 2024
No real reason to have the zero-beta restriction, so let's lift it.

Testing: intentionally broke new paths locally to verify test coverage existed

Differential Revision: [D64407752](https://our.internmc.facebook.com/intern/diff/D64407752/)

Pull Request resolved: #138275
Approved by: https://github.com/malfet
ghstack dependencies: #139082, #139083, #137918, #138005
rahulsingh-intel pushed a commit to rahulsingh-intel/pytorch that referenced this pull request Nov 5, 2024
Same as the efficient one for float when f16 hardware support is available.

Testing: Added exhaustive isnan test coverage

Differential Revision: [D65003321](https://our.internmc.facebook.com/intern/diff/D65003321/)

Pull Request resolved: pytorch#139083
Approved by: https://github.com/malfet
ghstack dependencies: pytorch#139082
rahulsingh-intel pushed a commit to rahulsingh-intel/pytorch that referenced this pull request Nov 5, 2024
This is the first big milestone we've been building towards!
(Following rev also hooks this up to actual gemv.)
Testing: To check perf, I ran python torchchat.py generate stories110M
--dtype fp16 --device cpu on an x86 machine without AVX512FP16. Observed roughly 5x tokens/sec increase.
Differential Revision: [D64280688](https://our.internmc.facebook.com/intern/diff/D64280688/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D64280688/)!
Pull Request resolved: pytorch#137918
Approved by: https://github.com/malfet
ghstack dependencies: pytorch#139082, pytorch#139083
rahulsingh-intel pushed a commit to rahulsingh-intel/pytorch that referenced this pull request Nov 5, 2024
…rchitectures (pytorch#138005)

Following up on previous rev to use fp16_gemv_trans in gemv, not just gemm-used-for-gemv.

Differential Revision: [D64351092](https://our.internmc.facebook.com/intern/diff/D64351092/)
Pull Request resolved: pytorch#138005
Approved by: https://github.com/malfet
ghstack dependencies: pytorch#139082, pytorch#139083, pytorch#137918
rahulsingh-intel pushed a commit to rahulsingh-intel/pytorch that referenced this pull request Nov 5, 2024
No real reason to have the zero-beta restriction, so let's lift it.

Testing: intentionally broke new paths locally to verify test coverage existed

Differential Revision: [D64407752](https://our.internmc.facebook.com/intern/diff/D64407752/)

Pull Request resolved: pytorch#138275
Approved by: https://github.com/malfet
ghstack dependencies: pytorch#139082, pytorch#139083, pytorch#137918, pytorch#138005
@github-actions github-actions bot deleted the gh/swolchok/679/head branch December 2, 2024 02:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/linux-aarch64 linux aarch64 CI workflow ciflow/trunk Trigger trunk jobs on your pull request fb-exported Merged module: cpu CPU specific problem (e.g., perf, algorithm)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants