KEMBAR78
add Half support for AdaptiveAvgPool2d and AdaptiveMaxPool2d on CPU by CaoE · Pull Request #102079 · pytorch/pytorch · GitHub
Skip to content

Conversation

@CaoE
Copy link
Collaborator

@CaoE CaoE commented May 23, 2023

Stack from ghstack (oldest at bottom):

Testing

Single core:

AdaptiveMaxPool2d:

shape fp32 forward / ms fp16 forward / ms bf16 forward / ms fp32 backward / ms fp16 backward / ms bf16 backward / ms
input size: (2, 56, 264, 264), output size: (100, 100) 71.5826 78.7460 85.7195 7.3925 6.0618 6.2596
input size: (2, 56, 264, 264), output size: (50, 50) 28.122 30.8572 36.6366 6.2645 3.4781 3.6628
input size: (32, 32, 100, 100), output size: (50, 50) 109.2978 115.0330 121.9500 13.4329 10.2769 12.1975
input size: (16, 4, 300, 300), output size: (100, 100) 34.1849 36.5876 40.9862 4.7719 4.3362 4.1417

28 cores:

AdaptiveMaxPool2d:

shape fp32 forward / ms fp16 forward / ms bf16 forward / ms fp32 backward / ms fp16 backward / ms bf16 backward / ms
input size: (2, 56, 264, 264), output size: (100, 100) 3.1809 3.5057 3.6728 0.6657 0.3138 0.2934
input size: (2, 56, 264, 264), output size: (50, 50) 1.2779 1.3869 1.5238 0.4223 0.1775 0.1825
input size: (32, 32, 100, 100), output size: (50, 50) 4.7942 4.9670 5.2330 1.7146 0.6477 0.7001
input size: (16, 4, 300, 300), output size: (100, 100) 1.9522 2.0879 2.3155 0.4370 0.3175 0.2828

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10

@CaoE CaoE requested review from mruberry and ngimel as code owners May 23, 2023 13:00
@pytorch-bot
Copy link

pytorch-bot bot commented May 23, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/102079

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit c585986 with merge base fb88760 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@github-actions github-actions bot added the module: cpu CPU specific problem (e.g., perf, algorithm) label May 23, 2023
CaoE added a commit that referenced this pull request May 23, 2023
@CaoE CaoE marked this pull request as draft May 24, 2023 00:36
…n CPU"

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
…n CPU"

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
CaoE added a commit that referenced this pull request May 29, 2023
@CaoE CaoE added module: half Related to float16 half-precision floats ciflow/trunk Trigger trunk jobs on your pull request ciflow/mps Run MPS tests (subset of trunk) labels May 29, 2023
@CaoE CaoE requested review from jgong5 and mingfeima May 31, 2023 01:02
@CaoE CaoE changed the title add Half support for AdaptiveAvgPool and AdaptiveMaxPool on CPU add Half support for AdaptiveAvgPool2d and AdaptiveMaxPool2d on CPU Jun 18, 2023
CaoE added a commit to CaoE/pytorch that referenced this pull request Jun 18, 2023
@CaoE CaoE marked this pull request as ready for review June 19, 2023 03:03
…2d on CPU"


### Testing

Single core:

shape | fp32 forward / ms | fp16 forward / ms | bf16 forward / ms | fp32 backward / ms | fp16 backward / ms | bf16 backward / ms
-- | -- | -- | -- | -- | -- | --
input size: (2, 56, 264, 264), output size: (100, 100)  | 71.5826 | 78.7460 | 85.7195 | 7.3925 | 6.0618 | 6.2596
input size: (2, 56, 264, 264), output size: (50, 50)  | 28.122 | 30.8572 | 36.6366 | 6.2645 | 3.4781 | 3.6628
input size: (32, 32, 100, 100), output size: (50, 50)  | 109.2978 | 115.0330 | 121.9500 | 13.4329 | 10.2769 | 12.1975
input size: (16, 4, 300, 300), output size: (100, 100) | 34.1849 | 36.5876 | 40.9862 | 4.7719 | 4.3362 | 4.1417


28 cores:

shape | fp32 forward / ms | fp16 forward / ms | bf16 forward / ms | fp32 backward / ms | fp16 backward / ms | bf16 backward / ms
-- | -- | -- | -- | -- | -- | --
input size: (2, 56, 264, 264), output size: (100, 100)  | 3.1809 | 3.5057 | 3.6728 | 0.6657 | 0.3138 | 0.2934
input size: (2, 56, 264, 264), output size: (50, 50)  | 1.2779 | 1.3869 | 1.5238 | 0.4223 | 0.1775 | 0.1825
input size: (32, 32, 100, 100), output size: (50, 50)  | 4.7942 | 4.9670 | 5.2330 | 1.7146 | 0.6477 | 0.7001
input size: (16, 4, 300, 300), output size: (100, 100) | 1.9522 | 2.0879 | 2.3155 | 0.4370 | 0.3175 | 0.2828


cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
CaoE added a commit that referenced this pull request Jun 19, 2023
…2d on CPU"


### Testing

Single core:

AdaptiveMaxPool2d:
shape | fp32 forward / ms | fp16 forward / ms | bf16 forward / ms | fp32 backward / ms | fp16 backward / ms | bf16 backward / ms
-- | -- | -- | -- | -- | -- | --
input size: (2, 56, 264, 264), output size: (100, 100)  | 71.5826 | 78.7460 | 85.7195 | 7.3925 | 6.0618 | 6.2596
input size: (2, 56, 264, 264), output size: (50, 50)  | 28.122 | 30.8572 | 36.6366 | 6.2645 | 3.4781 | 3.6628
input size: (32, 32, 100, 100), output size: (50, 50)  | 109.2978 | 115.0330 | 121.9500 | 13.4329 | 10.2769 | 12.1975
input size: (16, 4, 300, 300), output size: (100, 100) | 34.1849 | 36.5876 | 40.9862 | 4.7719 | 4.3362 | 4.1417


28 cores:

AdaptiveMaxPool2d:

shape | fp32 forward / ms | fp16 forward / ms | bf16 forward / ms | fp32 backward / ms | fp16 backward / ms | bf16 backward / ms
-- | -- | -- | -- | -- | -- | --
input size: (2, 56, 264, 264), output size: (100, 100)  | 3.1809 | 3.5057 | 3.6728 | 0.6657 | 0.3138 | 0.2934
input size: (2, 56, 264, 264), output size: (50, 50)  | 1.2779 | 1.3869 | 1.5238 | 0.4223 | 0.1775 | 0.1825
input size: (32, 32, 100, 100), output size: (50, 50)  | 4.7942 | 4.9670 | 5.2330 | 1.7146 | 0.6477 | 0.7001
input size: (16, 4, 300, 300), output size: (100, 100) | 1.9522 | 2.0879 | 2.3155 | 0.4370 | 0.3175 | 0.2828


cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
CaoE added a commit that referenced this pull request Jul 12, 2023
…2d on CPU"


### Testing

Single core:

AdaptiveMaxPool2d:
shape | fp32 forward / ms | fp16 forward / ms | bf16 forward / ms | fp32 backward / ms | fp16 backward / ms | bf16 backward / ms
-- | -- | -- | -- | -- | -- | --
input size: (2, 56, 264, 264), output size: (100, 100)  | 71.5826 | 78.7460 | 85.7195 | 7.3925 | 6.0618 | 6.2596
input size: (2, 56, 264, 264), output size: (50, 50)  | 28.122 | 30.8572 | 36.6366 | 6.2645 | 3.4781 | 3.6628
input size: (32, 32, 100, 100), output size: (50, 50)  | 109.2978 | 115.0330 | 121.9500 | 13.4329 | 10.2769 | 12.1975
input size: (16, 4, 300, 300), output size: (100, 100) | 34.1849 | 36.5876 | 40.9862 | 4.7719 | 4.3362 | 4.1417


28 cores:

AdaptiveMaxPool2d:

shape | fp32 forward / ms | fp16 forward / ms | bf16 forward / ms | fp32 backward / ms | fp16 backward / ms | bf16 backward / ms
-- | -- | -- | -- | -- | -- | --
input size: (2, 56, 264, 264), output size: (100, 100)  | 3.1809 | 3.5057 | 3.6728 | 0.6657 | 0.3138 | 0.2934
input size: (2, 56, 264, 264), output size: (50, 50)  | 1.2779 | 1.3869 | 1.5238 | 0.4223 | 0.1775 | 0.1825
input size: (32, 32, 100, 100), output size: (50, 50)  | 4.7942 | 4.9670 | 5.2330 | 1.7146 | 0.6477 | 0.7001
input size: (16, 4, 300, 300), output size: (100, 100) | 1.9522 | 2.0879 | 2.3155 | 0.4370 | 0.3175 | 0.2828


cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
CaoE added a commit that referenced this pull request Aug 18, 2023
…2d on CPU"


### Testing

Single core:

AdaptiveMaxPool2d:
shape | fp32 forward / ms | fp16 forward / ms | bf16 forward / ms | fp32 backward / ms | fp16 backward / ms | bf16 backward / ms
-- | -- | -- | -- | -- | -- | --
input size: (2, 56, 264, 264), output size: (100, 100)  | 71.5826 | 78.7460 | 85.7195 | 7.3925 | 6.0618 | 6.2596
input size: (2, 56, 264, 264), output size: (50, 50)  | 28.122 | 30.8572 | 36.6366 | 6.2645 | 3.4781 | 3.6628
input size: (32, 32, 100, 100), output size: (50, 50)  | 109.2978 | 115.0330 | 121.9500 | 13.4329 | 10.2769 | 12.1975
input size: (16, 4, 300, 300), output size: (100, 100) | 34.1849 | 36.5876 | 40.9862 | 4.7719 | 4.3362 | 4.1417


28 cores:

AdaptiveMaxPool2d:

shape | fp32 forward / ms | fp16 forward / ms | bf16 forward / ms | fp32 backward / ms | fp16 backward / ms | bf16 backward / ms
-- | -- | -- | -- | -- | -- | --
input size: (2, 56, 264, 264), output size: (100, 100)  | 3.1809 | 3.5057 | 3.6728 | 0.6657 | 0.3138 | 0.2934
input size: (2, 56, 264, 264), output size: (50, 50)  | 1.2779 | 1.3869 | 1.5238 | 0.4223 | 0.1775 | 0.1825
input size: (32, 32, 100, 100), output size: (50, 50)  | 4.7942 | 4.9670 | 5.2330 | 1.7146 | 0.6477 | 0.7001
input size: (16, 4, 300, 300), output size: (100, 100) | 1.9522 | 2.0879 | 2.3155 | 0.4370 | 0.3175 | 0.2828


cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
CaoE added a commit that referenced this pull request Aug 29, 2023
…2d on CPU"


### Testing

Single core:

AdaptiveMaxPool2d:
shape | fp32 forward / ms | fp16 forward / ms | bf16 forward / ms | fp32 backward / ms | fp16 backward / ms | bf16 backward / ms
-- | -- | -- | -- | -- | -- | --
input size: (2, 56, 264, 264), output size: (100, 100)  | 71.5826 | 78.7460 | 85.7195 | 7.3925 | 6.0618 | 6.2596
input size: (2, 56, 264, 264), output size: (50, 50)  | 28.122 | 30.8572 | 36.6366 | 6.2645 | 3.4781 | 3.6628
input size: (32, 32, 100, 100), output size: (50, 50)  | 109.2978 | 115.0330 | 121.9500 | 13.4329 | 10.2769 | 12.1975
input size: (16, 4, 300, 300), output size: (100, 100) | 34.1849 | 36.5876 | 40.9862 | 4.7719 | 4.3362 | 4.1417


28 cores:

AdaptiveMaxPool2d:

shape | fp32 forward / ms | fp16 forward / ms | bf16 forward / ms | fp32 backward / ms | fp16 backward / ms | bf16 backward / ms
-- | -- | -- | -- | -- | -- | --
input size: (2, 56, 264, 264), output size: (100, 100)  | 3.1809 | 3.5057 | 3.6728 | 0.6657 | 0.3138 | 0.2934
input size: (2, 56, 264, 264), output size: (50, 50)  | 1.2779 | 1.3869 | 1.5238 | 0.4223 | 0.1775 | 0.1825
input size: (32, 32, 100, 100), output size: (50, 50)  | 4.7942 | 4.9670 | 5.2330 | 1.7146 | 0.6477 | 0.7001
input size: (16, 4, 300, 300), output size: (100, 100) | 1.9522 | 2.0879 | 2.3155 | 0.4370 | 0.3175 | 0.2828


cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
CaoE added a commit that referenced this pull request Oct 12, 2023
@CaoE
Copy link
Collaborator Author

CaoE commented Oct 13, 2023

@malfet Could you please review this PR again ? Thanks.

Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for eliminating the if, not please do not make unnecessary changes (like renaming accsalar_t, which is used throughout the folder) or defining param_t`, which is tautologically equal to float in the code as it is written right now)


namespace {

template <typename scalar_t, typename accscalar_t>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't understand the reply. This change is unrelated to PR title/description, please do not make it.

@CaoE CaoE requested a review from malfet October 16, 2023 06:04
@CaoE
Copy link
Collaborator Author

CaoE commented Oct 17, 2023

@malfet Thanks for your comments, but I still have some doubts. In pytorch there are codes like using accscalar_t = at::acc_type<scalar_t, false>. Such accscalar_t is double when scalar_t is float. It is different from opmath type. So I want to make it clear to use opmath_type instead to indicate that we just use float for half/bfloat16.

…2d on CPU"


### Testing

Single core:

AdaptiveMaxPool2d:
shape | fp32 forward / ms | fp16 forward / ms | bf16 forward / ms | fp32 backward / ms | fp16 backward / ms | bf16 backward / ms
-- | -- | -- | -- | -- | -- | --
input size: (2, 56, 264, 264), output size: (100, 100)  | 71.5826 | 78.7460 | 85.7195 | 7.3925 | 6.0618 | 6.2596
input size: (2, 56, 264, 264), output size: (50, 50)  | 28.122 | 30.8572 | 36.6366 | 6.2645 | 3.4781 | 3.6628
input size: (32, 32, 100, 100), output size: (50, 50)  | 109.2978 | 115.0330 | 121.9500 | 13.4329 | 10.2769 | 12.1975
input size: (16, 4, 300, 300), output size: (100, 100) | 34.1849 | 36.5876 | 40.9862 | 4.7719 | 4.3362 | 4.1417


28 cores:

AdaptiveMaxPool2d:

shape | fp32 forward / ms | fp16 forward / ms | bf16 forward / ms | fp32 backward / ms | fp16 backward / ms | bf16 backward / ms
-- | -- | -- | -- | -- | -- | --
input size: (2, 56, 264, 264), output size: (100, 100)  | 3.1809 | 3.5057 | 3.6728 | 0.6657 | 0.3138 | 0.2934
input size: (2, 56, 264, 264), output size: (50, 50)  | 1.2779 | 1.3869 | 1.5238 | 0.4223 | 0.1775 | 0.1825
input size: (32, 32, 100, 100), output size: (50, 50)  | 4.7942 | 4.9670 | 5.2330 | 1.7146 | 0.6477 | 0.7001
input size: (16, 4, 300, 300), output size: (100, 100) | 1.9522 | 2.0879 | 2.3155 | 0.4370 | 0.3175 | 0.2828


cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
…2d on CPU"


### Testing

Single core:

AdaptiveMaxPool2d:
shape | fp32 forward / ms | fp16 forward / ms | bf16 forward / ms | fp32 backward / ms | fp16 backward / ms | bf16 backward / ms
-- | -- | -- | -- | -- | -- | --
input size: (2, 56, 264, 264), output size: (100, 100)  | 71.5826 | 78.7460 | 85.7195 | 7.3925 | 6.0618 | 6.2596
input size: (2, 56, 264, 264), output size: (50, 50)  | 28.122 | 30.8572 | 36.6366 | 6.2645 | 3.4781 | 3.6628
input size: (32, 32, 100, 100), output size: (50, 50)  | 109.2978 | 115.0330 | 121.9500 | 13.4329 | 10.2769 | 12.1975
input size: (16, 4, 300, 300), output size: (100, 100) | 34.1849 | 36.5876 | 40.9862 | 4.7719 | 4.3362 | 4.1417


28 cores:

AdaptiveMaxPool2d:

shape | fp32 forward / ms | fp16 forward / ms | bf16 forward / ms | fp32 backward / ms | fp16 backward / ms | bf16 backward / ms
-- | -- | -- | -- | -- | -- | --
input size: (2, 56, 264, 264), output size: (100, 100)  | 3.1809 | 3.5057 | 3.6728 | 0.6657 | 0.3138 | 0.2934
input size: (2, 56, 264, 264), output size: (50, 50)  | 1.2779 | 1.3869 | 1.5238 | 0.4223 | 0.1775 | 0.1825
input size: (32, 32, 100, 100), output size: (50, 50)  | 4.7942 | 4.9670 | 5.2330 | 1.7146 | 0.6477 | 0.7001
input size: (16, 4, 300, 300), output size: (100, 100) | 1.9522 | 2.0879 | 2.3155 | 0.4370 | 0.3175 | 0.2828


cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
…2d on CPU"


### Testing

Single core:

AdaptiveMaxPool2d:
shape | fp32 forward / ms | fp16 forward / ms | bf16 forward / ms | fp32 backward / ms | fp16 backward / ms | bf16 backward / ms
-- | -- | -- | -- | -- | -- | --
input size: (2, 56, 264, 264), output size: (100, 100)  | 71.5826 | 78.7460 | 85.7195 | 7.3925 | 6.0618 | 6.2596
input size: (2, 56, 264, 264), output size: (50, 50)  | 28.122 | 30.8572 | 36.6366 | 6.2645 | 3.4781 | 3.6628
input size: (32, 32, 100, 100), output size: (50, 50)  | 109.2978 | 115.0330 | 121.9500 | 13.4329 | 10.2769 | 12.1975
input size: (16, 4, 300, 300), output size: (100, 100) | 34.1849 | 36.5876 | 40.9862 | 4.7719 | 4.3362 | 4.1417


28 cores:

AdaptiveMaxPool2d:

shape | fp32 forward / ms | fp16 forward / ms | bf16 forward / ms | fp32 backward / ms | fp16 backward / ms | bf16 backward / ms
-- | -- | -- | -- | -- | -- | --
input size: (2, 56, 264, 264), output size: (100, 100)  | 3.1809 | 3.5057 | 3.6728 | 0.6657 | 0.3138 | 0.2934
input size: (2, 56, 264, 264), output size: (50, 50)  | 1.2779 | 1.3869 | 1.5238 | 0.4223 | 0.1775 | 0.1825
input size: (32, 32, 100, 100), output size: (50, 50)  | 4.7942 | 4.9670 | 5.2330 | 1.7146 | 0.6477 | 0.7001
input size: (16, 4, 300, 300), output size: (100, 100) | 1.9522 | 2.0879 | 2.3155 | 0.4370 | 0.3175 | 0.2828


cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
CaoE added a commit that referenced this pull request Oct 23, 2023
@CaoE
Copy link
Collaborator Author

CaoE commented Oct 23, 2023

@malfet I removed some unnecessary changes. Could you please review this PR ? Thank you.

@CaoE
Copy link
Collaborator Author

CaoE commented Nov 20, 2023

@malfet Could you please review this PR again ? Thank you for your help.

Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the updates, looks good to me

@malfet
Copy link
Contributor

malfet commented Nov 20, 2023

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team Raised by workflow job

@malfet malfet added release notes: nn release notes category topic: improvements topic category labels Nov 20, 2023
@malfet
Copy link
Contributor

malfet commented Nov 20, 2023

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/mps Run MPS tests (subset of trunk) ciflow/trunk Trigger trunk jobs on your pull request Merged module: cpu CPU specific problem (e.g., perf, algorithm) module: half Related to float16 half-precision floats open source release notes: nn release notes category topic: improvements topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants