KEMBAR78
Remove dtype check on meta device by ge0405 · Pull Request #136774 · pytorch/pytorch · GitHub
Skip to content

Conversation

@ge0405
Copy link
Contributor

@ge0405 ge0405 commented Sep 26, 2024

Summary:

Latest Update

This diff is no longer needed because we did need the check to exist, to make meta behave the same as other devices, see D54526190.


Background

T176105639

| case | embedding bag weight | per_sample_weight | fbgemm lookup | forward in meta |
| A | fp32 | fp32 | good | good |
| B | fp16 | fp32 | good| failed check that forces weight dtype == per_sample_weights dtype |
| C | fp16 | fp16 | P1046999270, RuntimeError: "expected scalar type Float but found Half from fbgemm call" | good |
| D | fp32 | fp16 | N/A | N/A |

Currently we are in case A. Users need to add use_fp32_embedding in training to force embedding bag dtype to be fp32. However, users actually hope for case B to use fp16 as the embedding bag weight. When deleting use_fp32_embedding, they would fail the check that forces weight dtype == per_sample_weights dtype in meta_registration.

The check is actually not necessary. Is it because the backend fbgemm does support case B. Additionally, later on in the meta_embedding_bag, weight and per_sample_weights don't need to be in the same dtype (https://fburl.com/code/q0tho05h, weight is src, per_sample_weights is scale) for is_fast_path_index_select.

This diff

Therefore, this diff remove the unnecessary check to support case B in meta forward. With such, users are able to use fp16 to be the emb bag dtype without the need to force per_sample_weights the same dtype in meta forward (see Test Plan).

Reference diffs to resolve this issue

Diff 1: D52591217
This passes embedding bag dtype to feature_processor to make per_sample_weights same dtype as emb bag weight. However, is_meta also needs to be passed because of case C. fbgemm still does not support per_sample_weights = fp16 (see the above table). Therefore users are forced to only make per_sample_weights fp16 when it is on meta. The solution requires too many hacks.

Diff 2: D53232739
Basically doing the same thing in diff 1 D52591217, except that the hack is added in TorchRec library. This adds an if in EBC and PEA for: when emb bag weight is fp16, it forces per_sample_weight fp16 too. However, it would then result in fbgemm issue too and has broken a bunch of prod models.

Test Plan:

APS

The following command will run icvr_launcher which triggers ads_launcher and run forward in meta device:

buck2 run mode/opt -c python.package_style=inplace //aps_models/ads/icvr:icvr_launcher_publish -- mode=mast_ig_fm_when_combo0_uhm_publish launcher.fbl_entitlement=ads_global_tc_ads_score launcher.data_project=oncall_ads_model_platform launcher.tags=[ads_ranking_taxonomy_exlarge_fm_prod] stages.train=false

Result:
{F1461463993}

Reviewed By: ezyang

Differential Revision: D54175438

@pytorch-bot
Copy link

pytorch-bot bot commented Sep 26, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/136774

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit c2a37a2 with merge base 69bcf10 (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D54175438

@github-actions
Copy link
Contributor

This PR needs a release notes: label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D54175438

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 12, 2024
Summary:

# Background

T176105639

| case | embedding bag weight | per_sample_weight | fbgemm lookup | forward in meta |
| A | fp32 | fp32 | good | good |
| B | fp16 | fp32 | good| failed [check](https://fburl.com/code/k3n3h031) that forces weight dtype ==  per_sample_weights dtype |
| C | fp16 | fp16 | P1046999270, RuntimeError: "expected scalar type Float but found Half from fbgemm call" | good |
| D | fp32 | fp16 | N/A | N/A |

Currently we are in case A. Users need to add `use_fp32_embedding` in training to force embedding bag dtype to be fp32. However, users actually hope for case B to use fp16 as the embedding bag weight. When deleting `use_fp32_embedding`, they would fail the [check](https://fburl.com/code/k3n3h031) that forces `weight dtype ==  per_sample_weights dtype ` in meta_registration. 

The check is actually not necessary. Is it because the backend fbgemm does support case B. Additionally, later on in the `meta_embedding_bag`, `weight` and `per_sample_weights` don't need to be in the same dtype (https://fburl.com/code/q0tho05h, weight is src, per_sample_weights is scale) for `is_fast_path_index_select`.

# This diff
Therefore, this diff remove the unnecessary [check](https://fburl.com/code/k3n3h031) to support case B in meta forward. With such, users are able to use fp16 to be the emb bag dtype without the need to force per_sample_weights the same dtype in meta forward (see Test Plan).

# Reference diffs to resolve this issue
Diff 1: D52591217 
This passes embedding bag dtype to feature_processor to make per_sample_weights same dtype as emb bag weight. However, `is_meta` also needs to be passed because of case C. fbgemm still does not support per_sample_weights = fp16 (see the above table). Therefore users are forced to only make per_sample_weights fp16 when it is on meta. The solution requires too many hacks. 

Diff 2: D53232739
Basically doing the same thing in diff 1 D52591217, except that the hack is added in TorchRec library. This adds an if in EBC and PEA for: when emb bag weight is fp16, it forces per_sample_weight fp16 too. However, it would then result in fbgemm issue too and has broken a bunch of prod models.

Test Plan:
# APS
The following command will run icvr_launcher which triggers ads_launcher and run forward in meta device: 
```
buck2 run mode/opt -c python.package_style=inplace //aps_models/ads/icvr:icvr_launcher_publish -- mode=mast_ig_fm_when_combo0_uhm_publish launcher.fbl_entitlement=ads_global_tc_ads_score launcher.data_project=oncall_ads_model_platform launcher.tags=[ads_ranking_taxonomy_exlarge_fm_prod] stages.train=false
```

Result:
 {F1461463993}

Reviewed By: ezyang

Differential Revision: D54175438
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D54175438

@facebook-github-bot
Copy link
Contributor

@pytorchbot merge -f 'Landed internally'

(Initiating merge automatically since Phabricator Diff has merged, using force because this PR might not pass merge_rules.json but landed internally)

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request fb-exported Merged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants