-
Notifications
You must be signed in to change notification settings - Fork 25.7k
[AOTI] Use len(serialized_weights) when calculating consts_size
#139054
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/139054
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 99f163f with merge base a7479fa ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
|
||
| class AOTInductorTestABICompatibleCpu(AOTITestCase): | ||
| device = "cpu" | ||
| device_type = "cpu" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
device_type is needed in skipCUDAIf
| @unittest.skipIf(sys.platform == "darwin", "No CUDA on MacOS") | ||
| class AOTInductorTestABICompatibleCuda(AOTITestCase): | ||
| device = "cuda" | ||
| device_type = "cuda" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
|
|
||
| class AOTInductorTestABICompatibleCpuWithStackAllocation(AOTITestCase): | ||
| device = "cpu" | ||
| device_type = "cpu" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
| TestCase | ||
| ): | ||
| device = "cpu" | ||
| device_type = "cpu" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
len(serialized_weights) when calculating consts_size
len(serialized_weights) when calculating consts_sizelen(serialized_weights) when calculating consts_size
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: Command Details for Dev Infra teamRaised by workflow job |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
|
@chunyuan-w Hi, I noticed that this PR uses |
…ytorch#139054) Fixes the failure of INT8 DLRM using AOTI. The previous code calculates `consts_size` directly using `tensor` from `graph.constants`: ``` consts_size = sum( get_nbytes_of_tensor(tensor, all_cuda) for (name, tensor) in graph.constants.items() if name not in graph.folded_constants ) ``` Meanwhile, the actual bytes to serialize (`serialized_weights`) is using `graph.get_original_value_of_constant(name)`: ``` serialized_weights = b"".join( _to_bytes(graph.get_original_value_of_constant(name), all_cuda) for name in graph.constants.keys() if name not in graph.folded_constants ) ``` `tensor` from `graph.constants` could be different from `graph.get_original_value_of_constant(name)` thus making the `consts_size` inconsistent with the actual byte size of the `serialized_weights`, resulting in runtime error `weights_offset must be aligned to 16K boundary`, similar to what happened in pytorch#135205. This PR direclty gets `consts_size ` using `len(serialized_weights)`, which fixes the inconsistency. We also added a `reduce_range` argument to the `get_default_x86_inductor_quantization_config` function, which is needed in the unit test to avoid accuracy issue on CI machines (earlier CPUs without VNNI). Pull Request resolved: pytorch#139054 Approved by: https://github.com/leslie-fang-intel, https://github.com/jgong5, https://github.com/desertfire
Stack from ghstack (oldest at bottom):
len(serialized_weights)when calculatingconsts_size#139054Fixes the failure of INT8 DLRM using AOTI.
The previous code calculates
consts_sizedirectly usingtensorfromgraph.constants:Meanwhile, the actual bytes to serialize (
serialized_weights) is usinggraph.get_original_value_of_constant(name):tensorfromgraph.constantscould be different fromgraph.get_original_value_of_constant(name)thus making theconsts_sizeinconsistent with the actual byte size of theserialized_weights, resulting in runtime errorweights_offset must be aligned to 16K boundary, similar to what happened in #135205.This PR direclty gets
consts_sizeusinglen(serialized_weights), which fixes the inconsistency.We also added a
reduce_rangeargument to theget_default_x86_inductor_quantization_configfunction, which is needed in the unit test to avoid accuracy issue on CI machines (earlier CPUs without VNNI).cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov