KEMBAR78
Remove dependency on numpy for serialization for XLA/open registration devices without numpy by mikaylagawarecki · Pull Request #137444 · pytorch/pytorch · GitHub
Skip to content

Conversation

@mikaylagawarecki
Copy link
Contributor

@mikaylagawarecki mikaylagawarecki commented Oct 7, 2024

Related: pytorch/xla#7799 (comment)

Follow ups: Do the same for maia and mtia

Motivation

With the move to weights_only by default, we are making an explicit decision not to allowlist GLOBALs required to deserialize numpy tensors by default. The implication is that backends relying on numpy for serialization will fail loudly when torch.load flips weights_only.

However, we make the observation that this dependency on numpy was legacy and is not actually needed anymore. So we can remove it, which aligns with our weights_only strategy.

Why is this ok?

The following comment on why numpy is necessary for serialization is legacy

pytorch/torch/_tensor.py

Lines 303 to 312 in c87c9f0

# Note: Numpy array is chosen to be the rebuild component for XLA, MTIA, MAIA Tensors.
# We considered a few options:
# 1. CPU tensor can't be used here.
# Otherwise in torch.load CPU storage is reconstructed with randomly
# initialized data, moved onto backend device, and then storage is updated
# to the serialized content. This works perfectly for CPU/CUDA but not these backends;
# their tensors are disconnected with storage so they don't get the update.
# 2. Python list is not a good fit due to performance reason.
# `tolist()` converts every single element in the tensor into python objects
# and serialize them one by one.

We no longer do the following, though it was the case 5 years ago in the PR that added this

CPU storage is reconstructed with randomly initialized data, moved onto backend device, and then storage is updated to the serialized content

Instead what now happens is that CPU storage is constructed with data from the file and then moved onto backend device.

Old behavior (legacy_load): https://github.com/ailzhang/pytorch/blob/67adda891a839691790a0dcd99062430050eff3b/torch/serialization.py#L620

Stack from ghstack (oldest at bottom):

@pytorch-bot
Copy link

pytorch-bot bot commented Oct 7, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/137444

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 9c0b63b with merge base f80ed0b (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

mikaylagawarecki added a commit that referenced this pull request Oct 7, 2024
…n devices without numpy

ghstack-source-id: d2fd34d
Pull Request resolved: #137444
@mikaylagawarecki mikaylagawarecki added release notes: python_frontend python frontend release notes category topic: not user facing topic category labels Oct 7, 2024
Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!

…registration devices without numpy"

Related: pytorch/xla#7799 (comment) 

Follow ups: Do the same for maia and mtia

## Motivation

With the move to `weights_only` by default, we are making an explicit decision not to allowlist GLOBALs required to deserialize `numpy` tensors  by default. The implication is that backends relying on numpy for serialization will fail loudly when `torch.load` flips `weights_only`.

However, we make the observation that this dependency on numpy was legacy and is not actually needed anymore. So we can remove it, which aligns with our weights_only strategy.

## Why is this ok?

The following comment on why numpy is necessary for serialization is legacy

https://github.com/pytorch/pytorch/blob/c87c9f0a01f4840bd19ac5058960c9766dd15ef8/torch/_tensor.py#L303-L312

We no longer do the following, though it was the case 5 years ago in the PR that added this 
> CPU storage is reconstructed with randomly initialized data, moved onto backend device, and then storage is updated to the serialized content

**Instead what now happens is that CPU storage is constructed with data from the file **and then** moved onto backend device.**


Old behavior (`legacy_load`): https://github.com/ailzhang/pytorch/blob/67adda891a839691790a0dcd99062430050eff3b/torch/serialization.py#L620






[ghstack-poisoned]
mikaylagawarecki added a commit that referenced this pull request Oct 9, 2024
…n devices without numpy

ghstack-source-id: 7883a35
Pull Request resolved: #137444
@mikaylagawarecki
Copy link
Contributor Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 9, 2024
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

mikaylagawarecki added a commit that referenced this pull request Oct 16, 2024
…tion"

See rationale in #137444 description




[ghstack-poisoned]
mikaylagawarecki added a commit that referenced this pull request Oct 16, 2024
See rationale in #137444 description




[ghstack-poisoned]
mikaylagawarecki added a commit that referenced this pull request Oct 22, 2024
…tion"

See rationale in #137444 description




[ghstack-poisoned]
mikaylagawarecki added a commit that referenced this pull request Oct 22, 2024
See rationale in #137444 description




[ghstack-poisoned]
mikaylagawarecki added a commit that referenced this pull request Oct 25, 2024
…tion"

See rationale in #137444 description




[ghstack-poisoned]
mikaylagawarecki added a commit that referenced this pull request Oct 25, 2024
See rationale in #137444 description




[ghstack-poisoned]
pytorchmergebot pushed a commit that referenced this pull request Oct 28, 2024
See rationale in #137444 description

Pull Request resolved: #137600
Approved by: https://github.com/albanD
rahulsingh-intel pushed a commit to rahulsingh-intel/pytorch that referenced this pull request Oct 29, 2024
rahulsingh-intel pushed a commit to rahulsingh-intel/pytorch that referenced this pull request Nov 5, 2024
@github-actions github-actions bot deleted the gh/mikaylagawarecki/267/head branch November 9, 2024 02:02
nautsimon added a commit to nautsimon/pytorch that referenced this pull request Jul 9, 2025
Summary:
NumPy based tensor rebuilding from serialization has been deprecated by other backends (eg. [XLA](pytorch#137444)). The new flow has CPU storage being constructed with data from the file and then moved to the target backend device.

Furthermore, relying on numpy for serialization will fail loudly when torch.load flips weights_only.

Reviewed By: andyanwang

Differential Revision: D77843238
pytorchmergebot pushed a commit that referenced this pull request Jul 11, 2025
Summary:
NumPy based tensor rebuilding from serialization has been deprecated by other backends (eg. [XLA](#137444)). The new flow has CPU storage being constructed with data from the file and then moved to the target backend device.

Furthermore, relying on numpy for serialization will fail loudly when torch.load flips weights_only.

Reviewed By: andyanwang

Differential Revision: D77843238

Pull Request resolved: #157884
Approved by: https://github.com/albanD
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged release notes: python_frontend python frontend release notes category topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants