Hacky support for meta tensor serialization. #62192

ezyang · 2021-07-26T15:16:34Z

Stack from ghstack:

-> Hacky support for meta tensor serialization. #62192

This support is hacky because it doesn't preserve meta tensor storage
sharing (e.g., if you serialize a model with shared storage, e.g., a
tensor and a view on a tensor, when I deserialize the viewing
relationship will be broken and these are just different tensors.) The
hack is also durable, in the sense that we will be on the hook for
supporting _rebuild_meta_tensor_no_storage in perpetuity in the
future, even if we change our mind about the serialization format.

This unblocks an FB production use case. I didn't add C++ support to minimize
blast area of this patch.

Signed-off-by: Edward Z. Yang ezyang@fb.com

Differential Revision: D29910535

This support is hacky because it doesn't preserve meta tensor storage sharing (e.g., if you serialize a model with shared storage, e.g., a tensor and a view on a tensor, when I deserialize the viewing relationship will be broken and these are just different tensors. Signed-off-by: Edward Z. Yang <ezyang@fb.com> [ghstack-poisoned]

facebook-github-bot · 2021-07-26T15:16:39Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/62192
📄 Preview docs built from this PR

💊 CI failures summary and remediations

As of commit 8c1a1a1 (more details on the Dr. CI page):

1/2 failures possibly* introduced in this PR
- 1/1 non-scanned failure(s)
1/2 tentatively recognized as flaky ❄️
- Click here to rerun these jobs

❄️ 1 failure tentatively classified as flaky

but reruns have not yet been triggered to confirm:

pytorch_linux_xenial_cuda11_1_cudnn8_py3_gcc7_build (1/1)

Step: "Build" (full log | diagnosis details | 🔁 rerun) ❄️

fatal: the remote end hung up unexpectedly

remote: Total 0 (delta 0), reused 0 (delta 0), pack-reused 0        
remote: Enumerating objects: 7, done.        
remote: Counting objects:  14% (1/7)        
remote: Counting objects:  28% (2/7)        
remote: Counting objects:  42% (3/7)        
remote: Counting objects:  57% (4/7)        
remote: Counting objects:  71% (5/7)        
remote: Counting objects:  85% (6/7)        
remote: Counting objects: 100% (7/7)        
remote: Counting objects: 100% (7/7), done.        
remote: Compressing objects:  25% (1/4)        
remote: Compressing objects:  50% (2/4)        
remote: Compressing objects:  75% (3/4)        
remote: Compressing objects: 100% (4/4)        
remote: Compressing objects: 100% (4/4), done.        
remote: Total 4 (delta 3), reused 0 (delta 0), pack-reused 0        
Unpacking objects:  25% (1/4)
Unpacking objects:  50% (2/4)
Unpacking objects:  75% (3/4)
Unpacking objects: 100% (4/4)
Unpacking objects: 100% (4/4), 985 bytes | 492.00 KiB/s, done.
From ssh://github.com/pytorch/cpuinfo
 * branch            5916273f79a21551890fd3d56fc5375a78d1598d -> FETCH_HEAD
remote: Total 0 (delta 0), reused 0 (delta 0), pack-reused 0        
Connection to github.com closed by remote host.

fatal: the remote end hung up unexpectedly
Fetched in submodule path 'third_party/cub', but it did not contain d106ddb991a56c3df1b6d51b2409e36ba8181ce4. Direct fetching of that commit failed.


Exited with code exit status 1

ci.pytorch.org: 1 failed

Failed: pr/pytorch-linux-bionic-rocm4.2-py3.6

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

This support is hacky because it doesn't preserve meta tensor storage sharing (e.g., if you serialize a model with shared storage, e.g., a tensor and a view on a tensor, when I deserialize the viewing relationship will be broken and these are just different tensors. Signed-off-by: Edward Z. Yang <ezyang@fb.com> [ghstack-poisoned]

This support is hacky because it doesn't preserve meta tensor storage sharing (e.g., if you serialize a model with shared storage, e.g., a tensor and a view on a tensor, when I deserialize the viewing relationship will be broken and these are just different tensors. Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: 9ff1bc9 Pull Request resolved: #62192

ezyang · 2021-07-26T15:20:26Z

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

albanD

LGTM
Maybe a little bit more testing for more general serialization?

albanD · 2021-07-26T17:50:20Z

test/test_serialization.py

+            f.seek(0)
+            state = torch.load(f)
+
+        self.assertEqual(state.weight.size(), big_model.weight.size())


Do you want to check the device?

I think the tensor is plenty big enough :) I would certainly like it if we had more structured serialization tests for different device types, maybe a more involved refactor here is in order.

torch/_tensor.py

test/test_serialization.py

facebook-github-bot · 2021-07-26T21:35:41Z

@ezyang merged this pull request in cf1f594.

facebook-github-bot added the cla signed label Jul 26, 2021

ezyang requested review from albanD and zou3519 July 26, 2021 15:25

zou3519 approved these changes Jul 26, 2021

View reviewed changes

albanD approved these changes Jul 26, 2021

View reviewed changes

facebook-github-bot closed this in cf1f594 Jul 26, 2021

facebook-github-bot added the Merged label Jul 26, 2021

This was referenced Jul 27, 2021

Refactor serialization tests to use device parametrization #62271

Open

Serialization map_location silently ignores xla/mlc/meta (any serialization mechanism that skips storage) #62273

Open

facebook-github-bot deleted the gh/ezyang/1051/head branch July 30, 2021 14:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Hacky support for meta tensor serialization. #62192

Hacky support for meta tensor serialization. #62192

Uh oh!

ezyang commented Jul 26, 2021 •

edited

Loading

Uh oh!

facebook-github-bot commented Jul 26, 2021 •

edited

Loading

Uh oh!

ezyang commented Jul 26, 2021

Uh oh!

albanD left a comment

Uh oh!

albanD Jul 26, 2021

Uh oh!

ezyang Jul 26, 2021

Uh oh!

Uh oh!

Uh oh!

facebook-github-bot commented Jul 26, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Hacky support for meta tensor serialization. #62192

Hacky support for meta tensor serialization. #62192

Uh oh!

Conversation

ezyang commented Jul 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Jul 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful links

💊 CI failures summary and remediations

❄️ 1 failure tentatively classified as flaky

pytorch_linux_xenial_cuda11_1_cudnn8_py3_gcc7_build (1/1)

ci.pytorch.org: 1 failed

Uh oh!

ezyang commented Jul 26, 2021

Uh oh!

albanD left a comment

Choose a reason for hiding this comment

Uh oh!

albanD Jul 26, 2021

Choose a reason for hiding this comment

Uh oh!

ezyang Jul 26, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

facebook-github-bot commented Jul 26, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ezyang commented Jul 26, 2021 •

edited

Loading

facebook-github-bot commented Jul 26, 2021 •

edited

Loading