-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Fix weights_only for BUILD instructions for user allowlisted objects with __slots__ #138936
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix weights_only for BUILD instructions for user allowlisted objects with __slots__ #138936
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138936
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 96123eb with merge base 73fde0d ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Previously BUILD instruction missed handling for `__slots__` see handling in pickle code https://github.com/python/cpython/blob/c5b99f5c2c5347d66b9da362773969c531fb6c85/Lib/pickle.py#L1846-L1867 e.g. ```python from dataclasses import dataclass # Define the dataclass dataclass class MyDataClass: __slots__ = ["x", "y"] x: int y: str # Create an instance of the dataclass my_data = MyDataClass(x=2, y=3) # Save the dataclass to a file torch.save(my_data, "my_data.pt") with torch.serialization.safe_globals([MyDataClass]): loaded_my_data = torch.load("my_data.pt", weights_only=True) # AttributeError: 'MyDataClass' object has no attribute '__dict__' ``` [ghstack-poisoned]
Previously `BUILD` instruction missed handling for `__slots__`. ### Background When does pickle serialize a `BUILD` instruction? When `state` is not `None` and `state_setter` is `None` [[link](https://github.com/python/cpython/blob/c5b99f5c2c5347d66b9da362773969c531fb6c85/Lib/pickle.py#L765)] `__reduce__`/`__reduce_ex__` are expected to return tuples of length 2 to 6 where `state` is the 3rd argument. Note the return type for [`__getstate__` ](https://docs.python.org/3/library/pickle.html#object.__getstate__) - For a class that has no instance [`__dict__`](https://docs.python.org/3/reference/datamodel.html#object.__dict__) and no [`__slots__`](https://docs.python.org/3/reference/datamodel.html#object.__slots__), the default state is None. - For a class that has an instance [`__dict__`](https://docs.python.org/3/reference/datamodel.html#object.__dict__) and no [`__slots__`](https://docs.python.org/3/reference/datamodel.html#object.__slots__), the default state is `self.__dict__`. - For a class that has an instance [`__dict__`](https://docs.python.org/3/reference/datamodel.html#object.__dict__) and [`__slots__`](https://docs.python.org/3/reference/datamodel.html#object.__slots__), the default state is a tuple consisting of two dictionaries: `self.__dict__`, and a dictionary mapping slot names to slot values. Only slots that have a value are included in the latter. - For a class that has [`__slots__`](https://docs.python.org/3/reference/datamodel.html#object.__slots__) and no instance [`__dict__`](https://docs.python.org/3/reference/datamodel.html#object.__dict__), the default state is a tuple whose first item is None and whose second item is a dictionary mapping slot names to slot values described in the previous bullet. see handling in pickle code https://github.com/python/cpython/blob/c5b99f5c2c5347d66b9da362773969c531fb6c85/Lib/pickle.py#L1846-L1867 Before this PR, this would fail ```python from dataclasses import dataclass # Define the dataclass dataclass class MyDataClass: __slots__ = ["x", "y"] x: int y: str # Create an instance of the dataclass my_data = MyDataClass(x=2, y=3) # Save the dataclass to a file torch.save(my_data, "my_data.pt") with torch.serialization.safe_globals([MyDataClass]): loaded_my_data = torch.load("my_data.pt", weights_only=True) # AttributeError: 'MyDataClass' object has no attribute '__dict__' ``` [ghstack-poisoned]
…ed objects with __slots__" Previously `BUILD` instruction missed handling for `__slots__`. **This only applies for things allowlisted via `add_safe_globals`/`safe_globals` that use slots.** ### Background When does pickle serialize a `BUILD` instruction? When `state` is not `None` and `state_setter` is `None` [[link](https://github.com/python/cpython/blob/c5b99f5c2c5347d66b9da362773969c531fb6c85/Lib/pickle.py#L765)]. In this case, the docs tell us that either `__setstate__` or a `__dict__` update will be performed [[link](https://github.com/python/cpython/blob/3.13/Lib/pickletools.py#L1984)] `__reduce__`/`__reduce_ex__` are expected to return tuples of length 2 to 6 where `state` is the 3rd argument. When user doesn't patch `__reduce__` but patches `__setstate__`/`__getstate__`, state will be what is yielded by `__getstate__` Note the return type for [`__getstate__` ](https://docs.python.org/3/library/pickle.html#object.__getstate__) - For a class that has no instance [`__dict__`](https://docs.python.org/3/reference/datamodel.html#object.__dict__) and no [`__slots__`](https://docs.python.org/3/reference/datamodel.html#object.__slots__), the default state is None. - For a class that has an instance [`__dict__`](https://docs.python.org/3/reference/datamodel.html#object.__dict__) and no [`__slots__`](https://docs.python.org/3/reference/datamodel.html#object.__slots__), the default state is `self.__dict__`. - For a class that has an instance [`__dict__`](https://docs.python.org/3/reference/datamodel.html#object.__dict__) and [`__slots__`](https://docs.python.org/3/reference/datamodel.html#object.__slots__), the default state is a tuple consisting of two dictionaries: `self.__dict__`, and a dictionary mapping slot names to slot values. Only slots that have a value are included in the latter. - For a class that has [`__slots__`](https://docs.python.org/3/reference/datamodel.html#object.__slots__) and no instance [`__dict__`](https://docs.python.org/3/reference/datamodel.html#object.__dict__), the default state is a tuple whose first item is None and whose second item is a dictionary mapping slot names to slot values described in the previous bullet. see handling in pickle code https://github.com/python/cpython/blob/c5b99f5c2c5347d66b9da362773969c531fb6c85/Lib/pickle.py#L1846-L1867 Before this PR, we didn't account for the fact that when `__setstate__` is not defined, `state` might be a tuple so this would fail ```python from dataclasses import dataclass # Define the dataclass dataclass class MyDataClass: __slots__ = ["x", "y"] x: int y: str # Create an instance of the dataclass my_data = MyDataClass(x=2, y=3) # Save the dataclass to a file torch.save(my_data, "my_data.pt") with torch.serialization.safe_globals([MyDataClass]): loaded_my_data = torch.load("my_data.pt", weights_only=True) # AttributeError: 'MyDataClass' object has no attribute '__dict__' ``` [ghstack-poisoned]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Pull Request resolved: #139433 Approved by: https://github.com/malfet ghstack dependencies: #138936, #139221
…139541) This is tested in PR stacked above in ```python python test/distributed/fsdp/test_fsdp_state_dict.py TestFSDPStateDict.test_torch_save_load ``` We cannot depend on whether `hasattr(..., __slots__)` to know whether a BUILD instruction has slotstate. For example, if a class subclasses ABC `hasattr(__slots__)` will be `True` but there might be no slots (and hence `state` will not be a tuple). So revert #138936 to following the pickle library's code ```python >>> from abc import ABC >>> hasattr(ABC, "__slots__") True ``` So ```python import torch from abc import ABC from dataclasses import dataclass class Foo(ABC): pass class FooWrapper(Foo): def __init__(self, x, y): self.x = x self.y = y f = FooWrapper(1, 2) torch.save(f, "temp.pt") with torch.serialization.safe_globals([FooWrapper]): torch.load("temp.pt") ``` Would fail on the previous code with ``` File "/data/users/mg1998/pytorch/torch/serialization.py", line 1934, in _load result = unpickler.load() File "/data/users/mg1998/pytorch/torch/_weights_only_unpickler.py", line 366, in load for k, v in slotstate.items(): ``` As there is actually no slotstate Pull Request resolved: #139541 Approved by: https://github.com/malfet ghstack dependencies: #138936, #139221, #139433
Pull Request resolved: #137602 Approved by: https://github.com/malfet, https://github.com/albanD ghstack dependencies: #138936, #139221, #139433, #139541
…with __slots__ (pytorch#138936) Previously `BUILD` instruction missed handling for `__slots__`. **This only applies for things allowlisted via `add_safe_globals`/`safe_globals` that use slots.** ### Background When does pickle serialize a `BUILD` instruction? When `state` is not `None` and `state_setter` is `None` [[link](https://github.com/python/cpython/blob/c5b99f5c2c5347d66b9da362773969c531fb6c85/Lib/pickle.py#L765)]. In this case, the docs tell us that either `__setstate__` or a `__dict__` update will be performed [[link](https://github.com/python/cpython/blob/3.13/Lib/pickletools.py#L1984)] `__reduce__`/`__reduce_ex__` are expected to return tuples of length 2 to 6 where `state` is the 3rd argument. When user doesn't patch `__reduce__` but patches `__setstate__`/`__getstate__`, state will be what is yielded by `__getstate__` Note the return type for [`__getstate__` ](https://docs.python.org/3/library/pickle.html#object.__getstate__) - For a class that has no instance [`__dict__`](https://docs.python.org/3/reference/datamodel.html#object.__dict__) and no [`__slots__`](https://docs.python.org/3/reference/datamodel.html#object.__slots__), the default state is None. - For a class that has an instance [`__dict__`](https://docs.python.org/3/reference/datamodel.html#object.__dict__) and no [`__slots__`](https://docs.python.org/3/reference/datamodel.html#object.__slots__), the default state is `self.__dict__`. - For a class that has an instance [`__dict__`](https://docs.python.org/3/reference/datamodel.html#object.__dict__) and [`__slots__`](https://docs.python.org/3/reference/datamodel.html#object.__slots__), the default state is a tuple consisting of two dictionaries: `self.__dict__`, and a dictionary mapping slot names to slot values. Only slots that have a value are included in the latter. - For a class that has [`__slots__`](https://docs.python.org/3/reference/datamodel.html#object.__slots__) and no instance [`__dict__`](https://docs.python.org/3/reference/datamodel.html#object.__dict__), the default state is a tuple whose first item is None and whose second item is a dictionary mapping slot names to slot values described in the previous bullet. see handling in pickle code https://github.com/python/cpython/blob/c5b99f5c2c5347d66b9da362773969c531fb6c85/Lib/pickle.py#L1846-L1867 Before this PR, we didn't account for the fact that when `__setstate__` is not defined, `state` might be a tuple so this would fail ```python from dataclasses import dataclass # Define the dataclass @DataClass class MyDataClass: __slots__ = ["x", "y"] x: int y: str # Create an instance of the dataclass my_data = MyDataClass(x=2, y=3) # Save the dataclass to a file torch.save(my_data, "my_data.pt") with torch.serialization.safe_globals([MyDataClass]): loaded_my_data = torch.load("my_data.pt", weights_only=True) # AttributeError: 'MyDataClass' object has no attribute '__dict__' ``` Pull Request resolved: pytorch#138936 Approved by: https://github.com/malfet
…ependency) (pytorch#139221) Fixes pytorch#129698 pytorch#139106 without pickletools Pull Request resolved: pytorch#139221 Approved by: https://github.com/malfet ghstack dependencies: pytorch#138936
Pull Request resolved: pytorch#139433 Approved by: https://github.com/malfet ghstack dependencies: pytorch#138936, pytorch#139221
ghstack-source-id: 9e12002 Pull Request resolved: pytorch/pytorch#138936
Previously
BUILD
instruction missed handling for__slots__
. This only applies for things allowlisted viaadd_safe_globals
/safe_globals
that use slots.Background
When does pickle serialize a
BUILD
instruction? Whenstate
is notNone
andstate_setter
isNone
[link]. In this case, the docs tell us that either__setstate__
or a__dict__
update will be performed [link]__reduce__
/__reduce_ex__
are expected to return tuples of length 2 to 6 wherestate
is the 3rd argument. When user doesn't patch__reduce__
but patches__setstate__
/__getstate__
, state will be what is yielded by__getstate__
Note the return type for
__getstate__
__dict__
and no__slots__
, the default state is None.__dict__
and no__slots__
, the default state isself.__dict__
.__dict__
and__slots__
, the default state is a tuple consisting of two dictionaries:self.__dict__
, and a dictionary mapping slot names to slot values. Only slots that have a value are included in the latter.__slots__
and no instance__dict__
, the default state is a tuple whose first item is None and whose second item is a dictionary mapping slot names to slot values described in the previous bullet.see handling in pickle code https://github.com/python/cpython/blob/c5b99f5c2c5347d66b9da362773969c531fb6c85/Lib/pickle.py#L1846-L1867
Before this PR, we didn't account for the fact that when
__setstate__
is not defined,state
might be a tuple so this would failStack from ghstack (oldest at bottom):