-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Fix weights_only for BUILD instructions for user allowlisted objects with __slots__ #138936
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix weights_only for BUILD instructions for user allowlisted objects with __slots__ #138936
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138936
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 96123eb with merge base 73fde0d ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Previously BUILD instruction missed handling for `__slots__` see handling in pickle code https://github.com/python/cpython/blob/c5b99f5c2c5347d66b9da362773969c531fb6c85/Lib/pickle.py#L1846-L1867 e.g. ```python from dataclasses import dataclass # Define the dataclass dataclass class MyDataClass: __slots__ = ["x", "y"] x: int y: str # Create an instance of the dataclass my_data = MyDataClass(x=2, y=3) # Save the dataclass to a file torch.save(my_data, "my_data.pt") with torch.serialization.safe_globals([MyDataClass]): loaded_my_data = torch.load("my_data.pt", weights_only=True) # AttributeError: 'MyDataClass' object has no attribute '__dict__' ``` [ghstack-poisoned]
Previously `BUILD` instruction missed handling for `__slots__`. ### Background When does pickle serialize a `BUILD` instruction? When `state` is not `None` and `state_setter` is `None` [[link](https://github.com/python/cpython/blob/c5b99f5c2c5347d66b9da362773969c531fb6c85/Lib/pickle.py#L765)] `__reduce__`/`__reduce_ex__` are expected to return tuples of length 2 to 6 where `state` is the 3rd argument. Note the return type for [`__getstate__` ](https://docs.python.org/3/library/pickle.html#object.__getstate__) - For a class that has no instance [`__dict__`](https://docs.python.org/3/reference/datamodel.html#object.__dict__) and no [`__slots__`](https://docs.python.org/3/reference/datamodel.html#object.__slots__), the default state is None. - For a class that has an instance [`__dict__`](https://docs.python.org/3/reference/datamodel.html#object.__dict__) and no [`__slots__`](https://docs.python.org/3/reference/datamodel.html#object.__slots__), the default state is `self.__dict__`. - For a class that has an instance [`__dict__`](https://docs.python.org/3/reference/datamodel.html#object.__dict__) and [`__slots__`](https://docs.python.org/3/reference/datamodel.html#object.__slots__), the default state is a tuple consisting of two dictionaries: `self.__dict__`, and a dictionary mapping slot names to slot values. Only slots that have a value are included in the latter. - For a class that has [`__slots__`](https://docs.python.org/3/reference/datamodel.html#object.__slots__) and no instance [`__dict__`](https://docs.python.org/3/reference/datamodel.html#object.__dict__), the default state is a tuple whose first item is None and whose second item is a dictionary mapping slot names to slot values described in the previous bullet. see handling in pickle code https://github.com/python/cpython/blob/c5b99f5c2c5347d66b9da362773969c531fb6c85/Lib/pickle.py#L1846-L1867 Before this PR, this would fail ```python from dataclasses import dataclass # Define the dataclass dataclass class MyDataClass: __slots__ = ["x", "y"] x: int y: str # Create an instance of the dataclass my_data = MyDataClass(x=2, y=3) # Save the dataclass to a file torch.save(my_data, "my_data.pt") with torch.serialization.safe_globals([MyDataClass]): loaded_my_data = torch.load("my_data.pt", weights_only=True) # AttributeError: 'MyDataClass' object has no attribute '__dict__' ``` [ghstack-poisoned]
…ed objects with __slots__" Previously `BUILD` instruction missed handling for `__slots__`. **This only applies for things allowlisted via `add_safe_globals`/`safe_globals` that use slots.** ### Background When does pickle serialize a `BUILD` instruction? When `state` is not `None` and `state_setter` is `None` [[link](https://github.com/python/cpython/blob/c5b99f5c2c5347d66b9da362773969c531fb6c85/Lib/pickle.py#L765)]. In this case, the docs tell us that either `__setstate__` or a `__dict__` update will be performed [[link](https://github.com/python/cpython/blob/3.13/Lib/pickletools.py#L1984)] `__reduce__`/`__reduce_ex__` are expected to return tuples of length 2 to 6 where `state` is the 3rd argument. When user doesn't patch `__reduce__` but patches `__setstate__`/`__getstate__`, state will be what is yielded by `__getstate__` Note the return type for [`__getstate__` ](https://docs.python.org/3/library/pickle.html#object.__getstate__) - For a class that has no instance [`__dict__`](https://docs.python.org/3/reference/datamodel.html#object.__dict__) and no [`__slots__`](https://docs.python.org/3/reference/datamodel.html#object.__slots__), the default state is None. - For a class that has an instance [`__dict__`](https://docs.python.org/3/reference/datamodel.html#object.__dict__) and no [`__slots__`](https://docs.python.org/3/reference/datamodel.html#object.__slots__), the default state is `self.__dict__`. - For a class that has an instance [`__dict__`](https://docs.python.org/3/reference/datamodel.html#object.__dict__) and [`__slots__`](https://docs.python.org/3/reference/datamodel.html#object.__slots__), the default state is a tuple consisting of two dictionaries: `self.__dict__`, and a dictionary mapping slot names to slot values. Only slots that have a value are included in the latter. - For a class that has [`__slots__`](https://docs.python.org/3/reference/datamodel.html#object.__slots__) and no instance [`__dict__`](https://docs.python.org/3/reference/datamodel.html#object.__dict__), the default state is a tuple whose first item is None and whose second item is a dictionary mapping slot names to slot values described in the previous bullet. see handling in pickle code https://github.com/python/cpython/blob/c5b99f5c2c5347d66b9da362773969c531fb6c85/Lib/pickle.py#L1846-L1867 Before this PR, we didn't account for the fact that when `__setstate__` is not defined, `state` might be a tuple so this would fail ```python from dataclasses import dataclass # Define the dataclass dataclass class MyDataClass: __slots__ = ["x", "y"] x: int y: str # Create an instance of the dataclass my_data = MyDataClass(x=2, y=3) # Save the dataclass to a file torch.save(my_data, "my_data.pt") with torch.serialization.safe_globals([MyDataClass]): loaded_my_data = torch.load("my_data.pt", weights_only=True) # AttributeError: 'MyDataClass' object has no attribute '__dict__' ``` [ghstack-poisoned]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Pull Request resolved: #139433 Approved by: https://github.com/malfet ghstack dependencies: #138936, #139221
…139541) This is tested in PR stacked above in ```python python test/distributed/fsdp/test_fsdp_state_dict.py TestFSDPStateDict.test_torch_save_load ``` We cannot depend on whether `hasattr(..., __slots__)` to know whether a BUILD instruction has slotstate. For example, if a class subclasses ABC `hasattr(__slots__)` will be `True` but there might be no slots (and hence `state` will not be a tuple). So revert #138936 to following the pickle library's code ```python >>> from abc import ABC >>> hasattr(ABC, "__slots__") True ``` So ```python import torch from abc import ABC from dataclasses import dataclass class Foo(ABC): pass class FooWrapper(Foo): def __init__(self, x, y): self.x = x self.y = y f = FooWrapper(1, 2) torch.save(f, "temp.pt") with torch.serialization.safe_globals([FooWrapper]): torch.load("temp.pt") ``` Would fail on the previous code with ``` File "/data/users/mg1998/pytorch/torch/serialization.py", line 1934, in _load result = unpickler.load() File "/data/users/mg1998/pytorch/torch/_weights_only_unpickler.py", line 366, in load for k, v in slotstate.items(): ``` As there is actually no slotstate Pull Request resolved: #139541 Approved by: https://github.com/malfet ghstack dependencies: #138936, #139221, #139433
Pull Request resolved: #137602 Approved by: https://github.com/malfet, https://github.com/albanD ghstack dependencies: #138936, #139221, #139433, #139541
…with __slots__ (pytorch#138936) Previously `BUILD` instruction missed handling for `__slots__`. **This only applies for things allowlisted via `add_safe_globals`/`safe_globals` that use slots.** ### Background When does pickle serialize a `BUILD` instruction? When `state` is not `None` and `state_setter` is `None` [[link](https://github.com/python/cpython/blob/c5b99f5c2c5347d66b9da362773969c531fb6c85/Lib/pickle.py#L765)]. In this case, the docs tell us that either `__setstate__` or a `__dict__` update will be performed [[link](https://github.com/python/cpython/blob/3.13/Lib/pickletools.py#L1984)] `__reduce__`/`__reduce_ex__` are expected to return tuples of length 2 to 6 where `state` is the 3rd argument. When user doesn't patch `__reduce__` but patches `__setstate__`/`__getstate__`, state will be what is yielded by `__getstate__` Note the return type for [`__getstate__` ](https://docs.python.org/3/library/pickle.html#object.__getstate__) - For a class that has no instance [`__dict__`](https://docs.python.org/3/reference/datamodel.html#object.__dict__) and no [`__slots__`](https://docs.python.org/3/reference/datamodel.html#object.__slots__), the default state is None. - For a class that has an instance [`__dict__`](https://docs.python.org/3/reference/datamodel.html#object.__dict__) and no [`__slots__`](https://docs.python.org/3/reference/datamodel.html#object.__slots__), the default state is `self.__dict__`. - For a class that has an instance [`__dict__`](https://docs.python.org/3/reference/datamodel.html#object.__dict__) and [`__slots__`](https://docs.python.org/3/reference/datamodel.html#object.__slots__), the default state is a tuple consisting of two dictionaries: `self.__dict__`, and a dictionary mapping slot names to slot values. Only slots that have a value are included in the latter. - For a class that has [`__slots__`](https://docs.python.org/3/reference/datamodel.html#object.__slots__) and no instance [`__dict__`](https://docs.python.org/3/reference/datamodel.html#object.__dict__), the default state is a tuple whose first item is None and whose second item is a dictionary mapping slot names to slot values described in the previous bullet. see handling in pickle code https://github.com/python/cpython/blob/c5b99f5c2c5347d66b9da362773969c531fb6c85/Lib/pickle.py#L1846-L1867 Before this PR, we didn't account for the fact that when `__setstate__` is not defined, `state` might be a tuple so this would fail ```python from dataclasses import dataclass # Define the dataclass @DataClass class MyDataClass: __slots__ = ["x", "y"] x: int y: str # Create an instance of the dataclass my_data = MyDataClass(x=2, y=3) # Save the dataclass to a file torch.save(my_data, "my_data.pt") with torch.serialization.safe_globals([MyDataClass]): loaded_my_data = torch.load("my_data.pt", weights_only=True) # AttributeError: 'MyDataClass' object has no attribute '__dict__' ``` Pull Request resolved: pytorch#138936 Approved by: https://github.com/malfet
…ependency) (pytorch#139221) Fixes pytorch#129698 pytorch#139106 without pickletools Pull Request resolved: pytorch#139221 Approved by: https://github.com/malfet ghstack dependencies: pytorch#138936
Pull Request resolved: pytorch#139433 Approved by: https://github.com/malfet ghstack dependencies: pytorch#138936, pytorch#139221
ghstack-source-id: 9e12002 Pull Request resolved: pytorch/pytorch#138936
Previously
BUILDinstruction missed handling for__slots__. This only applies for things allowlisted viaadd_safe_globals/safe_globalsthat use slots.Background
When does pickle serialize a
BUILDinstruction? Whenstateis notNoneandstate_setterisNone[link]. In this case, the docs tell us that either__setstate__or a__dict__update will be performed [link]__reduce__/__reduce_ex__are expected to return tuples of length 2 to 6 wherestateis the 3rd argument. When user doesn't patch__reduce__but patches__setstate__/__getstate__, state will be what is yielded by__getstate__Note the return type for
__getstate____dict__and no__slots__, the default state is None.__dict__and no__slots__, the default state isself.__dict__.__dict__and__slots__, the default state is a tuple consisting of two dictionaries:self.__dict__, and a dictionary mapping slot names to slot values. Only slots that have a value are included in the latter.__slots__and no instance__dict__, the default state is a tuple whose first item is None and whose second item is a dictionary mapping slot names to slot values described in the previous bullet.see handling in pickle code https://github.com/python/cpython/blob/c5b99f5c2c5347d66b9da362773969c531fb6c85/Lib/pickle.py#L1846-L1867
Before this PR, we didn't account for the fact that when
__setstate__is not defined,statemight be a tuple so this would failStack from ghstack (oldest at bottom):