KEMBAR78
Fix weights_only for BUILD instructions for user allowlisted objects with __slots__ by mikaylagawarecki · Pull Request #138936 · pytorch/pytorch · GitHub
Skip to content

Conversation

@mikaylagawarecki
Copy link
Contributor

@mikaylagawarecki mikaylagawarecki commented Oct 25, 2024

Previously BUILD instruction missed handling for __slots__. This only applies for things allowlisted via add_safe_globals/safe_globals that use slots.

Background

When does pickle serialize a BUILD instruction? When state is not None and state_setter is None [link]. In this case, the docs tell us that either __setstate__ or a __dict__ update will be performed [link]

__reduce__/__reduce_ex__ are expected to return tuples of length 2 to 6 where state is the 3rd argument. When user doesn't patch __reduce__ but patches __setstate__/__getstate__, state will be what is yielded by __getstate__

Note the return type for __getstate__

  • For a class that has no instance __dict__ and no __slots__, the default state is None.
  • For a class that has an instance __dict__ and no __slots__, the default state is self.__dict__.
  • For a class that has an instance __dict__ and __slots__, the default state is a tuple consisting of two dictionaries: self.__dict__, and a dictionary mapping slot names to slot values. Only slots that have a value are included in the latter.
  • For a class that has __slots__ and no instance __dict__, the default state is a tuple whose first item is None and whose second item is a dictionary mapping slot names to slot values described in the previous bullet.

see handling in pickle code https://github.com/python/cpython/blob/c5b99f5c2c5347d66b9da362773969c531fb6c85/Lib/pickle.py#L1846-L1867

Before this PR, we didn't account for the fact that when __setstate__ is not defined, state might be a tuple so this would fail

from dataclasses import dataclass

# Define the dataclass
@dataclass
class MyDataClass:
    __slots__ = ["x", "y"]
    x: int
    y: str
# Create an instance of the dataclass
my_data = MyDataClass(x=2, y=3)
# Save the dataclass to a file
torch.save(my_data, "my_data.pt")
with torch.serialization.safe_globals([MyDataClass]):
    loaded_my_data = torch.load("my_data.pt", weights_only=True)
# AttributeError: 'MyDataClass' object has no attribute '__dict__'

Stack from ghstack (oldest at bottom):

@pytorch-bot
Copy link

pytorch-bot bot commented Oct 25, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138936

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 96123eb with merge base 73fde0d (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@mikaylagawarecki mikaylagawarecki added release notes: python_frontend python frontend release notes category topic: bug fixes topic category labels Oct 25, 2024
Previously BUILD instruction missed handling for `__slots__`

see handling in pickle code https://github.com/python/cpython/blob/c5b99f5c2c5347d66b9da362773969c531fb6c85/Lib/pickle.py#L1846-L1867

e.g.

```python
from dataclasses import dataclass

# Define the dataclass
dataclass
class MyDataClass:
    __slots__ = ["x", "y"]
    x: int
    y: str
# Create an instance of the dataclass
my_data = MyDataClass(x=2, y=3)
# Save the dataclass to a file
torch.save(my_data, "my_data.pt")
with torch.serialization.safe_globals([MyDataClass]):
    loaded_my_data = torch.load("my_data.pt", weights_only=True)
# AttributeError: 'MyDataClass' object has no attribute '__dict__'
```






[ghstack-poisoned]
Previously `BUILD` instruction missed handling for `__slots__`.

### Background
When does pickle serialize a `BUILD` instruction? When `state` is not `None` and `state_setter` is `None` [[link](https://github.com/python/cpython/blob/c5b99f5c2c5347d66b9da362773969c531fb6c85/Lib/pickle.py#L765)]

`__reduce__`/`__reduce_ex__` are expected to return tuples of length 2 to 6 where `state` is the 3rd argument.

Note the return type for [`__getstate__` ](https://docs.python.org/3/library/pickle.html#object.__getstate__)

- For a class that has no instance [`__dict__`](https://docs.python.org/3/reference/datamodel.html#object.__dict__) and no [`__slots__`](https://docs.python.org/3/reference/datamodel.html#object.__slots__), the default state is None.
- For a class that has an instance [`__dict__`](https://docs.python.org/3/reference/datamodel.html#object.__dict__) and no [`__slots__`](https://docs.python.org/3/reference/datamodel.html#object.__slots__), the default state is `self.__dict__`.
- For a class that has an instance [`__dict__`](https://docs.python.org/3/reference/datamodel.html#object.__dict__) and [`__slots__`](https://docs.python.org/3/reference/datamodel.html#object.__slots__), the default state is a tuple consisting of two dictionaries: `self.__dict__`, and a dictionary mapping slot names to slot values. Only slots that have a value are included in the latter.
- For a class that has [`__slots__`](https://docs.python.org/3/reference/datamodel.html#object.__slots__) and no instance [`__dict__`](https://docs.python.org/3/reference/datamodel.html#object.__dict__), the default state is a tuple whose first item is None and whose second item is a dictionary mapping slot names to slot values described in the previous bullet.

see handling in pickle code https://github.com/python/cpython/blob/c5b99f5c2c5347d66b9da362773969c531fb6c85/Lib/pickle.py#L1846-L1867

Before this PR, this would fail

```python
from dataclasses import dataclass

# Define the dataclass
dataclass
class MyDataClass:
    __slots__ = ["x", "y"]
    x: int
    y: str
# Create an instance of the dataclass
my_data = MyDataClass(x=2, y=3)
# Save the dataclass to a file
torch.save(my_data, "my_data.pt")
with torch.serialization.safe_globals([MyDataClass]):
    loaded_my_data = torch.load("my_data.pt", weights_only=True)
# AttributeError: 'MyDataClass' object has no attribute '__dict__'
```






[ghstack-poisoned]
@mikaylagawarecki mikaylagawarecki changed the title Fix weights_only for BUILD instructions with __slots__ Fix weights_only for BUILD instructions for user allowlisted objects with __slots__ Oct 29, 2024
…ed objects with __slots__"

Previously `BUILD` instruction missed handling for `__slots__`. **This only applies for things allowlisted via `add_safe_globals`/`safe_globals` that use slots.**

### Background
When does pickle serialize a `BUILD` instruction? When `state` is not `None` and `state_setter` is `None` [[link](https://github.com/python/cpython/blob/c5b99f5c2c5347d66b9da362773969c531fb6c85/Lib/pickle.py#L765)]. In this case, the docs tell us that either `__setstate__` or a `__dict__` update will be performed [[link](https://github.com/python/cpython/blob/3.13/Lib/pickletools.py#L1984)]

`__reduce__`/`__reduce_ex__` are expected to return tuples of length 2 to 6 where `state` is the 3rd argument. When user doesn't patch `__reduce__` but patches `__setstate__`/`__getstate__`, state will be what is yielded by `__getstate__`

Note the return type for [`__getstate__` ](https://docs.python.org/3/library/pickle.html#object.__getstate__)

- For a class that has no instance [`__dict__`](https://docs.python.org/3/reference/datamodel.html#object.__dict__) and no [`__slots__`](https://docs.python.org/3/reference/datamodel.html#object.__slots__), the default state is None.
- For a class that has an instance [`__dict__`](https://docs.python.org/3/reference/datamodel.html#object.__dict__) and no [`__slots__`](https://docs.python.org/3/reference/datamodel.html#object.__slots__), the default state is `self.__dict__`.
- For a class that has an instance [`__dict__`](https://docs.python.org/3/reference/datamodel.html#object.__dict__) and [`__slots__`](https://docs.python.org/3/reference/datamodel.html#object.__slots__), the default state is a tuple consisting of two dictionaries: `self.__dict__`, and a dictionary mapping slot names to slot values. Only slots that have a value are included in the latter.
- For a class that has [`__slots__`](https://docs.python.org/3/reference/datamodel.html#object.__slots__) and no instance [`__dict__`](https://docs.python.org/3/reference/datamodel.html#object.__dict__), the default state is a tuple whose first item is None and whose second item is a dictionary mapping slot names to slot values described in the previous bullet.

see handling in pickle code https://github.com/python/cpython/blob/c5b99f5c2c5347d66b9da362773969c531fb6c85/Lib/pickle.py#L1846-L1867

Before this PR, we didn't account for the fact that when `__setstate__` is not defined, `state` might be a tuple so this would fail

```python
from dataclasses import dataclass

# Define the dataclass
dataclass
class MyDataClass:
    __slots__ = ["x", "y"]
    x: int
    y: str
# Create an instance of the dataclass
my_data = MyDataClass(x=2, y=3)
# Save the dataclass to a file
torch.save(my_data, "my_data.pt")
with torch.serialization.safe_globals([MyDataClass]):
    loaded_my_data = torch.load("my_data.pt", weights_only=True)
# AttributeError: 'MyDataClass' object has no attribute '__dict__'
```






[ghstack-poisoned]
Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@malfet
Copy link
Contributor

malfet commented Oct 31, 2024

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 31, 2024
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pytorchmergebot pushed a commit that referenced this pull request Nov 1, 2024
…ependency) (#139221)

Fixes #129698

#139106 without pickletools

Pull Request resolved: #139221
Approved by: https://github.com/malfet
ghstack dependencies: #138936
pytorchmergebot pushed a commit that referenced this pull request Nov 1, 2024
pytorchmergebot pushed a commit that referenced this pull request Nov 4, 2024
…139541)

This is tested in PR stacked above in

```python
python test/distributed/fsdp/test_fsdp_state_dict.py TestFSDPStateDict.test_torch_save_load
```

We cannot depend on whether `hasattr(..., __slots__)` to know whether a BUILD instruction has slotstate. For example, if a class subclasses ABC `hasattr(__slots__)` will be `True` but there might be no slots (and hence `state` will not be a tuple). So revert #138936 to following the pickle library's code

```python

>>> from abc import ABC
>>> hasattr(ABC, "__slots__")
True
```

So

```python
import torch
from abc import ABC
from dataclasses import dataclass

class Foo(ABC):
    pass

class FooWrapper(Foo):
    def __init__(self, x, y):
        self.x = x
        self.y = y

f = FooWrapper(1, 2)
torch.save(f, "temp.pt")
with torch.serialization.safe_globals([FooWrapper]):
    torch.load("temp.pt")
```

Would fail on the previous code with
```
File "/data/users/mg1998/pytorch/torch/serialization.py", line 1934, in _load
    result = unpickler.load()
  File "/data/users/mg1998/pytorch/torch/_weights_only_unpickler.py", line 366, in load
    for k, v in slotstate.items():
```

As there is actually no slotstate

Pull Request resolved: #139541
Approved by: https://github.com/malfet
ghstack dependencies: #138936, #139221, #139433
pytorchmergebot pushed a commit that referenced this pull request Nov 4, 2024
pytorchmergebot pushed a commit that referenced this pull request Nov 4, 2024
Prevent same global from being added multiple times

Pull Request resolved: #139303
Approved by: https://github.com/janeyx99
ghstack dependencies: #138936, #139221, #139433, #139541, #137602
rahulsingh-intel pushed a commit to rahulsingh-intel/pytorch that referenced this pull request Nov 5, 2024
…with __slots__ (pytorch#138936)

Previously `BUILD` instruction missed handling for `__slots__`. **This only applies for things allowlisted via `add_safe_globals`/`safe_globals` that use slots.**

### Background
When does pickle serialize a `BUILD` instruction? When `state` is not `None` and `state_setter` is `None` [[link](https://github.com/python/cpython/blob/c5b99f5c2c5347d66b9da362773969c531fb6c85/Lib/pickle.py#L765)]. In this case, the docs tell us that either `__setstate__` or a `__dict__` update will be performed [[link](https://github.com/python/cpython/blob/3.13/Lib/pickletools.py#L1984)]

`__reduce__`/`__reduce_ex__` are expected to return tuples of length 2 to 6 where `state` is the 3rd argument. When user doesn't patch `__reduce__` but patches `__setstate__`/`__getstate__`, state will be what is yielded by `__getstate__`

Note the return type for [`__getstate__` ](https://docs.python.org/3/library/pickle.html#object.__getstate__)

- For a class that has no instance [`__dict__`](https://docs.python.org/3/reference/datamodel.html#object.__dict__) and no [`__slots__`](https://docs.python.org/3/reference/datamodel.html#object.__slots__), the default state is None.
- For a class that has an instance [`__dict__`](https://docs.python.org/3/reference/datamodel.html#object.__dict__) and no [`__slots__`](https://docs.python.org/3/reference/datamodel.html#object.__slots__), the default state is `self.__dict__`.
- For a class that has an instance [`__dict__`](https://docs.python.org/3/reference/datamodel.html#object.__dict__) and [`__slots__`](https://docs.python.org/3/reference/datamodel.html#object.__slots__), the default state is a tuple consisting of two dictionaries: `self.__dict__`, and a dictionary mapping slot names to slot values. Only slots that have a value are included in the latter.
- For a class that has [`__slots__`](https://docs.python.org/3/reference/datamodel.html#object.__slots__) and no instance [`__dict__`](https://docs.python.org/3/reference/datamodel.html#object.__dict__), the default state is a tuple whose first item is None and whose second item is a dictionary mapping slot names to slot values described in the previous bullet.

see handling in pickle code https://github.com/python/cpython/blob/c5b99f5c2c5347d66b9da362773969c531fb6c85/Lib/pickle.py#L1846-L1867

Before this PR, we didn't account for the fact that when `__setstate__` is not defined, `state` might be a tuple so this would fail

```python
from dataclasses import dataclass

# Define the dataclass
@DataClass
class MyDataClass:
    __slots__ = ["x", "y"]
    x: int
    y: str
# Create an instance of the dataclass
my_data = MyDataClass(x=2, y=3)
# Save the dataclass to a file
torch.save(my_data, "my_data.pt")
with torch.serialization.safe_globals([MyDataClass]):
    loaded_my_data = torch.load("my_data.pt", weights_only=True)
# AttributeError: 'MyDataClass' object has no attribute '__dict__'
```

Pull Request resolved: pytorch#138936
Approved by: https://github.com/malfet
rahulsingh-intel pushed a commit to rahulsingh-intel/pytorch that referenced this pull request Nov 5, 2024
rahulsingh-intel pushed a commit to rahulsingh-intel/pytorch that referenced this pull request Nov 5, 2024
@github-actions github-actions bot deleted the gh/mikaylagawarecki/278/head branch December 1, 2024 02:22
Esquains pushed a commit to Esquains/study1 that referenced this pull request Dec 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged release notes: python_frontend python frontend release notes category topic: bug fixes topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants