KEMBAR78
Solving pickle error when saving CyclicLR state_dict by ancestor-mithril · Pull Request #110931 · pytorch/pytorch · GitHub
Skip to content

Conversation

@ancestor-mithril
Copy link
Contributor

@ancestor-mithril ancestor-mithril commented Oct 10, 2023

How to reproduce:

import os
import tempfile

import torch
from torch import nn
from torch.optim import SGD
from torch.optim.lr_scheduler import CyclicLR


model = nn.Linear(100, 100)
opt = SGD(model.parameters(), lr=1.)
scheduler = CyclicLR(opt, base_lr=0.1, max_lr=0.2, scale_fn=lambda x: 0.99)

tmp = tempfile.NamedTemporaryFile(delete=False)
try:
    torch.save(scheduler.state_dict(), tmp.name)
    scheduler.load_state_dict(torch.load(tmp.name))
finally:
    tmp.close()
    os.unlink(tmp.name)

Error:

_pickle.PicklingError: Can't pickle <function <lambda> at 0x000001A51DF67600>: attribute lookup <lambda> on __main__ failed

Fix:

Saving scale_fn to the state dict only if it is a callable object and not if it is a function or lambda.

@pytorch-bot
Copy link

pytorch-bot bot commented Oct 10, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/110931

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 8f4d1a1 with merge base c77a4a4 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Copy link
Contributor

@janeyx99 janeyx99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix, could you add a test case to ensure this behavior won't regress in test_lrschedulers.py?

Also, lint! Oh it looks like they've all been canceled. Maybe repushing will fix it.

@janeyx99 janeyx99 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Oct 10, 2023
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be None in both situations? I would expect it to not be none here...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this should be None in both situations, since scale_fn is a lamba function and not a callable object with an internal state that can be pickled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be a third case where a custom callable is passed in then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added the third case now.

Copy link
Contributor

@janeyx99 janeyx99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just one comment!

Copy link
Contributor

@janeyx99 janeyx99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Sorry for the delay--I have just gotten back from traveling. Looks good to merge once CI is green (lint seems to be failing)

@janeyx99
Copy link
Contributor

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 15, 2023
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: Command git -C /home/runner/work/pytorch/pytorch fetch origin returned non-zero exit code 1

From https://github.com/pytorch/pytorch
   2164598c405..5d170fce290  main                    -> origin/main
   dbf98c1f602..f4512eaae57  csl/add_gitignore       -> origin/csl/add_gitignore
 * [new branch]              csl/consistent_pytest_cache -> origin/csl/consistent_pytest_cache
 * [new branch]              csl/coverage            -> origin/csl/coverage
 * [new branch]              csl/onnx_no_retry       -> origin/csl/onnx_no_retry
 + fe705c0d795...a81999adc7c csl/test3               -> origin/csl/test3  (forced update)
 * [new branch]              dcp_stateful            -> origin/dcp_stateful
   44ba4e09e82..a09b32b5468  gh/BowenBao/316/base    -> origin/gh/BowenBao/316/base
   6ae6bbbb577..c5f74dd8a83  gh/BowenBao/316/head    -> origin/gh/BowenBao/316/head
 + ca9339d5710...1cde29c9c47 gh/BowenBao/316/orig    -> origin/gh/BowenBao/316/orig  (forced update)
   03a50c87cb7..17f90c18391  gh/BowenBao/317/base    -> origin/gh/BowenBao/317/base
   efc12322824..5ef6e72751d  gh/BowenBao/317/head    -> origin/gh/BowenBao/317/head
 + dc7e6447956...8dfc996f0f5 gh/BowenBao/317/orig    -> origin/gh/BowenBao/317/orig  (forced update)
   efc12322824..1cbce8845f8  gh/BowenBao/318/base    -> origin/gh/BowenBao/318/base
   72ce3e35bfa..33a679b6d55  gh/BowenBao/318/head    -> origin/gh/BowenBao/318/head
 + ede8c9b937d...e6d11e94edf gh/BowenBao/318/orig    -> origin/gh/BowenBao/318/orig  (forced update)
   59f89ee2bcc..1bd184228ba  gh/andrewlee302/2/base  -> origin/gh/andrewlee302/2/base
   a5d2e1f7bbc..787b2b43edd  gh/andrewlee302/2/head  -> origin/gh/andrewlee302/2/head
 + 5efd9596d56...f4f56e87268 gh/andrewlee302/2/orig  -> origin/gh/andrewlee302/2/orig  (forced update)
   3324cedf683..885abd9be93  gh/andrewor14/43/base   -> origin/gh/andrewor14/43/base
   b23eb432edb..89df0e488a9  gh/andrewor14/43/head   -> origin/gh/andrewor14/43/head
 + 43877f7b15d...0eeb1a9caa0 gh/andrewor14/43/orig   -> origin/gh/andrewor14/43/orig  (forced update)
   5dcc2908b70..add4560a427  gh/andrewor14/44/base   -> origin/gh/andrewor14/44/base
   91a2f3c4684..dfa1118c902  gh/andrewor14/44/head   -> origin/gh/andrewor14/44/head
 + 7e205fe1a06...81b631ebe21 gh/andrewor14/44/orig   -> origin/gh/andrewor14/44/orig  (forced update)
   46511a07207..e2016134077  gh/desertfire/267/head  -> origin/gh/desertfire/267/head
 + b72815cb86f...e329d9f83a9 gh/desertfire/267/orig  -> origin/gh/desertfire/267/orig  (forced update)
   e5053b71791..1fe60f729f4  gh/drisspg/11/base      -> origin/gh/drisspg/11/base
   b1660543bd1..613b808b3ed  gh/drisspg/11/head      -> origin/gh/drisspg/11/head
 + 793567b14d7...365e37ff8fb gh/drisspg/11/orig      -> origin/gh/drisspg/11/orig  (forced update)
   b202abb9cc8..77eee3483aa  gh/eellison/543/base    -> origin/gh/eellison/543/base
   a63e40e8c17..ddec675c04a  gh/eellison/543/head    -> origin/gh/eellison/543/head
 + 6af8f59df85...987a030081d gh/eellison/543/orig    -> origin/gh/eellison/543/orig  (forced update)
   a093a6492cc..fce8796c5d3  gh/ezyang/2443/head     -> origin/gh/ezyang/2443/head
 + bef6160b920...c86716323d2 gh/ezyang/2443/orig     -> origin/gh/ezyang/2443/orig  (forced update)
 * [new branch]              gh/ezyang/2447/base     -> origin/gh/ezyang/2447/base
 * [new branch]              gh/ezyang/2447/head     -> origin/gh/ezyang/2447/head
 * [new branch]              gh/ezyang/2447/orig     -> origin/gh/ezyang/2447/orig
   ee484b28275..05b51b3945c  gh/fduwjj/110/base      -> origin/gh/fduwjj/110/base
   590f4537b39..47662a926e4  gh/fduwjj/110/head      -> origin/gh/fduwjj/110/head
 + db504c041ec...4da29ca0aa9 gh/fduwjj/110/orig      -> origin/gh/fduwjj/110/orig  (forced update)
   404dc5d0119..5e31a443151  gh/fduwjj/113/base      -> origin/gh/fduwjj/113/base
   7547f693ec6..f1314660154  gh/fduwjj/113/head      -> origin/gh/fduwjj/113/head
 + b8aa5342d69...55a8fcb7944 gh/fduwjj/113/orig      -> origin/gh/fduwjj/113/orig  (forced update)
   d63789656ae..0c112f844b0  gh/jbschlosser/95/base  -> origin/gh/jbschlosser/95/base
   e5a183ac672..53391651a77  gh/jbschlosser/95/head  -> origin/gh/jbschlosser/95/head
 + e5a13c3ee35...e07dc6c2064 gh/jbschlosser/95/orig  -> origin/gh/jbschlosser/95/orig  (forced update)
   8320f8ca4e6..965e700b54f  gh/jbschlosser/96/base  -> origin/gh/jbschlosser/96/base
   750e5e711dc..577ebf22ff6  gh/jbschlosser/96/head  -> origin/gh/jbschlosser/96/head
 + bfb52cf9dfa...ddaff7b17d3 gh/jbschlosser/96/orig  -> origin/gh/jbschlosser/96/orig  (forced update)
   e5513373242..6e682e1d399  gh/muchulee8/14/base    -> origin/gh/muchulee8/14/base
   d4adce6f54c..a765ea9b8b3  gh/muchulee8/14/head    -> origin/gh/muchulee8/14/head
 + 978d65ffcfd...dc653d65044 gh/muchulee8/14/orig    -> origin/gh/muchulee8/14/orig  (forced update)
 * [new branch]              gh/pearu/134/base       -> origin/gh/pearu/134/base
 * [new branch]              gh/pearu/134/head       -> origin/gh/pearu/134/head
 * [new branch]              gh/pearu/134/orig       -> origin/gh/pearu/134/orig
   1760519357f..254dddf42cb  gh/soulitzer/251/head   -> origin/gh/soulitzer/251/head
 + a23638e8de8...67cc2cf2591 gh/soulitzer/251/orig   -> origin/gh/soulitzer/251/orig  (forced update)
   060f3c7b2dd..76a199c295c  gh/tugsbayasgalan/167/head -> origin/gh/tugsbayasgalan/167/head
 + bedd82dec84...0f655ce4a0b gh/tugsbayasgalan/167/orig -> origin/gh/tugsbayasgalan/167/orig  (forced update)
 * [new branch]              gh/wconstab/216/base    -> origin/gh/wconstab/216/base
 * [new branch]              gh/wconstab/216/head    -> origin/gh/wconstab/216/head
 * [new branch]              gh/ydwu4/42/base        -> origin/gh/ydwu4/42/base
 * [new branch]              gh/ydwu4/42/head        -> origin/gh/ydwu4/42/head
 * [new branch]              gh/ydwu4/42/orig        -> origin/gh/ydwu4/42/orig
error: cannot lock ref 'refs/remotes/origin/hoy/autotune/lanucher': 'refs/remotes/origin/hoy' exists; cannot create 'refs/remotes/origin/hoy/autotune/lanucher'
 ! [new branch]              hoy/autotune/lanucher   -> origin/hoy/autotune/lanucher  (unable to update local ref)
   f9f3cf34ca3..96c8b797d1b  mlazos/mutable          -> origin/mlazos/mutable
 * [new branch]              remove_global_ns        -> origin/remove_global_ns
 + 3963150e74c...a8b33d711bd update_submodule_FBGEMM -> origin/update_submodule_FBGEMM  (forced update)
 + 36317812df4...d43ed6f2b00 update_submodule_kineto -> origin/update_submodule_kineto  (forced update)
   7f9fafed532..cc11c0d11bf  viable/strict           -> origin/viable/strict
 * [new branch]              voz/serde               -> origin/voz/serde
 * [new tag]                 ciflow/inductor/113799  -> ciflow/inductor/113799
 * [new tag]                 ciflow/inductor/113802  -> ciflow/inductor/113802
 * [new tag]                 ciflow/inductor/113814  -> ciflow/inductor/113814
 * [new tag]                 ciflow/inductor/113815  -> ciflow/inductor/113815
 * [new tag]                 ciflow/trunk/113697     -> ciflow/trunk/113697
 * [new tag]                 ciflow/trunk/113774     -> ciflow/trunk/113774
 * [new tag]                 ciflow/trunk/113813     -> ciflow/trunk/113813
error: some local refs could not be updated; try running
 'git remote prune origin' to remove any old, conflicting branches
Details for Dev Infra team Raised by workflow job

@janeyx99
Copy link
Contributor

Argh, I am sorry. Could you locally fix the merge conflicts and kick off a @pytorchbot merge on the rebased copy?

@ancestor-mithril
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged open source release notes: optim triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants