KEMBAR78
Add the appropriate check on div_value to the cpp frontend by FFFrog · Pull Request #114671 · pytorch/pytorch · GitHub
Skip to content

Conversation

@FFFrog
Copy link
Collaborator

@FFFrog FFFrog commented Nov 28, 2023

Fixes #114334

As the title stated.

@pytorch-bot pytorch-bot bot added the release notes: cpp release notes category label Nov 28, 2023
@pytorch-bot
Copy link

pytorch-bot bot commented Nov 28, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/114671

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 3c8f5c9 with merge base 373f206 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@mikaylagawarecki mikaylagawarecki added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Nov 28, 2023
Copy link
Contributor

@mikaylagawarecki mikaylagawarecki Nov 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this again, it seems like there is an inconsistency between the Python and C++ API here

In this case,

int64_t hsz = options.in_features() / static_cast<int64_t>(std::pow(options.div_value(), (i + 1)));

will be

int64_t hsz = 16 / static_cast<int64_t>(std::pow(options.div_value(), (i + 1)));

which gives 16/0

on the other hand in the python API the corresponding line

hsz = int(self.in_features // (self.div_value ** (i + 1)))

would give

int(16//0.25 ** 1) = 64

I did not notice that the issue used 0 for the div_value in Python rather than 0.25 (which was used for C++), which does not error

import torch
m = torch.nn.AdaptiveLogSoftmaxWithLoss(1,3,(1,),div_value=0.25)

Instead of a check on the denominator, maybe we need to do

static_cast<int64_t>(std::floor(options.in_features() / std::pow(options.div_value(), (i + 1))));

And also a separate TORCH_CHECK that div_value is not 0 (iiuc that is the only case where std::pow(div_value, i) will be 0)

Apologies for the churn! Let me know if you agree with the above analysis

Copy link
Collaborator Author

@FFFrog FFFrog Nov 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, It was my mistake, I agree with you very much, and have fixed it again, please have a look at it again.
Thank you.

Copy link
Contributor

@mikaylagawarecki mikaylagawarecki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

cc @albanD on whether this is considered a bugfix or BC-breaking

{
// test div_value
auto options =
AdaptiveLogSoftmaxWithLossOptions(16, 20, {4, 10, 15}).div_value(0.);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could we also check that .div_value(0.25) doesn't error here

@FFFrog
Copy link
Collaborator Author

FFFrog commented Dec 1, 2023

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Dec 1, 2023
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@FFFrog
Copy link
Collaborator Author

FFFrog commented Dec 1, 2023

@pytorchbot merge -i

@FFFrog
Copy link
Collaborator Author

FFFrog commented Dec 4, 2023

@pytorchbot merge -i

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged while ignoring the following 0 checks:

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@FFFrog FFFrog deleted the mrl_fix_div_value branch December 4, 2023 01:34
@mikaylagawarecki mikaylagawarecki added the topic: bug fixes topic category label Dec 7, 2023
dmenig pushed a commit to dmenig/pytorch that referenced this pull request Dec 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged open source release notes: cpp release notes category topic: bug fixes topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AdaptiveLogSoftmaxWithLoss division by zero

4 participants