-
Notifications
You must be signed in to change notification settings - Fork 30.9k
Add Doc Tests for Reformer PyTorch #16565
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The documentation is not available anymore as the PR was closed or merged. |
|
I found two errors in the ReformerForMaskedLM.forward example.
# before
model = ReformerForMaskedLM.from_pretrained("google/reformer-crime-and-punishment")
# AssertionError: If you want to use `ReformerForMaskedLM` make sure `config.is_decoder=False` for bi-directional self-attention.
# after
model = ReformerForMaskedLM.from_pretrained("google/reformer-crime-and-punishment", is_decoder=False)
# It's OK.
mask_token_index = (inputs.input_ids == tokenizer.mask_token_id)[0].nonzero(as_tuple=True)[0]
# TypeError: 'bool' object is not subscriptable
# Because tokenizer.mask_token_id is None. |
|
This issue reports same problems. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got AssertionError: If you want to use `ReformerModelWithLMHead` make sure that `is_decoder=True`. with model = ReformerModelWithLMHead.from_pretrained("hf-internal-testing/tiny-random-reformer").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For ReformerModelWithLMHead, we could use the checkpoint google/reformer-crime-and-punishment, and we won't have the issue regarding is_decoder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
print loss instead of loss.backward() because I got AssertionError: If you want to train `ReformerModel` and its variations, make sure to use `model.train()` to put the model into training mode..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it is a good idea to set model.train() before loss = model(**inputs, labels=labels).loss?
So the users will know it is required :-)
cc @patrickvonplaten for comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the suggestion!
I tried setting model.train(), but got ValueError: If training, sequence length 11 has to be a multiple of least common multiple chunk_length 4. Please consider padding the input to a length of 12. at loss = model(**inputs, labels=labels).loss.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes indeed that's very Reformer specific, but nice that you caught it :-) Could we use a sequence length of 12 here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the comment.
I tried a sequence length of 12 like below:
import torch
from transformers import ReformerTokenizer, ReformerForSequenceClassification
tokenizer = ReformerTokenizer.from_pretrained("hf-internal-testing/tiny-random-reformer")
tokenizer.add_special_tokens({'pad_token': '[PAD]'})
model = ReformerForSequenceClassification.from_pretrained("hf-internal-testing/tiny-random-reformer", problem_type="multi_label_classification")
inputs = tokenizer("Hello, my dog is cute", max_length=12, padding="max_length", return_tensors="pt")
with torch.no_grad():
logits = model(**inputs).logits
predicted_class_id = logits.argmax().item()
model.config.id2label[predicted_class_id]
# To train a model on `num_labels` classes, you can pass `num_labels=num_labels` to `.from_pretrained(...)`
num_labels = len(model.config.id2label)
model = ReformerForSequenceClassification.from_pretrained("hf-internal-testing/tiny-random-reformer", num_labels=num_labels)
model.train()
num_labels = len(model.config.id2label)
labels = torch.nn.functional.one_hot(torch.tensor([predicted_class_id]), num_classes=num_labels).to(
torch.float
)
loss = model(**inputs, labels=labels).loss
loss.backward()However, I got ValueError: If training, make sure that config.axial_pos_shape factors: (4, 25) multiply to sequence length. Got prod((4, 25)) != sequence_length: 12. You might want to consider padding your sequence length to 100 or changing config.axial_pos_shape. at loss = model(**inputs, labels=labels).loss in this time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then, I tried a sequence length of 100.
It seems to work fine!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ydshieh @patrickvonplaten
I pushed this change. Could you please check that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will take a look. However, we will wait Patrick's review anyway as he has more knowledge on this model than me (he is currently not available).
|
Hi, @hiromu166 , could you remind me of the reasons why you need to overwrite the code sample in the model file, instead of just using (sorry if you already mentioned before!) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for using replace_return_docstrings here is below:
- We need to load the model with is_decoder=True. (line:2230)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for using replace_return_docstrings here is below:
- ReformerTokenizer's mask_token is
None, so we need to set it. (line: 2360) len(tokenizer.tokenize("The capital of France is [MASK]."))is 16, butlen(tokenizer.tokenize("The capital of France is Paris."))is 17. So, we need to match the lengths. (line: 2378)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, Reformer is in fact an auto-regressive generative model, not really intended to be used as Masked LM model.
I saw that you already use hf-internal-testing/tiny-random-reformer for ReformerForMaskedLM, so we don't have issue regarding is_decoder.
We still have issue regarding mask token, so good to use replace_return_docstrings! Thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for using replace_return_docstrings here is below:
I got an error when callingloss.backward(). So, I replaced it with simply print loss. (line: 2530)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update:
- adding pad_token to pad inputs. (line: 2509)
- padding max_length=100 to avoid the error. (line: 2510)
- model.train() to call
loss.backward(). (line: 2526)
|
Hi @ydshieh, I commented about the reason for using |
|
Hi, @hiromu166 I think we can change back to and use this ( For Let's try to make these 2 working first :-) |
|
Hi @ydshieh, thank you for the suggestion. OK, I'll change
|
|
Hi, @hiromu166 Could you try to resolve the conflicts (This is not necessary for me to review, but is required to fix the conflict before merging) I will review this PR tomorrow :-) |
265fe36 to
b2fef30
Compare
|
Hi @ydshieh, I tried to resolve the conflict like below. Is this OK? git fetch upstream
git rebase upstream/main
-- fix conflict --
git rebase --continue
git push -f origin add_doctest_reformer_pt |
Looks good! I will review this PR now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @hiromu166, especially for the details that require some attentions 💯
@patrickvonplaten Could you have a final look 🙏
(run locally -> tests pass)
| >>> # add pad_token | ||
| >>> tokenizer.add_special_tokens({"pad_token": "[PAD]"}) # doctest: +IGNORE_RESULT | ||
| >>> inputs = tokenizer("Hello, my dog is cute", max_length=100, padding="max_length", return_tensors="pt") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part is required for the next block example, not this current block.
So a good reason to overwrite the code sample here.
| >>> model = ReformerForSequenceClassification.from_pretrained( | ||
| ... "hf-internal-testing/tiny-random-reformer", num_labels=num_labels | ||
| ... ) | ||
| >>> model.train() # doctest: +IGNORE_RESULT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@patrickvonplaten I am not sure if we should add this in doc.py. (Currently this is not added in doc.py) Without this, we are not in the train model, and might be a bit misleading.
| >>> labels = tokenizer("The capital of France is Paris.", return_tensors="pt")["input_ids"] | ||
| >>> # mask labels of non-[MASK] tokens | ||
| >>> labels = torch.where( | ||
| ... inputs.input_ids == tokenizer.mask_token_id, labels[:, : inputs["input_ids"].shape[-1]], -100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirmed the special treatment here is required to make the example run.
(Better to have a good way to handle this in doc.py in the future)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me - great job!
|
PR Merged. Thank you again, @hiromu166 ❤️ ! |
|
I'm glad it was merged smoothly. Thank you for your cooperation!! |
|
Thanks a mille for your contribution @hiromu166 ! |
* start working * fix: ReformerForQA doctest * fix: ReformerModelWithLMHead doctest * fix: ReformerModelForSC doctest * fix: ReformerModelForMLM doctest * add: documentation_tests.txt * make fixup * change: ReformerModelForSC doctest * change: checkpoint
What does this PR do?
#16292
Fixing doc tests in modeling_reformer.py.
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@patrickvonplaten
@ydshieh
@patil-suraj