-
Notifications
You must be signed in to change notification settings - Fork 30.9k
[Doctests] Fix all T5 doc tests #16646
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Doctests] Fix all T5 doc tests #16646
Conversation
|
The documentation is not available anymore as the PR was closed or merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing those. The examples are still a bit arcane with special values hard-coded in the middle, could you explain those a little bit better?
docs/source/en/model_doc/byt5.mdx
Outdated
| >>> input_ids = ( | ||
| ... torch.tensor([list("Life is like a box of chocolates.".encode("utf-8"))]) + 3 | ||
| >>> ) # add 3 for special tokens | ||
| >>> labels = ( | ||
| ... torch.tensor([list("La vie est comme une boîte de chocolat.".encode("utf-8"))]) + 3 | ||
| >>> ) # add 3 for special tokens |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the comment go once above to avoid the formatting on several lines? And also maybe be more helpful because I have no idea what "add 3 for special tokens" means.
docs/source/en/model_doc/byt5.mdx
Outdated
| >>> # Now Mask | ||
| >>> # Note that we can add "{extra_id_...}" to the string directly | ||
| >>> # as the Byte tokenizer would incorrectly merge the tokens | ||
| >>> # We need to work on the character level directly here | ||
| >>> # => mask to "The dog [258]a ball [257]park." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can or we can't? I don't understand this comment and why it results in using 258 and 257.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point - Added more explanation!
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
…platen/transformers into correct_t5_model_docs
* [Doctests] Fix all T5 doc tests * make style * Update docs/source/en/model_doc/t5.mdx Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Apply Sylvains comments * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
What does this PR do?
Corrects T5 model docs and adds them to doc tests
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.