-
Notifications
You must be signed in to change notification settings - Fork 30.9k
documentation: some minor clean up #16850
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The documentation is not available anymore as the PR was closed or merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you @mingboiz!
| src/transformers/models/wav2vec2/tokenization_wav2vec2.py | ||
| src/transformers/models/wav2vec2_with_lm/processing_wav2vec2_with_lm.py | ||
| src/transformers/models/wavlm/modeling_wavlm.py | ||
| src/transformers/models/ctrl/modeling_ctrl.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this an intended change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - during the final merge this line was removed from utils/documentation_tests.txt.
I wasn't able to track which exact commit in the linked PR that caused this change but I guess it happened because this line was added originally in one of the multiple rebases on main needed to pass CI (the PR was originally started in 4.17.dev0 😅 ), so it was tracked as an additional but unrelated to debertav2 changes.
So I removed that line in the final commit of the PR and now restoring it here for parity!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed!
| index of the token comprising a given character or the span of characters corresponding to a given token). Currently | ||
| no "Fast" implementation is available for the SentencePiece-based tokenizers (for T5, ALBERT, CamemBERT, XLM-RoBERTa | ||
| and XLNet models). | ||
| index of the token comprising a given character or the span of characters corresponding to a given token). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was also an unrelated change to debertav2 in the linked PR - currently there's Fast implementation for the tokenizers of the mentioned models, so making the change here!
|
Thank you! |
What does this PR do?
This cleans up some minor documentation changes unrelated to debertav2 in the PR #15529 so I'm opening up this PR just for repo history. Please do let me know if this is alright 😄
Before submitting
Pull Request section?
to it if that's the case.
feat: add debertav2 fast tokenizer #15529
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.