KEMBAR78
Fix `fix_and_overwrite` mode of `utils/check_docstring.py` by manueldeprada · Pull Request #39369 · huggingface/transformers · GitHub
Skip to content

Conversation

@manueldeprada
Copy link
Contributor

@manueldeprada manueldeprada commented Jul 11, 2025

Fixed it @ydshieh :)

The script was failing to apply fixes because of a mismatch in how it reads the docstring for comparison versus how it reads it for replacement.

  • To detect a mismatch, the script reads the docstring from the obj.__doc__ attribute. Python automatically un-indents the string provided by __doc__.
  • To apply a fix, the script reads the code from the source file using inspect.getsourcelines(), which preserves the original indentation.

The fix_docstring function would first try to locate the argument block within the raw source code. To ensure it had found the right place, it would compare the text from the source (with indentation) against the text from __doc__ (without indentation). This comparison would then fail.

Probably the check we talked about is not needed anymore, maybe should we remove it? Or leave it as-is to be extra cautious?

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@ydshieh
Copy link
Collaborator

ydshieh commented Jul 16, 2025

Hi @manueldeprada

I won't be available to review until next Friday.

BTW, do you mean the issue you found for

#38915 (comment)

@manueldeprada
Copy link
Contributor Author

Yes, that one. Enjoy!! Its minor, it can wait for a week :)

@manueldeprada manueldeprada changed the title Bug in fix mode of utils/check_docstring.py Fix fix_and_overwrite mode of utils/check_docstring.py Jul 29, 2025
@manueldeprada
Copy link
Contributor Author

@ydshieh gentle ping on this 🤗 Btw, I have successfully fixed a few docstrings using the new version :)

Copy link
Collaborator

@ydshieh ydshieh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for diving into this!

One question:

obj.doc attribute. Python automatically un-indents the string provided by doc

Is this always? Because if so, we should have troubles at many places (before this PR) no? But we only have seen a few places.

@manueldeprada
Copy link
Contributor Author

manueldeprada commented Aug 5, 2025

Thanks for diving into this!

One question:

obj.doc attribute. Python automatically un-indents the string provided by doc

Is this always? Because if so, we should have troubles at many places (before this PR) no? But we only have seen a few places.

It only affects the fixing mode, not the checks on the docstring. I guess devs have got used to fixing the docs manually (we can ask).

Every time I had to fix a doc issue in my cache refactor this issue happened, and I had to either cherry pick this or fix the docstring manually.

Also, If a docstring is placed at root level indentation, there is no difference. But it is indeed a consistent behaviour:

>>> def f():
...     """
...     Hello
...          World
...     """
...     
>>> repr(f.__doc__)
"'\\nHello\\n     World\\n'"
>>> import inspect
>>> inspect.getsourcelines(f)[0]
['def f():\n', '    """\n', '    Hello\n', '         World\n', '    """\n']

@manueldeprada manueldeprada requested a review from ydshieh August 5, 2025 17:27
Copy link
Collaborator

@ydshieh ydshieh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. Makes sense to me. Guess you are right, this is likely not running in fix mode for long time

@manueldeprada manueldeprada merged commit cf243a1 into huggingface:main Aug 6, 2025
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants