KEMBAR78
Issue #17514: New ANTLR Grammar for Javadoc Comments by mahfouz72 · Pull Request #17837 · checkstyle/checkstyle · GitHub
Skip to content

Conversation

@mahfouz72
Copy link
Member

@mahfouz72 mahfouz72 commented Sep 27, 2025

Closes #17514
Related to: New ANTLR Grammar for Javadoc Comments Project - GSoC '25

Key Changes

  • Implemented a new Javadoc lexer and parser in ANTLR with full coverage of the Javadoc Specification
  • Added a visitor-based AST construction layer to map ANTLR parse trees into Checkstyle’s internal AST.
  • Integrated the new parser into Checkstyle’s APIs.
  • Updated all AbstractJavadocCheck subclasses and Javadoc-related checks to work with the new AST.
  • Added extensive AST tests and regression tests to ensure correctness and stability.

Impact

  • Improves maintainability and long-term stability of Javadoc parsing.
  • Provides a cleaner and more consistent AST structure for checks.
  • Backward compatibility preserved, existing checks continue to work after migration.
  • Expected minor differences in parsing behavior compared to the old parser (verified through regressions).

Testing

  • Unit tests cover the full grammar and AST conversion layer.
  • Regression testing performed on large projects confirmed correctness and identified only expected differences.

Diff Regression config: https://gist.githubusercontent.com/mahfouz72/5c0b5dccbeb20670915d5f8fe96e9527/raw/4c0abeb793315fd57facd9f2636ba1621e08b1e4/google_checks.xml
Diff Regression projects: https://gist.githubusercontent.com/mahfouz72/e175be6712b9073ad7340fdaa752fc4e/raw/b562141180075cab53efa10a46c8a3d5a1b3050c/projects-to-test-on.properties

@mahfouz72
Copy link
Member Author

Github, generate report for configs in PR description

@github-actions
Copy link
Contributor

Report generation failed. Please check the logs for more details.

Link: https://github.com/checkstyle/checkstyle/actions/runs/18065012961

@romani
Copy link
Member

romani commented Sep 27, 2025

Github, generate report

@github-actions
Copy link
Contributor

Report generation failed on phase "make_report",
step "Generate report".
Link: https://github.com/checkstyle/checkstyle/actions/runs/18065103825

@mahfouz72
Copy link
Member Author

Github, generate report

1 similar comment
@mahfouz72
Copy link
Member Author

Github, generate report

@github-actions
Copy link
Contributor

Report generation failed on phase "make_report",
step "Generate report".
Link: https://github.com/checkstyle/checkstyle/actions/runs/18065175662

@github-actions
Copy link
Contributor

@mahfouz72
Copy link
Member Author

Github, generate report

1 similar comment
@mahfouz72
Copy link
Member Author

Github, generate report

@github-actions
Copy link
Contributor

@romani
Copy link
Member

romani commented Oct 1, 2025

Task :postgresql:checkstyleMain
[ant:checkstyle] [ERROR] /home/circleci/project/.ci-temp/pgjdbc/pgjdbc/src/main/java/org/postgresql/jdbc/
ArrayEncoding.java:30:4: <p> tag should not precede HTML block-tag '<ul>', <p> tag should be removed.
 [JavadocParagraph]
[ant:checkstyle] [ERROR] /home/circleci/project/.ci-temp/pgjdbc/pgjdbc/src/main/java/org/postgresql/sspi/
NTDSAPI.java:23:6: <p> tag should not precede HTML block-tag '<pre>', <p> tag should be removed.
 [JavadocParagraph]

please fix false positives

@mahfouz72 mahfouz72 force-pushed the new-javadoc-grammar branch 6 times, most recently from 04a3c99 to c9f2947 Compare October 1, 2025 23:16
Copy link
Member

@romani romani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please apply the same logic to most others in whitelist

Fheader
fht
fieldannotations
Fieldname
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔ ~/github/romani/checkstyle [checkstyle-GSoC25/new-javadoc-grammar L|✔] 
$ ag "Fieldname"
config/jsoref-spellchecker/whitelist.words
432:Fieldname

src/test/java/com/puppycrawl/tools/checkstyle/checks/javadoc/AbstractJavadocCheckTest.java
132:                    29, "token recognition error at: '-'", "Fieldname"),
419:                    29, "token recognition error at: '-'", "Fieldname"),

src/test/resources/com/puppycrawl/tools/checkstyle/checks/javadoc/abstractjavadoc/InputAbstractJavadocInvalidLexing.java
10:     * @serialField Fieldname-fieldtype-fielddescription
13:    // Details: token recognition error at: '-' while parsing Fieldname

src/test/resources/com/puppycrawl/tools/checkstyle/checks/javadoc/abstractjavadoc/InputAbstractJavadocNoWsBeforeDescriptionInJavadocTags.java
27:     * @serialField Fieldname-fieldtype-fielddescription
30:    // Details: token recognition error at: '-' while parsing Fieldname
35:     * @serialField Fieldname fieldtype-fielddescription
40:     * @serialField Fieldname -fieldtype -fielddescription

I think we can easily avoid this new word in while list.
Fieldname --> fieldName

htdocs
html
HTMLJS
htmltags
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔ ~/github/romani/checkstyle [checkstyle-GSoC25/new-javadoc-grammar L|✔] 
$ ag -s "htmltags"
config/jsoref-spellchecker/whitelist.words
560:htmltags

src/test/java/com/puppycrawl/tools/checkstyle/grammar/javadoc/JavadocCommentsAstRegressionTest.java
40:        return getPath("htmltags" + File.separator + filename);

can we use htmlTags ? we already have such folder

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a package name, we use all lowercase letters for package names across the codebase

Popup
Postgresql
powershell
prc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔ ~/github/romani/checkstyle [checkstyle-GSoC25/new-javadoc-grammar L|✔] 
$ ag -s "prc"
config/jsoref-spellchecker/whitelist.words
1099:prc

src/main/java/com/puppycrawl/tools/checkstyle/JavadocCommentsAstVisitor.java
153:        if (tag instanceof ParserRuleContext prc) {
154:            final Token tagName = (Token) prc.getChild(1).getPayload();
283:        if (tagContent instanceof ParserRuleContext prc) {
284:            final Token tagName = (Token) prc.getChild(0).getPayload();

ruleContext or context

@romani
Copy link
Member

romani commented Oct 3, 2025

GitHub, generate website

@github-actions
Copy link
Contributor

github-actions bot commented Oct 3, 2025

https://checkstyle-diff-reports.s3.us-east-2.amazonaws.com/9f50866_20251003010133/index.html

https://checkstyle-diff-reports.s3.us-east-2.amazonaws.com/9f50866_20251003010133/checks/annotation/missingdeprecated.html#Violation_Messages

https://checkstyle-diff-reports.s3.us-east-2.amazonaws.com/9f50866_20251003010133/checks/javadoc/atclauseorder.html#Examples

https://checkstyle-diff-reports.s3.us-east-2.amazonaws.com/9f50866_20251003010133/checks/javadoc/javadocblocktaglocation.html#Violation_Messages

https://checkstyle-diff-reports.s3.us-east-2.amazonaws.com/9f50866_20251003010133/checks/javadoc/javadocleadingasteriskalign.html#Violation_Messages

https://checkstyle-diff-reports.s3.us-east-2.amazonaws.com/9f50866_20251003010133/checks/javadoc/javadocmissingleadingasterisk.html#Violation_Messages

https://checkstyle-diff-reports.s3.us-east-2.amazonaws.com/9f50866_20251003010133/checks/javadoc/javadocmissingwhitespaceafterasterisk.html#Violation_Messages

https://checkstyle-diff-reports.s3.us-east-2.amazonaws.com/9f50866_20251003010133/checks/javadoc/javadocparagraph.html#Violation_Messages

https://checkstyle-diff-reports.s3.us-east-2.amazonaws.com/9f50866_20251003010133/checks/javadoc/javadoctagcontinuationindentation.html#Violation_Messages

https://checkstyle-diff-reports.s3.us-east-2.amazonaws.com/9f50866_20251003010133/checks/javadoc/nonemptyatclausedescription.html#Properties

https://checkstyle-diff-reports.s3.us-east-2.amazonaws.com/9f50866_20251003010133/checks/javadoc/requireemptylinebeforeblocktaggroup.html#Violation_Messages

https://checkstyle-diff-reports.s3.us-east-2.amazonaws.com/9f50866_20251003010133/checks/javadoc/singlelinejavadoc.html#Violation_Messages

https://checkstyle-diff-reports.s3.us-east-2.amazonaws.com/9f50866_20251003010133/checks/javadoc/summaryjavadoc.html#Violation_Messages

https://checkstyle-diff-reports.s3.us-east-2.amazonaws.com/9f50866_20251003010133/checks/misc/orderedproperties.html#Description

@mahfouz72 mahfouz72 force-pushed the new-javadoc-grammar branch 4 times, most recently from 4c0e4ff to e65052c Compare October 3, 2025 08:24
@mahfouz72 mahfouz72 force-pushed the new-javadoc-grammar branch 3 times, most recently from 6f3e470 to 77c774a Compare October 3, 2025 21:16
@mahfouz72 mahfouz72 force-pushed the new-javadoc-grammar branch from 77c774a to 8dad8c7 Compare October 4, 2025 15:25
@romani
Copy link
Member

romani commented Oct 4, 2025

GitHub, generate website

@romani
Copy link
Member

romani commented Oct 4, 2025

Github, generate report

@github-actions
Copy link
Contributor

github-actions bot commented Oct 4, 2025

https://checkstyle-diff-reports.s3.us-east-2.amazonaws.com/8dad8c7_20251004154112/index.html

https://checkstyle-diff-reports.s3.us-east-2.amazonaws.com/8dad8c7_20251004154112/checks/annotation/missingdeprecated.html#Violation_Messages

https://checkstyle-diff-reports.s3.us-east-2.amazonaws.com/8dad8c7_20251004154112/checks/javadoc/atclauseorder.html#Examples

https://checkstyle-diff-reports.s3.us-east-2.amazonaws.com/8dad8c7_20251004154112/checks/javadoc/javadocblocktaglocation.html#Violation_Messages

https://checkstyle-diff-reports.s3.us-east-2.amazonaws.com/8dad8c7_20251004154112/checks/javadoc/javadocleadingasteriskalign.html#Violation_Messages

https://checkstyle-diff-reports.s3.us-east-2.amazonaws.com/8dad8c7_20251004154112/checks/javadoc/javadocmissingleadingasterisk.html#Violation_Messages

https://checkstyle-diff-reports.s3.us-east-2.amazonaws.com/8dad8c7_20251004154112/checks/javadoc/javadocmissingwhitespaceafterasterisk.html#Violation_Messages

https://checkstyle-diff-reports.s3.us-east-2.amazonaws.com/8dad8c7_20251004154112/checks/javadoc/javadocparagraph.html#Violation_Messages

https://checkstyle-diff-reports.s3.us-east-2.amazonaws.com/8dad8c7_20251004154112/checks/javadoc/javadoctagcontinuationindentation.html#Violation_Messages

https://checkstyle-diff-reports.s3.us-east-2.amazonaws.com/8dad8c7_20251004154112/checks/javadoc/nonemptyatclausedescription.html#Properties

https://checkstyle-diff-reports.s3.us-east-2.amazonaws.com/8dad8c7_20251004154112/checks/javadoc/requireemptylinebeforeblocktaggroup.html#Violation_Messages

https://checkstyle-diff-reports.s3.us-east-2.amazonaws.com/8dad8c7_20251004154112/checks/javadoc/singlelinejavadoc.html#Violation_Messages

https://checkstyle-diff-reports.s3.us-east-2.amazonaws.com/8dad8c7_20251004154112/checks/javadoc/summaryjavadoc.html#Violation_Messages

https://checkstyle-diff-reports.s3.us-east-2.amazonaws.com/8dad8c7_20251004154112/checks/misc/orderedproperties.html#Description

@github-actions
Copy link
Contributor

github-actions bot commented Oct 4, 2025

Copy link
Member

@romani romani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for your hard work!!!!

I can not review all code changes, but it was reviewed by others, so let's move forward

@romani romani merged commit c8c13f4 into checkstyle:master Oct 4, 2025
120 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

New ANTLR Grammar for Javadoc Comments

2 participants