KEMBAR78
Refactor interpolation (and string and regex) handling in lexer by lydell · Pull Request #3770 · jashkenas/coffeescript · GitHub
Skip to content

Conversation

lydell
Copy link
Collaborator

@lydell lydell commented Jan 3, 2015

See the commit messages for details.

I’m sorry about the size of the first commit, but I don’t think I could have split it up.

Fixes #3621, #3301, #3394, #3348, #2388, #2321.

@vendethiel
Copy link
Collaborator

Did not look at the code yet, but thanks for the tests, and great work!

src/lexer.coffee Outdated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like leaving TODOs around. This isn't a particularly important one either, so let's remove it.

@michaelficarra
Copy link
Collaborator

LGTM. Very nice job, especially with all the tests. This was one of the most direly needed features. 👍.

lydell added 2 commits January 4, 2015 07:47
- Fix jashkenas#3394: Unclosed single-quoted strings (both regular ones and heredocs)
  used to pass through the lexer, causing a parsing error later, while
  double-quoted strings caused an error already in the lexing phase. Now both
  single and double-quoted unclosed strings error out in the lexer (which is the
  more logical option) with consistent error messages. This also fixes the last
  comment by @satyr in jashkenas#3301.

- Similar to the above, unclosed heregexes also used to pass through the lexer
  and not error until in the parsing phase, which resulted in confusing error
  messages. This has been fixed, too.

- Fix jashkenas#3348, by adding passing tests.

- Fix jashkenas#3529: If a string starts with an interpolation, an empty string is no
  longer emitted before the interpolation (unless it is needed to coerce the
  interpolation into a string).

- Block comments cannot contain `*/`. Now the error message also shows exactly
  where the offending `*/`. This improvement might seem unrelated, but I had to
  touch that code anyway to refactor string and regex related code, and the
  change was very trivial. Moreover, it's consistent with the next two points.

- Regexes cannot start with `*`. Now the error message also shows exactly where
  the offending `*` is. (It might actually not be exatly at the start in
  heregexes.) It is a very minor improvement, but it was trivial to add.

- Octal escapes in strings are forbidden in CoffeeScript (just like in
  JavaScript strict mode). However, this used to be the case only for regular
  strings. Now they are also forbidden in heredocs. Moreover, the errors now
  point at the offending octal escape.

- Invalid regex flags are no longer allowed. This includes repeated modifiers
  and unknown ones. Moreover, invalid modifiers do not stop a heregex from
  being matched, which results in better error messages.

- Fix jashkenas#3621: `///a#{1}///` compiles to `RegExp("a" + 1)`. So does
  `RegExp("a#{1}")`. Still, those two code snippets used to generate different
  tokens, which is a bit weird, but more importantly causes problems for
  coffeelint (see clutchski/coffeelint#340). This required lots of tests in
  test/location.coffee to be updated. Note that some updates to those tests are
  unrelated to this point; some have been updated to be more consistent (I
  discovered this because the refactored code happened to be seemingly more
  correct).

- Regular regex literals used to erraneously allow newlines to be escaped,
  causing invalid JavaScript output. This has been fixed.

- Heregexes may now be completely empty (`//////`), instead of erroring out with
  a confusing message.

- Fix jashkenas#2388: Heredocs and heregexes used to be lexed simply, which meant that
  you couldn't nest a heredoc within a heredoc (double-quoted, that is) or a
  heregex inside a heregex.

- Fix jashkenas#2321: If you used division inside interpolation and then a slash later in
  the string containing that interpolation, the division slash and the latter
  slash was erraneously matched as a regex. This has been fixed.

- Indentation inside interpolations in heredocs no longer affect how much
  indentation is removed from each line of the heredoc (which is more
  intuitive).

- Whitespace is now correctly trimmed from the start and end of strings in a few
  edge cases.

- Last but not least, the lexing of interpolated strings now seems to be more
  efficient. For a regular double-quoted string, we used to use a custom
  function to find the end of it (taking interpolations and interpolations
  within interpolations etc. into account). Then we used to re-find the
  interpolations and recursively lex their contents. In effect, the same string
  was processed twice, or even more in the case of deeper nesting of
  interpolations. Now the same string is processed just once.

- Code duplication between regular strings, heredocs, regular regexes and
  heregexes has been reduced.

- The above two points should result in more easily read code, too.
Previously such errors pointed at the end of the input, which wasn't very
helpful. This is also consistent with unclosed strings, where the errors point
at the opening quote.

Note that this includes unclosed #{ (interpolations).
@lydell
Copy link
Collaborator Author

lydell commented Jan 4, 2015

I have fixed all mentioned issues now. Moreover, now it also fixes #3529. Ready for another round of review.

@michaelficarra
Copy link
Collaborator

LGTM.

@epidemian
Copy link
Contributor

This looks very good. Thanks for putting such work on some of the hairy corners of the compiler, @lydell!

michaelficarra added a commit that referenced this pull request Jan 4, 2015
Refactor interpolation (and string and regex) handling in lexer
@michaelficarra michaelficarra merged commit b70f657 into jashkenas:master Jan 4, 2015
@lydell lydell deleted the interpolations branch January 4, 2015 20:35
@xixixao
Copy link
Contributor

xixixao commented Jan 5, 2015

@lydell 🏆

@jashkenas
Copy link
Owner

And let's tag 'em as we close 'em ;)

@michaelficarra
Copy link
Collaborator

I wasn't sure how to tag a PR that fixes a bug.

@jashkenas
Copy link
Owner

It was a big ol' change/rewrite, so I'm going with "change fixed". If it had just been a single bug or two, then "bug fixed".

@vendethiel
Copy link
Collaborator

I'm not sure we need the "fixed" tag. I mean, a "bug"-tagged PR without the "fixed" tag would mean the PR added a bug on purpose?

@jashkenas
Copy link
Owner

We need it. It's nice to be able to search through Issues and PR's consistently, and that's harder if we only keep things organized half the time. On occasion, a minor, but unfixable-at-the-time bug might be closed without fixing, and we want to be able to find those as well.

lydell added a commit to lydell/coffee-script that referenced this pull request Jan 12, 2015
Instead of compiling to `"" + + (+"-");`, `"#{+}-"'` now gives an appropriate
error message:

    [stdin]:1:5: error: unexpected end of interpolation
    "#{+}-"
        ^

I got tired of updating the tests in test/location.coffee (which I had enough of
in jashkenas#3770), which relies on implementation details (the exact amount of tokens
generated for a given string of code) to do their testing, so I refactored them
to be less fragile.
lydell added a commit to lydell/coffee-script that referenced this pull request Jan 16, 2015
Instead of compiling to `"" + + (+"-");`, `"#{+}-"'` now gives an appropriate
error message:

    [stdin]:1:5: error: unexpected end of interpolation
    "#{+}-"
        ^

This is done by _always_ (instead of just sometimes) wrapping the interpolations
in parentheses in the lexer. Unnecessary parentheses won't be output anyway.

I got tired of updating the tests in test/location.coffee (which I had enough of
in jashkenas#3770), which relies on implementation details (the exact amount of tokens
generated for a given string of code) to do their testing, so I refactored them
to be less fragile.
swang added a commit to swang/coffeelint that referenced this pull request Feb 11, 2015
The new lexer was merged from PR made in jashkenas/coffeescript#3770

Also closes clutchski#372

no_unnecessary_double_quotes

- Removed Block RegExp Fix that was originally patched in PR clutchski#340.
  Thanks to the new PR, this is no longer needed.
- Shortened function name `containsSingleQuote` to `hasSingleQuote`.
- Changed how we detect if we're inside an interpolated string to be
  more like the way we handle interpolation in space_operators.

no_trailing_semicolons rule

- Due to the new way tokens are generated, we now check for 'STRING_END'
  token rather than a 'CALL_END' token

space_operators

- Removed old code that assumed an interpolated string always begins
  with empty quotes. Due to new changes in CoffeeScript lexer, we can
  now use 'STRING_START' to determine the beginning of an interpolated
  string.
- Renamed lintParens, lintCall to trackParens, trackCall respectively
swang added a commit to swang/coffeelint that referenced this pull request Feb 11, 2015
The new lexer was merged from PR made in jashkenas/coffeescript#3770

Also closes clutchski#372

no_unnecessary_double_quotes

- Removed Block RegExp Fix that was originally patched in PR clutchski#340.
  Thanks to the new PR, this is no longer needed
- Shortened function name `containsSingleQuote` to `hasSingleQuote`
- Changed how we detect if we're inside an interpolated string to be
  more like the way we handle interpolation in space_operators

no_trailing_semicolons rule

- Due to the new way tokens are generated, we now check for 'STRING_END'
  token rather than a 'CALL_END' token

space_operators

- Removed old code that assumed an interpolated string always begins
  with empty quotes. Due to new changes in CoffeeScript lexer, we can
  now use 'STRING_START' to determine the beginning of an interpolated
  string
- Renamed lintParens, lintCall to trackParens, trackCall respectively
swang added a commit to swang/coffeelint that referenced this pull request Feb 11, 2015
The new lexer was merged from PR made in jashkenas/coffeescript#3770

Also closes clutchski#372

no_unnecessary_double_quotes

- Removed Block RegExp Fix that was originally patched in PR clutchski#340.
  Thanks to the new PR, this is no longer needed
- Shortened function name `containsSingleQuote` to `hasSingleQuote`
- Changed how we detect if we're inside an interpolated string to be
  more like the way we handle interpolation in space_operators

no_trailing_semicolons rule

- Due to the new way tokens are generated, we now check for 'STRING_END'
  token rather than a 'CALL_END' token

space_operators

- Removed old code that assumed an interpolated string always begins
  with empty quotes. Due to new changes in CoffeeScript lexer, we can
  now use 'STRING_START' to determine the beginning of an interpolated
  string
- Renamed lintParens, lintCall to trackParens, trackCall respectively
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Multiline RegExp with string interpolation do not get marked as such.

6 participants