KEMBAR78
Building parsers in JavaScript | PDF
Building parsers in
JavaScript
Kenneth Geisshirt
kneth
kgeisshirt
Agenda
● What is parsing?
● Fractions
● Nearley
● Example: Fraction calculator
Rajiv Patel, https://tinyurl.com/yx6rmcdt
What is parsing?
You know the problem
● What is syntax?
● How is the syntax defined?
● How do you check if input
matches syntax?
● How can you use the syntax
in your applications?
https://xkcd.com/859/
Grammar
● The syntax is defined by a grammar
● Lexical analysis breaks down input into
tokens or terminals
○ Keywords, literals, identifiers, operators
● A set of rules connecting non-terminals to
tokens
● One non-terminal is the start symbol
● Parsers are software which use a grammar
to verify input
keyword
identifier
Body: statements
Example
● S → AA
● A → 𝞪
● A → 𝞫
Matches 𝞪𝞪, 𝞪𝞫, 𝞫𝞪, and 𝞫𝞫
function add2(n) {
let r = n + 2;
return r;
}
Ken Whytock, https://tinyurl.com/s9s3eee
literal
Parser generators
● Many well-documented algorithms exist
○ Hot research topics in 1960s and 1970s
● It’s not a trivial task to write a parser
● Parser generators can speed up development
process
○ Yacc (C) - 1975!!
○ ANTLR (mostly Java) - 1989
○ Nearley (JavaScript) - 2014
Erica Zabowski, https://tinyurl.com/uqbaldv
Fractions
Quick recap
● A fraction is a rational number
○ Numerator and denominator, both natural numbers
○ Broken latin (fractus, broken)
● Fractions are rational numbers
Bill Ward, https://tinyurl.com/r3dtp2b
Arithmetic
Greatest Common Divisor
● Original algorithm by Euclid (c. 300 BC)
● Often used to reduce or simplify a fraction
● https://en.wikipedia.org/wiki/Greatest_common_divisor
Nearley
Earley Parsers in JavaScript
● Nearley implements Earley’s parser algorithm
○ Left-recursive (LR) grammars
○ Deterministic parser
○ Worst-case performance O(n3
) but O(n) for well-behaving grammars
○ https://en.wikipedia.org/wiki/Earley_parser
● Can generate JavaScript, CoffeeScript, and TypeScript
○ Can run in browsers, node.js and probably React Native
● Inclusion of predefined grammars
○ Numbers, white spaces, strings
● Lexer is also included
○ Define tokens using double-quotes
● Rules can have (semantic) actions
○ Plain JavaScript functions
How to use
● Easy installation: npm install nearley --save-dev
● Generate a parser: npx nearleyc -o parser.js parser.ne
○ Add to scripts in package.json
● The .ne files contains rules, terminals, non-terminals, and actions
expr -> "(" _ sum _ ")" {% function (d) { return d[2]; }
%}
| value {% function (d) { return d[0]; }
%}
Non-terminal Terminal Whitespace Action The return value
from the sum rule
Additional tools
Supported by many editors
● VS Code, Atom, Emacs, Vim, Sublime
Railroad diagrams
Example:
Fraction
calculator
kalculator
● Simple Fraction class
○ Basic arithmetic and simplification
● Parser
○ Actions to perform calculation
● Little driver to read input and call parser
The grammar (no actions)
main → sum
expr → ( sum )
| value
product → product * expr
| product / expr
| value
sum → sum + expr
| sum - expr
| product
value → fraction
| int
fraction → int / int
Start symbol: main
Tokens: (, ), +, -, *, /
Positive integer: int
Important take-aways
● Recursive rules
● Operator precedence
Source code:
parser.ne
Source code: kalc.js
Resources
Links
● My example: https://github.com/kneth/kalculator
● Nearley: https://nearley.js.org/
● Earley parsers explained: http://loup-vaillant.fr/tutorials/earley-parsing/
● An Efficient Context-Free Parsing Algorithm. Jay Earley’s Ph.D. thesis from
1968.
http://reports-archive.adm.cs.cmu.edu/anon/anon/usr/ftp/scan/CMU-CS-68-earl
ey.pdf
Building parsers
for JavaScript is
easy - and fun
Ron Mader, https://tinyurl.com/sg5pdwn

Building parsers in JavaScript

  • 1.
    Building parsers in JavaScript KennethGeisshirt kneth kgeisshirt
  • 2.
    Agenda ● What isparsing? ● Fractions ● Nearley ● Example: Fraction calculator Rajiv Patel, https://tinyurl.com/yx6rmcdt
  • 3.
  • 4.
    You know theproblem ● What is syntax? ● How is the syntax defined? ● How do you check if input matches syntax? ● How can you use the syntax in your applications? https://xkcd.com/859/
  • 5.
    Grammar ● The syntaxis defined by a grammar ● Lexical analysis breaks down input into tokens or terminals ○ Keywords, literals, identifiers, operators ● A set of rules connecting non-terminals to tokens ● One non-terminal is the start symbol ● Parsers are software which use a grammar to verify input keyword identifier Body: statements Example ● S → AA ● A → 𝞪 ● A → 𝞫 Matches 𝞪𝞪, 𝞪𝞫, 𝞫𝞪, and 𝞫𝞫 function add2(n) { let r = n + 2; return r; } Ken Whytock, https://tinyurl.com/s9s3eee literal
  • 6.
    Parser generators ● Manywell-documented algorithms exist ○ Hot research topics in 1960s and 1970s ● It’s not a trivial task to write a parser ● Parser generators can speed up development process ○ Yacc (C) - 1975!! ○ ANTLR (mostly Java) - 1989 ○ Nearley (JavaScript) - 2014 Erica Zabowski, https://tinyurl.com/uqbaldv
  • 7.
  • 8.
    Quick recap ● Afraction is a rational number ○ Numerator and denominator, both natural numbers ○ Broken latin (fractus, broken) ● Fractions are rational numbers Bill Ward, https://tinyurl.com/r3dtp2b
  • 9.
    Arithmetic Greatest Common Divisor ●Original algorithm by Euclid (c. 300 BC) ● Often used to reduce or simplify a fraction ● https://en.wikipedia.org/wiki/Greatest_common_divisor
  • 10.
  • 11.
    Earley Parsers inJavaScript ● Nearley implements Earley’s parser algorithm ○ Left-recursive (LR) grammars ○ Deterministic parser ○ Worst-case performance O(n3 ) but O(n) for well-behaving grammars ○ https://en.wikipedia.org/wiki/Earley_parser ● Can generate JavaScript, CoffeeScript, and TypeScript ○ Can run in browsers, node.js and probably React Native ● Inclusion of predefined grammars ○ Numbers, white spaces, strings ● Lexer is also included ○ Define tokens using double-quotes ● Rules can have (semantic) actions ○ Plain JavaScript functions
  • 12.
    How to use ●Easy installation: npm install nearley --save-dev ● Generate a parser: npx nearleyc -o parser.js parser.ne ○ Add to scripts in package.json ● The .ne files contains rules, terminals, non-terminals, and actions expr -> "(" _ sum _ ")" {% function (d) { return d[2]; } %} | value {% function (d) { return d[0]; } %} Non-terminal Terminal Whitespace Action The return value from the sum rule
  • 13.
    Additional tools Supported bymany editors ● VS Code, Atom, Emacs, Vim, Sublime Railroad diagrams
  • 14.
  • 15.
    kalculator ● Simple Fractionclass ○ Basic arithmetic and simplification ● Parser ○ Actions to perform calculation ● Little driver to read input and call parser
  • 16.
    The grammar (noactions) main → sum expr → ( sum ) | value product → product * expr | product / expr | value sum → sum + expr | sum - expr | product value → fraction | int fraction → int / int Start symbol: main Tokens: (, ), +, -, *, / Positive integer: int Important take-aways ● Recursive rules ● Operator precedence
  • 17.
  • 18.
  • 19.
  • 20.
    Links ● My example:https://github.com/kneth/kalculator ● Nearley: https://nearley.js.org/ ● Earley parsers explained: http://loup-vaillant.fr/tutorials/earley-parsing/ ● An Efficient Context-Free Parsing Algorithm. Jay Earley’s Ph.D. thesis from 1968. http://reports-archive.adm.cs.cmu.edu/anon/anon/usr/ftp/scan/CMU-CS-68-earl ey.pdf
  • 21.
  • 22.