KEMBAR78
Perl6 grammars | PDF
Regexes
and Grammars
   in Perl 6
Preface
Synopsis 5
Synopsis 5
Regexes and Rules
S05
Damian Conway
Allison Randal
Patrick Michaud
Larry Wall
Moritz Lenz
Created: 24 Jun 2002
Last Modified: 30 Aug 2010
Version: 132
54 pages
Part I
Regexes
Random facts
and terminology
Regular expressions
in Perl 5 were not regular
Regular expressions
in Perl 5 were not regular

Regular expressions
in Perl 6 are called regexes
Regular expressions
in Perl 5 were not regular

Regular expressions
in Perl 6 are called regexes

Which means “kinda like
a regular expression”
Match object
contains result of matching




          $/
Capture variable indexes
start with 0




          $0
$0, $1, etc.
are part of $/
my $q = "Hotels in Berlin";
$q ~~ /ins(.*)/;


say $0;    # Berlin
say $/[0]; # Berlin
Metacharacters
are everything except
Unicode letters
or numbers
or underscore
Quotes
may be used for creating
atoms


'I will never use PHP again. '*
Repetition


(d+ s?) ** 3

(d+ s?) ** 5..10

d+ ** ','
/x modifier gone


"ab" ~~ / a    b /;
say $/;   # ab
/s, /m modifiers gone


"a1nb2nc3" ~~ /N+/;

"a1nb2nc3" ~~ /^^ .2 $$/;
/e modifier gone


$str =~ s/pattern/{action()}/;
Modifier syntax


@names =
 $str =~ m:i/MiSteR s (w+)/;
Brackets
Capturing group


       (...)
Non-capturing group


       [...]
Character class


      <[ . . . ]>
Embedded closure


      {...}
Embedded closure


              {...}

> "500" ~~ /(d+) {$0 < 200 or fail}/
===SORRY!===
Named rule or token


       <. . .>
Part II
Grammars
Keywords
grammar
rule
token
proto
TOP
grammar Grammar {
    rule TOP {...}
    rule some_rule {...}
    token some_token {...}
}
grammar Grammar {
  rule TOP {...}
  rule some_rule {...}
  token some_token {...}
}
Syntax is similar
to class definition
grammar Grammar {
    rule TOP {...}
    rule some_rule {...}
    token some_token {...}
}
Grammar.parse($string);
Example.
Step by step
Executed by Rakudo


    rakudo.org
Executed by Rakudo


    rakudo.org


Sometimes it fails
City
grammar SearchQuery {

}
grammar SearchQuery {
  rule TOP {

    }
}
grammar SearchQuery {
  rule TOP {
     ^
     $
  }
}
grammar SearchQuery {
  rule TOP {
     ^
        <query>
     $
  }
}
grammar SearchQuery {
  rule TOP {
     ^
        <query>
     $
  }
}


     Easy, isn't it?
Grammars are part
 of the language
grammar SearchQuery {
  rule TOP {
     ^
        <query>
     $
  }
  rule query {
  }
}
grammar SearchQuery {
  rule TOP {
     ^
        <query>
     $
  }
  rule query {
     <city>
  }
}
grammar SearchQuery {
  rule TOP {
     ^
        <query>
     $
  }
  rule query {
     <city>
  }
  token city {
  }
}
grammar SearchQuery {   N. B.
  rule TOP {
     ^
        <query>
     $                  rules
  }
  rule query {
     <city>
  }                     token
  token city {
  }
}
token
is a "word"
rule
is a "phrase"
grammar SearchQuery {
  rule TOP {
     ^
        <query>
     $
  }
  rule query {
     <city>
  }
  token city {
  }
}
grammar SearchQuery {
  rule TOP {
     ^
        <query>
     $
  }
  rule query {
     <city>
  }
  token city {
     <capital>
  }
}
^
          <query>
      $
    }
    rule query {
       <city>
    }
    token city {
       <capital>
    }
    token capital {
    }
}
my $result = SearchQuery.parse("Amsterdam");
say $result.perl;
Match.new(
from => 0,
orig => "Amsterdam",
to => 9,
named => {
 query => Match.new(
  from => 0,
  orig => "Amsterdam",
  to => 9,
  named => {
   city =>    Match.new(
    from => 0,
    orig => "Amsterdam",
    to => 9,
    named => {
     capital =>    Match.new(
      from => 0,
      orig => "Amsterdam",
Match.new(
from => 0,                 Matched text
orig => "Amsterdam",
to => 9,
named => {
 query => Match.new(
  from => 0,
  orig => "Amsterdam",
  to => 9,
  named => {
   city =>    Match.new(
    from => 0,
    orig => "Amsterdam",
    to => 9,
    named => {
     capital =>    Match.new(
      from => 0,
      orig => "Amsterdam",
Match.new(
from => 0,
orig => "Amsterdam",
to => 9,
named => {
 query => Match.new(          rule query {
  from => 0,                  }
  orig => "Amsterdam",
  to => 9,
  named => {
   city =>    Match.new(
    from => 0,
    orig => "Amsterdam",
    to => 9,
    named => {
     capital =>    Match.new(
      from => 0,
      orig => "Amsterdam",
Match.new(
from => 0,
orig => "Amsterdam",
to => 9,
named => {
 query => Match.new(
  from => 0,
  orig => "Amsterdam",
  to => 9,
  named => {
   city =>    Match.new(        token city {
    from => 0,                  }
    orig => "Amsterdam",
    to => 9,
    named => {
     capital =>    Match.new(
      from => 0,
      orig => "Amsterdam",
Match.new(
from => 0,
orig => "Amsterdam",
to => 9,
named => {
 query => Match.new(
  from => 0,
  orig => "Amsterdam",
  to => 9,
  named => {
   city =>    Match.new(
    from => 0,
    orig => "Amsterdam",
    to => 9,
    named => {
     capital =>    Match.new( token capital {
      from => 0,              }
      orig => "Amsterdam",
Country
rule query {
     <city>
   | <country>
}
rule query {
     <city>
   | <country>
}
rule country {
       'Afghanistan'
     | 'Akrotiri'
     | 'Albania'
     | 'Algeria'
     | 'American Samoa'
     | 'Andorra' . . .
}
my $result = SearchQuery.parse("Amsterdam");
say $result.perl;

$result = SearchQuery.parse("China");
say $result.perl;
rule query {
     <city> ',' <ws>? <country>
   | <city>
   | <country>
}
rule query {
     <city> ',' <ws>? <country>
   | <city>
   | <country>
}



SearchQuery.parse("Tirana, Albania");
rule query {
     <city> ',' <ws>? <country>
   | <city>
   | <country>
}



SearchQuery.parse("Tirana, Albania");
Capturing
and accessing
Everything goes
to Match object

    $/
SearchQuery.parse("Tirana, Albania");
say $<query><city>;
say $<query><country>;
SearchQuery.parse("Tirana, Albania");
say $<query><city>;
say $<query><country>;


Tirana
Albania
SearchQuery.parse("Tirana, Albania");
say $<query><city>;            Shortcut
say $<query><country>;


say $/<query><city>;           Full syntax
say $/<query><country>;
rule query {
   'Hotels in'?
   [
       <city> ',' <ws>? <country>
     | <city>
     | <country>
   ]
}
SearchQuery.parse("Tirana, Albania");
say $<query><city>;
say $<query><country>;


SearchQuery.parse
  ("Hotels in Tirana, Albania");
say $<query><city>;
say $<query><country>;
rule date {
   <day>
   <month>
}
token day {
   d+
   ['st' | 'nd' | 'th']?
}
token month {
     'January'
   | 'February'
   | 'March'
   | 'April' . . .
SearchQuery.parse("Hotels in Tirana,
Albania from 25th December");


SearchQuery.parse("Hotels in Tirana,
Albania from 25 December");
What will
$<query><date>
    print?
What will
$<query><date>
    print?

 25th December
       or
  25 December
How to check days



token day {
  (d+) {$0 <= 31 or fail}
}
[
          <city> ',' <ws>? <country>
        | <city>
        | <country>
    ]
    [
         'from' <date>
         'to' <date>
    ]?
    [
         'for' <guest_number>
    ]?
}
token guest_number {
    d
  | 'one'
  | 'two'
  | 'three'
  | 'four'
  | 'five'
}
"Hotels in Tirana, Albania from
25 December to 7 January for two"
rule date {
     'today'
   | 'tomorrow'
   |[
       <day>
       <month>
     ]
}
$ perl6 10-all.pl
Hotels in Amsterdam, Netherlands from 1 January to 5
February for three
   City:    Amsterdam
   Country: Netherlands
   From:    1 January
  To:      5 February
   Guests: three
__END__

           Andrew Shitov
talks.shitov.ru | andy@shitov.ru

Perl6 grammars