KEMBAR78
Regular Expression (Regex) Fundamentals | PPTX
REGEX
Regular Expression
Mesut Güneş
www.testrisk.com
Regular expressions are
patterns
used to match
character combinations
in strings[1].
What it is?
1956: Stephen Cole Kleene, Regular
Language
1968: Ken Thompson, Pattern Matching,
Text editor
1970: Bell Labs, in Unix
1980: Henry Spencer, PERL
1992: POSIX.2 (UNIX Shell), Many
languages [2]
History
Hi,
I called Jon on Tuesday, March 25th at 7pm
and expressed a concern about my slow times
accessing www.cnn.com. He said he would fix
it, but I never heard back. Can someone contact
me at Kellie.Booth@if.com ASAP? What does
Ctrl-F5 mean, by the way?
Thanks
Kellie
Human Brain VS Text Processing
Hi, …., thanks
I called …
March 27th
ww.blabla.com
Patterns?
(Hi|Hello),w{1,}(Regards|Thanks)
Is(verb|auxiliary)(*)
Marchsd(st|nd|rd|th)
www.w{1,}.(com|net|edu|…)
Patterns?
/pattern/options
Regex syntax
^ $ . | { } [ ] ( ) * + ? 
Literal Characters
(metacharacters)
provide a list of potential
matching characters at a
position in the search
text
Square Brackets
7[Pp][Mm]
more examples
Square Brackets
7[Pp][Mm]
[123456789][aApP][Mm]
[1-9][aApP][Mm]
provide characters cannot enter
to regex
Non-Printable Characters
n - Matches a new line; Windows rn
t - Matches a tab character.
b - Matches a backspace (when used between brackets)
a - Matches the bell character.
r - Matches a carriage return.
f - Matches Form feed.
v - Matches a vertical tab.
Euro € - u20AC
British pound £ - u00A3
Yen ¥ -u00A5
Dollar sign $ - $ or u0024 or x24
cX - Matches an ASCII control character, such as cC is Ctrl-C.
provide list of
excludation
Negation
[^0-9A-F]
[^a-zA-Z0-9_] negative of w (or
W)
repetition of
characters
Curly Brakets
{n} : “n” times.
{n,} : At least “n” times, but no upper
limit.
{n,m} : Between “n” and “m” times.
repetition
characters
Quantifier Symbols
Quantifier Matches Same as
? Match zero or one time {0,1}
* Match zero or more times {0, }
+ Match one or more times {1, }
define the string
boundaries
Starting and Ending Pattern
^ : starting string, not inside []
$ : end of string
provides
alternatives
Alternation
(x|y|z)
(www|ftp)
www.w{1,}.(net|com|org|edu)
(x|y|z) vs [xyx]
Alternation
(x|y|z) : can be used for string
[xyz][a-A0-9] : one character or
list of characters
(Regex|ReGex) - Re[gG]ex
.
Any single character
[abc]
A single character: a, b, or c
[^abc]
Any single character but a, b, or c
[a-z]
Any single character in the range
a-z
[a-zA-Z]
Any single character in the range a-z or
A-Z
^
Start of line
$
End of line
A
Start of string
z
End of string
s
Any whitespace character
S
Any non-whitespace character
d
Any digit
D
Any non-digit
w
Any word character (letter, number,
underscore)
W
Any non-word character
b
Any word boundary character
(...)
Capture everything enclosed
(a|b)
a or b
i
Case insensitive option.
x
ignore whitespace in regex
(? (name)
<pattern>)
Grouping
(?: <pattern>)
Non-Capturing Group
check if the pattern follows by
another
Look Ahead
(?=<pattern>) : positive look ahead
(?!<pattern>) : negative look ahead
(?<city>w+)[, ]+(?= NJ|PA|DE)
check if the pattern precede by
another
Look Behind
(?<=<pattern>) : positive look ahead
(?<!<pattern>) : negative look ahead
(?<="state":)[
].*(?<state>PA|Pennsylvania)
EXAMPLES
^(?!.*(?:<|>|&|’|"|%|;|-|+|(|)|s)).{6,20}$
password
should be 6 to 20 characters length
and
not include the followings:
< > & ’ ” % ; - + ( )
Let’s Dig-into Pattern
English Rule Regex Pattern
BEGINNING of the string ^
Start of NEGATIVE LOOKAHEAD (?!
Multiple any word except newline, with QUANTIFIER .*
Start of NON-CAPTURING group (?:
Single CHARACTER with ALTERNATION <|
More single CHARACTER with ALTERNATION
>| &| ‘| “| %| ;| -| +| (|
)| s
Repetition with boundaries {6,20}
END string $
^(?!.*(?:<|>|&|'|"|%|;|-|+|(|)|s)).{6,20}$
ack '(?<="GET")[,]"/nike.*'
unix shell
Find all “GET” requests to “nike” in all .csv files:
~/Downloads ls *.csv | wc -l
109
~/Downloads ack '(?<="GET")[,]"/nike.*' | wc -l
88
~/Downloads cat web_3000:25.csv | grep '/nike.*'
"GET","/arama/nike",7,0,140,665,101,3797,196168,0.09
"GET","/kampanya/arama/nike",8,0,270,678,229,2641,164205,0.11
"GET","/nike/295/morhipo-ozel",2,0,81,88,81,95,121609,0.03
"GET","/nike/markalar/503/32026/marka?fh=discount_rate_catalog01]
BDD - Cucumber
^/(Questions|Sorular|‫$/*)پرسش‬
Thanks
Reference:
[1] https://developer.mozilla.org/en/docs/Web/JavaScript/Guide/Regular_Expressions
[2] https://en.wikipedia.org/wiki/Regular_expression
[3] Regular Expression Succinctly, Syncfusion, by Joe Both
[4] http://www.slideshare.net/adamlowe/regex-cards-powerpoint-format
[5] https://regex101.com
Mesut Güneş
www.testrisk.com

Regular Expression (Regex) Fundamentals