0% found this document useful (0 votes)

39 views15 pages

Regular Expression

The document provides information about regular expressions (regex). It describes various elements of regex like repeaters, wildcard characters, character classes, grouping, alternation, anchors etc. It also gives examples of regex to validate a mobile number, email address, string with first uppercase character and lowercase letters with an optional digit.

Uploaded by

SIDDHI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views15 pages

Regular Expression

Uploaded by

SIDDHI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Module 1: NLP

Aarti Dharmani
Estimate bigram probabilities
• <s> I am Sam </s>
• <s> Sam I am </s>
• <s> I do not like green eggs and ham </s>

P(I|<s>) =
P(Sam|<s>) =
P(am|I) =
P(</s>|Sam) =
P(Sam|am) =
P(do|I) =
Given no. of bigrams and unigrams count of
a dataset
i want to eat chinese food lunch spend
i 5 827 0 9 0 0 0 2
want 2 0 608 1 6 6 5 1
to 2 0 4 686 2 0 6 211
eat 0 0 2 0 16 2 42 0
chinese 1 0 0 0 0 82 1 0
food 15 0 15 0 1 4 0 0
lunch 2 0 0 0 0 1 0 0
spend 1 0 1 0 0 0 0 0

i want to eat chinese food lunch spend

2533 927 2417 746 158 1093 341 278
Calculate the probability of a sentence
• P(I want chinese food to eat) = ?

• P(I) x P(want|I) x P(chinese|want) x P(food|chinese) x P(to|food) x

P(eat|to) = ?
Regular Expressions
Regular expressions provide a powerful, flexible, and efficient method
for processing text.
The extensive pattern-matching notation of regular expressions
enables you to quickly parse large amounts of text to:
• Find specific character patterns.
• Validate text to ensure that it matches a predefined pattern (such as
an email address).

^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$
Elements of Regular Expressions
1. Repeaters ( *, +, and { } )
These symbols act as repeaters and tell the computer that the preceding character
is to be used for more than just one time.

2. The asterisk symbol ( * )

It tells the computer to match the preceding character (or set of characters) for 0
or more times (upto infinite).

3. The Plus symbol ( + )

It tells the computer to repeat the preceding character (or set of characters) at
atleast one or more times(up to infinite).
4. The curly braces { … }
It tells the computer to repeat the preceding character (or set of characters) for as
many times as the value inside this bracket.

5. Wildcard ( . )
The dot symbol can take the place of any other symbol, that is why it is called the
wildcard character.
6. Optional character ( ? )
This symbol tells the computer that the preceding character may or may not be present in the string to be
matched.

7. The caret ( ^ ) symbol ( Setting position for the match )

The caret symbol tells the computer that the match must start at the beginning of the string or line.

8. The dollar ( $ ) symbol

It tells the computer that the match must occur at the end of the string or before \n at the end of the line or
string.
9. Character Classes
A character class matches any one of a set of characters. It is used to match the
most basic element of a language like a letter, a digit, a space, a symbol, etc.
10. [^set_of_characters] Negation:
Matches any single character that is not in set_of_characters. By default, the
match is case-sensitive.

11. [first-last] Character range:

• Matches any single character in the range from first to last.

12. The Escape Symbol ( \ )

If you want to match for the actual ‘+’, ‘.’ etc characters, add a backslash( \ ) before
that character. This will tell the computer to treat the following character as a
search character and consider it for a matching pattern.
13. Grouping Characters ( )
A set of different symbols of a regular expression can be grouped together to act
as a single unit and behave as a block, for this, you need to wrap the regular
expression in the parenthesis( ).

14. Vertical Bar ( | )

Matches any one element separated by the vertical bar (|) character.
Write Regular Expressions for the following
cases
1. Mobile number:should start with 8 or 9 and total number of digits:10

• f you're looking for a regular expression for a mobile number that should start with 8 or 9 and have a total of 10
digits, you can use the following:

• regexCopy code
• ^[89]\d{9}$

• Explanation:
• ^[89]: The caret (^) asserts the start of the string. [89] means the first digit should be 8 or 9.
• \d{9}: \d represents any digit, and {9} specifies that there should be exactly 9 digits following the first one.
• $: The dollar sign asserts the end of the string.
Email ID:
Should have the format "nlp123@gmail.com"

• regexCopy code
• ^[a-zA-Z0-9]+@[a-zA-Z0-9]+\.[a-zA-Z]{2,}$
• Explanation:
• ^[a-zA-Z0-9]+: Starts with one or more alphanumeric characters.
• @: Contains the "@" symbol.
• [a-zA-Z0-9]+: Followed by one or more alphanumeric characters for the
domain name.
• \.: Contains a dot before the top-level domain.
• [a-zA-Z]{2,}$: Ends with at least two alphabetic characters for the top-level
domain.
First Character uppercase, contains lower case
alphabets, only one digit allowed in between
regex
• ^[A-Z][a-z]*\d?[a-z]*$
• Explanation:
• ^[A-Z]: The caret (^) asserts the start of the string. [A-Z] means the first
character should be an uppercase letter.
• [a-z]*: Matches zero or more lowercase letters.
• \d?: Optionally matches one digit.
• [a-z]*$: Matches zero or more lowercase letters until the end of the string.
• This regular expression ensures that the first character is uppercase, and
the string can contain lowercase letters with at most one digit in between
them.

Natural Language Processing 5
No ratings yet
Natural Language Processing 5
13 pages
Unit 2 Regular Expression
No ratings yet
Unit 2 Regular Expression
3 pages
How To Write Regular Expressions?: What Is A Regular Expression and What Makes It So Important?
No ratings yet
How To Write Regular Expressions?: What Is A Regular Expression and What Makes It So Important?
2 pages
COMP3 RegEx
No ratings yet
COMP3 RegEx
10 pages
Lecture 9
No ratings yet
Lecture 9
26 pages
Regex Cheat Sheet
No ratings yet
Regex Cheat Sheet
10 pages
Python RegEx
No ratings yet
Python RegEx
8 pages
40-Multitrack TM, Pattern Matching-02-05-2024
No ratings yet
40-Multitrack TM, Pattern Matching-02-05-2024
17 pages
3 Regular Expression
No ratings yet
3 Regular Expression
15 pages
Regex
No ratings yet
Regex
24 pages
Chapter 10
No ratings yet
Chapter 10
28 pages
Regex for NLP Enthusiasts
No ratings yet
Regex for NLP Enthusiasts
42 pages
An Introduction To Regular Expressions (9781492082569)
100% (1)
An Introduction To Regular Expressions (9781492082569)
17 pages
VBA - Regular Expressions in VBScript
No ratings yet
VBA - Regular Expressions in VBScript
4 pages
Regular Expressions
No ratings yet
Regular Expressions
35 pages
Regular Expressions
No ratings yet
Regular Expressions
14 pages
Java Regular Expression Final
No ratings yet
Java Regular Expression Final
68 pages
Regular Expressions
No ratings yet
Regular Expressions
9 pages
SEN 317 Lecture 4
No ratings yet
SEN 317 Lecture 4
8 pages
2 Regular Expression
No ratings yet
2 Regular Expression
23 pages
Css Unit 5 Dev Notes
No ratings yet
Css Unit 5 Dev Notes
13 pages
Python Strings & Regex Guide
No ratings yet
Python Strings & Regex Guide
23 pages
Jan Goyvaerts - All About Regular Expressions-Https - WWW - Regular-Expressions - Info - (2019)
No ratings yet
Jan Goyvaerts - All About Regular Expressions-Https - WWW - Regular-Expressions - Info - (2019)
206 pages
Mod 2
No ratings yet
Mod 2
49 pages
Lecture03 Regular Expressions 20092024 012539pm
No ratings yet
Lecture03 Regular Expressions 20092024 012539pm
36 pages
Regex Cheat Sheet
No ratings yet
Regex Cheat Sheet
4 pages
Module2 NLP BAD613B Notes
100% (1)
Module2 NLP BAD613B Notes
16 pages
Lecture 6 Re Basics
No ratings yet
Lecture 6 Re Basics
12 pages
Python Regex Cheat Sheet
No ratings yet
Python Regex Cheat Sheet
1 page
Regular Expressions
No ratings yet
Regular Expressions
5 pages
REGULAR EXPRESSIONS Workbook
No ratings yet
REGULAR EXPRESSIONS Workbook
8 pages
02 Text Processing - Regular Expressions-Text Normalization
No ratings yet
02 Text Processing - Regular Expressions-Text Normalization
58 pages
Regular Expressions for Developers
No ratings yet
Regular Expressions for Developers
5 pages
Understanding Regular Expressions
No ratings yet
Understanding Regular Expressions
18 pages
Regular Expressions
100% (5)
Regular Expressions
94 pages
Regular Expressions: Luísa Coheur
No ratings yet
Regular Expressions: Luísa Coheur
22 pages
Regex for Mobile Forensic Searches
No ratings yet
Regex for Mobile Forensic Searches
4 pages
Howto Regex
No ratings yet
Howto Regex
20 pages
Regular Expressions & Automata
No ratings yet
Regular Expressions & Automata
62 pages
Regular Expressions (Slides)
No ratings yet
Regular Expressions (Slides)
20 pages
RegEx in Python
No ratings yet
RegEx in Python
6 pages
Regex & Parsing for Developers
No ratings yet
Regex & Parsing for Developers
32 pages
En Field Constraints 2013 20130801
No ratings yet
En Field Constraints 2013 20130801
8 pages
Using Regular Expressions With PHP
No ratings yet
Using Regular Expressions With PHP
6 pages
Regular Expression Syntax
No ratings yet
Regular Expression Syntax
9 pages
Andrei's Regex Clinic - PHP Quebec 2009
100% (2)
Andrei's Regex Clinic - PHP Quebec 2009
209 pages
Howto Regex PDF
No ratings yet
Howto Regex PDF
20 pages
Unit1 02
No ratings yet
Unit1 02
10 pages
Regular Expressions in Java: Dr. Mohamed Y. Dahab
No ratings yet
Regular Expressions in Java: Dr. Mohamed Y. Dahab
6 pages
Css Micro
No ratings yet
Css Micro
14 pages
SQL Pattern Matching Guide
No ratings yet
SQL Pattern Matching Guide
13 pages
14.regular Expression
No ratings yet
14.regular Expression
3 pages
Chapter 5 Regular Expression, Rollover and Frames
No ratings yet
Chapter 5 Regular Expression, Rollover and Frames
56 pages
Howto Regex
No ratings yet
Howto Regex
19 pages
Howto Regex
No ratings yet
Howto Regex
20 pages
Howto Regex
No ratings yet
Howto Regex
20 pages
Definition and Types of Modeling and Simulation FINAL
No ratings yet
Definition and Types of Modeling and Simulation FINAL
15 pages
Mark Sheet
No ratings yet
Mark Sheet
1 page
NS-3 Installation in Ubuntu-NWS 2022
No ratings yet
NS-3 Installation in Ubuntu-NWS 2022
5 pages
02SOP-Outlook Android
No ratings yet
02SOP-Outlook Android
8 pages
Thumbkeyboard Setup Guide
No ratings yet
Thumbkeyboard Setup Guide
50 pages
Tamron Lenses for Canon & Nikon
No ratings yet
Tamron Lenses for Canon & Nikon
18 pages
Citrix Easycall Gateway Telephony System Integrator'S Guide: For Alcatel Omnipcx Enterprise
No ratings yet
Citrix Easycall Gateway Telephony System Integrator'S Guide: For Alcatel Omnipcx Enterprise
16 pages
Fit AP v200r005c10spcd00 (&ac) Upgrade Guide
No ratings yet
Fit AP v200r005c10spcd00 (&ac) Upgrade Guide
10 pages
ZXA10 C320 Product Introduction
100% (2)
ZXA10 C320 Product Introduction
9 pages
74F382 4-Bit Arithmetic Logic Unit: General Description Features
No ratings yet
74F382 4-Bit Arithmetic Logic Unit: General Description Features
9 pages
Algebra 2 Project 1 Saman
No ratings yet
Algebra 2 Project 1 Saman
13 pages
Practical 3linux Practical For B.tech Student
No ratings yet
Practical 3linux Practical For B.tech Student
6 pages
Mod04 K Nearest Neighbor
No ratings yet
Mod04 K Nearest Neighbor
48 pages
ITU T A5 TD New G.1028.2
No ratings yet
ITU T A5 TD New G.1028.2
7 pages
Path Planning For Unmanned Ground Vehicle: Fethi DEMIM, Kahina LOUADJ, Abdelkrim NEMRA
No ratings yet
Path Planning For Unmanned Ground Vehicle: Fethi DEMIM, Kahina LOUADJ, Abdelkrim NEMRA
3 pages
RDX QuikStation 4 Quick Start Guide
No ratings yet
RDX QuikStation 4 Quick Start Guide
2 pages
36 PES POs and Mapping CSE
No ratings yet
36 PES POs and Mapping CSE
3 pages
7 Fresh and Simple Ways To Test Cross-Browser Compatibility - FreelanceFolder
No ratings yet
7 Fresh and Simple Ways To Test Cross-Browser Compatibility - FreelanceFolder
45 pages
Swi MT940 and MT950 Statements Customer Service Guide: Haribabu Ramineni Full Description
No ratings yet
Swi MT940 and MT950 Statements Customer Service Guide: Haribabu Ramineni Full Description
15 pages
B.Tech CSE Provisional Grade Sheet
No ratings yet
B.Tech CSE Provisional Grade Sheet
4 pages
C Programming Basics Quiz
No ratings yet
C Programming Basics Quiz
6 pages
Malaysian Student Scholarship Form
No ratings yet
Malaysian Student Scholarship Form
7 pages
SGP 22-v3 1
No ratings yet
SGP 22-v3 1
501 pages
M2350-1 Windows Interface Ver1.2.1 April 2013
No ratings yet
M2350-1 Windows Interface Ver1.2.1 April 2013
12 pages
Iso 26866
No ratings yet
Iso 26866
20 pages
Katalog Agra Jaya 2022
No ratings yet
Katalog Agra Jaya 2022
41 pages
Manual Ut350
No ratings yet
Manual Ut350
88 pages
Job Interview Etiquette
100% (1)
Job Interview Etiquette
47 pages
Draft MSBP
No ratings yet
Draft MSBP
3 pages

Regular Expression

Uploaded by

Regular Expression

Uploaded by

Module 1: NLP

i want to eat chinese food lunch spend

• P(I) x P(want|I) x P(chinese|want) x P(food|chinese) x P(to|food) x

2. The asterisk symbol ( * )

3. The Plus symbol ( + )

7. The caret ( ^ ) symbol ( Setting position for the match )

8. The dollar ( $ ) symbol

11. [first-last] Character range:

12. The Escape Symbol ( \ )

14. Vertical Bar ( | )

You might also like