Linux Regular Expression
Regular expression is also called regex or regexp. It is a very powerful tool in Linux.
Regular expression is a pattern for a matching string that follows some pattern.
Regex can be used in a variety of programs like grep, sed, vi, bash, rename and many
more.
Regular Expression Metacharacters
A regular expression may have one or several repeating metacharacters.
Metacharacter Description
. Replaces any character.
^ Matches start of string and represents
characters not in the string.
$ Matches end of string.
* Matches zero or more times the preceding
character.
\ Represents the group of characters.
() Groups regular expressions.
? Matches exactly one character.
+ Matches one or more times the preceding
character.
{N} Preceding character is matched exactly N
times.
{N,} Preceding character is matched exactly N
times or more.
{N,M} Preceding character is matched exactly N
times, but not more than N times.
- Represents the range.
\b Matches empty string at the edge of a word.
\B Matches empty string if it is not at the edge of
a word.
\< Matches empty string at the beginning of a
word.
\> Matches empty string at the end of a word.
Regex Versions
There are three versions of regular expressions syntax:
o BRE : Basic Regular Expressions
o ERE : Extended Regular Expressions
o PRCE: Perl Regular Expressions
Depending on tool or programs, one or more of these versions can be used.
Linux grep Regular Expressions
The grep tool has the following options to use regular expressions:
o -E : String is read as ERE (Extended Regular Expressions)
o -G : String is read as BRE (Basic Regular Expressions)
o -P : String is read as PRCE (Perl Regular Expressions)
o -F : String is read literally.
Print Lines Matching A Pattern
The grep command will search for line that matches the specified pattern.
Syntax:
1. grep <pattern> <fileName>
Example:
1. grep t msg.txt
2. grep l msg.txt
3. grep v msg.txt
Look at the above snapshot, all the matching pattern lines are displayed and pattern
is highlighted.
Concatenating Characters
If a pattern is of concatenating characters then it has to be matched as it is, for the
line to be displayed.
Example:
1. grep tp msg.txt
2. grep in msg.txt
3. grep is msg.txt
Look at the above snapshot, lines matching exactly the specified patterns are
displayed.
One Or The Other
Here pipe (|) symbol is used as OR to signify one or the other. All the three versions
are shown. Options -E and -P syntax are same but -G syntax uses (\).
Syntax:
1. grep <option> <'pattern|pattern> <fileName>
Example:
1. grep -E 'j|g' msg.txt
2. grep -P 'j|g' msg.txt
3. grep -G 'j\|g' msg.txt
Look at the above snapshot, either pattern 'j' or 'g' should be matched to display
the lines.
One Or More / Zero Or More
The * signifies zero or more times occurence of a pattern and + signifies one or more
times occurence.
Syntax:
1. grep <option> <'pattern*'> <fileName>
Example:
1. grep -E '1*' list
2. grep -E '1+' list
Look at the above snapshot, * character displays zero or more times occurence of
pattern '1'. But + character displays one or more times occurence.
Match The End Of A String
To match the end of a string we use $ sign.
Syntax:
1. grep <pattern>$ <fileName>
Example:
1. grep r$ dupli.txt
2. grep e$ dupli.txt
Look at the above snapshot, lines are displayed matching the end of a string.
Match The Start Of A String
To match the start or beginning of a file we use caret sign (^).
Syntax:
1. grep ^<pattern> <fileName>
Example:
1. grep ^o dupli.txt
Look at the above snapshot, lines are displayed matching the start or beginning of
a string.
Separating Words
Syntax:
1. grep '\b<pattern>\b' <fileName>
Example:
1. grep '\bsome\b' file
Look at the above snapshot, by giving command "grep some file all the lines
matching to the word 'some' are displayed. But by giving command
"grep '\bsome\b' file" only lines matching single word 'some' are displayed.
Note: This can also be done with the help of -w option.
Syntax:
1. grep -w <pattern> <fileName>
Example:
1. grep -w some file
Look at the above snapshot, command "grep -w some file" displays the same
result as \b character.
Linux rename Regular Expressions
The rename command is mostly used to search a string and replace it with another string
Syntax:
1. rename 's/string/other string/'
Example:
1. rename 's/text/txt/' *
Look at the above snapshot, all the 'text' are converted into 'txt'.
You can also replace a string with the following syntax.
Syntax:
1. rename 's/string/other string/' * string
Example:
1. rename 's/txt/TXT/' *.txt
Look at the above snapshot, all '.txt' are converted into '.TXT'.
In above two examples the strings used were present only at the end of the file name.
But this example is different.
Example:
1. rename 's/txt/bbb/' atxt.txt
Look at the above snapshot, only the first occurence of sarched string is replaced.
A Global Replacement
In the above example only first 'txt' was replaced in 'atxt.txt'. To replace both the
'txt' we can use a global replacement 'g'.
Syntax:
1. rename 's/string/other string/g'
Example:
1. rename 's/txt/TXT/g' atxt.txt
Look at the above snapshot, both the 'txt' are replaced with 'TXT'.
Case Insensitive Replacement
In case insensitive replacement, a string can be replaced with a case insensitive string.
Syntax:
1. rename 's/string/other string/i'
Example:
1. rename 's/.text/.txt/i' *
Look at the above snapshot, all '.text' are replaced with '.txt'.
Linux Sed Regular Expressions
Stream Editor
The sed command is used for stream editing.
Example:
1. echo interactive | sed 's/inte/dist/'
2. echo interactive | sed 's:inte:dist:'
3. echo interactive | sed 's_inte_dist_'
4. echo interactive | sed 's|inte|dist|'
Look at the above snapshot, string 'interactive' is changed to 'distractive' with sed
command. Inspite of forward slash (/), colon (:), underscore (_) and pipe (|) will also
work.
Interactive Editor
The sed command is meant to be stream editor while it can also be used as interactive
editor on a file. For interactive editor option 'i' is used.
Look at the above snapshot, stream 'today' is converted into 'tomorrow' in the 'file'.
Simple Back Referencing
Double ampersand is used to search and find the specified string. It will print the found
string with sed command.
Look at the above snapshot, ampersand has searched the string 'four' and printed it
as 'fourfourty'.
A Dot For Any Character
In regex a simple dot can signify any character.,/p>
Look at the above snapshot, dots are replaced by the date format.
Multiple Back Referencing
When more than one pair of parenthesis is used it is called grouping. Here each of them
can be referenced separately as three consecutive numbers.
Look at the above snapshot, date is printed in different formats. Here, 2014 is
as (1), 06 is refernced as (2) and 30 is referenced as (3).
White Space
The white space syntax is '\s' and tab space syntax is '\t'.
Look at the above snapshot, '\s' is used for a single space.
Optional Occurrence
You can specify something optional by specifying it with (?) question mark.
Look at the above snapshot, we have made third 'i' as optional. It mens that two 'i' are
must to be converted into 'Y'.
Exact n Times Occurence
Exact times occurence is specified by "{times}".
Look at the above snapshot, we have specified exactly three times occurence of 'i'.
Occurence In Range
We can specify occurence in terms of range also. For example, if we'll specify range as
{m,n}, then 'm' denotes minimum times occurence and 'n' denotes maximum times
occurence.
Look at the above snapshot, we have specified minimum range as 3 and maximum
range as 4.