Ryans Tutorials (/) More Tutorials
Filters!
Turning data into information.
Introduction
One of the underlying principles of Linux is that every item should do one thing and one thing only and that we
can easily join these items together. Think of it like a set of building blocks that we may put together however we
like to build anything we want. In this section and the next we will learn about a few of these building blocks. Then
in section 11 and 13 we'll look at how we may build them into more complex creations that can do useful work for
us.
This page is quite long but it is mostly examples so it's not as scary as it looks.
So what are they?
A filter, in the context of the Linux command line, is a program that accepts textual data and then transforms it in
a particular way. Filters are a way to take raw data, either produced by another program, or stored in a file, and
manipulate it to be displayed in a way more suited to what we are after.
These filters often have various command line options that will modify their behaviour so it is always good to
check out the man page for a filter to see what is available.
In the examples below we will be providing input to these programs by a file but in the section Piping and
Redirection we'll see that we may provide input via other means that add a lot more power.
Let's dive in and introduce you to some of them. (remember, the examples here will only give you a taste of what
is possible with these commands. Make sure you explore and use your creativity to see what else you may do
with them.)
I won't go into too much detail on these. I think they are pretty self explanatory for the most part.
For each of the demonstrations below I will be using the following file as an example. This example file contains a
list of content purely to make the examples a bit easier to understand but realise that they will work the same with
absolutely any other textual data. Also, remember that the file is actually specified as a path and so you may use
absolute and relative paths and also wildcards.
Terminal
1. user@bash: cat mysampledata.txt
2. Fred apples 20
3. Susy oranges 5
4. Mark watermellons 12
5. Robert pears 4
6. Terry oranges 9
7. Lisa peaches 7
8. Susy oranges 12
9. Mark grapes 39
10. Anne mangoes 7
11. Greg pineapples 3
12. Oliver rockmellons 2
13. Betty limes 14
14. user@bash:
head
Head is a program that prints the first so many lines of it's input. By default it will print the first 10 lines but we
may modify this with a command line argument.
head [-number of lines to print] [path]
Terminal
1. user@bash: head mysampledata.txt
2. Fred apples 20
3. Susy oranges 5
4. Mark watermellons 12
5. Robert pears 4
6. Terry oranges 9
7. Lisa peaches 7
8. Susy oranges 12
9. Mark grapes 39
10. Anne mangoes 7
11. Greg pineapples 3
12. user@bash:
Above was head's default behaviour. And below is specifying a set number of lines.
Terminal
1. user@bash: head -4 mysampledata.txt
2. Fred apples 20
3. Susy oranges 5
4. Mark watermellons 12
5. Robert pears 4
6. user@bash:
tail
Tail is the opposite of head. Tail is a program that prints the last so many lines of it's input. By default it will print
the last 10 lines but we may modify this with a command line argument.
tail [-number of lines to print] [path]
Terminal
1. user@bash: tail mysampledata.txt
2. Mark watermellons 12
3. Robert pears 4
4. Terry oranges 9
5. Lisa peaches 7
6. Susy oranges 12
7. Mark grapes 39
8. Anne mangoes 7
9. Greg pineapples 3
10. Oliver rockmellons 2
11. Betty limes 14
12. user@bash:
Above was tail's default behaviour. And below is specifying a set number of lines.
Terminal
1. user@bash: tail -3 mysampledata.txt
2. Greg pineapples 3
3. Oliver rockmellons 2
4. Betty limes 14
5. user@bash:
sort
Sort will sort it's input, nice and simple. By default it will sort alphabetically but there are many options available to
modify the sorting mechanism. Be sure to check out the man page to see everything it may do.
sort [-options] [path]
Terminal
1. user@bash: sort mysampledata.txt
2. Anne mangoes 7
3. Betty limes 14
4. Fred apples 20
5. Greg pineapples 3
6. Lisa peaches 7
7. Mark grapes 39
8. Mark watermellons 12
9. Oliver rockmellons 2
10. Robert pears 4
11. Susy oranges 12
12. Susy oranges 5
13. Terry oranges 9
14. user@bash:
nl
nl stands for number lines and it does just that.
nl [-options] [path]
Terminal
1. user@bash: nl mysampledata.txt
2. 1 Fred apples 20
3. 2 Susy oranges 5
4. 3 Mark watermellons 12
5. 4 Robert pears 4
6. 5 Terry oranges 9
7. 6 Lisa peaches 7
8. 7 Susy oranges 12
9. 8 Mark grapes 39
10. 9 Anne mangoes 7
11. 10 Greg pineapples 3
12. 11 Oliver rockmellons 2
13. 12 Betty limes 14
14. user@bash:
The basic formatting is ok but sometimes you are after something a little different. With a few command line
options, nl is happy to oblige.
Terminal
1. user@bash: nl -s '. ' -w 10 mysampledata.txt
2. 1. Fred apples 20
3. 2. Susy oranges 5
4. 3. Mark watermellons 12
5. 4. Robert pears 4
6. 5. Terry oranges 9
7. 6. Lisa peaches 7
8. 7. Susy oranges 12
9. 8. Mark grapes 39
10. 9. Anne mangoes 7
11. 10. Greg pineapples 3
12. 11. Oliver rockmellons 2
13. 12. Betty limes 14
14. user@bash:
In the above example we have used 2 command line options. The first one -s specifies what should be printed
after the number while the second one -w specifies how much padding to put before the numbers. For the first
one we needed to include a space as part of what was printed. Because spaces are normally used as separator
characters on the command line we needed a way of specifying that the space was part of our argument and not
just inbetween arguments. We did that by including the argument surrounded by quotes.
wc
wc stands for word count and it does just that (as well as characters and lines). By default it will give a count of
all 3 but using command line options we may limit it to just what we are after.
wc [-options] [path]
Terminal
1. user@bash: wc mysampledata.txt
2. 12 36 195 mysampledata.txt
3. user@bash:
Sometimes you just want one of these values. -l will give us lines only, -w will give us words and -m will give us
characters. The example below gives us just a line count.
Terminal
1. user@bash: wc -l mysampledata.txt
2. 12 mysampledata.txt
3. user@bash:
You may combine the command line arguments too. This example gives us both lines and words.
Terminal
1. user@bash: wc -lw mysampledata.txt
2. 12 36 mysampledata.txt
3. user@bash:
cut
cut is a nice little program to use if your content is separated into fields (columns) and you only want certain fields.
cut [-options] [path]
In our sample file we have our data in 3 columns, the first is a name, the second is a fruit and the third an amount.
Let's say we only wanted the first column.
Terminal
1. user@bash: cut -f 1 -d ' ' mysampledata.txt
2. Fred
3. Susy
4. Mark
5. Robert
6. Terry
7. Lisa
8. Susy
9. Mark
10. Anne
11. Greg
12. Oliver
13. Betty
14. user@bash:
cut defaults to using the TAB character as a separator to identify fields. In our file we have used a single space
instead so we need to tell cut to use that instead. The separator character may be anything you like, for instance
in a CSV file the separator is typically a comma ( , ). This is what the -d option does (we include the space within
single quotes so it knows this is part of the argument). The -f option allows us to specify which field or fields we
would like. If we wanted 2 or more fields then we separate them with a comma as below.
Terminal
1. user@bash: cut -f 1,2 -d ' ' mysampledata.txt
2. Fred apples
3. Susy oranges
4. Mark watermellons
5. Robert pears
6. Terry oranges
7. Lisa peaches
8. Susy oranges
9. Mark grapes
10. Anne mangoes
11. Greg pineapples
12. Oliver rockmellons
13. Betty limes
14. user@bash:
sed
sed stands for Stream Editor and it effectively allows us to do a search and replace on our data. It is quite a
powerful command but we will use it here in it's basic format.
sed <expression> [path]
A basic expression is of the following format:
s/search/replace/g
The initial s stands for substitute and specifies the action to perform (there are others but for now we'll keep it
simple). Then between the first and second slashes ( / ) we place what it is we are searching for. Then between
the second and third slashes, what it is we wish to replace it with. The g at the end stands for global and is
optional. If we omit it then it will only replace the first instance of search on each line. With the g option we will
replace every instance of search that is on each line. Let's see an example. Say we ran out of oranges and
wanted to instead give those people bananas.
Terminal
1. user@bash: sed 's/oranges/bananas/g' mysampledata.txt
2. Fred apples 20
3. Susy bananas 5
4. Mark watermellons 12
5. Robert pears 4
6. Terry bananas 9
7. Lisa peaches 7
8. Susy bananas 12
9. Mark grapes 39
10. Anne mangoes 7
11. Greg pineapples 3
12. Oliver rockmellons 2
13. Betty limes 14
14. user@bash:
It's important to note that sed does not identify words but strings of characters. Try running the example above
yourself but replacing oranges with es and you'll see what I mean. The search term is also actually something
called a regular expression which is a means to define a pattern (similar to wildcards we looked at in section 7).
We'll learn more about regular expressions in the next section and you can use them to make sed even more
powerful.
Also note that we included our expression within single quotes. We did this so that any characters included in it
which may have a special meaning on the command line don't get interpreted and acted upon by the command
line but instead get passed through to sed.
Tip
A common mistake is to forget the single quotes in which case you may get some strange behaviour
from the command line. If this happens you may need to press CTRL+c to cancel the program and get
back to the prompt.
uniq
uniq stands for unique and it's job is to remove duplicate lines from the data. One limitation however is that those
lines must be adjacent (ie, one after the other). (sometimes this is not the case but we'll see one way we can fix
this in Section 11 Piping and Redirection).
uniq [options] [path]
Let's say that our sample file was actually generated from another sales program but after a software update it
had some buggy output.
Terminal
1. user@bash: cat mysampledata.txt
2. Fred apples 20
3. Susy oranges 5
4. Susy oranges 5
5. Susy oranges 5
6. Mark watermellons 12
7. Robert pears 4
8. Terry oranges 9
9. Lisa peaches 7
10. Susy oranges 12
11. Mark grapes 39
12. Mark grapes 39
13. Anne mangoes 7
14. Greg pineapples 3
15. Oliver rockmellons 2
16. Betty limes 14
17. user@bash:
No worries, we can easily fix that using uniq.
Terminal
1. user@bash: uniq mysampledata.txt
2. Fred apples 20
3. Susy oranges 5
4. Mark watermellons 12
5. Robert pears 4
6. Terry oranges 9
7. Lisa peaches 7
8. Susy oranges 12
9. Mark grapes 39
10. Anne mangoes 7
11. Greg pineapples 3
12. Oliver rockmellons 2
13. Betty limes 14
14. user@bash:
tac
Linux guys are known for having a funny sense of humor. The program tac is actually cat in reverse. It was named
this as it does the opposite of cat. Given data it will print the last line first, through to the first line.
tac [path]
Maybe our sample file is generated by writing each new order to the end of the file. As a result, the most recent
orders are at the end of the file. We would like it the other way so that the most recent orders are always at the
top.
Terminal
1. user@bash: tac mysampledata.txt
2. Betty limes 14
3. Oliver rockmellons 2
4. Greg pineapples 3
5. Anne mangoes 7
6. Mark grapes 39
7. Susy oranges 12
8. Lisa peaches 7
9. Terry oranges 9
10. Robert pears 4
11. Mark watermellons 12
12. Susy oranges 5
13. Fred apples 20
14. user@bash:
Others
Here are two other programs that are worth investigating if you want to take your knowledge even further. They
are a quite powerfull but also more complex than the programs listed above.
awk (./bonus.php#awk)
diff
Summary
Stuff We Learnt
head
View the first n lines of data.
tail
View the last n lines of data.
sort
Organise the data into order.
nl
Print line numbers before data.
wc
Print a count of lines, words and characters.
cut
Cut the data into fields and only display the specified fields.
sed
Do a search and replace on the data.
uniq
Remove duplicate lines.
tac
Print the data in reverse order.
Important Concepts
Processing
Filters allow us to process and format data in interesting ways.
man pages
Most of the programs we looked at have command line options that allow you to modify their
behaviour.
Activities
Let's mangle some data.
First off, you may want to make a file with data similar to our sample file.
Now play with each of the programs we looked at above. Make sure you use both relative and absolute
paths.
Have a look at the man page for each of the programs and try at least 2 of the command line options for
them.
Permissions (./permissions.php) Grep and Regular Expressions (./grep.php)
By Ryan Chadwick (https://plus.google.com/105636787773904848687) © 2019
Home
(/)
Linux Tutorial
(/linuxtutorial/)
HTML Tutorial
(/html-tutorial/)
Binary Tutorial
(/binary-tutorial/)
Education is the kindling of a flame,
not the filling of a vessel.
- Socrates
Contact (/contact.php) | Disclaimer (/disclaimer.php)
Bash Scripting Tutorial
(/bash-scripting-tutorial/)
CSS Tutorial
(/css-tutorial/)
Regular Expressions
(/regular-expressions-tutorial/)
Programming Challenges
(/programming-challenges/)
Problem Solving
(/problem-solving-skills/)
Boolean Algebra Tutorial
(/boolean-algebra-tutorial/)
Basic Design Tutorial
(/graphic-design-tutorial/)
Solve the Cube
(/rubiks-cube-tutorial/)
Software Design and Development
(/software-design-and-development/)
micro:bit Tutorial
(/microbit-tutorial/)