Linux tutorials:
What is linux ?
Just like Windows XP, Windows 7, Windows 8, and Mac OS X, inux is a free, open-source
operating system. An operating system is software that manages all of the hardware
resources associated with your desktop or laptop. To put it simply – the operating system
manages the communication between your software and your hardware. Without the
operating system (often referred to as the “OS”), the software wouldn’t function.
Linux commands :
Linux filesystems are based on a directory tree. This means that you can create
directories (or "folders") inside other directories, and files can exist in any directory.
To see what directory you are currently active in:
pwd
This stands for "print working directory", and will print the path to your current
directory. The output can look similar to this:
/home/arraygen
whoami
--the easiest way to find out your username.
uname
--operating system.
To see other files and directories that exist in your current working directory:
ls (list) command is used.
To get a long listing of files which contains all the details such as file size,
permissions,
Time stamp, user details sorted by the modification time, use ls -l as follows.
ls -l : to list contents in long format i.e. with owner, permissions etc.
ls -a : to see all hidden files.
ls dir : list all items in directory dir.
ls *.fasta ->display the name of all fasta files in the given folder.
With the man command, you can retrieve the information in the manual and display it as
text output on your screen.
man : for getting help for command.
For eg. man ls
Creating Directories :
mkdir : to make directory.
mkdir dirname
--this wil create new directory
User can make two or many directories in same command.
For eg. mkdir demo1 demo2 demo3
Removing Directories:
rmdir : delete directory
rmdir dirname
--this will delete the directory
Changing Directories:
cd : go in directory anytime.
For eg. cd ArrayGen
cd ~ : go to your home directory.
cd - : to go in your last directory.
cd .. : come out of directory.
Renaming Directories:
The mv (move) command can also be used to rename a directory.
mv : to change the name of a file
mv old_file new_file
For eg. mv filename newfile
--rename existing file filename to newfile
Creating file:
For creating new file, use gedit.
gedit file_name.txt
cp : to make a copy of a file
cp file1 file2
The content of file1 is getting copied into file2.
rm : remove file
rm file_name
You can remove multiple files at a tile as follows:
rm filename1 filename2 filename3
Download .fasta file from NCBI nucleotide database and .fastq file from NCBI
SRA database. Save it in the directory.
cat stands for "catenate." It reads data from files, and outputs
their contents. It is the simplest way to display the contents of a file at
the command line.
cat : to display content of file.
cat is one of the most commonly-used commands in Linux. It can be used to:
Display text files
cat sequence.fasta
Copy text files into a new document
cat mytext.txt > newfile.txt
Similarly, you can catenate several files into your destination file.
For eg. cat mytext.txt mytext2.txt > newfile.txt
Append the contents of a text file to the end of another text file.
Instead of overwriting another file, you can also append a source text file to another
using the redirection operator ">>".
cat mytext.txt >> another-text-file.txt
This works for multiple text files as well:
cat mytext.txt mytext2.txt >> another_file.txt
Concatenate all files such as (.txt, .fa, .fasta) in one file.
If you download all chromosome sequence fasta files and you want to concatenate into a
single file as reference then use following command.
cat *.fa/txt/fasta >> output.fa/txt
wc : to get a count of the total number of lines, words, and
characters contained in a file.
wc sequence.fasta
Output: 77737 77741 5519193 sequence.fasta
Here are the details of all the four columns:
1. First Column: represents total number of lines in the file.
2. Second Column: represents total number of words in the file.
3. Third Column: represents total number of bytes in the file. This is actual size of
the file.
4. Fourth Column: represents file name.
wc -l : line count
wc -w: word count
wc -m : character count
wc -c : byte count
wc -L : length of the longest line
For eg. wc -l sequence.fasta
tr : Replaces or removes specific sets of characters within files.
To replace a characters ATGC in the entire .fasta file for characters TACG type (to make
complement).
tr 'ATGC' 'TACG' < test.fasta > demo1.fasta
To replace lower case to upper case in fasta sequence type:
tr "[:lower:]" "[:upper:]" < test.fasta > demo2.fasta
OR
tr '[:lower:]' '[:upper:]' < test.fasta > demo2.fasta
To replace upper case to lower case in fasta sequence type:
tr "[:upper:]" "[:lower:]" < test.fasta > demo3.fasta
OR
tr '[:upper:]' '[:lower:]' < test.fasta > demo3.fasta
grep command:
Match regular expression in files. Use grep function for matching pattern.
For eg. grep “pattern” filename
grep “>” test.fasta
--It will print the header of test.fasta file which starts with > sign.
Head and tail command:
The head command reads the first few lines of any text given to it as an input and
writes them to standard output (which, by default, is the display screen).
head's basic syntax is:
head [options] [file(s)]
head test.fasta
--Output the first 10 lines of the file test.fasta
head -15 test.fasta
--Display first 15 lines of file test.fasta
head -n 100 test.fasta
--Output the first 100 lines of the file test.fasta
The tail command reads the last few lines of any text given to it as an input
and writes them to standard output (which, by default, is the display
screen).
tail's basic syntax is:
tail [options] [file(s)]
tail test.fasta
--Output the last 10 lines of the file test.fasta
tail -15 test.fasta
--Display last 15 lines of file test.fasta
tail -n 100 test.fasta
--Output the last 100 lines of the file test.fasta
tail -n +2 test.fasta
--The above command prints the file from line 2. This is to remove header from a file.
(To skip first few lines in the file (e.g. to remove header line of the file))
Compression of file :
Two most common tools for compressing files gzip and bzip2.
gzip sra_data.fastq
--The extension of output file would be .gz after compressing.
bzip2 sra_data.fastq
--The extension of output file would be .bz2 after compressing.
Uncompression of file :
Two most common tools for uncompressing files gunzip and bunzip2.
For eg.
gunzip sra_data.fastq
--this will uncompress gzip files
bunzip2 sra_data.fastq
--this will uncompress bzip2 files
Compression of file in .tar format:
tar -zcvf outfile.fastq.tgz sra_data.fastq
OR
tar -jcvf outfile.fastq.tbz2 sra_data.fastq
z : tells tar to compress the archive using gzip
c : create a new archive
v : verbosely list files processed
f : read from a file
j : deal with bzipped file
Uncompression tar of file:
tar -xjf filename
--to unzip tar.bz2 file
tar -xzf filename
--to unzip tar.tgz file
x : extract
j : deal with bzipped file
f : read from a file
z : tells tar to uncompress the archive using gzip
tar --help will give you more options and info
exit : logout the current session
To search packages :
sudo aptitude search packagename
--this will search pakages
apt-cache search . : to search all packages install in your system.
Installation of NGS tools by commands
sudo apt-get install PACKAGE_NAME
--this will install pakages on system
If aptitude is not installed in your system then first install aptitude by using following
command -
sudo apt-get install aptitude
sudo is used to earn root access and be able to install and remove software. sudo
is always required if you do system wide changes like installing, removing, updating
and upgrading packages.
apt-get
The above command used to manage any software and software sources. Install is an
extra command that tells the computer that you want to install software with the package
name as follows. It will then check the software sources for a download link with the same
name and then download and install the latest version (or specified version).
sudo aptitude
The aptitude package is a GUI version of the apt-get command, it hasn't got the full set of
features as apt-get but you have the basics like, remove, update, upgrade, install, etc.
sudo apt-get install samtools
sudo apt-get install fastqc
sudo apt-get install bwa
sudo apt-get install bowtie
Install Cutadapt :
Download cutadapt from following website and extract the folder.
https://pypi.python.org/pypi/cutadapt
To install Cutadapt use following commands:
sudo apt-get install python-dev
sudo apt-get install python-pip
sudo pip install cutadapt
● The following command is use to provide permmission to read, write and execute.
sudo chmod -R 777 path of extracted folder
sudo gedit /etc/bash.bashrc
The above command will open bash.bashrc file. At the end of file please type following line
and save it.
export PATH=$PATH:/path of extracted cutadapt folder
Installation with jar files for snpEff & GATK
java -jar path of jar file --help
If java is not installed then first install java using following command-
sudo apt-get install default-jre
Installation of R and RStudio in linux
Following are the commands for installation
1. sudo apt-get install r-base
Always first install R and then install RStudio. Download RStudio from following website
https://www.rstudio.com/products/rstudio/download/
or we can download using following wget command for 64 bit.
wget –c https://download1.rstudio.org/rstudio-0.99.491-amd64.deb
RStudio is .deb file, for .deb file use following command.
2. sudo dpkg –i packagename
wget :
wget stands for "web get". It is a command-line utility which downloads files over a
network.
If your operating system is Ubuntu, or another Debian-based
Linux distribution which uses APT for package management, you
can install wget with apt-get:
Installing wget :
sudo apt-get install wget
Syntax :
wget [option]... [URL]...
eg: Download R studio
wget https://download1.rstudio.org/rstudio-0.99.902-amd64.deb
Shortcuts :
ctrl+a :move cursor to beginning of line
ctrl+f : move cursor to end of line
alt+f : move cursor forward 1 word
alt+b : move cursor backward 1 word
ctrl+c : halts the current command
ctrl+z : stops the current command
ctrl+d : logout the current session, similar to exit
ctrl+l : clear screen
clear : to clear your screen or control+l
logout : the system will clean up everything and break the connection.
shutdown : Shuts down the system.