NOTES
WORKING WITH CSV FILES
INTRODUCTION
When you export data from Website, the files can be delivered in the
following formats
1. CSV - COMMA SEPARATED VALUES
2. PSV - PIPE SEPARATED VALUES
3. TSV - TAB SEPARATED VALUES
CSV - COMMA SEPARATED VALUES
A CSV(Comma Separated Values) is a plain-text file format used to store tabular
data such as a spreadsheet or a database. It essentially stores a tabular data which
comprises of numbers and text into plain text.
WHAT IS A CSV FILE?
A CSV(Comma Separated Values) is a plain-text file format used to store
tabular data such as a spreadsheet or a database. It essentially stores a tabular data
which comprises numbers and text into plain text. Many online services allow its
users to export tabular data from the website into a CSV file.
Files of CSV will open into Excel, and nearly all databases have a tool to allow
import from CSV file. The standard format is defined by rows and columns data.
Moreover, each row is terminated by a newline to begin the next row. Also within the
row, each column is separated by a comma.
The CSV file is commonly used to represent tabular data. For example,
consider the following table:
To retain the commas inside the Address field enclose it in double quotation
marks, as follows:
Tabular data is represented by comma separated values
Page no 1
RollNo,Name,Age,Class,Sec
1023,Vishwa,14,VII,A
3078,Varun,16,X,B
3543,Gowtham,15,X,C
Every line of the file is called a record. And each record consists of fields that
are separated by commas which are also known as “delimiter” which is the default
delimiter, others include pipe(|), semicolon(;). Given below is a structure of a Normal
CSV File separated by a comma
WHY IS CSV FILE FORMAT USED?
CSV is a plain-text file which makes it easier for data interchange and also
easier to import onto spreadsheet or database storage.
For example: You might want to export the data of a certain statistical analysis
to CSV file and then import it to the spreadsheet for further analysis. Overall it makes
users working experience very easy programmatically. Any language supporting a
text file or string manipulation like Python can work with CSV files directly.
CSV ADVANTAGES
✔ CSV is human readable and easy to edit manually
✔ CSV is simple to implement and parse
✔ CSV is processed by almost all existing applications
✔ CSV provides a straightforward information schema
✔ CSV is faster to handle
✔ CSV is smaller in size
✔ CSV is considered to be standard format
✔ CSV is compact.
✔ CSV is easy to generate
CSV DISADVANTAGES
✔ CSV allows to move most basic data only. Complex configurations
cannot be imported and exported this way
✔ There is no distinction between text and numeric values
Page no 2
✔ No standard way to represent binary data
✔ Problems with importing CSV into SQL (no distinction between NULL and
quotes)
✔ Poor support of special characters
✔ No standard way to represent control characters
✔ Lack of universal standard
PYTHON CSV MODULE
Python uses a CSV package which is a part of the standard library, so you need
not install it.
Python CSV package contains following functions:
Ser Functions Description
1 csv.field_size_limit It returns the maximum field size
2 csv.get_dialect Fetches the dialect associated with name
3 csv.list_dialects Displays all the registered dialects
4 csv.reader Read data from csv file
5 csv.register_dialect Dialect associated with a name
6 csv.writer Writes data to a csv file
It deletes the dialect associated with the
7 csv.unregister_dialect name dialect registry
8 csv.QUOTE_ALL Quotes everything irrespective of the type
9 csv.QUOTE_MINIMAL Quotes special character field
10 csv.QUOTE_NONNUMERIC Quotes fields that are not numeral
11 csv.QUOTE_NONE Doesn’t quote anything in output
Page no 3
READING AND WRITING FILES
The csv module is used for reading and writing files. It mainly provides following
classes and functions:
1. reader()
2. writer()
1.READING A CSV FILE WITH reader()
The reader() function takes a file object and returns a _csv.reader object that can be
used to iterate over the contents of a CSV file. The syntax of reader() function is as
follows:
Syntax: csvreaderobject = csv.reader(fileobject)
Let's say we have following CSV file:
employees.csv
1id,name,email,age,designation
21,John,john@mail.com,24,programmer
32,Bob,bob@mail.com,34,designer
43,Mary,mary@mail.com,43,sales
import csv
with open('employees.csv', 'rt') as f:
csv_reader = csv.reader(f)
for line in csv_reader:
print(line)
Here is how to read this CSV file:
Expected Output:
1['id', 'name', 'email', 'age', 'designation']
2['1', 'John', 'john@mail.com', '24', 'programmer']
3['2', 'Bob', 'bob@mail.com', '34', 'designer']
4['3', 'Mary', 'mary@mail.com', '43', 'sales']
Notice that each line in the CSV file is returned as a list of strings.
Page no 4
To get the data from certain fields, you can use indexing. For example:
1import csv
2
3with open('employees.csv', 'rt') as f:
4 csv_reader = csv.reader(f)
5
6 for line in csv_reader:
7 print(line[0], line[1], line[2])
Expected Output:
id name email
11 John
2john@mail.com
32 Bob bob@mail.com
43 Mary
mary@mail.com
WRITING CSV FILES WITH writer()
To write data to a CSV file we use the writer() function. It accepts the same
argument as the reader() function but returns a writer object (i.e _csv.writer):
Syntax: csvwriterobject= csv.writerow(fileobject)
The writer instance provides the following two methods to write data:
Method Description
writerow(row) Writes a single row of data and returns the number of characters
written. The row must be a sequence of strings and number.
writerows(rows Writes multiple rows of data and returns None. The rows must be
) a sequence.
Here are examples:
Example 1: Using writerow()
import csv
header = ['id', 'name', 'address', 'zip']
rows = [
[1, 'Hannah', '4891 Blackweet, Anchorage, Alaska', 99503
],
Page no 5
[2, 'Walton', '4223 Half and Half Drive, Lemoore, California',
97401 ],
[3, 'Sam', '3952 Littlt, Akron, Ohio', 93704],
[4, 'Chris', '3192 Fln Road, Arlington, Illinois',
62677],
[5, 'Doug', '3236 Walkers Ridge Way, Burr Ridge',
61257],
]
with open('customers.csv', 'wt') as f:
csv_writer = csv.writer(f)
csv_writer.writerow(header) # write header
for row in rows:
csv_writer.writerow(row)
Example 2: Using writerows()
import csv
header = ['id', 'name', 'address', 'zip']
rows = [
[1, 'Hannah', '4891 Bltreet, Anchorage, Alaska',
99503 ],
[2, 'Walton', '4223 Hrive, Lemoore, California',
97401 ],
[3, 'Sam', '3952 Little Street, Akron, Ohio', 93704],
[4, 'Chris', '3192 Fli Road, Arlingto, Illinois',
62677],
[5, 'Doug', '3236 Walkers Ridge Way, Burr Ridge',
61257],
]
with open('customers.csv', 'wt') as f:
csv_writer = csv.writer(f)
csv_writer.writerow(header) # write header
csv_writer.writerows(rows)
The output generated by both listing will be the same and it looks like this:
customers.csv
id,name,address,zip
1,Hannah,"4891 Blackwell Street, Anchorage, Alaska",99503
2,Walton,"4223 Half and Half Drive, Lemoore,
California",97401
Page no 6
3,Sam,"3952 Little Street, Akron, Ohio",93704
4,Chris,"3192 Flinderation Road, Arlington Heights,
Illinois",62677
5,Doug,"3236 Walkers Ridge Way, Burr Ridge",61257
Page no 7