Data file handling
• A file is a bunch of bytes stored on some storage devices like hard disk.
• Data files is used to store information permanently.
NEED FOR DATA FILE HANDLING
● Generally in every programming languages, all the values or
data are stored in some variables which are volatile in nature.
i.e, data will be stored into these variables during run-time
only and will be lost once the program execution is
completed.
● In order to save these data permanently for future use, we
need to use data files.
File Object
• The file operations in Python make use of the file object tool as an interface
between the programs(RAM) and the files(SSD).
• A file object is a reference to a file on disk. It is also known as File-handle.
• All the functions that you perform on a data file are performed through file-
objects only.
• Python stores the reference of mentioned file in the file object.
• A file object of Python is a stream of bytes where the data can be read either byte
by byte or line by line or collectively.
Program-Console and Program-File interaction
RAM Write data to file SSD
Program + data Data files
Read data from file
Input device output device
(keyboard) (screen)
Program-Console interaction Program-File interaction
Console I/O Data stream
Input stream
Input devices Extraction
(Source) from input
stream
RAM
Insertion
into output
stream
Output devices
(destination)
Output stream
File Input and Output data streams
SSD Read data from file
Write data to file
Disk file
to file File to
I
memory
memory
O n
u p
t u
p t
u s
t t
s r
t e
r a
e m
a
m
RAM
Program
Data output Data input to program
Python allow us to create and manage three types of
files :
1. TEXT FILE
2. BINARY FILE
3. CSV (Comma Separated Values) FILES
TEXT FILE
● A text file is structured as a sequence of lines.
● Line is a sequence of characters (ASCII or UNICODE)
● Stores information in ASCII or Unicode characters.
● Each line of text is terminated by a special character known as End Of Line character
(delimiter). Some internal translations takes place when this EOL is read or written. In Python,
by default, this EOL character is the newline character(‘\n’) or carriage return, newline
combination(‘\r\n’).
● Text files are stored in human readable form and they can also be created using any text
editor.
● Files are by default treated as text files. File extension of text files is .txt
BINARY FILE
● A file that contains information in the same format in which information is held in
memory. Generally they are used to store record wise data or information.
● Binary file contains arbitrary binary data.
● So when we work on binary file, we have to interpret the raw bit pattern(s) read
from the file into correct type of data in our program.
● Python provides special module(s) for encoding and decoding of data for binary file.
● There is no delimiter for a line and so no internal translation for end of line occurs in
binary files. As a result, they are faster and easier for the program to read and write
than any text file. File extension of this file is generally .dat.
CSV FILES
● CSV stands for Comma Separated Values.
● CSV is just like a text file, in a human readable format which is extensively
used to store tabular data, in a spreadsheet (like Ms Excel) or database.
● The separator character of CSV files called a delimiter is by default comma
(,). Other delimiters are tab (\t), colon (:), pipe (|), semicolon (;) characters.
STEPS TO PROCESS A FILE
1. Determine the type of file usage.
• Reading purpose : If the data is to be brought in from a file to
memory
• Writing purpose : If the data is to be sent from memory to file.
2. Open the file and assign its reference to a file object or file-handle.
3. Process the file as required : Perform the desired operation from
11
the file.
Python provides built-in functions to perform various file manipulation
tasks .
4. Close the file.
Opening Files
• To work with files within a Python program, we need to open it in a specific mode as per the file
manipulation task we want to perform.
• Syn to Open a file:
<fileobject>=open(<filename>, <mode>)
• Filename can be simply a name given to the data file. In this case that file will be stored/searched in
the current folder. We can specify the filename along with required path if needed to store/search in
some different location of your harddisk.
• Mode is optional parameter. Default mode is read mode. If we need to write, append etc, then we
need to specify the appropriate mode. A filemode governs the type of operations possible in the
opened file. i.e it refers to how the file will be used once it is opened.
More about the parameter - Filename
• Open using the default mode – i.e read mode
Without specifying path- i.e in the default current directory
f= open(“File1.txt")
Specifying the path of the file- (Make sure that the folder is existing in the specified path)
f= open("e:\\testfiles\\file1.txt")
Because slashes have special meaning and to suppress the special meaning escape sequence for slash is given
i.e \\
f= open(r"e:\testfiles\file1.txt")
The prefix r in front of a string makes it a raw string, that means there is no special meaning attached to any
character
• Open specifying the mode required-
f= open("e:\\testfiles\\file1.txt","w+")
Various Modes available for opening a file
Mode Description
“r” Read Default value. Opens a file for reading. Eerror if the file does not exist.
“w” Write Opens a file for writing, creates the file if it does not exist
“a” Append Opens a file for appending, creates the file if it does not exist
“r+” Read and Write File must exist otherwise error is raised.Both reading and writing operations
can take place.
“w+” Write and Read File is created if it does not exist. If the file exists past data is lost
(truncated).Both reading and writing operations can take place.
“a+” Append and Read File is created if it does not exist. If the file exists past data is retained and new
data are added .Both reading and writing(appending) operations can take
place.
Note: We Can specify if the file should be handled as binary or text Mode :
• "t" - Text - Text mode. But it is Default. Eg: “rt”, “at”, “r”,”a”
• "b" - Binary - Binary mode Eg: “rb”, “ab”, “rb+”
Closing Files
• An open file must be closed invoking the method close() with respective file object.
• Files are automatically closed at the end of the program, but operating system may not write
the data to the file until it is closed.
• Hence to boost the performance, preventing any loss of data that could occur due to
unexpected program exit, it is good to close the file explicitly using this method.
• Syn : <fileobject> . close()
• Eg: f.close()
Note:
• open() is a function used standalone, while close() is a method used with respective file
object.
• It is important to close all the files opened within a program, because due to buffering, the
changes made to a file may not show until you close the file.
Writing Textfiles
• write() – Writes the parameter string to the textfile to which the fileobject is
referencing.
• Syn - <fileobj>.write(string) #string can be direct string within quotes or can be name of the
variable (without quotes) that contains the string to write.
• Eg:
• f.write(“Testing”) or
• St=“Testing”
f.write(St)
• writelines() – writes all the strings in the list (given as parameter) as lines to the file
referenced by the file object
• Eg:
• L=["line1","line2","line3"]
f.writelines(L)
Reading Text Files
• read() – reads n bytes of characters from the file referenced by the file object.
Syn: <var>=<fileobject>.read(n) # if n is not specified then it reads the entire file
Eg:
Str=f.read() – entire file contents are read
Str=f.read(10) – 10 bytes are read from the file
• readline() – reads a line of textfile. If parameter is given then it reads that n bytes.
Eg:
Str=f.readline() #reads one line
Str=f.readline(3) #reads only 3 bytes from a line. It doesn’t continue reading multiple lines to reach the number of
bytes specified
• readlines() – reads all the lines and returns them in a list if no parameter is given. But, if
size n is specified as parameter then it continues reading n number of bytes in different
lines.
Eg:
Str=f.readlines() #Str is a list variable. Reads all the lines from the file
Str=f.readlines(1) #Reads one line from the file