1.
Data Handling Using Pandas
Python module- A python module is a python script file(.py file) containing variables, python classes,
functions, statements etc.
Python Library/package- A Python library is a collection of modules that together cater to a specific type of
need or application. The advantage of using libraries is that we can directly use functions/methods for
performing specific type of application instead of rewriting the code for that particular use. They are used by
using the import command as-
import libraryname
at the top of the python code/script file.
Some examples of Python Libraries-
1. Python standard library-It is a collection of library which is normally distributed along with Python
installation. Some of them are-
a. math module- provides mathematical functions
b. random module- provides functions for generating pseudo-random numbers.
c. statistics module- provides statistical functions
2. Numpy (Numerical Python) library- It provides functions for working with large multi-dimensional
arrays(ndarrays) and matrices. NumPy provides a large set of mathematical functions that can
operate quickly on the entries of the ndarray without the need of loops.
3. Pandas (PANel + DAta) library- Pandas is a fast, powerful, flexible and easy to use open source data
analysis and manipulation tool. Pandas is built on top of NumPy, relying on ndarray and its fast and
efficient array based mathematical functions.
4. Matplotlib library- It provides functions for plotting and drawing graphs.
Data Structure- Data structure is the arrangement of data in such a way that permits efficient access and
modification.
Pandas Data Structures- Pandas offers the following data structures-
a) Series - 1D array
b) DataFrame - 2D array
c) Panel - 3D array (not in syllabus)
Series- Series is a one-dimensional array with homogeneous data.
Index/Label
0 1 2 3 4
abc def ghi Jkl mno
1D Data values
Key features of Series-
A Series has only one dimension, i.e. one axis
Each element of the Series can be associated with an index/label that can be used to access the data
Series is data mutable i.e. the data values can be changed in-place in memory
Series is size immutable i.e. once a series object is created in memory with a fixed number of
elements, then the number of elements cannot be changed in place. Although the series object can
be assigned a different set of values it will refer to a different location in memory.
All the elements of the Series are homogenous data i.e. their data type is the same. For example.
0 1 2 3 4
all data is of int type
223 367 456 339 927
a b c de fg
all data is of object type
1 def 10.5 Jkl True
Creating a Series- A series object can be created by calling the Series() method in the following ways-
a) Create an empty Series- A Series object not containing any elements is an empty Series. It can be
created as follows-
import pandas as pd
s1=pd.Series()
print(s1)
o/p-
Series([], dtype: float64)
b) Create a series from array without index- A numpy 1D array can be used to create a Series object as
import pandas as pd
import numpy as np
a1=np.array(['hello', 'world', 'good', np.NaN])
s1=pd.Series(a1)
print(s1)
o/p-
0 hello
1 world
2 good
3 nan
dtype: object
c) Create a series from array with index- The default index for a Series object can be changed and
specified by the programmer by using the index parameter and enclosing the index in square
brackets. The number of elements of the array must match the number of index specified otherwise
python gives an error.
#Creating a Series object using numpy array and specifying index
import pandas as pd
import numpy as np
a1=np.array(['hello', 'world', 'good', 'morning'])
s1=pd.Series(a1, index=[101, 111, 121, 131])
print(s1)
o/p-
101 hello
111 world
121 good
131 morning
dtype: object
d) Create a Series from dictionary- Each element of the dictionary contains a key:value pair. The key of
the dictionary becomes the index of the Series object and the value of the dictionary becomes the
data.
#4 Creating a Series object from dictionary
import pandas as pd
d={101:'hello', 111:'world', 121:'good', 131:'morning'}
s1=pd.Series(d)
print(s1)
o/p-
101 hello
111 world
121 good
131 morning
dtype: object
e) Create a Series from dictionary, reordering the index- When we are creating a Series object from a
dictionary then we can specify which all elements of the dictionary, we want to include in the Series
object and in which order by specifying the index argument while calling the Series() method.
If any key of the dictionary is missing in the index argument, then that element is not added
to the Series object.
If the index argument contains a key not present in the dictionary then a value of NaN is
assigned to that particular index.
The order in which the index arguments are specified determines the order of the elements
in the Series object.
#5 Creating a Series object from dictionary reordering the index
import pandas as pd
d={101:'hello', 111:'world', 121:'good', 131:'morning'}
s1=pd.Series(d, index=[131, 111, 121, 199])
print(s1)
o/p-
131 morning
111 world
121 good
199 NaN
dtype: object
f) Create a Series from a scalar value- A Series object can be created from a single value i.e. a scalar
value and that scalar value can be repeated many times by specifying the index arguments that
many number of times.
#6 Creating a Series object from scalar value
import pandas as pd
s1=pd.Series(7, index=[101, 111, 121])
print(s1)
o/p-
101 7
111 7
121 7
dtype: int64
g) Create a Series from a List- A Series object can be created from a list as shown below.
#7 Creating a Series object from list
import pandas as pd
L=['abc', 'def', 'ghi', 'jkl']
s1=pd.Series(L)
print(s1)
o/p-
0 abc
1 def
2 ghi
3 jkl
dtype: object
h) Create a Series from a Numpy Array (using various array creation methods) - A Series object can be
created from a numpy array as shown below. All the methods of numpy array creation can be used
to create a Series object.
#7a Creating a Series object from list
import pandas as pd
import numpy as np
#a. Create an array consisting of elements of a list [2,4,7,10, 13.5, 20.4]
a1=np.array([2,4,7,10, 13.5, 20.4])
s1=pd.Series(a1)
print('s1=', s1)
#b. Create an array consisting of ten zeros.
a2=np.zeros(10)
s2=pd.Series(a2, index=range(101, 111))
print('s2=', s2)
#c. Create an array consisting of five ones.
a3=np.ones(5)
s3=pd.Series(a3)
print('s3=', s3)
#d. Create an array consisting of the elements from 1.1, 1.2, 1.3,1.4, 1.5, 1.6, 1.7
a4=np.arange(1.1,1.8,0.1)
s4=pd.Series(a4)
print('s4=', s4)
#e. Create an array of 10 elements which are linearly spaced between 1 and 10 (both inclusive)
a5=np.linspace(1,10,4)
s5=pd.Series(a5)
print('s5=', s5)
#f.
a6=np.fromiter('helloworld', dtype='U1')
s6=pd.Series(a6)
print('s6=', s6)
o/p:
s1= 0 2.0
1 4.0
2 7.0
3 10.0
4 13.5
5 20.4
dtype: float64
s2= 101 0.0
102 0.0
103 0.0
104 0.0
105 0.0
106 0.0
107 0.0
108 0.0
109 0.0
110 0.0
dtype: float64
s3= 0 1.0
1 1.0
2 1.0
3 1.0
4 1.0
dtype: float64
s4= 0 1.1
1 1.2
2 1.3
3 1.4
4 1.5
5 1.6
6 1.7
dtype: float64
s5= 0 1.0
1 4.0
2 7.0
3 10.0
dtype: float64
s6= 0 h
1 e
2 l
3 l
4 o
5 w
6 o
7 r
8 l
9 d
dtype: object
Operations on Series objects-
1. Accessing elements of a Series object
The elements of a series object can be accessed using different methods as shown below-
a) Using the indexing operator []
The square brackets [] can be used to access a data value stored in a Series object. The index
of the element must be entered within the square brackets. If the index is a string then the
index must be written in quotes. If the index is a number then the index must be written
without the quotes. Attempting to use an index which does not exist leads to error.
#8 Accessing elements of Series using index
import pandas as pd
d={101:'hello', 'abc':'world', 121:'good', 131:'morning'}
s=pd.Series(d)
print(s['abc'])
print(s[131])
o/p-
world
morning
b) Using the get() method
The get() method returns the data value associated with an index.
Syntax: seriesobject.get(key, default=None)
The first argument to the get method is the index of the element which we want to access.
Here if the key/index is not present in the series object and the second argument is not
specified then None is returned. If the key is not present and we want some default value to
be returned then it is specified using the default argument.
#9 Accessing elements of Series using get() method
import pandas as pd
d={101:'hello', 'abc':'world', 121:'good', 131:'morning'}