EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
DATA HANDLING USING PANDAS–I – SERIES
INTRODUCTION: The process of analyzing a large set of data, which enables to answering questions related to
data set, is known as Data Science or Data Analytics.
Data Analytics is necessary to handle huge data. Before analyzing data, the data is to be processed as the
data may not be readily available for analyzing. The data is generally available in different formats like CSV file,
Excel file, HTML file etc. and all these formats are to be converted into a single format.
The analysis of data will have sequence of steps like converting data of different types in to one type,
storing it, performing operations like join, merge, search etc. and plotting data in form of a graph. Python
supports different libraries for all these sequence of operations for data analysis.
Python Pandas is a library that enables data analysis, with various methods available in it
PANDAS: It is a high–level data manipulation tool developed by Wes McKinney for data analysis and
visualization work. It offers powerful and flexible data structures to make data analysis and manipulation easy.
The term ‘Pandas’ is derived from ‘Panel data system’, which is a term used for multidimensional, structured
data set. Pandas provide easy to use data structures and data analysis tools.
Features of Pandas: Pandas is the most popular library in scientific Python ecosystem for doing data analysis.
Pandas can handle several tasks related to data processing and offers the following features
It can read or write in many different data formats like integers, float, double etc.
Columns from a Pandas data structure can be deleted or inserted
It supports group by operation for data aggregation and transformations, and allows high performance
merging and joining of data
It offers good I/O capabilities as it easily data from a MySQL database directly into a dataframe
It can easily select subsets of data from bulky datasets and can even combine multiple data sets together
It has the functionality to find and fill missing data
It allows to apply operations to independent groups within the data
It supports reshaping of data into different forms
It supports advanced time–series functionality, which is the use of a model to predict future values based
on previously observed values
It supports visualization by integrating libraries such as matplotlib and seaborn etc. Pandas is best at
heading huge tabular datasets comprising different data formats
INSTALLING PANDAS: The procedure for installing Pandas is as follows
Step 1: Open Command Prompt as an Administrator
Step 2: Type cd\ to move to the root directory
Step 3: Type the following command by ensuring internet connectivity
pip install pandas
DATA STRUCTURES IN PANDAS: A data structure is a specialized format for organizing, processing, retrieving
and storing data. Python Pandas provides three data structures namely, Series, Dataframes and Panel
Series: It is a one–dimensional structure storing homogeneous(all data elements of same type) mutable
data
Dataframes: It is a two–dimensional structure storing heterogeneous(data elements may be of different
data types) mutable data
Panel: It is a three–dimensional way of storing items
SERIES: A series is a one–dimensional array like structure with homogeneous data. i.e. all the data elements in
the series are of same type. However, the data elements may be of any type like integer, string, float, object etc.
Ex1: 10 23 56 17 52 61 73 26
Ex2: 1.5 2.6 38.5 45.2 9.7 2.0 3.8 6.4
Ex3: App Box Car Doll ENT 1234 CBSE Mango
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
Jawahar Navodaya Vidyalaya, Chittoor 2
A series can also be described as an ordered dictionary with mapping of index values to data values
Index Data Index Data Index Data
0 22 Jan 31 Sunday 1
1 –14 Feb 28 Monday 2
2 52 Mar 31 Tuesday 3
3 100 April 30 Wednesday 4
Some characteristics of Series are,
All data elements in a series are homogeneous i.e. of same data type
The size of series is immutable i.e. the size of series is not alterable. Hence, it is not possible to add or
remove data elements after creating a series
The values of data are mutable i.e. the values of data elements can be changed in a series
Creating a Series: A series can be created by using Series( ) method with various inputs like (i) List (ii) Scalar
Value or Constant (iii) Dictionary (iv) Array etc.
To use Series( ) method to create a series, the library “pandas” is to be imported using the import
statement, like below
import pandas (Or)
import pandas as pd
1. Creating an Empty Series: An empty series can be created by using the Series( ) function, without any
parameters.
Syntax : import pandas as pd
<Series_Object> = pd.Series( )
Ex : >>>mtsrs = pd.Series( )
>>> mtsrs
Series([ ], dtype: float64)
Here,
mtsrs is the series variable
Series( ) method creates an empty list, with default data type
The dtype indicates the data type of the elements of the series
pd is an alternate name given to the pandas module. Hence, instead of the module name ‘pandas’
the short name ‘pd’ can be used
2. Creating a Series using List: A list can be passed as an argument to Series( ) function to create a series.
The syntax for creating a series using list is,
Syntax : import pandas as pd
<Series_Object> = pd.Series(data, index=idx )
Here, data can be a list, or dictionary or scalar value
Index is the numeric value displayed with given values. Providing index is
optional, and the default index starts from 0
Ex : >>> daysinmonths=pd.Series([31,28,31,30,31,30,31,31,30,31,30,31])
>>> daysinmonths
0 31
1 28
2 31
3 30
4 31
5 30
6 31
7 31
8 30
9 31
10 30
11 31
dtype: int64
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
Jawahar Navodaya Vidyalaya, Chittoor 3
When the index is not provided, the default index starts from 0 and ranges up to len–1. However, index
can also be provided while creating a series using the argument index
Ex: >>> srs_week=pd.Series(["Sun","Mon","Tue","Wed","Thu","Fri","Sat"], index=[1,2,3,4,5,6,7])
>>> srs_week
1 Sun
2 Mon
3 Tue
4 Wed
5 Thu
6 Fri
7 Sat
Index can be assigned to a series at the time of creating the series or even after creating series
Ex : >>> srs_nat=pd.Series([1,2,3,4,5])
>>> srs_nat
0 1
1 2
2 3
3 4
4 5
>>> srs_nat.index=["First","Second","Third","Fourth","Fifth"]
>>> srs_nat
First 1
Second 2
Third 3
Fourth 4
Fifth 5
If a single value is in float in series, then the rest of the integer values will be converted into float and
hence when the series was displayed, it will be displayed as a float series
Ex: >>> srs_test=pd.Series([2,5,8,9.4,18])
>>> srs_test
0 2.0
1 5.0
2 8.0
3 9.4
4 18.0
dtype: float64
3. Creating Series by providing data with range( ) function: The sequence of values generated using
range( ) function can be used to create a Series
Ex: >>> srs_data=pd.Series(range(3,20,4))
>>> srs_data
0 3
1 7
2 11
3 15
4 19
4. Create Series from Scalar or Constant Value: A series can be created for a scalar or constant value. In
this case, it is possible to provide only one scalar value
Ex: >>> srs_const=pd.Series(18)
>>> srs_const
0 18
dtype: int64
If index is provided that index will be applicable to the scalar value and if more indices provided all the
indices will have the same scalar value
Ex: >>> srs_const=pd.Series(18,['h','i','j','k'])
>>> srs_const
h 18
i 18
j 18
k 18
dtype: int64
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
Jawahar Navodaya Vidyalaya, Chittoor 4
The range( ) function can also be applied to provide indices while creating series
Ex: >>> srs_const=pd.Series(33,range(3,15,4))
>>> srs_const
3 33
7 33
11 33
dtype: int64
5. Creating Series with Index of String (Text) Type: A string can also be specified as an index to an element
of series.
Ex: >>> srs_num=pd.Series([9,50,17,–6,0],["Odd","Even","Prime","Negative","Zero"])
>>> srs_num
Odd 9
Even 50
Prime 17
Negative –6
Zero 0
dtype: int64
6. Creating a Series with range( ) and for loop: The data and indices can be generated using range( ) function
and for loop as well.
However, to generate numeric values either for data or indices the range( ) function alone can be used without
using for statement.
Ex: >>> srs1=pd.Series(range(11,20,2),range(1,10,2))
>>> srs1
1 11
3 13
5 15
7 17
9 19
dtype: int64
But, to generate characters as data or index, the range function along with for to be used, as follows
Ex1: >>> srs2=pd.Series([11,22,33,44,55],index=[i for i in 'apple'])
>>> srs2
a 11
p 22
p 33
l 44
e 55
dtype: int64
Ex2: >>> srs3=pd.Series([ch for ch in "Navodaya"],index=[i for i in 'udaigiri'])
>>> srs3
u N
d a
a v
i o
g d
i a
r y
i a
dtype: object
7. Creating a Series using two different lists: A series can be created by providing data as one list and the
indices as the other list
Ex: >>> srs_num=pd.Series(["One","Two","Three","Four","Five"], index=[1,2,3,4,5])
>>> srs_num
1 One
2 Two
3 Three
4 Four
5 Five
dtype: object
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
Jawahar Navodaya Vidyalaya, Chittoor 5
8. Creating a Series by using NaN for missing values: A series having missing numbers can be created. For
this purpose the constant NaN of NumPy library can be used for missing numbers. The NaN of NumPy
library can be accessed using the statement np.NaN, where np is equivalent to import numpy as np
Ex: >>> import numpy as np
>>> srs_sales = pd.Series ([536, 486, np. NaN, 472, 86, np.NaN, 145], index = ["Sun", "Mon",
Tue", "Wed", "Thu", "Fri", "Sat"])
>>> srs_sales
Sun 536.0
Mon 486.0
Tue NaN
Wed 472.0
Thu 86.0
Fri NaN
Sat 145.0
dtype: float64
9. Creating a Series from Dictionary: A series can also be created using a Dictionary. However, a dictionary
is collection of elements, where each element is a combination of Key and Value. As every element of
dictionary is already having a key, the series should not possess a separate key while declaring.
Ex: >>> srs_month1=pd.Series({"Jan":31, "Feb":28, "Mar":31, "Apr":30, "May":31, "June":30})
>>> srs_month1
Jan 31
Feb 28
Mar 31
Apr 30
May 31
June 30
dtype: int64
Ex2: >>> srs_month2=pd.Series({31:"July", 31:"Aug", 30:"Sep", 31:"Oct", 30:"Nov", 31:"Dec"})
>>> srs_month2
31 Dec
30 Nov
dtype: object
10. Creating a Series using Mathematics Expression / Function: The data values or index values for a series
object can also be provided, from a result of expression or function.
Ex1: >>> srs1=pd.Series(data=[11,22,33,44],index=[1,1+1,2+1,1+3])
>>> srs1
1 11
2 22
3 33
4 44
dtype: int64
Ex2: >>> d=np.arange(10,100,20)
>>> i=d//10
>>> s1=pd.Series(d,i)
>>> s1
1 10
3 30
5 50
7 70
9 90
dtype: int32
Ex3: >>> idx=np.arange(10,15)
>>> srs=pd.Series(index=idx,data=idx**2)
>>> srs
10 100
11 121
12 144
13 169
14 196
dtype: int32
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
Jawahar Navodaya Vidyalaya, Chittoor 6
Accessing Data from a Series:
1. A series can be accessed by using the name of the series
Ex: >>> srs_prime=pd.Series([2,3,5,7,11,13,17,19])
>>> srs_prime
0 2
1 3
2 5
3 7
4 11
5 13
6 17
7 19
2. Individual element(s) of a series can be accessed using position / index.
Ex: >>> srs_prime=pd.Series([2,3,5,7,11,13,17,19])
>>> srs_prime[3]
7
>>> srs_prime[[2,4,7]]
2 5
4 11
7 19
3. A sequence of elements of a series can be accessed by applying slicing on the series
Ex: >>> srs_odd=pd.Series([1,3,5,7,9,11,13,15,17,19], index=['a','b','c','d','e','f','g','h','i','j'])
>>> srs_odd[:3]
a 1
b 3
c 5
dtype: int64
>>> srs_odd[2:8]
c 5
d 7
e 9
f 11
g 13
h 15
dtype: int64
>>> srs_odd[4:10:3]
e 9
h 15
dtype: int64
>>> srs_odd[–3:]
h 15
i 17
j 19
dtype: int64
4. Elements of a series can also be accessed by using iloc and loc
iloc: It is used for indexing or slicing based on position, i.e., by row number and column
number. It refers to position–based indexing. The syntax for using iloc is,
Syntax: iloc = [<row number range>, <column number range>]
loc: It is used for indexing or selecting based on name, i.e., by row name and column name. It
refers to name–based indexing. The syntax for using loc is,
Syntax: loc = [<list of row names>, <list of column names>]
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
Jawahar Navodaya Vidyalaya, Chittoor 7
Ex:
>>> weeksrs = pd.Series (index = ["S", "M", "T", "W", "Th", "F", "Sa"],
data = ["Sunday", "Monday", "Tuesday", "Wednesday",
"Thursday", "Friday", "Saturday"])
>>> weeksrs
S Sunday
M Monday
T Tuesday
W Wednesday
Th Thursday
F Friday
Sa Saturday
dtype: object
>>> weeksrs.iloc[2 : 5]
T Tuesday
W Wednesday
Th Thursday
dtype: object
>>> weeksrs.loc["M" : "F"]
M Monday
T Tuesday
W Wednesday
Th Thursday
F Friday
dtype: object
Naming a Series: To name the values and index of a series, the name property can be used. The name assigned
to the index will be displayed above the index and the name assigned to values will be displayed at the bottom of
the series
Ex: >>> srs=pd.Series(["Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat"],index=[1, 2, 3, 4, 5, 6, 7])
>>> srs
1 Sun
2 Mon
3 Tue
4 Wed
5 Thu
6 Fri
7 Sat
dtype: object
>>> srs.name="Day"
>>> srs.index.name="S.No."
>>> srs
S.No.
1 Sun
2 Mon
3 Tue
4 Wed
5 Thu
6 Fri
7 Sat
Name: Day, dtype: object
Series Object Attributes: The various properties of a series can be accessed by using its attributes. The syntax
for accessing an attribute with Series Object is,
<Series_Object> <Attribute_Name>
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
Jawahar Navodaya Vidyalaya, Chittoor 8
Some common attributes related to series object are as follows,
Attribute Description
Series.index Returns index of the series
Series.values Returns ndarray having values of series
Series.dtype Returns data type of the data in series
Series.shape Returns shape of data in form of a tuple
Series.nbytes Returns number of bytes occupied by series data
Series.ndim Returns the number of dimension
Series.size Returns number of elements
Series.hasnans Returns true, if any NaN values are present
Series.empty Returns true, if series object is empty
Ex: >>> sales = pd.Series ([536, 486, np.NaN, 472, 86, np.NaN, 145], index = ["Sun", "Mon","Tue",
"Wed", "Thu", "Fri", "Sat"])
>>> sales.index
Index(['Sun', 'Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat'], dtype='object')
>>> sales.values
array([536., 486., nan, 472., 86., nan, 145.])
>>> sales.dtype
dtype('float64')
>>> sales.shape
(7,)
>>> sales.nbytes
56
>>> sales.ndim
1
>>> sales.size
7
>>> sales.hasnans
True
>>> sales.empty
False
Retrieving Values from a Series using head( ) and tail( ) functions:
The head( ) function, when invoked with a series object, returns the specified number of rows from top.
By default, this function fetches 5 rows
Ex: >>> srs=pd.Series(data=range(1,100,10),index=range(0,10))
>>> srs
0 1
1 11
2 21
3 31
4 41
5 51
6 61
7 71
8 81
9 91
dtype: int64
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
Jawahar Navodaya Vidyalaya, Chittoor 9
>>> srs.head( )
0 1
1 11
2 21
3 31
4 41
dtype: int64
>>> srs.head(3)
0 1
1 11
2 21
dtype: int64
The tail( ) function, when invoked with a series object, returns the specified number of rows from
bottom. By default, this function fetches 5 rows from bottom
Ex: >>> srs=pd.Series(data=range(1,100,10),index=range(0,10))
>>> srs
0 1
1 11
2 21
3 31
4 41
5 51
6 61
7 71
8 81
9 91
dtype: int64
>>> srs.tail( )
5 51
6 61
7 71
8 81
9 91
dtype: int64
>>> srs.tail(7)
3 31
4 41
5 51
6 61
7 71
8 81
9 91
dtype: int64
Mathematical Operations on Series: It is possible to perform mathematical / arithmetic operations, such as
addition (+), subtraction (–), multiplication (*), division (/) etc. on series.
To perform arithmetic operations, the index of the series in operation must be same; otherwise the
operation results into producing NaN values.
>>> srs1 >>> srs2 >>> srs3
1 11 1 21 7 31
2 12 2 22 8 32
3 13 3 23 9 33
4 14 4 24 10 34
dtype: int64 dtype: int64 dtype: int64
Now,
>>> srs1+srs2 >>> srs2–srs1 >>> srs1*srs2 >>> srs2/srs1
1 32 1 10 1 231 1 1.909091
2 34 2 10 2 264 2 1.833333
3 36 3 10 3 299 3 1.769231
4 38 4 10 4 336 4 1.714286
dtype: int64 dtype: int64 dtype: int64 dtype: float64
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
Jawahar Navodaya Vidyalaya, Chittoor 10
But,
>>> srs1+srs3 >>> srs3–srs2 >>> srs3*srs1 >>> srs2/srs3
1 NaN 1 NaN 1 NaN 1 NaN
2 NaN 2 NaN 2 NaN 2 NaN
3 NaN 3 NaN 3 NaN 3 NaN
4 NaN 4 NaN 4 NaN 4 NaN
7 NaN 7 NaN 7 NaN 7 NaN
8 NaN 8 NaN 8 NaN 8 NaN
9 NaN 9 NaN 9 NaN 9 NaN
10 NaN 10 NaN 10 NaN 10 NaN
dtype: float64 dtype: float64 dtype: float64 dtype: float64
Vector Operations on Series: It is possible to perform Vector Operations on series. i.e. Arithmetic operations
such as addition(+), subtraction(–), multiplication(*), division(/) etc. on series can be performed with a scalar
value (constant)
Ex:
>>> srs
1 11
2 12
3 13
4 14
dtype: int64
>>> srs+15 >>> 10–srs >>> srs*0.75
1 26 1 –1 1 8.25
2 27 2 –2 2 9.00
3 28 3 –3 3 9.75
4 29 4 –4 4 10.50
dtype: int64 dtype: int64 dtype: float64
>>> 25/srs >>> srs**3 >>> srs>12.5
1 2.272727 1 1331 1 False
2 2.083333 2 1728 2 False
3 1.923077 3 2197 3 True
4 1.785714 4 2744 4 True
dtype: float64 dtype: int64 dtype: bool
Retrieving Values using Conditions: While displaying the series, condition can be applied using relational
operators, like below
Ex: >>> numsrs=pd.Series([1, 2, 3, 4, 5, 6], [11, 22, 33, 44, 55, 66])
>>> numsrs[numsrs<3]
11 1
22 2
dtype: int64
>>> numsrs[numsrs>=4]
44 4
55 5
66 6
dtype: int64
Deleting Elements from a Series: An element in a series can be deleted by passing the index of the element to be
deleted to the method drop( ). When this function is used, it actually does not change the Series Object, as it is
immutable, but creates another Series Object internally and displays it.
The syntax of using drop( ) method is as follows
Syntax: <Series_Object> drop(Index_of_Element)
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
Jawahar Navodaya Vidyalaya, Chittoor 11
Ex: >>> primesrs=pd.Series([2, 3, 5, 7, 9, 11, 13])
>>> primesrs
0 2
1 3
2 5
3 7
4 9
5 11
6 13
dtype: int64
>>> primesrs.drop(4)
0 2
1 3
2 5
3 7
5 11
6 13
dtype: int64
>>> primesrs
0 2
1 3
2 5
3 7
4 9
5 11
6 13
dtype: int64
Sorting Series Values: The sort_values( ) function can be used to display the sorted Series Object. This function
displays the Series Object in sorted order of data items, but never changes the Series Object.
The syntax of using sort_values( ) method is as follows
Syntax: <Series_Object> sort_values( )
Ex: >>> srs=pd.Series([18,25,13,90,35])
>>> srs
0 18
1 25
2 13
3 90
4 35
dtype: int64
>>> srs.sort_values()
2 13
0 18
1 25
4 35
3 90
dtype: int64
>>> srs
0 18
1 25
2 13
3 90
4 35
dtype: int64