KEMBAR78
Python Pandas for Data Science | PDF | Python (Programming Language) | Data
0% found this document useful (0 votes)
1K views22 pages

Python Pandas for Data Science

Pandas is an open source Python library used for data analysis and manipulation. It provides data structures like Series and DataFrames that make working with structured data easy. A Series is a one-dimensional array-like object that stores data and associated array labels. A Series can be created from lists, arrays, constants, and dictionaries. Values in a Series can be accessed using indexing or slicing and various attributes provide information about the Series.

Uploaded by

Adithyan R Nair
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1K views22 pages

Python Pandas for Data Science

Pandas is an open source Python library used for data analysis and manipulation. It provides data structures like Series and DataFrames that make working with structured data easy. A Series is a one-dimensional array-like object that stores data and associated array labels. A Series can be created from lists, arrays, constants, and dictionaries. Values in a Series can be accessed using indexing or slicing and various attributes provide information about the Series.

Uploaded by

Adithyan R Nair
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 22

Python Pandas

Pandas- Panel Data Systems


 Data Science or Data analytics – It is the process of
analyzing a large set of data to derive answers for
questions related to that data.
 Data life cycle
 Data Warehousing – The data collected from
various sources will be in different formats like
csv files, html files or excel files. These data is
converted to a common format and stored.
 Data analysis- The stored data is analyzed by
using techniques like join, merge, searching etc.
 Data visualization – after analysis the data is
plotted in the form of a graph.
Pandas - why and what?
The data analysis can be easily and effectively
performed by using Python. The Python library called
Pandas is one of the most effective tools used in the field
of data sciences. Pandas make data importing and data
analyzing easier.

Pandas is an open source library which provides high


performance and easy to use data structures and
analysis tools.
It is built on packages like NumPy and Matplotlib.

Data structures in Pandas


Data structure is a way of storing and organizing data,
so that the data can be accessed and worked with in an
appropriate way.
A data structure is a way to arrange the data in such a
way that so it can be accessed quickly and we can
perform various operation on this data like- retrieval,
deletion, modification etc.
1. Series
2. Data frames
3. Panels
Python Libraries
Python Libraries are a set of useful functions that
eliminate the need for writing codes from scratch.
There are over 1, 37,000 python libraries present today.
Python libraries play a vital role in developing machine
learning, data science, data visualization, image and
data manipulation applications and more.
Eg. NumPy, Pandas, Matplotlib, Scipy etc.
Modules:
Modules refer to a file containing Python statements,
functions, classes and variables related to a particular
task.

A file containing Python code, for example:


exampleprog.py,
is called a module, and its module name would
be exampleprog.
We use modules to break down large programs into
small manageable and organized files.

Package: A package is a collection of Python modules,


i.e., a package is a directory which consists of Python
modules.

Importing Python Modules


import statement can be used to import a module into
our program and reuse the functions and statements in
it.
Eg: import pandas
Modules can also be referred to by their alias. An alias
can be created for a module by using the keyword as .
Eg. import pandas as pd
Here pd is the alias for pandas and can be used instead
of the module name in the current program.
Series
A Series is a one-dimensional array containing a
sequence of values of any data type (int, float, list,
string, etc)
Data in the series is mutable (it can be changed).
A series consists of a sequence of values and associated
data labels called index.
A series when printed displays two columns, first one is
the index and second one is the data.
If the index for a series is not specified then Pandas
creates a default index and automatically assigns the
index values from 0 to length-1.
By default the index for a series is 0, 1, 2,…length-1
Creating a series
To create a series the Python module (package) Pandas
should be imported using the command.
import pandas as pd
Create an empty series –
By using the function Series ()
Syntax :
<series_name> = pandas.Series ()
S1 = pandas.Series ()

1.Creating a series using a list


import pandas
Eg. S1 = pandas. Series ([10, 20, 30, 40])
S2 = pandas. Series ([100, 500, 1000],
index=[‘KL’,”TN”,”MR”])
If the series is created without an index the default
index will be printed in the first column.
Index Data
0 10
1 20
2 30
3 40
4 50

If an index is specified in a series, on printing the


specified index will be printed as the first column.
Index Data
Kerala 100
Tamilnadu 500
Maharashtr 1000
a

2.Creating a series using a range ()


import pandas as pd
s1=pd.Series(range(5))
print (s1)
0 0
1 1
2 2
3 3
4 4
If one of the elements in the series is a float value, it will
convert the rest of the elements also to float values.
Eg. import pandas as pd
s1= pd.Series ([12, 3.4, 7, 2])
print (s1)
0 12.0
1 3.4
2 7.0
3 2.0
3. Creating a series from a constant or scalar value
(i)import pandas as xy
series2=pd.Series(55)
print(series2)
output
0 55
(ii) import pandas as pd
series2=pd.Series(55,index=['a','b','c'])
print (series2)
output
a 55
b 55
c 55
(iii) import pandas as xy
x=xy.Series('welcome to
Bhavans',index=['Ann','Dain','Meenu'])
print(x)
output
Ann welcome to Bhavans
Dain welcome to Bhavans
Meenu welcome to Bhavans

4. Creating a series from a dictionary


Python dictionary has key: value pairs and a value can
be quickly retrieved when its key is known.
Dictionary keys can be used to construct an index for a
Series.
When a dictionary is used to create a series, the key
values of the dictionary will be used as the index and
value will be used as the data in the series.
Eg. import pandas as dd
dict={1:'a',2:'b'}
ds=dd.Series(dict)
print(ds)
output  1 a
2 b
Q1: Create a series that stores the name (as index) and
area (as value) of some states (using dictionaries).
Q2: Create a series that accept the name (as index) and
marks (as value) of 5 students (using lists) and display
the details of students.

Q3: Write a program to create a series that accept the


salary (as value) obtained by 3 employees (name as
index) for a month (using lists).
NumPy (Numerical Python)

The NumPy library is a popular Python Library used for


scientific computing applications.
NumPy provides functions for fast mathematical and
logical operations on arrays and matrices.
Array is a collection of elements of same datatype.
The elements of an array are enclosed in square brackets
and separated by a space.
The elements of an array are stored in contiguous
memory locations.
Each element in an array is referred to by its index
number. The index number of an array starts with zero.
Eg [1 2 3 4 5]
Arrays can be single dimensional or multi-dimensional
(ndarrays)
To create an array the module numpy should be
imported into our program.
NumPy provides a set of functions/methods using
which we can create arrays.
1. array()- This function helps to create a one
dimensional array from a list.
import numpy as np
lst1=[10,20,30,40]
arr=np.array(lst1)
print(arr)
2. arange()- This function helps to create a one
dimensional array containing values within the
given range.
import numpy as np
arr=np.arange(1,5,1)
print(arr)
output
[1 2 3 4]

5. Creating a series from an ndarray


import pandas as pd
import numpy as np
a1=np.arange(1,10,2)
s1=pd.Series(a1)
print(s1)
output
0 1
1 3
2 5
3 7
4 9

Accessing data from a series


A) Indexing

Indexing is used to access elements in a Series


Indexes are of two types: positional index and
labelled index.

Positional index takes an integer value that


corresponds to its position in the series starting
from 0, whereas labelled index takes any user-
defined label as index.
Eg.

1. import pandas as pd
s1=pd.Series ([1, 2, 3, 4, 7, 2])
print (s1 [2])
Output
3
2. s1=pd.Series ([11, 12,13,14, 17] ,
index=[‘a’,’b’,’c’,’d’,’e’])
print (s1 [‘a’])
It will print the value corresponding to the labelled
index ‘a’.
o/p
11

More than one element of a series can be accessed using


a list of positional integers or a list of index labels as
shown in the following examples:
(i) print (s1 [[1,2]])  12,13
(ii) print (s1 [[‘a’,’d’]]) 11,17
(B) Slicing

Sometimes, we may need to extract a part of a series.


This can be done through slicing.
We can define which part of the series is to be sliced by
specifying the start and end parameters [start: end] with
the series name.
When we use positional indices for slicing, the value at
the end index position is excluded.
Example:
print(p2[1:3])
output
b 22
c 33
If labelled indexes are used for slicing, then value at the
end index label is also included in the output, for
example:
Example:
print(p2['a':'c'])
output
a 11
b 22
c 33

Write the output

import pandas as pa
list1= ["Ann","John","Denson","Lalu","Rahul"]
list2= [11, 22, 33, 44, 55]
s1=pa.Series (list1, index=list2)
print (s1 [11]) 
print (s1 [[22, 44]]) 
print (s1 [0:3]) 
print (s1 [1:3]) 
print(s1[-3:-1]) 
print(s1[[22,33]]) 

Attributes of Pandas Series

The Series attribute is defined as any information


related to the Series object such as size, index. Etc. Below
are some of the attributes that you can use to get the
information about the Series object:
1. values
Prints a list of the values in the series.
s1=pd.Series ([1,2,3,4,5])
print(s1.values)
2. Size
Prints the number of values in the Series object.
print(s1.size)
3. Empty
Prints True if the series is empty, and False
otherwise.
print(s1.empty)

Retrieving values from a series using functions


1.Series.head() –
Returns the first n members of the series.
If the value for n is not passed, then by default n takes 5
and the first five members are displayed.
Syntax:
<series_name>.head ([argument])
Eg. import pandas as pd
list1 =
["Ann","Johan","Don","Lalu","Rahul","Mohan"]
list2 = [11, 22, 33, 44, 55, 66]
s1=pd.Series (list1, index=list2)
print(s1.head())
o/p ->11 Ann
22 Johan
33 Don
44 Lalu
55 Rahul
head() with an argument will return the specified
number of rows from the beginning.
Eg. import pandas as a1
se=a1.Series(["Anu","binu","cinu","ddd","eee"],index=[11
,22,33,44,55])
print(se.head(2))
o/p-> 11 Anu
22 binu
2. Series.tail() – this function fetches last ‘n’ elements
from a series. If the argument is not passed last five
elements will be retrieved by default.
Syntax:
<series_name>.tail([argument])
Eg. import pandas as a1
se=a1.Series(["Anu","binu","cinu","ram","rose"],inde
x=[11,22,33,44,55])
print(se.tail())
o/p 11 Anu
22 binu
33 cinu
44 ram
55 rose

print(se.tail(3))
o/p
33 cinu
44 ram
55 rose
Mathematical Operations on Series
We can perform mathematical operations on two series
in Pandas.
While performing mathematical operations on series,
index matching is implemented.
A) Addition of two Series
Method 1
Eg. import pandas as pd
se1=pd.Series ([10, 20, 30, 40])
se2=pd.Series ([1,2,3,4])
print(se1+se2)
o/p0 11
1 22
2 33
3 44
To perform mathematical operations on 2 series, both
the series should have the same number of elements
and same index otherwise it will result in NaN(Not a
Number).

Eg. import pandas as pd


se1=pd.Series ([10, 20, 30, 40], index= ['a','b','c','d'])
se2=pd.Series ([1, 2, 3, 4])
print (se1+se2)
o/pa NaN
b NaN
c NaN
d NaN
0 NaN
1 NaN
2 NaN
3 NaN

Method 2

This method is applied when we do not want to have


NaN values in the output.
We can use the series method add() and a parameter
fill_value to replace missing value with a specified
value.
Example
se1=pd.Series ([10, 20, 30, 40], index= ['a','b','c','d'])
se2=pd.Series ([1, 2, 3, 4])
se1.add(se2, fill_value=0)
(B) Subtraction of two Series
Method 1
import pandas as pd
se1=pd.Series ([10, 20, 30, 40], index= ['a','b','c','d'])
se2=pd.Series ([1, 2, 3, 4])
print (se1- se2)
o/p 
a NaN
b NaN
c NaN
d NaN
0 NaN
1 NaN
2 NaN
3 NaN
Method 2

se1=pd.Series ([10, 20, 30, 40], index= ['a','b','c','d'])


se2=pd.Series ([1, 2, 3, 4])
se1.sub(se2, fill_value=0)
(C) Multiplication of two Series
Method 1

import pandas as pd
se1=pd.Series ([10, 20, 30, 40], index= ['a','b','c','d'])
se2=pd.Series ([1, 2, 3, 4])
print (se1 * se2)
o/p 
a NaN
b NaN
c NaN
d NaN
0 NaN
1 NaN
2 NaN
3 NaN

Method 2
se1=pd.Series ([10, 20, 30, 40], index= ['a','b','c','d'])
se2=pd.Series ([1, 2, 3, 4])
se1.mul(se2, fill_value=0)
(D) Division of two Series

Method 1
import pandas as pd
se1=pd.Series ([10, 20, 30, 40], index= ['a','b','c','d'])
se2=pd.Series ([1, 2, 3, 4])
print (se1 / se2)
o/p 
a NaN
b NaN
c NaN
d NaN
0 NaN
1 NaN
2 NaN
3 NaN

Method 2
se1=pd.Series ([10, 20, 30, 40], index= ['a','b','c','d'])
se2=pd.Series ([1, 2, 3, 4])
se1.div(se2, fill_value=0)

Assignments
1. Create a series that stores the strength of 3 divisions
of XII std. as data and label each data with class
name. Print the series.

XII A 36
XII B 30
XII C 35
2. Create a series that stores the names of five of your
friends as data and their roll numbers as data labels
or indexes. Print the series.
3. Create a series that stores the names of class
teachers of std xii as data and their short forms as
labels. Print the name of the class teacher of xii b.
Xii a – Bindu - BV
Xii b – Abhilash - AGN
Xii c – Manju – MB
4. Create a series from a dictionary that stores the
basic colours as value and their code as key. (red
–‘R’,blue – ‘B’, green- ‘G’). Print the series.
5. Write the output of the following
import pandas as pd
se1=pd.Series([10,20,30,40])
print(se1*2)
print(se1.head(2))
print(se1.tail())
print(se1[3])
se1=se1*3
print(se1)
se1[2]=200
print(se1)

sort_values() - this function helps to sort the values of a


data structure in ascending or descending order.
Syntax:
<series_name>.sort_values([ascending=False])
By default the sorting is done in ascending order.
Example 1

import pandas as pd
S1= pd.Series([23,65,78,89,11,21])
S1.sort_values()
O/p 
4 11
5 21
0 23
1 65
2 78
3 89
To Sort the values in descending order use the following
syntax:
Seriesname.sort_values(ascending=False)
Example
S1.sort_values(ascending=False)
O/P
3 89
2 78
1 65
0 23
5 21
4 11

Retrieving values from a series using conditions


Values can be retrieved from a series based on
conditions.
Eg. import pandas as ps
S1=ps.Series([10,20,30,40])
print(S1<30) #it displays True/False depending
upon the given condition.
o/p -> 0 True
1 True
2 False
3 False

print (s1[s1<30])#it displays only those values that


return True for the given condition
o/p-> 0 10
1 20

Deleting elements from a Series


We can delete an element from a Series using drop ()
method by passing the index of the element to be
deleted.
Eg: se1=pd.Series([10,20,30,40])
se1.drop(1)

Assignment

1.Write a program to accept n elements into a series


using a list and print the first three greatest values.
2. To accept a series that stores the area of some
states .Write code to find out the biggest and the
smallest three areas from the given series.

You might also like