Introducing Python Pandas
Introduction
• Pandas or Python Pandas is a library of Python which is
used
for data analysis.
• The term Pandas is derived from “Panel data system” ,
which is an ecometric term for multidimentioal,
structured data set ecometrics.
• Now a days, Pandas has become a popular option for
Data Analysis.
• Pandas provides various tools for data analysis in
simpler
form.
• Pandas is an Open Source, BSD library built for
Python Programming language.
• Pandas offers high performance, easy to use data
structure
Neha Tyagi, KV5 Jaipur II shift
and data analysis tools.
• The main author of Pandas is Wes McKinney.
• In this chapter, we will learn about Pandas.
Neha Tyagi, KV5 Jaipur II shift
Installing Pandas
• “pip” command is used to install Pandas. For this,
open the location of pip storage in command prompt
(cmd). Goto the location in windows where pip file is
stored.look at the following screen-
In Windows, after reaching at the location, on
right click with shift you will get the option
“Open Command Window Here”. On clicking,
you will enter in the command promt at the same
path.
Neha Tyagi, KV5 Jaipur II shift
Installing Pandas
• Command window will look like-
•
• Run the command- “pip install pandas”
• The following screen comes after and Pandas will be successfully
installed.
Neha Tyagi, KV5 Jaipur II shift
Neha Tyagi, KV5 Jaipur II shift
Using Pandas
• Before proceeding, we need to first import the Pandas.
Help(pandas) command will give you all information
about Pandas module.
Neha Tyagi, KV5 Jaipur II shift
Features of Pandas
• Pandas, is the most popular library in Scientific
Python ecosystem for doing data analysis. Pandas is
capable of many taska including-
1. It can read or write in many different data formats(Integer,
float, double etc).
2. It can calculate in all ways data is organized.
3. It can easily select subsets of data from bulky data sets
ab=nd even combine multiple datasets together.
4. It has functionality to find anfd fill missing data.
5. It allows you to apply operations to independent groups within the
data.
6. It supports reshaping of data into different forms.
7. It supports advanced time-series functionality(which is the use
of a model to predict future values based on previously observed
values).
8. It supports visualization by integrating matplotlib and seaborn etc
Neha Tyagi, KV5 Jaipur II shift
libraries.
Pandas is best at handling huge tabular data sets comprising
different data formats.
Neha Tyagi, KV5 Jaipur II shift
NumPy Arrays
• Before proceeding towards Pandas’ data structure, let us have
a brief review of NumPy arrays because-
1. Pandas’ some functions return result in form of NumPy array.
2. It will give you a jumpstart with data structure.
• NumPy (“Numerical Python” or Numeric Python”) is an open
source module of Python that provides functions for fast
mathematical computation on arrays and matrices.
• To use NumPy, it is needed to import. Syntax for that is-
>>>import numpy as
np
See the difference between List and array
(here np, is an alias for numpy which is optional)
• NumPy arrays come in two forms-
• 1-D array – also known as Vectors.
• Multidimentional
arrays – Also known
as Matrices.
Neha Tyagi, KV5 Jaipur II shift
2D NumPy Arrays
With the help
Accessing of list, 2D
Array elemets array is
with index created.
Printing of
To see type
of Array
To see shape
NumPy arrays arr also knownof
asArray
ndarray(use
(n-dimentional
of array)
different
Neha Tyagi, KV5 Jaipur II shift
NumPy Arrays Vs Python
Lists
• Although NumPy array also holds elements like
Python List , yet Numpy arrays are different data
structures from Python list. The key differences are-
• Once a NumPy array is created, you cannot change its
size. you will have to create a new array or overwrite
the existing one.
• NumPy array contain elements of homogenous type,
unlike python lists.
• An equivalent NumPy array occupies much less space
than a Python list.
• NumPy array supports Vectorized operation, i.e. you
need to perform any function on every item one by
one
lis which is not in
In list, it will generate error but will be executed in arrays.
NumPy Data Types
NumPy supports following data types-
Neha Tyagi, KV5 Jaipur II shift
Ways to Create NumPy
Arrays
• empty() function can be used to create empty
array or an unintialized array of specified shape and
dtype.
numpy.empty(Shape,[dtype=<datatype>,] [ order
Where:dtype: is a data type of python or numpy to set initial values.
Shape: is dimension.
Order : ‘C’ means arrangement of data as row wise(C means C
like). Order : ‘F’ means arrangement of data as row wise ( F
means Fortran like)
Here, array is of all zeros
Here, array is of all garbage
Neha Tyagi, KV5 Jaipur II shift
Ways to Create NumPy
Arrays
1. arange( ) function is used to create array from a range.
<arrayname> = numpy.arange([start],stop,[step],[dtype])
Here, only stop value is passed.
Here, from 1-7 at the step of 2.
2. linspace( ) function can be used to prepare array of rang
<arrayname> = numpy.linspace([start],stop,[dtype])
een
Here, an array of 6 values is created
the valubetw es 2 and 3.
een the values 2.5 and 8.
Neha Tyagi, KV5 Jaipur II shift
Here, an array of 8 values is created
betw
Neha Tyagi, KV5 Jaipur II shift
Pandas Data Structure
“A data structure is a particular way of storing and
organizing data in a computer so that it can be accessed
and worked with in appropriate ways. For ex-
-If you want to store similar type of data items together
and process them in identical way , array is the solution.
- If you want to store data in such a way so that you
get access of the very last data item you inserted, stack
is the solution.
-If you want to store data in such a way so that data item
inserted
first get accessed first, Queue is the solution.
there are many more other types of data structure
suited for different types of functionality.
Further, We will come to know about Series and
DataFrame data structures of Python.
Neha Tyagi, KV5 Jaipur II shift
Series Data Structure
– Series is a data structure of pandas. It represents a
1D array of indexed data.
– It has two main components-
• An array of actual data.
• An associated array of indexes or data labels.
– Both components are 1D arrays with the same length.
Index of
Examples Data Index Data
series type objects. Index Data
0 21 Jan 31 ‘A’ 91
1 23 Feb 28 ‘B’ 81
2 18 Mar 31 ‘C’ 71
3 25 Apr 30 ‘D’ 61
Neha Tyagi, KV5 Jaipur II shift
Creation of Series Objects
– There are many ways to create series type object.
1. Using Series ( )-
<Series Object> = pandas.Series( ) it will create empty series.
2. Non-empty series
creation– Import pandas
as pd
<Series Object> = pd.Series(data, index=idx) where data can
be python sequence, ndarray, python dictionary or scaler value.
Index
Index
Neha Tyagi, KV5 Jaipur II shift
Series Objects creation
1. Creation of series with Dictionary-
Index of Keys
2. Creation of series with Scalar value-
Neha Tyagi, KV5 Jaipur II shift
Creation of Series Objects –Additional functionality
1. When it is needed to create a series with missing
values, this can be achieved by filling missing data
with a NaN (“Not a Number”) value.
2. Index can also be given as-
Loop is used to give Index
Neha Tyagi, KV5 Jaipur II shift
Creation of Series Objects –Additional functionality
3. Dtype can also be passed with Data and index
Important: it is not necessary to have unique indices bu
4. Mathematical function/Expression can also be used-
Neha Tyagi, KV5 Jaipur II shift
Series Object Attributes
3. Some common attributes-
<series object>.<AttributeName>
Attribute Description
Series.index Returns index of the series
Series.values Returns ndarray
Series.dtype Returns dtype object of the underlying data
Series.shape Returns tuple of the shape of underlying data
Series.nbytes Return number of bytes of underlying data
Series.ndim Returns the number of dimention
Series.size Returns number of elements
Series.intemsize Returns the size of the dtype
Series.hasnans Returns true if there are any NaN
Series.empty Returns true if series object is empty
Neha Tyagi, KV5 Jaipur II shift
Series Object Attributes
Neha Tyagi, KV5 Jaipur II shift
Accessing Series Object
Object slicing
Printing object value
Printing Individual value
For Object slicing, follow the following syntax-
<objectName>[<start>:<stop>:<step >]
Neha Tyagi, KV5 Jaipur II shift
Operations on Series
Object
1. Elements modification-
<series object>[index] = <new_data_value>
change
T ividual value
o To change value in a
certain slice
Neha Tyagi, KV5 Jaipur II shift
Operations on Series
Object
1. It is possible to change indexes
<series object>.<index] = <new_index_array>
Here, indexes
got changed.
Neha Tyagi, KV5 Jaipur II shift
head() and tail () Function
1. head(<n> ) function fetch first n rows from a pandas
object. If you do not provide any value for n, will
return first 5 rows.
2. tail(<n> ) function fetch last n rows from a pandas
object. If you do not provide any value for n, will
return last 5 rows.
Neha Tyagi, KV5 Jaipur II shift
Series Objects Series Objects -
- Vector Arithmetic
Operations Operations
All these are
vector operations
Arithmetic operation
is possible on objects
of same index
otherwise will result
as NaN.
We can also store these results in otheNrehoabTjyeagci,tKsV.5 Jaipur II shift
Entries Filtering
<seriesObject> <series - boolean expression >
Other feature
To delete value of index
Neha Tyagi, KV5 Jaipur II shift
Difference between NumPy array Series objects
1. In case of ndarray, vector operation is
possible only when ndarray are of similar
shape. Whereas in case of series object, it will
be aligned only with matching index otherwise
NaN will be returned.
2. In ndarray, index always starts from 0 and
always numeric. Whereas, in series, index can
be of any type including number and not
necessary to start from 0.
Neha Tyagi, KV5 Jaipur II shift
Thank you
Please follow us on our blog
www.pythontrends.wordpress.com
Neha Tyagi, KV 5 Jaipur II Shift