KEMBAR78
Pandas | PDF | Computer Programming | Computing
0% found this document useful (0 votes)
8 views34 pages

Pandas

The document introduces Python Pandas, a popular open-source library for data analysis that provides high-performance tools and data structures. It covers installation, features, and the creation of Series and DataFrame objects, as well as comparisons with NumPy arrays. The document also discusses various operations and attributes associated with Series objects.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views34 pages

Pandas

The document introduces Python Pandas, a popular open-source library for data analysis that provides high-performance tools and data structures. It covers installation, features, and the creation of Series and DataFrame objects, as well as comparisons with NumPy arrays. The document also discusses various operations and attributes associated with Series objects.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 34

Introducing Python Pandas

Introduction
• Pandas or Python Pandas is a library of Python which is
used
for data analysis.
• The term Pandas is derived from “Panel data system” ,
which is an ecometric term for multidimentioal,
structured data set ecometrics.
• Now a days, Pandas has become a popular option for
Data Analysis.
• Pandas provides various tools for data analysis in
simpler
form.
• Pandas is an Open Source, BSD library built for
Python Programming language.
• Pandas offers high performance, easy to use data
structure
Neha Tyagi, KV5 Jaipur II shift
and data analysis tools.
• The main author of Pandas is Wes McKinney.
• In this chapter, we will learn about Pandas.

Neha Tyagi, KV5 Jaipur II shift


Installing Pandas
• “pip” command is used to install Pandas. For this,
open the location of pip storage in command prompt
(cmd). Goto the location in windows where pip file is
stored.look at the following screen-

In Windows, after reaching at the location, on


right click with shift you will get the option
“Open Command Window Here”. On clicking,
you will enter in the command promt at the same
path.

Neha Tyagi, KV5 Jaipur II shift


Installing Pandas
• Command window will look like-

• Run the command- “pip install pandas”

• The following screen comes after and Pandas will be successfully


installed.

Neha Tyagi, KV5 Jaipur II shift


Neha Tyagi, KV5 Jaipur II shift
Using Pandas
• Before proceeding, we need to first import the Pandas.

Help(pandas) command will give you all information


about Pandas module.

Neha Tyagi, KV5 Jaipur II shift


Features of Pandas
• Pandas, is the most popular library in Scientific
Python ecosystem for doing data analysis. Pandas is
capable of many taska including-
1. It can read or write in many different data formats(Integer,
float, double etc).
2. It can calculate in all ways data is organized.
3. It can easily select subsets of data from bulky data sets
ab=nd even combine multiple datasets together.
4. It has functionality to find anfd fill missing data.
5. It allows you to apply operations to independent groups within the
data.
6. It supports reshaping of data into different forms.
7. It supports advanced time-series functionality(which is the use
of a model to predict future values based on previously observed
values).
8. It supports visualization by integrating matplotlib and seaborn etc
Neha Tyagi, KV5 Jaipur II shift
libraries.
Pandas is best at handling huge tabular data sets comprising
different data formats.

Neha Tyagi, KV5 Jaipur II shift


NumPy Arrays
• Before proceeding towards Pandas’ data structure, let us have
a brief review of NumPy arrays because-
1. Pandas’ some functions return result in form of NumPy array.
2. It will give you a jumpstart with data structure.
• NumPy (“Numerical Python” or Numeric Python”) is an open
source module of Python that provides functions for fast
mathematical computation on arrays and matrices.
• To use NumPy, it is needed to import. Syntax for that is-
>>>import numpy as
np
See the difference between List and array
(here np, is an alias for numpy which is optional)

• NumPy arrays come in two forms-


• 1-D array – also known as Vectors.
• Multidimentional
arrays – Also known
as Matrices.
Neha Tyagi, KV5 Jaipur II shift
2D NumPy Arrays
With the help
Accessing of list, 2D
Array elemets array is
with index created.

Printing of

To see type
of Array

To see shape
NumPy arrays arr also knownof
asArray
ndarray(use
(n-dimentional
of array)
different

Neha Tyagi, KV5 Jaipur II shift


NumPy Arrays Vs Python
Lists
• Although NumPy array also holds elements like
Python List , yet Numpy arrays are different data
structures from Python list. The key differences are-
• Once a NumPy array is created, you cannot change its
size. you will have to create a new array or overwrite
the existing one.
• NumPy array contain elements of homogenous type,
unlike python lists.
• An equivalent NumPy array occupies much less space
than a Python list.
• NumPy array supports Vectorized operation, i.e. you
need to perform any function on every item one by
one
lis which is not in
In list, it will generate error but will be executed in arrays.
NumPy Data Types
NumPy supports following data types-

Neha Tyagi, KV5 Jaipur II shift


Ways to Create NumPy
Arrays
• empty() function can be used to create empty
array or an unintialized array of specified shape and
dtype.
numpy.empty(Shape,[dtype=<datatype>,] [ order

Where:dtype: is a data type of python or numpy to set initial values.


Shape: is dimension.
Order : ‘C’ means arrangement of data as row wise(C means C
like). Order : ‘F’ means arrangement of data as row wise ( F
means Fortran like)

Here, array is of all zeros

Here, array is of all garbage


Neha Tyagi, KV5 Jaipur II shift
Ways to Create NumPy
Arrays
1. arange( ) function is used to create array from a range.
<arrayname> = numpy.arange([start],stop,[step],[dtype])
Here, only stop value is passed.

Here, from 1-7 at the step of 2.

2. linspace( ) function can be used to prepare array of rang


<arrayname> = numpy.linspace([start],stop,[dtype])
een

Here, an array of 6 values is created


the valubetw es 2 and 3.
een the values 2.5 and 8.

Neha Tyagi, KV5 Jaipur II shift


Here, an array of 8 values is created
betw

Neha Tyagi, KV5 Jaipur II shift


Pandas Data Structure
“A data structure is a particular way of storing and
organizing data in a computer so that it can be accessed
and worked with in appropriate ways. For ex-
-If you want to store similar type of data items together
and process them in identical way , array is the solution.
- If you want to store data in such a way so that you
get access of the very last data item you inserted, stack
is the solution.
-If you want to store data in such a way so that data item
inserted
first get accessed first, Queue is the solution.
there are many more other types of data structure
suited for different types of functionality.
Further, We will come to know about Series and
DataFrame data structures of Python.
Neha Tyagi, KV5 Jaipur II shift
Series Data Structure
– Series is a data structure of pandas. It represents a
1D array of indexed data.
– It has two main components-
• An array of actual data.
• An associated array of indexes or data labels.
– Both components are 1D arrays with the same length.
Index of
Examples Data Index Data
series type objects. Index Data
0 21 Jan 31 ‘A’ 91
1 23 Feb 28 ‘B’ 81
2 18 Mar 31 ‘C’ 71
3 25 Apr 30 ‘D’ 61

Neha Tyagi, KV5 Jaipur II shift


Creation of Series Objects
– There are many ways to create series type object.
1. Using Series ( )-
<Series Object> = pandas.Series( ) it will create empty series.

2. Non-empty series
creation– Import pandas
as pd
<Series Object> = pd.Series(data, index=idx) where data can
be python sequence, ndarray, python dictionary or scaler value.

Index
Index
Neha Tyagi, KV5 Jaipur II shift
Series Objects creation
1. Creation of series with Dictionary-
Index of Keys

2. Creation of series with Scalar value-

Neha Tyagi, KV5 Jaipur II shift


Creation of Series Objects –Additional functionality
1. When it is needed to create a series with missing
values, this can be achieved by filling missing data
with a NaN (“Not a Number”) value.

2. Index can also be given as-

Loop is used to give Index

Neha Tyagi, KV5 Jaipur II shift


Creation of Series Objects –Additional functionality
3. Dtype can also be passed with Data and index

Important: it is not necessary to have unique indices bu

4. Mathematical function/Expression can also be used-

Neha Tyagi, KV5 Jaipur II shift


Series Object Attributes
3. Some common attributes-
<series object>.<AttributeName>

Attribute Description
Series.index Returns index of the series
Series.values Returns ndarray
Series.dtype Returns dtype object of the underlying data
Series.shape Returns tuple of the shape of underlying data
Series.nbytes Return number of bytes of underlying data
Series.ndim Returns the number of dimention
Series.size Returns number of elements
Series.intemsize Returns the size of the dtype
Series.hasnans Returns true if there are any NaN
Series.empty Returns true if series object is empty

Neha Tyagi, KV5 Jaipur II shift


Series Object Attributes

Neha Tyagi, KV5 Jaipur II shift


Accessing Series Object
Object slicing

Printing object value

Printing Individual value

For Object slicing, follow the following syntax-

<objectName>[<start>:<stop>:<step >]

Neha Tyagi, KV5 Jaipur II shift


Operations on Series
Object
1. Elements modification-
<series object>[index] = <new_data_value>

change
T ividual value
o To change value in a
certain slice
Neha Tyagi, KV5 Jaipur II shift
Operations on Series
Object
1. It is possible to change indexes
<series object>.<index] = <new_index_array>

Here, indexes
got changed.

Neha Tyagi, KV5 Jaipur II shift


head() and tail () Function
1. head(<n> ) function fetch first n rows from a pandas
object. If you do not provide any value for n, will
return first 5 rows.
2. tail(<n> ) function fetch last n rows from a pandas
object. If you do not provide any value for n, will
return last 5 rows.

Neha Tyagi, KV5 Jaipur II shift


Series Objects Series Objects -
- Vector Arithmetic
Operations Operations

All these are


vector operations

Arithmetic operation
is possible on objects
of same index
otherwise will result
as NaN.

We can also store these results in otheNrehoabTjyeagci,tKsV.5 Jaipur II shift


Entries Filtering
<seriesObject> <series - boolean expression >
Other feature

To delete value of index

Neha Tyagi, KV5 Jaipur II shift


Difference between NumPy array Series objects

1. In case of ndarray, vector operation is


possible only when ndarray are of similar
shape. Whereas in case of series object, it will
be aligned only with matching index otherwise
NaN will be returned.

2. In ndarray, index always starts from 0 and


always numeric. Whereas, in series, index can
be of any type including number and not
necessary to start from 0.
Neha Tyagi, KV5 Jaipur II shift
Thank you
Please follow us on our blog

www.pythontrends.wordpress.com

Neha Tyagi, KV 5 Jaipur II Shift

You might also like