Program 2: Introduction to usage of python 3 packages – matplotlib, numpy, pandas, seaborn,
ggplot,ggplot2, plotly
(i) numpy :
NumPy stands for Numerical Python. NumPy is a Python library used for working with arrays. It
also has functions for working in domain of linear algebra, fourier transform, and matrices.
Arrays are the collection of elements/values that can have one (or) more dimensions. An array of one
dimension is called vector and array of two dimensions is called a matrix. Numpy arrays are called ndarray
(or) n-dimensional array.
Arrays in NumPy: NumPy‟s main object is the homogeneous multidimensional array.
It is a table of elements (usually numbers), all of the same type, indexed by a tuple of
positive integers.
In NumPy dimensions are called axes. The number of axes is rank.
NumPy‟s array class is called ndarray. It is also known by the alias array.
Creating of numpy array:
We can create a NumPy ndarray object by using the “array()” function. To create an ndarray, we can
pass a list, tuple or any array-like object into the array() method and it will be converted into an ndarray.
# Python program to demonstrate basic array characteristics
import numpy as np
# Creating array object
arr = np.array( [[ 1, 2, 3],[ 4, 2, 5]] )
# Printing type of arr object
print("Array is of type: ", type(arr))
# Printing array dimensions (axes)
print("No. of dimensions: ", arr.ndim)
# Printing shape of array
print("Shape of array: ", arr.shape)
# Printing size (total number of elements) of array
print("Size of array: ", arr.size)
# Printing type of elements in array
print("Array stores elements of type: ", arr.dtype)
Output :
Array is of type:
No. of dimensions: 2
Shape of array: (2, 3)
Size of array: 6
Array stores elements of type: int64
1. Creation of 0-D array:
import numpy as np
arr = np.array(42)
print(arr)
2. Creation of 1-D array:
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
print(arr)
3. Creation of 2-D array: An array that has 1-D arrays as its elements is called a 2-D array.
import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr)
4. Creation of 3-D Arrays: An array that has 2-D arrays (matrices) as its elements is called 3-D
array.
import numpy as np
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
print(arr)
Checking of No of dimensions:
NumPy Arrays provides the ndim attribute that returns an integer that tells us how many
dimensions the array have.
import numpy as np
a = np.array(42)
b = np.array([1, 2, 3, 4, 5])
c = np.array([[1, 2, 3], [4, 5, 6]])
d = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
print(a.ndim)
print(b.ndim)
print(c.ndim)
print(d.ndim)
Array of zeros:
This routine is used to create the numpy array with the specified shape where each numpy array
itemis initialized to 0.
Syntax: numpy. zeros (shape)
import numpy as np
b=np.zeros(5)
print(b)
Output: [0 0 0 0 0]
Array of ones:
It is used to create the numpy array with the specified shape where each numpy array item is
initialized to 1.
Syntax: np.ones(shape)
import numpy as np
d=np.ones(6,dtype(int))
print(d)
Output:
[1 1 1 1 1 1]
Array of random values:
The random is a module present in the NumPy library. This module contains the functions which are
used for generating random numbers. This module contains some simple random data generation
methods, some permutation and distribution functions, and random generator functions.
Syntax: np.random.rand(shape)
import numpy as np
e=np.random.rand(6)
print(e)
Output:
[0.02905376 0.59423152 0.25030791 0.60751057 0.52254074 0.80428618]
Array of your choice:
To print the array elements of your choice we use full().
import numpy as np
f=np.full((3,3),7)
print(f)
Output:
[[7 7 7]
[7 7 7]
[7 7 7]]
Identity matrix with numpy:
For printing identity matrix we use eye().
Syntax :np.eye(size)
import numpy as np
g=np.eye(4)
print(g)
Output:
[1 0 0 0]
[0 1 0 0]
[0 0 1 0]
[0 0 0 1]
Shape of the given ndarray:
The shape of an array can be defined as the number of elements in each dimension. Dimension is
the number of indices or subscripts that we require in order to specify an individual element of an
array.
Syntax: numpy.shape(array_name)
import numpy as np
a=np.array([[1,2,3],[4,5,6]])
print(a.shape)
Output:
(2, 3)
(ii) pandas
Introduction for Pandas:
“Pandas” is developed by WES MCKINNEY in 2008 and is used for data analysis. As for
data analysis requires lots of processing like restructuring, cleaning, etc. So we use pandas.
In python, pandas are defined as an open source library which is used for high performance
data manipulation and high level data structure.
It contains high level data structures and manipulation tools, designed for fast and easy data
analysis in python.
Pandas were built on top of numpy and make it easy and more effective use for numpycentric
application.
Introduction to Pandas Data Structure:
To get started with pandas, we need to be comfortable with two data structure.
1. Series
2. Data Frames
Series:
It is a one dimensional array, like object containing an “array of data” and it‟s associated with “arrayof
data labels”. The data labels are also called index. While using the series to create the series weshould
give initial “s” as upper case. The given example is best way to create series.
Ex: import pandas as pd
s=pd.Series([-2,5,7,9,23])
print(s)
Output:
0 -2
1 5
2 7
3 9
4 23
We can also return the index and values of the series.
print(s.index)
Output:
RangeIndex(start=0, stop=5, step=1)
print (s.values)
Output:
[-2, 5, 7, 9, 23]
Example: Often it will be desirable to create a Series with an index identifying each data point:
import pandas as pd
s=pd.Series ([-2, 5, 7, 9, 23], index= ['a','b','c','d','e'])
print(s)
Output:
a -2
b 5
c 7
d 9
e 23
Now we can the value of any index value:
print(s['d'])
print(s[[„a‟,‟b‟,‟c‟]])
Output:
9
a -2
b 5
c 7
Operator on pandas:
We can also perform all the arithmetic and comparison operation in this array. We can perform
operation like scalar multiplication, or applying math functions.
import pandas as pd
k=pd.Series([-7,6,5,-21,5,65])
print(k)
Output:
0 -7
1 6
2 5
3 -21
4 5
5 65
Example: print(k*k)
Output:
0 49
1 36
2 25
3 441
4 25
5 4225
Example: print(k*2)
Output:
0 -14
1 12
2 10
3 -42
4 10
5 130
Working with directories:
Pandas support the directories directly without converting (or) rewriting in series.
Example:
import pandas as pd
data= {'Ohio': 35000, 'Texas': 71000, 'Oregon': 16000, 'Utah': 5000}
Obj=pd.Series(data)
print(Obj)
Output:
Ohio 35000
Texas 71000
Oregon 16000
Utah 5000
We can give our own indexing to the series. In pandas can check if the given values are null or not null
Ex:
import pandas as pd
data={'Ohio': 35000, 'Texas': 71000, 'Oregon': 16000, 'Utah': 5000}
place=['a','Ohio','Texas','Oregon','Utah','b']
obj=pd.Series(data,index=place)
print(obj)
Output:
a NaN
Ohio 35000.0
Texas 71000.0
Oregon 16000.0
Utah 5000.0
b NaN
a=pd.isnull(obj)
print(a)
Output:
a True
Ohio False
Texas False
Oregon False
Utah False
b True
b=pd.notnull(obj)
print(b)
Output:
a False
Ohio True
Texas True
Oregon True
Utah True
b False
Data Frames:
A Data Frame represents a tabular, spreadsheet-like data structure containing the collection of
columns, each of which can be a different value type (numeric, string, Boolean, etc.). The Data
Frame has both a row and column index; it can be thought of as a dictionary of Series.
The data frame has both rows and column index. The data is stored as two or more dimensional
blocks rather than list, directories, ndarrays or some other collections of one dimensional arrays.
Data Frames stores data internally in two dimensional format and we can easily represent much
high dimensional data in tabular format with hierarchical indexing.
import pandas as pd
data={„Year‟:[2021,2020,2019,2018],‟Month‟:[„Jan‟,‟Feb‟,‟Mar‟,‟Apr‟]}
df=pd.DataFrame(data)
print(df)
Output
Year Month
0 2021 Jan
1 2020 Feb
2 2019 Mar
3 2018 Apr
import pandas as pd
k=pd.DataFrame({'R.no':[4401,4402,4403,4404],'Sname':['ABC','DEF','GHI','JKL'],'Rank':[7,
6,3,9]})
print(k)
Output:
R.no Sname Rank
0 4401 ABC 7
1 4402 DEF 6
2 4403 GHI 3
3 4404 JKL 9
import pandas as pd
k=pd.DataFrame({'R.no':[4401,4402,4403,4404],'Sname':['ABC','DEF','GHI','JKL'],'Rank':[7,6,3,9]})
k['Rank']=2
print(k)
Output:
R.no Sname Rank
0 4401 ABC 2
1 4402 DEF 2
2 4403 GHI 2
3 4404 JKL 2
If we want give a range of values to all the record of a particular column to the above set.
import numpy as np
import pandas as pd
k=pd.DataFrame({'R.no':[4401,4402,4403,4404],'Sname':['ABC','DEF','GHI','JKL'],'Rank':[7,
6,3,9]}
k['Rank']=np.arange(4)
print(k)
Output:
R.no Sname Rank
0 4401 ABC 0
1 4402 DEF 1
2 4403 GHI 2
3 4404 JKL 3
If we want to alter all the values of particular column with a unique value to the followingset.
import numpy as np
import pandas as pd
k=pd.DataFrame({'R.no':[4401,4402,4403,4404],'Sname':['ABC','DEF','GHI','JKL'],'Rank':[7,6,3,9]})
val=pd.Series([11,22],index=[0,3])
k['Rank']=val
print(k)
Output:
R.no Sname Rank
0 4401 ABC 11.0
1 4402 DEF NaN
2 4403 GHI NaN
3 4404 JKL 22.0
Working with altering index:
We can alter the index as whatever we want.
import pandas as pd
a=pd.Series([1,2,3],index=['a','b','c'])
print(a)
Output:
a 1
b 2
c 3
import pandas as pd
a=pd.Series([1,2,3],index=['a','b','c'])
a.index=['x','y','z']
print(a)
Output:
x 1
y 2
z 3
In this concept while working with altering index we cannot change the vales in the series.
a.index['x']=5
Output:
Index does not support mutable operations.
(iii) matplotlib & pyplot
Matploplib is a low-level library of Python which is used for data visualization. This library is built on
the top of NumPy arrays and consists of several plots like line chart, bar chart, histogram, etc.
Matplotlib is originally written by Dr. John D Hunter.
Example: 1
Draw a line in a diagram from position (1, 3) to position (8, 10):
import matplotlib.pyplot as plt
import numpy as np
xpoints = np.array([1, 8])
ypoints = np.array([3, 10])
plt.plot(xpoints, ypoints)
plt.show()
OUTPUT:
Plotting Without Line
To plot only the markers, you can use shortcut string notation parameter 'o', which means 'rings'.
Example: 2
Draw two points in the diagram, one at position (1, 3) and one in position (8, 10):
import matplotlib.pyplot as plt
import numpy as np
xpoints = np.array([1, 8])
ypoints = np.array([3, 10])
plt.plot(xpoints, ypoints, 'o')
plt.show()
OUTPUT:
To plot symbols rather than lines, provide an additional string argument.
symbols - , –, -., , . , , , o , ^ , v , < , > , s , + , x , D , d , 1 , 2 , 3 , 4 , h , H , p , | , _
Colors b, g, r, c, m, y, k, w
Example: 3
Draw a line in a diagram from position (1, 3) to (2, 8) then to (6, 1) and finally to position (8, 10)
import matplotlib.pyplot as plt
import numpy as np
xpoints = np.array([1, 2, 6, 8])
ypoints = np.array([3, 8, 1, 10])
plt.plot(xpoints, ypoints)
plt.show()
OUTPUT:
Example: 4
Mark each point with a circle:
import matplotlib.pyplot as plt
import numpy as np
ypoints = np.array([3, 8, 1, 10])
plt.plot(ypoints, marker = 'o')
plt.show()
OUTPUT:
Example 5: Mark the points with red color diamond symbol
import matplotlib.pyplot as plt
import numpy as np
xpoints = np.array([1, 2, 6, 8])
ypoints = np.array([3, 8, 1, 10])
plt.plot(xpoints, ypoints,'rD')
plt.show()
Output:
(iv) ggplot, ggplot2
we can use ggplot in Python to create data visualizations using a grammar of graphics. A
grammar of graphics is a high-level tool that allows us to create data plots in an efficient and
consistent way. It abstracts most low-level details, letting us focus on creating meaningful and
beautiful visualizations for our data.
There are several Python packages that provide a grammar of graphics. Here we focus
on plotnine since it‟s one of the most mature ones. plotnine is based on ggplot2 from the R
programming language, so if we have a background in R, then we can consider plotnine as the
equivalent of ggplot2 in Python.
The Grammar of Graphics
Statistical graphics is a mapping from data to aesthetic attributes (colour, shape, size) of geometric
objects (points, lines, bars). Faceting can be used to generate the same plot for different subsets of the
dataset
These are basic building blocks according to the grammar of graphics:
data: The data + a set of aesthetic mappings that describing variables mapping
geom: Geometric objects, represent what you actually see on the plot: points, lines, polygons, etc.
stats: Statistical transformations, summarizes data in many useful ways.
scale: The scales map values in the data space to values in an aesthetic space
coord: A coordinate system, describes how data coordinates are mapped to the plane of the graphic.
facet: A faceting specification describes how to break up the data into subsets for plotting individual
set
Plotting in ggplot style
Let's set up our working environment with necessary libraries and also load our csv file into data
frame called survs_df,
Program:
import numpy as np
import pandas as pd
from plotnine import *
%matplotlib inline
survs_df = pd.read_csv('covid2021.csv').dropna()
ggplot(survs_df, aes(x='Weekday', y='Value')) +geom_point()
output:
(v) seaborn
Seaborn is a library in Python predominantly used for making statistical graphics. Seaborn is a data
visualization library built on top of matplotlib and closely integrated with pandas data structures in
Python. Visualization is the central part of Seaborn which helps in exploration and understanding of
data.
One has to be familiar with Numpy and Matplotlib and Pandas to learn about Seaborn.
Seaborn offers the following functionalities:
1. Dataset oriented API to determine the relationship between variables.
2. Automatic estimation and plotting of linear regression plots.
3. It supports high-level abstractions for multi-plot grids.
4. Visualizing univariate and bivariate distribution.
These are only some of the functionalities offered by Seaborn, there are many more
Using Seaborn we can plot wide varieties of plots like:
1. Distribution Plots
2. Pie Chart & Bar Chart
3. Scatter Plots
4. Pair Plots
5. Heat map
Example 1: How do we can get a list of all datasets that are in-built in Seaborn
import seaborn as sns
print(sns.get_dataset_names())
Output:
['anagrams', 'anscombe', 'attention', 'brain_networks', 'car_crashes', 'diamonds', 'dots', 'exercise',
'flights', 'fmri', 'gammas', 'geyser', 'iris', 'mpg', 'penguins', 'planets', 'tips', 'titanic']
Example 2: A simple line plot which is created using the lineplot() method
import seaborn as sns
import matplotlib.pyplot as plt
data = sns.load_dataset("iris")
sns.lineplot(x="sepal_length", y="sepal_width", data=data)
plt.title('Title using Matplotlib Function')
plt.show()
Example 3: A simple bar plot which is created using the barplot() method
import seaborn as sns
import matplotlib.pyplot as plt
data = sns.load_dataset("iris")
sns.barplot(x='species', y='sepal_length', data=data)
plt.show()
output: