Matplotlib
Data Visualization using Matplotlib Library
Learning objective
• Data Visualisation with Matplotlib
• What is Data Visualisation?
• Introduction to Matplotlib
• Types of data visualization charts/plots
• Line chart, Scatter plot
• Bar chart, Histogram
• Area Plot, Pie chart
• Boxplot, Contour plot
What is Data Visualization?
• Representation of data both at small and large-scale.
• One of the key skills of a data scientist is the ability to tell a
compelling story, visualizing data and findings in an approachable
and stimulating way.
• Learning how to leverage a software tool to visualize data will also
enable you to extract information, better understand the data, and
make more effective decisions.
What is Data Visualization?
• Data visualisation is an efficient technique of gaining insights
about data through a visual medium.
• Data Visualisation with matplotlib in Python session is to extract
data and present that data in a form that makes sense to people.
• Data visualization is a way to show a complex data in a form that is
graphical and easy to understand.
Benefits of data visualization
• It simplifies the complex quantitative information.
• It helps analyse and explore big data easily.
• It identifies the areas that need attention or improvement.
• It identifies the relationship between data points and variables.
• It explores patterns in the data.
Matplotlib – Introduction
• Most popular data visualization 2D plotting library
• Most widely-used library for plotting in the Python community
• Can be used in Python scripts, the Python, Jupyter notebook etc.
• Introduced by John Hunter in 2002
Visualization Charts / Plots
• matplotlib.pyplot is a collection of command style functions that
make Matplotlib work.
• Each pyplot function makes some change to a figure.
• For example, a function creates a figure, a plotting area in a figure,
plots some lines in a plotting area, decorates the plot with labels,
etc.
Line Chart
• A line chart is often used to visualize a trend in data over intervals
of time – a time series – thus the line is often drawn chronologically.
• Data points are connected by straight line segments.
• You can use the plot(x,y) method to create a line chart.
Line Chart
import matplotlib.pyplot as plt
from matplotlib import pyplot as plt
x=[1,2,3,4]
y=[10,20,30,40]
plt.plot(x,y)
plt.show()
Scatter chart
• A Scatterplot displays the value of 2 sets of data on 2 dimensions.
Each dot represents an observation.
• The position on the X (horizontal) and Y (vertical) axis represents
the values of the 2 variables.
• It is really useful to study the relationship between both variables.
A scatter plot is a diagram where each value in the data set is
represented by a dot.
• Use the scatter() method to draw a scatter plot diagram
Scatter plot
from matplotlib import pyplot as plt
x=[1,2,3,4]
y=[10,20,30,40]
plt.plot(x,y)
plt.show()
Bar Chart
• Shows the relationship between a numerical variable and a
categorical variable.
• For example, you can display the height of several individuals
using bar chart.
• Bar charts are used to present categorical data with rectangular
bars.
• The bars can be plotted vertically or horizontally, and their
heights/lengths are proportional to the values that they represent.
Use the bar() method to draw a bar plot diagram.
Bar plot
from matplotlib import pyplot as plt
Subject=["Mathematics","Physics", “Chemistry","Biology"]
Marks = [90,70,75,85]
plt.bar(Subject,Marks)
plt.show()
Histogram
• An accurate graphical representation of the distribution of numerical
data.
• It takes as input one numerical variable only.
• The variable is cut into several bins, and the number of observation per
bin is represented by the height of the bar.
• It is a type of bar plot where X-axis represents the bin ranges while Y-
axis gives information about frequency.
• Use the hist() method to draw a histogram diagram.
Histogram
from matplotlib import pyplot as plt
import numpy as np
a=np.array([22,87,5,43,56,73,55,54,11,20,51,5,79,31,27])
plt.hist(a, bins = [0,25,50,75,100])
plt.xlabel(‘X Label')
plt.ylabel(‘Y Label')
plt.title('Histogram Example')
plt.show()
• A histogram of marks obtained by students in a class.
• Four bins, 0-25, 26-50, 51-75, and 76-100 are defined.
• The Histogram shows number of students falling in this range.
Area Chart
• An area chart is really similar to a line chart, except that the area
between the x axis and the line is filled in with color or shading.
• It represents the evolution of a numerical variable following
another numerical variable.
• Use the fill_between() method to draw an Area chart.
Area Chart
import matplotlib.pyplot as plt
from matplotlib import pyplot as plt
x=[1,2,3,4,5,6,7]
y=[10,20,30,20,10,50,60]
plt.fill_between(x,y)
plt.show()
Pie Chart
• Circular statistical plot that can display only one series of data.
• Pie charts show the size of items (called wedge) in one data series,
proportional to the sum of the items.
• The data points in a pie chart are shown as a percentage of the whole
pie.
• The area of the wedge is determined by the length of the arc of the
wedge.
• Use the pie() method to draw a Pie chart.
Pie Chart
from matplotlib import pyplot as plt
cars = ['AUDI','BMW','FORD','TESLA', 'JAGUAR’, 'MERCEDES']
data = [23, 17, 35, 29, 12, 41]
plt.pie(data, labels = cars)
plt.show()
Boxplot
• Boxplot is probably one of the most common type of graphic.
• It gives a nice summary of one or several numeric variables.
• The line that divides the box into 2 parts represents the median of
the data.
• The end of the box shows the upper and lower quartiles.
• The extreme lines shows the highest and lowest value excluding
outliers.
• Use the boxplot() method to draw a box plot.
Boxplot
from matplotlib import pyplot as plt
data1= [100,120,140,160,180]
plt.boxplot(data1)
plt.show()
Stacked bar plot
• Type of bar chart where multiple bars are stacked on top of each
other for each category, and the total height of the stacked bars
represents the cumulative value of the individual components.
• Each bar in the stack represents a different category or group, and
each segment of the bar represents a different sub-category or
component.
• This type of plot is useful when you want to show the total
magnitude across different categories and the contribution of
each sub-category to that total.
Stacked bar plot
import matplotlib.pyplot as plt
categories = ['Category A', 'Category B', 'Category C']
values1 = [10, 15, 20]
values2 = [5, 10, 15]
plt.bar(categories, values1, label='Value 1')
plt.bar(categories, values2, bottom=values1, label='Value 2')
plt.xlabel('Categories')
plt.ylabel('Values')
plt.title('Stacked Bar Plot')
plt.legend()
plt.show()
Group Bar plot
• Also known as a clustered bar chart, is a type of bar chart that
displays multiple bars for each category side by side, rather than
stacking them on top of each other.
• In a group bar plot, each group of bars represents a different
category, and each bar within the group represents a different
sub-category or component.
• This type of plot is useful for comparing the values of different
sub-categories across multiple categories.
Group Bar Plot
import matplotlib.pyplot as plt
import numpy as np
categories = ['Category A', 'Category B', 'Category C']
values1 = [10, 15, 20]
values2 = [5, 10, 15]
bar_width = 0.35
index = np.arange(len(categories))
plt.bar(index, values1, width=bar_width, label='Value 1')
plt.bar(index + bar_width, values2, width=bar_width, label='Value 2')
plt.xlabel('Categories')
plt.ylabel('Values')
plt.title('Group Bar Plot')
plt.legend()
plt.show()
You have learnt:
• Data Visualisation with Matplotlib
• What is Data Visualisation?
• Introduction to Matplotlib
• Types of data visualization charts/plots
• Line chart, Scatter plot
• Bar chart, Histogram
• Area Plot, Pie chart
• Boxplot, Contour plot