MODULE – 5
Data Visualization with NumPy Arrays
1. What is Data Visualization?
Data Visualization is the graphical representation of data and information.
It helps us to:
• Understand trends, patterns, and outliers
• Communicate complex data in a simple and meaningful way
• Make data-driven decisions more effectively
2. What is NumPy?
NumPy stands for Numerical Python. It is a core scientific computing library in Python.
NumPy provides:
• Support for large, multi-dimensional arrays and matrices
• Tools to perform mathematical and logical operations on arrays
• Fast performance compared to regular Python lists
3. Why Use NumPy in Data Visualization?
Using NumPy is extremely beneficial when preparing data for visualization:
Feature Benefit
Efficient Array
Fast computation and memory-saving
Handling
Mathematical
Easy calculation of values for plotting
Operations
Compatibility Works smoothly with libraries like matplotlib, seaborn, pandas
Can generate sample data (e.g., sine waves, random data) for
Simulation
practice or models
4. How NumPy Supports Data Visualization
NumPy makes it easy to:
• Create regular intervals of values (e.g., 0 to 10 with 100 steps)
• Perform mathematical operations (e.g., sin(x), log(x), etc.)
• Create data for function plots, trends, and distributions
• Clean, manipulate, or reshape data before plotting
5. Integration with Visualization Libraries
NumPy arrays are commonly used with:
• Matplotlib: For basic plots (line, bar, scatter, etc.)
• Seaborn: For statistical data visualization (with Pandas/NumPy)
• Plotly: For interactive charts
• Pandas: For labeled data structures (DataFrames use NumPy internally)
6. Common Use Cases of NumPy Arrays in Visualization
Use Case Example Plot
Plotting mathematical functions Line plot of sin(x)
Data distribution Histogram using random data
Comparing categories Bar charts
Time-series or signal data Line plots from sensor data
Matrix or image display 2D heatmaps or image plots
7. Very Small Example Code – Line Plot with NumPy
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0, 10, 100) # 100 values from 0 to 10
y = np.sin(x) # Calculate sine for each x
plt.plot(x, y) # Line graph
plt.title("Sine Wave")
plt.xlabel("X-axis")
plt.ylabel("sin(x)")
plt.show()
8. Advantages of Using NumPy for Visualization
• Speed: Much faster than native Python lists
• Simplicity: Easy syntax for math operations
• Flexibility: Works with multiple libraries
• Real-world ready: Can be used in scientific, financial, and engineering
visualizations
NumPy Data Visualization
Why Use NumPy for Data Visualization?
NumPy makes it easy to generate and process numerical data for visualization. Here’s
why:
Feature Benefit
Fast numerical computations Efficient handling of large datasets
Easy function operations Perform math (sin, cos, log, etc.) directly
Integration with libraries Works perfectly with matplotlib, seaborn
Real-world modeling Simulate waves, signals, and distributions
How It Works
You use NumPy to:
1. Generate data (e.g., an array of values from 0 to 10)
2. Apply functions (e.g., sine, square, exponential)
3. Pass data to visualization libraries like matplotlib for plotting
NumPy Data Visualization (with Car Data)
First, create basic car data using NumPy:
import numpy as np
import matplotlib.pyplot as plt
# Sample data
brands = np.array(['Toyota', 'Honda', 'Ford', 'BMW', 'Audi'])
weights = np.array([1200, 1300, 1500, 1800, 1700]) # in kg
prices = np.array([20000, 22000, 25000, 35000, 40000]) # in dollars
1. Line Plot
• Line plots connect data points with straight lines.
• Useful for showing trends across ordered data (e.g., increasing weight or price).
• Easy to spot rises, falls, or patterns.
plt.plot(brands, weights, marker='o')
plt.title('Car Brand vs Weight')
plt.xlabel('Brand')
plt.ylabel('Weight (kg)')
plt.show()
2. Scatter Plot
• Scatter plots show the relationship between two numeric variables.
• Each point represents one observation (like one car).
• Good for spotting correlations or clusters.
Code:
plt.scatter(weights, prices)
plt.title('Weight vs Price of Cars')
plt.xlabel('Weight (kg)')
plt.ylabel('Price ($)')
plt.show()
3. Histogram
• A histogram shows how data is distributed into intervals (bins).
• Useful for checking data spread like car weights.
• Helps detect skewness, outliers, and frequency.
Code:
plt.hist(weights, bins=5, color='skyblue', edgecolor='black')
plt.title('Distribution of Car Weights')
plt.xlabel('Weight (kg)')
plt.ylabel('Frequency')
plt.show()
4. Bar Plot
• A bar plot shows quantities of different categories.
• Useful for comparing car weights or prices across brands.
• Heights of bars represent values.
Code:
plt.bar(brands, prices, color='orange')
plt.title('Car Brand vs Price')
plt.xlabel('Brand')
plt.ylabel('Price ($)')
plt.show()
6. Horizontal Bar Plot (Barh)
• A horizontal bar plot is like a bar plot but flipped sideways.
• Useful when category names (brands) are long.
• Makes comparisons easier when many categories exist.
Code:
plt.barh(brands, weights, color='green')
plt.title('Car Brand vs Weight (Horizontal)')
plt.xlabel('Weight (kg)')
plt.ylabel('Brand')
plt.show()
7. Box Plot
• A box plot shows the spread of data with min, max, median, and quartiles.
• It helps detect outliers and variability.
• Perfect for summarizing distributions like car weights.
Code:
plt.boxplot(weights)
plt.title('Box Plot of Car Weights')
plt.ylabel('Weight (kg)')
Matplotlib
Matplotlib is one of the most widely used data visualization libraries in Python. It is mainly
used to create static, interactive, and animated plots. It provides a variety of tools for
creating different types of graphs like line plots, bar charts, scatter plots, histograms,
and more.
Features of Matplotlib
• Wide Range of Plots: Supports line plots, scatter plots, bar charts, pie charts,
histograms, etc.
• Customization: High level of control over every aspect of a figure (size, colors,
fonts, axes).
• Integration: Works well with NumPy, Pandas, and integrates with GUI toolkits like
Tkinter, wxPython.
• Exporting: Graphs can be exported in various formats like PNG, PDF, SVG, EPS.
• Interactive Plots: Supports zooming, panning, and updating plots dynamically.
Basic Structure of Matplotlib
Matplotlib works on an object-oriented approach. The two main interfaces are:
1. Pyplot API (matplotlib.pyplot) — Easy-to-use interface like MATLAB.
2. Object-oriented API — Gives full control of plot elements.
The most commonly used is the Pyplot interface.
import matplotlib.pyplot as plt
Important Concepts
1. Figure and Axes
• Figure: The whole window where plots appear (like a canvas).
• Axes: A part of the figure where the data is plotted (could be multiple in one figure).
2. Plotting Basic Graphs
A simple line plot:
import matplotlib.pyplot as plt
x = [1, 2, 3, 4]
y = [10, 20, 25, 30]
plt.plot(x, y)
plt.title("Simple Line Plot")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.show()
Types of Plots in Matplotlib
1. Line Plot
Useful for showing trends over time.
x = [0, 1, 2, 3]
y = [0, 1, 4, 9]
plt.plot(x, y, color='red', linestyle='--', marker='o')
plt.title("Line Plot Example")
plt.show()
2. Bar Chart
Good for comparing categories.
categories = ['A', 'B', 'C']
values = [10, 20, 15]
plt.bar(categories, values, color='green')
plt.title("Bar Chart Example")
plt.show()
3. Scatter Plot
Used for correlation between two variables.
x = [5, 7, 8, 7, 2, 17]
y = [99, 86, 87, 88, 100, 86]
plt.scatter(x, y, color='blue')
plt.title("Scatter Plot Example")
plt.show()
4. Histogram
Used to see the distribution of data.
import numpy as np
data = np.random.randn(1000)
plt.hist(data, bins=30, color='purple')
plt.title("Histogram Example")
plt.show()
Customization in Matplotlib
You can control color, line styles, markers, labels, legend, grids, and more.
x = [1, 2, 3, 4, 5]
y = [2, 3, 5, 7, 11]
plt.plot(x, y, color='magenta', marker='s', linestyle=':')
plt.title('Customized Plot', fontsize=14)
plt.xlabel('X-axis', fontsize=12)
plt.ylabel('Y-axis', fontsize=12)
plt.grid(True)
plt.legend(['Data Line'])
plt.show()
Subplots in Matplotlib
You can create multiple plots in the same figure using subplot().
# 2 rows, 1 column, 1st plot
plt.subplot(2, 1, 1)
plt.plot([1,2,3], [1,4,9])
# 2 rows, 1 column, 2nd plot
plt.subplot(2, 1, 2)
plt.plot([1,2,3], [1,2,3])
plt.show()
Saving Plots
You can save plots to files.
plt.plot([1,2,3], [5,7,4])
plt.title("Save Example")
plt.savefig('myplot.png')
Why use Matplotlib?
• Professional Quality: You can create publication-quality figures.
• Flexible and Powerful: Suitable for simple plots and complex visualizations.
• Community Support: Extensive documentation and large user community.
• Cross-platform: Works on Windows, Mac, Linux.
4. How Does It Work?
Matplotlib follows these general steps:
1. Import the library
2. Prepare the data (can be lists or arrays)
3. Plot the data using functions like plot(), bar(), scatter()
4. Customize the plot with titles, labels, grid, etc.
5. Display the plot using plt.show()
8. Advantages of Using Matplotlib
• Works with other libraries like NumPy, Pandas, and Seaborn
• Highly customizable: you can change colors, styles, legends, labels, and more
• Supports saving plots as images or PDFs
• Used by students, researchers, data scientists, and developers
Matplotlib Packages in Python
Matplotlib is a powerful plotting library for the Python programming language.
It provides a full set of functions for creating static, animated, and interactive
visualizations.
It has since become the default choice for plotting and graphing in data science,
machine learning, and scientific computing.
Visualization is one of the most important steps in any data-driven project. Without clear
graphs, data analysis can become confusing and incomplete.
It also integrates seamlessly with other libraries like NumPy, Pandas, and SciPy.
Why Study Matplotlib Packages?
Matplotlib is made up of many small sub-packages working together.
Understanding these packages helps to:
• Customize and control plots deeply.
• Create professional-quality graphs.
• Manage complex figures and multiple plots.
Main Packages of Matplotlib
Matplotlib is modular.
Here are the important packages and modules you must know:
Package/Module Description
matplotlib.pyplot Simplified plotting functions (main interface)
matplotlib.figure Manages the figure window (the canvas)
matplotlib.axes Handles the plotting area (x-axis, y-axis)
matplotlib.artist Base for all visible elements like lines, texts
matplotlib.backend Deals with rendering output (screen, file)
matplotlib.animation Creates animated plots
matplotlib.gridspec Controls layout of subplots
matplotlib.style Predefined styles to change appearance
1. matplotlib.pyplot
• Most commonly used module.
• Contains functions like plot(), show(), xlabel(), ylabel(), title(), etc.
• It is based on a state machine — calls affect the current figure and axes.
Small Code Example
import matplotlib.pyplot as plt
x = [1, 2, 3, 4]
y = [2, 4, 6, 8]
plt.plot(x, y)
plt.title("Basic Line Plot")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.show()
2. matplotlib.figure
• A Figure is the overall window where plots appear.
• You can have multiple Axes (plots) inside one Figure.
Small Code Example
from matplotlib.figure import Figure
from matplotlib.backends.backend_agg import FigureCanvasAgg as FigureCanvas
fig = Figure()
canvas = FigureCanvas(fig) # Attach figure to a canvas
ax = fig.add_subplot(111) # 1x1 grid, 1st subplot
ax.plot([1, 2], [3, 4])
fig.savefig('figure1.png') # Save figure to file
3. matplotlib.axes
• Axes represent the actual plot area (with X-axis and Y-axis).
• You can add multiple Axes inside a single Figure.
Small Code Example
import matplotlib.pyplot as plt
fig, ax = plt.subplots() # Create a figure and a single axes
ax.plot([1, 2, 3, 4], [1, 4, 2, 3])
ax.set_title("Plot using Axes")
plt.show()
4. matplotlib.artist
• Everything that appears on a figure (lines, text, legends, etc.) is an Artist.
• Even the figure, axes, and titles are subclasses of Artist.
Small Code Example
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
line = ax.plot([1, 2, 3], [1, 2, 3])
print(type(line)) # <class 'list'>, containing Artist elements
5. matplotlib.backend
• Matplotlib supports many backends — these determine how the plots are
displayed (screen, saved as file, web applications, etc.).
• Examples: TkAgg, Agg, PDF, SVG, etc.
Small Note
# You usually don't need to set backend manually
# But you can check the current backend:
import matplotlib
print(matplotlib.get_backend())
6. matplotlib.animation
• Allows you to animate plots.
• Useful for simulations, dynamic updates in real-time graphs.
Small Code Example
import matplotlib.pyplot as plt
import matplotlib.animation as animation
import numpy as np
fig, ax = plt.subplots()
x = np.arange(0, 2*np.pi, 0.01)
line, = ax.plot(x, np.sin(x))
def animate(i):
ani = animation.FuncAnimation(fig, animate, interval=100)
plt.show()
7. matplotlib.gridspec
• Used to organize multiple subplots in complex layouts.
• More flexible than subplot().
Small Code Example
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
fig = plt.figure()
gs = gridspec.GridSpec(2, 2)
ax1 = fig.add_subplot(gs[0, 0])
ax2 = fig.add_subplot(gs[0, 1])
ax3 = fig.add_subplot(gs[1, :])
ax1.set_title('Top Left')
ax2.set_title('Top Right')
ax3.set_title('Bottom Row')
plt.tight_layout()
plt.show()
8. matplotlib.style
• Provides predefined styles to quickly change the appearance of plots.
Small Code Example
import matplotlib.pyplot as plt
plt.style.use('ggplot') # Apply ggplot style
plt.plot([1, 2, 3], [2, 4, 6])
plt.title("Styled Plot")
plt.show()
You can view all available styles:
print(plt.style.available)
Main Parts of a Matplotlib Graph
1. Figure
• The entire window or canvas where everything is drawn.
• Think of it as a blank paper where all plots, titles, and labels go.
fig = plt.figure()
2. Axes
• The actual plot area inside the figure where data is drawn.
• It contains:
o X-axis and Y-axis
o Titles
o Ticks
o Lines, bars, etc.
ax = fig.add_subplot(1, 1, 1)
fig, ax = plt.subplots()
3. X-Axis and Y-Axis
• The two coordinate lines that define the horizontal and vertical scales.
• You can label them using:
plt.xlabel("X-axis Label")
plt.ylabel("Y-axis Label")
4. Plot (Data/Lines/Bars)
• The visual representation of your data — like a line, bar, or scatter dots.
• Created using commands like:
o plt.plot() for line graph
o plt.bar() for bar chart
o plt.scatter() for dot plots
5. Title
• The heading of the graph, explaining what the graph shows.
plt.title("Your Graph Title")
6. Ticks and Tick Labels
• Ticks are the small lines or markers along the axes.
• Tick labels are the numbers or names (like 1, 2, 3 or Jan, Feb, Mar).
• These helps read the values on the graph easily.
7. Legend
• A box that explains what each line, color, or symbol represents (especially in multi-
line plots).
plt.legend(["Line A", "Line B"])
8. Grid
• Background lines that make it easier to read the plot.
plt.grid(True)
Object-Oriented Interface for Plotting Graphs in
Matplotlib
In Matplotlib, there are two ways to create plots:
• State-based interface (using pyplot) — simpler, like MATLAB.
• Object-Oriented (OO) interface — more flexible and powerful, especially for
complex figures.
The Object-Oriented approach means you explicitly create Figure and Axes objects and
then call methods on these objects to create and customize the plots.
It gives better control when working with multiple plots, layouts, and customizations.
Steps in Object-Oriented Plotting
1. Create a Figure object using plt.figure() or plt.subplots().
2. Create one or more Axes (subplots) inside the Figure.
3. Use methods like .plot(), .set_title(), .set_xlabel(), .set_ylabel() on the Axes object.
Small Code Example
import matplotlib.pyplot as plt
# Step 1: Create figure and axes
fig, ax = plt.subplots()
# Step 2: Plot data on the axes
ax.plot([1, 2, 3, 4], [10, 20, 25, 30])
# Step 3: Set labels and title
ax.set_xlabel('X Axis')
ax.set_ylabel('Y Axis')
ax.set_title('Object-Oriented Plot')
# Step 4: Display the figure
plt.show()
Here, fig is the Figure and ax is the Axes object.
Advantages of OO Interface
• Useful for complex layouts with multiple plots.
• More readable and modular code.
• Easier to customize each part of the plot.
• Better when embedding Matplotlib plots into applications (Tkinter, PyQt, etc.)
Getting and Setting Values in Matplotlib
What does "Setting" and "Getting" Mean?
• Setting values = Changing properties of the plot, such as:
o Plot Title
o Axis Labels (X, Y)
o Axis Limits (range of X and Y)
o Tick Marks (positions and labels)
o Line Colors, Styles, Widths
• Getting values = Reading or checking the current settings of these elements.
o Useful if you want to modify the graph programmatically or check its current
state.
Using the pyplot Interface (plt)
In pyplot, you directly call functions on plt (no need to create figure or axes manually).
It automatically keeps track of the "current" figure and axes for you.
This is a simpler and more beginner-friendly way — perfect for quick plots.
Common Set and Get Methods in plt Interface
Element Set Function Get Function
Plot Title plt.title("My Title") plt.gca().get_title()
X-axis Label plt.xlabel("X Axis") plt.gca().get_xlabel()
Element Set Function Get Function
Y-axis Label plt.ylabel("Y Axis") plt.gca().get_ylabel()
X-axis Limits plt.xlim(min, max) plt.gca().get_xlim()
Y-axis Limits plt.ylim(min, max) plt.gca().get_ylim()
Tick Positions plt.xticks([1,2,3]) plt.gca().get_xticks()
Line Color Set inside plot() via color='red' line.get_color()
Here, gca() means Get Current Axes — it allows us to access axes properties even
when using plt style.
Full Code Example with plt Interface
import matplotlib.pyplot as plt
# Data
x = [1, 2, 3, 4]
y = [10, 20, 25, 30]
# Plotting the graph
line, = plt.plot(x, y, color='green') # Setting line color at plot time
# Setting values
plt.title("My Sample Plot")
plt.xlabel("X-Axis")
plt.ylabel("Y-Axis")
plt.xlim(0, 5)
plt.ylim(0, 35)
# Getting values using plt.gca() (get current axes)
print("Title:", plt.gca().get_title())
print("X-label:", plt.gca().get_xlabel())
print("Y-label:", plt.gca().get_ylabel())
print("X-limits:", plt.gca().get_xlim())
print("Y-limits:", plt.gca().get_ylim())
print("Line color:", line.get_color())
# Show the plot
plt.show()
How plt Interface is Different from OO Interface
Object-Oriented (fig, ax) pyplot Interface (plt)
You manually create Figure and Axes Automatically creates current figure and axes
More flexible for complex plots Simpler for quick and small plots
Example: ax.set_title() Example: plt.title()
Patches in Matplotlib
Patches are basic 2D shapes provided by Matplotlib that you can draw on your plots.
These shapes include:
• Rectangles
• Circles
• Arrows
• Ellipses
• Polygons
• Wedges, etc.
Patches are part of the matplotlib.patches module.
Why Use Patches?
Feature Use
Add custom shapes Rectangles, circles, etc. to highlight or annotate
Improve visual appeal Colored backgrounds, zones, highlights
Add interactivity Clickable or dynamic shapes
Create custom legends Using custom patch shapes
How to Use Patches?
1. Import the patch class you want
from matplotlib.patches import Rectangle, Circle
2. Create a figure and axes
fig, ax = plt.subplots()
3. Create a patch object and add it
rect = Rectangle((1, 1), width=2, height=3, color='skyblue')
ax.add_patch(rect)
4. Display the plot
plt.xlim(0, 5)
plt.ylim(0, 5)
plt.show()
Common Patch Types and Classes
Shape Patch Class
Rectangle Rectangle((x, y), w, h)
Circle Circle((x, y), radius)
Ellipse Ellipse((x, y), w, h)
Polygon Polygon([(x1,y1),(x2,y2)...])
Arrow FancyArrow(x, y, dx, dy)
Wedge Wedge(center, r, θ1, θ2)
Plotting
A plot is a visual representation of data using shapes like lines, bars, dots, etc., to help us
understand patterns, relationships, and trends.
Why Use Plots?
• To visualize large data in an understandable way
• To compare data
• To detect trends, patterns, or outliers
• To support data-driven decisions
Types of Plots in Matplotlib (with Simple Explanations)
Plot Type Use Case Function
Line Plot Show trends or time series plt.plot()
Bar Plot Compare quantities plt.bar()
Histogram Show distribution of data plt.hist()
Scatter Plot Show relation between 2 variables plt.scatter()
Pie Chart Show parts of a whole plt.pie()
Box Plot Show statistical summary plt.boxplot()
Area Plot Similar to line plot, with filled area plt.fill_between()
Stack Plot Cumulative data over time plt.stackplot()
Heatmap (via Seaborn) Show data intensity in grid sns.heatmap()
1. Line Plot
• A line plot connects individual data points with straight lines.
• It is mostly used to show trends over time (e.g., sales over months).
• Good for continuous data where points are related.
• X-axis usually shows time or sequence; Y-axis shows values.
• It is the most basic and widely used type of plot.
Small Code Example:
import matplotlib.pyplot as plt
x = [1, 2, 3, 4]
y = [10, 20, 25, 30]
plt.plot(x, y)
plt.title("Line Plot")
plt.show()
2. Bar Plot
• A bar plot displays data with rectangular bars.
• Bar height represents the value; useful for categorical data.
• Can be vertical or horizontal bars.
• Shows comparisons between different groups.
• Great for discrete data like survey results or sales by category.
Small Code Example:
categories = ['A', 'B', 'C']
values = [5, 7, 3]
plt.bar(categories, values)
plt.title("Bar Plot")
plt.show()
3. Histogram
• A histogram shows the distribution of a dataset.
• It groups data into bins (intervals) and plots how many values fall into each bin.
• Useful for understanding the spread and shape of data.
• Helps identify patterns like normal distribution, skewness, etc.
• Different from bar plots — histograms are for continuous data.
Small Code Example:
data = [1, 2, 2, 3, 3, 3, 4, 5, 5]
plt.hist(data, bins=5)
plt.title("Histogram")
plt.show()
4. Scatter Plot
• A scatter plot shows the relationship between two variables.
• Each point represents one observation with x and y coordinates.
• Useful to see correlations, trends, or clusters.
• No lines connecting points — just dots.
• Often used in regression analysis and machine learning.
Small Code Example:
import matplotlib.pyplot as plt
x = [1, 2, 3, 4, 5]
y = [2, 4, 1, 3, 5]
plt.scatter(x, y)
plt.title("Scatter Plot")
plt.show()
5. Pie Chart
• A pie chart shows parts of a whole as slices of a circle.
• Good for displaying percentage or proportional data.
• Each slice size is proportional to the quantity it represents.
• Useful when total = 100% and parts are clearly divided.
• Too many categories can make pie charts confusing.
Small Code Example:
sizes = [30, 40, 20, 10]
labels = ['A', 'B', 'C', 'D']
plt.pie(sizes, labels=labels, autopct='%1.1f%%')
plt.title("Pie Chart")
plt.show()
6. Box Plot (Box-and-Whisker Plot)
• A box plot summarizes a dataset's minimum, Q1, median, Q3, and maximum.
• Helps to visualize spread, central value, and outliers.
• The box shows the interquartile range (IQR).
• The line inside the box is the median.
• Useful in statistical analysis and comparing distributions.
Small Code Example:
data = [7, 15, 13, 18, 21, 25, 30, 10, 5]
plt.boxplot(data)
plt.title("Box Plot")
plt.show()
7. Area Plot
• An area plot is like a line plot, but the area under the line is filled with color.
• Shows quantity over time or different groups together.
• Useful for stacked trends (e.g., total sales of products).
• Helps to see cumulative changes clearly.
• Each colored area can represent a category.
Small Code Example:
import matplotlib.pyplot as plt
x = [1, 2, 3, 4]
y = [3, 5, 2, 8]
plt.fill_between(x, y)
plt.title("Area Plot")
plt.show()
8. Stack Plot
• A stack plot is a special type of area plot.
• Multiple datasets are stacked on top of each other.
• Shows cumulative changes in multiple variables over time.
• Useful for composition over time (e.g., population, sales growth).
• Each layer represents a different category.
Small Code Example:
days = [1, 2, 3, 4]
apples = [3, 4, 5, 6]
bananas = [1, 2, 1, 2]
plt.stackplot(days, apples, bananas, labels=['Apples', 'Bananas'])
plt.legend()
plt.title("Stack Plot")
plt.show()
Histograms, binning, and density
plots are visualization techniques used to understand the distribution of data —
especially how values are spread across a dataset.
Histogram
A Histogram is a type of graph that represents the distribution of a dataset.
• It divides the entire range of values into a series of intervals called bins.
• The height of each bar shows how many data points fall into each bin.
• Histograms are mainly used for continuous numerical data.
• They help us see patterns like skewness, modality (peaks), and spread of data.
• Example use: Visualizing test scores, age distribution, etc.
Simple code example:
import matplotlib.pyplot as plt
data = [1, 2, 2, 3, 3, 3, 4, 5, 5]
plt.hist(data, bins=5)
plt.title("Histogram Example")
plt.show()
Binning in Histograms
Binning refers to how we divide the data range into intervals (bins):
• Each bin covers a specific range of data.
• Choosing the number of bins is important:
o Too few bins → important details are hidden.
o Too many bins → graph becomes noisy and confusing.
• Common ways to choose bin size:
o Sturges’ rule
o Square root rule (bins ≈ √n, where n = number of data points)
o Freedman-Diaconis rule
Example showing bin change:
plt.hist(data, bins=3) # Only 3 bins
plt.title("Histogram with 3 Bins")
plt.show()
More bins = finer detail; fewer bins = more general view.
Density Estimation (KDE)
• Kernel Density Estimation (KDE) is a smooth version of a histogram.
• Instead of sharp bars, it produces a continuous curve that estimates the
probability density of the data.
• KDE smooths the distribution by placing a smooth "bump" (kernel) over each data
point.
• Useful when we want to understand the underlying distribution more clearly than
with histograms.
• KDE plots are especially good when the dataset is large.
Simple KDE Example (using Seaborn):
import seaborn as sns
sns.kdeplot(data)
plt.title("Density Estimation (KDE)")
plt.show()
What Is Error Visualization?
Error bars function used as graphical enhancement that visualizes the variability of the
plotted data on a Cartesian graph.
Error bars can be applied to graphs to provide an additional layer of detail on the presented
data. Error bars help you indicate estimated error or uncertainty.
To visualize this information error bars work by drawing lines that extend from the center of
the plotted data point or edge with bar charts the length of an error bar helps to reveal
uncertainty of a data point as shown in the below graph.
A short bar infers less error whereas a long bar indicates more error or deviation.
Error visualization is the process of graphically representing the uncertainty or variability
in data using error bars or shaded regions.
Why Visualize Errors?
• To show how accurate or uncertain your data points or model predictions are
• To show confidence intervals in measurements
• To make your graphs scientifically meaningful
• Useful in experiments, surveys, and model predictions
Common Ways to Visualize Errors:
1. Error Bars (plt.errorbar())
2. Shaded Regions (using fill_between())
3. Box plots (for statistical spread)
4. Confidence bands (for models, often with regression lines)
1. Using plt.errorbar()
plt.errorbar(x, y, yerr=errors, fmt='o')
Simple Example:
import matplotlib.pyplot as plt
x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]
errors = [0.5, 0.4, 0.6, 0.3, 0.7]
plt.errorbar(x, y, yerr=errors, fmt='o', capsize=5, color='blue', ecolor='red')
plt.title("Error Bar Example")
plt.xlabel("X")
plt.ylabel("Y")
plt.grid(True)
plt.show()
2. Shaded Error Region with fill_between()
Good for showing ranges (confidence intervals or variability).
import numpy as np
x = np.linspace(0, 10, 100)
y = np.sin(x)
error = 0.2
plt.plot(x, y, label='Mean Value')
plt.fill_between(x, y - error, y + error, alpha=0.3, label='± Error')
plt.title("Shaded Error Region")
plt.legend()
plt.show()
3. Horizontal and Vertical Error Bars
You can also add x-direction error bars using xerr.
plt.errorbar(x, y, xerr=0.2, yerr=errors, fmt='o', ecolor='green', capsize=3)
4. Box Plots (for spread and variability)
Box plots show median, quartiles, and outliers — another form of visualizing error or
spread in data.
data = [[5, 7, 6, 9, 12], [8, 5, 6, 7, 10]]
plt.boxplot(data)
plt.title("Box Plot")
plt.show()
Visualizing Continuous Errors
Continuous error bands are a graphical representation of error or uncertainty as a shaded
region around a main trace, rather than as discrete whisker-like error bars. Fill_between()
function can be used to visualize the continuous errors.
Contour Plots
Contour plots also called level plots are a tool for doing multivariate analysis and
visualizing 3-D plots in 2-D space. If we consider X and Y as our variables we want to plot
then the response Z will be plotted as slices on the X-Y plane due to which contours are
sometimes referred as Z-slices or iso-response.
import numpy as np
import matplotlib.pyplot as plt
feature_x = np.linspace(-5.0, 3.0, 70)
feature_y = np.linspace(-5.0, 3.0, 70)
X, Y = np.meshgrid(feature_x, feature_y)
Z = X**2 + Y**2
plt.contourf(X, Y, Z, cmap='viridis') # 'viridis' colormap
plt.colorbar()
plt.title('Filled Contour Plot of Z = X^2 + Y^2')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.show()
Customizing
1. Customizing Plot Legends
What is a Plot Legend?
• A legend is a small box inside the plot that explains what different colors,
markers, or lines represent.
• It is very important when you have multiple plots in a single graph.
• By default, legends are placed automatically, but customization allows you to:
o Change position (top, bottom, left, right)
o Add title to the legend
o Adjust font size, background color, and border.
• The plt.legend() function in Matplotlib is used for legends.
Common Customizations
Feature Method
Position loc='upper right', 'lower left', etc.
Title title='Legend Title'
Font Size fontsize=10
Background color facecolor='lightgray'
Border color and width edgecolor='black', frameon=True
Simple Code Example:
import matplotlib.pyplot as plt
x = [1, 2, 3, 4]
y1 = [2, 3, 4, 5]
y2 = [3, 4, 2, 1]
plt.plot(x, y1, label="Line 1")
plt.plot(x, y2, label="Line 2")
plt.legend(loc='lower right', title='My Legend', fontsize=10, frameon=True,
facecolor='lightyellow', edgecolor='black')
plt.title("Customized Legend Example")
plt.show()
2. Customizing Colorbars (5 Marks)
What is a Colorbar?
• A colorbar is a visual representation of color mapping used in plots like:
o Heatmaps
o Contour plots
o Images (e.g., pixel intensities)
• It helps the reader understand what different colors mean (e.g., high value = dark
red, low value = light blue).
Common Customizations
Feature Method
Colorbar Label colorbar.set_label('label name')
Orientation orientation='horizontal'
Tick Fontsize colorbar.ax.tick_params(labelsize=10)
Shrink Size plt.colorbar(mappable, shrink=0.8)
Colorbar Ticks colorbar.set_ticks([list_of_ticks])
Simple Code Example:
import matplotlib.pyplot as plt
import numpy as np
data = np.random.rand(5, 5)
img = plt.imshow(data, cmap='viridis')
colorbar = plt.colorbar(img, shrink=0.8, orientation='vertical')
colorbar.set_label('Intensity')
colorbar.ax.tick_params(labelsize=8)
plt.title("Customized Colorbar Example")
plt.show()
3. Text and Annotation in Matplotlib (7 Marks)
Theory
• Text is used to add static words like titles, labels, or general comments inside a
plot.
• Annotations are special texts connected to a specific data point (for highlighting
something important).
• You can control font size, color, rotation, alignment, etc.
• Matplotlib functions used:
o plt.text(x, y, "text") — Add text at position (x, y).
o plt.annotate() — Add an annotation, often with arrows pointing to a point.
Common Parameters:
Parameter Meaning
fontsize Size of text
color Color of text
ha, va Horizontal, Vertical alignment
arrowprops To add arrow in annotation
Small Code Example:
import matplotlib.pyplot as plt
plt.plot([1, 2, 3], [4, 5, 6])
# Add simple text
plt.text(2, 5, "Midpoint", fontsize=12, color='blue')
# Add annotation with arrow
plt.annotate("Peak", xy=(3,6), xytext=(2.5,6.5),
arrowprops=dict(facecolor='red', shrink=0.05))
plt.title("Text and Annotation Example")
plt.show()
4. Transform and Text Positions (7 Marks)
Theory
• Transforms control where text or annotations appear — based on data
coordinates or figure coordinates.
• Data coordinate → based on the axes scale (e.g., x=2, y=5).
• Figure coordinate → relative to figure size (0 to 1 range).
• Useful for placing text at fixed positions regardless of data.
Types of Transforms:
Transform Meaning
ax.transData Data points (default)
fig.transFigure Figure (0,0) to (1,1)
Small Code Example:
fig, ax = plt.subplots()
ax.plot([1, 2, 3], [4, 5, 6])
# Add text in data coordinates
ax.text(2, 5, "Data Text", transform=ax.transData)
# Add text in figure coordinates
fig.text(0.5, 0.9, "Figure Text", ha='center', fontsize=12)
plt.show()
5. Customizing Ticks (7 Marks)
Theory
• Ticks are the small marks on x-axis and y-axis showing values.
• You can customize:
o Location of ticks
o Labels of ticks
o Font size, rotation, color
• Useful to make plots cleaner or match a specific style.
Main Functions:
Function Purpose
set_xticks() Set custom tick locations (x-axis)
set_yticks() Set custom tick locations (y-axis)
set_xticklabels() Set custom labels
tick_params() Customize appearance (size, color, direction)
Small Code Example:
fig, ax = plt.subplots()
ax.plot([1, 2, 3], [4, 5, 6])
# Set custom ticks
ax.set_xticks([1, 2, 3])
ax.set_xticklabels(['One', 'Two', 'Three'])
# Customize tick appearance
ax.tick_params(axis='x', rotation=45, labelsize=12, colors='red')
plt.show()
6. Customizing Plots (7 Marks)
Theory
• Customization means changing the look and feel of the entire plot.
• You can customize:
o Line style (dotted, dashed)
o Line color (blue, red)
o Markers (dots, squares)
o Background color
o Grid style
• Customization helps highlight important patterns and improve readability.
Customization Options:
Feature Example Code
Line color color='green'
Line style linestyle='--'
Marker type marker='o'
Background color fig.patch.set_facecolor('lightgray')
Grid display plt.grid(True, linestyle='--')
Small Code Example:
fig, ax = plt.subplots()
# Customize line color, style, and marker
ax.plot([1, 2, 3], [4, 5, 6], color='purple', linestyle='--', marker='o')
# Change background color
fig.patch.set_facecolor('lightyellow')
# Add grid
plt.grid(True, linestyle=':', color='blue')
plt.title("Customized Plot")
plt.show()
7. Plot: Adjust Line Colors and Styles
Customizing your plot makes it:
• Easier to understand
• More visually appealing
• Helpful when plotting multiple lines to differentiate
Basic Line Plot Syntax
plt.plot(x, y, style)
Where style can include:
• Color
• Line style
• Marker style
Line Color Options
You can specify the color in multiple ways:
Color Name Code Example
Blue 'b' plt.plot(x, y, 'b')
Green 'g' plt.plot(x, y, 'g')
Red 'r' plt.plot(x, y, 'r')
Black 'k' plt.plot(x, y, 'k')
Custom RGB color='#FF5733' Hex values
Line Style Options
Line Style Code Example
Solid Line '-' plt.plot(x, y, '-')
Dashed Line '--' plt.plot(x, y, '--')
Dotted Line ':' plt.plot(x, y, ':')
Line Style Code Example
Dash-dot Line '-.' plt.plot(x, y, '-.')
Marker Style Options
Marker Code Description
'o' Circle marker
's' Square marker
'D' Diamond marker
'^' Triangle marker
'*' Star marker
Simple Example with Colors and Styles
import matplotlib.pyplot as plt
x = [1, 2, 3, 4, 5]
y1 = [2, 4, 6, 8, 10]
y2 = [1, 2, 1, 2, 1]
plt.plot(x, y1, color='blue', linestyle='--', marker='o', label='Line 1')
plt.plot(x, y2, color='red', linestyle='-', marker='s', label='Line 2')
plt.title("Custom Line Styles and Colors")
plt.xlabel("X-Axis")
plt.ylabel("Y-Axis")
plt.legend()
plt.grid(True)
plt.show()
8. Plot: Adjust Axes Limits
By default, Matplotlib auto-scales the x and y axes based on your data. But sometimes you
want to manually control the axis range to:
• Focus on a specific part of the data
• Ensure consistency between multiple plots
• Zoom into trends or outliers
• Improve plot readability
How to Set Axes Limits
You can set limits using:
plt.xlim() and plt.ylim()
Set the minimum and maximum values for x and y axes.
ax.set_xlim() and ax.set_ylim()
Used in the object-oriented (OO) style with subplots.
Syntax
plt.xlim(min, max)
plt.ylim(min, max)
or:
ax.set_xlim([min, max])
ax.set_ylim([min, max])
Simple Example: Setting Axis Limits
import matplotlib.pyplot as plt
x = [1, 2, 3, 4, 5]
y = [10, 20, 25, 30, 35]
plt.plot(x, y, 'bo--')
plt.xlim(2, 4) # Focus only on x from 2 to 4
plt.ylim(15, 30) # Focus only on y from 15 to 30
plt.title("Axes Limits Example")
plt.xlabel("X-Axis")
plt.ylabel("Y-Axis")
plt.grid(True)
plt.show()
9. Customizing Axis
Multiple Subplots using subplot() in Matplotlib (for 10
Marks)
What are Subplots?
• Subplots mean creating multiple small plots inside a single figure (one window).
• Instead of drawing only one graph, you can display several related graphs
together.
• This is useful for comparing data, showing trends, or summarizing multiple
results at once.
• Each small plot has its own axes, labels, and title.
• Helps in efficient use of space and makes visual comparisons easier.
The subplot() Function
Matplotlib provides a simple function called plt.subplot() to create subplots easily.
Basic Syntax:
plt.subplot(nrows, ncols, index)
Parameter Meaning
nrows Number of rows in the grid
ncols Number of columns in the grid
index Position of the current plot (starts from 1)
Key Points:
• The figure is divided into a grid of rows × columns.
• Index tells where to place the plot in the grid (counted left to right, top to bottom).
• Each subplot() call creates one subplot at the specified location.
Simple Examples
Example 1: 2 Plots (Side-by-Side)
import matplotlib.pyplot as plt
# 1st plot
plt.subplot(1, 2, 1)
plt.plot([1, 2, 3], [4, 5, 6])
plt.title("Plot 1")
# 2nd plot
plt.subplot(1, 2, 2)
plt.plot([1, 2, 3], [6, 5, 4])
plt.title("Plot 2")
plt.show()
• Here, 1 row, 2 columns, plot 1 and plot 2.
Example 2: 4 Plots (2x2 Grid)
plt.subplot(2, 2, 1)
plt.plot([1, 2, 3], [3, 2, 1])
plt.title("Plot 1")
plt.subplot(2, 2, 2)
plt.plot([1, 2, 3], [1, 2, 3])
plt.title("Plot 2")
plt.subplot(2, 2, 3)
plt.plot([1, 2, 3], [2, 3, 1])
plt.title("Plot 3")
plt.subplot(2, 2, 4)
plt.plot([1, 2, 3], [3, 1, 2])
plt.title("Plot 4")
plt.tight_layout() # Adjusts layout to avoid overlap
plt.show()
• Here, 2 rows, 2 columns, plots 1 to 4.
Important Tips
• Starting index is always 1, not 0.
• Use plt.tight_layout() to automatically adjust spaces between plots.
• You can mix different kinds of plots inside subplots (like a line plot + bar plot
together).
• Subplots share the same figure window but different axes.
• Good practice: keep axes labeled for each subplot.
Difference Between subplot() and subplots()
• subplot() → For simple or few plots.
• subplots() → For complex layouts, it returns figure and axes objects for advanced
control.
Example using subplots():
fig, axes = plt.subplots(2, 2)
axes[0, 0].plot([1, 2, 3], [4, 5, 6])
axes[1, 1].plot([1, 2, 3], [6, 5, 4])
plt.show()
Seaborn
Seaborn is a Python data visualization library based on Matplotlib. It offers a high-level
interface to produce statistical graphics with ease and beauty.
Seaborn operates on the principle of mapping data variables to visual properties (like
position, color, or size) to reveal patterns and relationships.
It emphasizes statistical plotting, meaning it often includes built-in functionality to
compute and display statistical measures such as means, medians, or distributions
directly within the visualization.
Seaborn operates by using the structure of datasets (especially pandas DataFrames) to
map relationships between variables directly into plots.
It was developed to make complex visualization simple, especially when working with
Pandas DataFrames.
Features of Seaborn:
Feature Explanation
High-level API Short code for complex plots
Beautiful defaults Good-looking plots without extra effort
Works with Pandas Easily plots DataFrame data
Statistical analysis Supports regression, distribution, categorization
Thematic styling Built-in themes and color palettes
Why Use Seaborn Over Matplotlib?
Feature Matplotlib Seaborn
Code simplicity More verbose Cleaner and shorter
Statistical support Manual Built-in
Themes and style Basic Beautiful by default
Integration with data Needs more work Native with DataFrame
In short: Seaborn is built on top of Matplotlib but focuses on statistics, dataframes, and
aesthetics.
Real-Life Use Cases
• Analyzing sales data over time
• Visualizing customer behavior
• Understanding exam scores distribution
• Displaying gender-based spending
• Visualizing correlations between multiple variables
Easy syntax: Simplifies complex plots into few lines of code.
Built-in themes and color palettes: Improves visual quality without manual styling.
Automatic statistical aggregation: Makes it easy to summarize and display large data.
Flexible customizations: Allows fine control over every element of the plot if needed.
Integration with pandas: Direct plotting from structured datasets.
Basic Seaborn Syntax & Workflow
1. Import libraries
2. Load or prepare dataset
3. Choose the Seaborn function for the desired chart
4. Call the function with required arguments
5. Customize with labels, title, and style
6. Display the plot
Advantages of Seaborn
1. Concise syntax
2. Automatic handling of DataFrame
3. In-built datasets for practice
4. Statistical power built-in
5. Beautiful, professional-looking graphs
6. Excellent for quick exploratory data analysis
Seaborn Packages
Seaborn is organized into internal modules (packages) based on the type of visualization or
data analysis you need to perform.
Each Seaborn package targets specific data relationships, offering specialized plotting
functions.
Organized structure: Different packages for different visualization needs.
Automatic statistical support: Adds things like regression lines, confidence intervals.
Stylish default themes: Makes plots attractive without much effort.
3. Organization of Seaborn
Seaborn organizes its functionalities into packages based on:
Purpose Package Name
Relationship visualization seaborn.relational
Categorical data visualization seaborn.categorical
Distribution visualization seaborn.distributions
Regression analysis seaborn.regression
Matrix (grid) visualization seaborn.matrix
Plot styling and themes seaborn.themes
Each package contains specialized functions for targeted visualizations.
4.1 seaborn.relational
• Focus: Relationships between numerical variables (e.g., scatterplots, lineplots).
• Helps in finding correlations, clusters, or trends between features.
Functions:
• scatterplot()
• lineplot()
Small Example:
sns.scatterplot(x=[1, 2, 3], y=[4, 5, 6])
plt.show()
4.2 seaborn.categorical
• Focus: Comparisons among categories (e.g., barplots, boxplots, violinplots).
• Useful when data involves grouping or classifications.
Functions:
• barplot()
• boxplot()
• violinplot()
Small Example:
sns.barplot(x=["Cat1", "Cat2", "Cat3"], y=[7, 8, 5])
plt.show()
4.3 seaborn.distributions
• Focus: Understanding distribution of a single variable (e.g., histograms, KDE
plots).
• Helps detect outliers, skewness, modes, and spread.
Functions:
• histplot()
• kdeplot()
Small Example:
sns.histplot([2, 3, 4, 4, 5, 5, 6, 7])
plt.show()
4.4 seaborn.regression
• Focus: Predictive relationships and trend lines (e.g., regplot, lmplot).
• Adds linear regression lines or confidence intervals automatically.
Functions:
• regplot()
• lmplot()
Small Example:
sns.regplot(x=[1, 2, 3, 4], y=[2, 4, 5, 7])
plt.show()
4.5 seaborn.matrix
• Focus: Matrix visualizations like heatmaps and cluster maps.
• Useful for correlation analysis or similarity detection.
Functions:
• heatmap()
• clustermap()
Small Example:
import numpy as np
data = np.array([[1, 2], [3, 4]])
sns.heatmap(data, annot=True)
plt.show()
4.6 seaborn.themes
• Focus: Styling and formatting plots (e.g., grid style, font scale, background color).
• Important for making graphs publication-ready.
Functions:
• set_style()
• set_context()
• set_palette()
Small Example:
sns.set_style("whitegrid")
sns.lineplot(x=[1, 2, 3], y=[3, 2, 5])
plt.show()
Using Seaborn Along With Matplotlib
Why Combine Seaborn and Matplotlib?
Seaborn makes it easier to plot beautiful and statistical graphs, while Matplotlib gives
full control over plot elements like:
• Titles and labels
• Ticks and axis limits
• Fonts and colors
• Subplots and figure size
So, using both together helps you create professional, customized, and flexible plots.
Real-Life Example: Seaborn + Matplotlib
Let’s plot a simple scatterplot with Seaborn and customize it using Matplotlib functions.
import seaborn as sns
import matplotlib.pyplot as plt
# Load sample dataset
data = sns.load_dataset("tips")
# Set Seaborn theme
sns.set_theme(style="whitegrid")
# Plot with Seaborn
sns.scatterplot(x="total_bill", y="tip", hue="day", data=data)
# Now customize using Matplotlib
plt.title("Total Bill vs Tip by Day")
plt.xlabel("Total Bill (in $)")
plt.ylabel("Tip (in $)")
plt.xlim(0, 60) # Set x-axis limits
plt.ylim(0, 12) # Set y-axis limits
plt.legend(title="Day of Week")
plt.grid(True)
# Show plot
plt.show()
Histograms, KDE, and Density Plots
These are tools to visualize how data is distributed, especially for continuous numerical
data.
1. Histogram
A histogram shows the frequency (count) of data values within specified intervals called
bins.
• Helps in understanding how values are spread (e.g., exam scores, salaries)
• Tall bars = many values in that range
2. KDE (Kernel Density Estimation)
KDE is a smoothed version of the histogram that estimates the probability density
function of the data.
• Gives a smooth curve instead of bars
• Better for understanding continuous distribution trends
• Represents "how likely" a value is
Seaborn Functions for These
Plot Type Function Description
Histogram sns.histplot() Bar-based frequency visualization
Plot Type Function Description
KDE sns.kdeplot() Smooth probability distribution curve
Both sns.histplot(..., kde=True) Combine histogram and KDE
Very Simple Code Examples
Histogram Only:
import seaborn as sns
import matplotlib.pyplot as plt
data = sns.load_dataset("tips")
sns.histplot(data["total_bill"], bins=20, color="skyblue")
plt.title("Histogram of Total Bill")
plt.xlabel("Total Bill")
plt.ylabel("Frequency")
plt.show()
KDE Only:
sns.kdeplot(data["total_bill"], shade=True, color="green")
plt.title("KDE Plot of Total Bill")
plt.xlabel("Total Bill")
plt.ylabel("Density")
plt.show()
Histogram + KDE Together:
sns.histplot(data["total_bill"], kde=True, color="purple", bins=20)
plt.title("Histogram + KDE of Total Bill")
plt.xlabel("Total Bill")
plt.ylabel("Frequency / Density")
plt.show()
Uses:
• To identify skewness, peaks, and spread
• To understand normality of data
• To compare distributions of different datasets
Detailed Seaborn Plot Explanations + Code (Iris & Tips
Dataset)
1. Line Plot
• Shows trend or pattern over continuous variables like time, measurements, etc.
• Good for time series, progressions, or comparisons.
A line plot displays information as a series of data points connected by straight lines.
It is commonly used to visualize data trends over time or continuous variables.
In Seaborn, lineplot() can automatically handle aggregation and error bands.
It is ideal for analyzing relationships and detecting patterns or cycles.
Line plots are highly readable and great for time-series analysis.
Code (Iris dataset):
import seaborn as sns
import matplotlib.pyplot as plt
iris = sns.load_dataset('iris')
sns.lineplot(x='sepal_length', y='sepal_width', data=iris)
plt.show()
2. Scatter Plot
• Displays relationship between two continuous variables.
• Good for finding correlations, clusters, or outliers.
A scatter plot shows the relationship between two continuous variables.
Each point represents an observation with x and y values.
It is useful for identifying correlations, clusters, and outliers.
Seaborn's scatterplot() supports grouping using colors or styles.
It is a key tool in exploring relationships between variables.
Code (Iris dataset):
sns.scatterplot(x='petal_length', y='petal_width', hue='species', data=iris)
plt.show()
3. Box Plot
• Shows distribution of data based on five summary statistics: minimum, Q1,
median, Q3, and maximum.
• Detects outliers and variation between categories.
A box plot visualizes the distribution of data using five summary statistics: min, Q1,
median, Q3, and max.
It highlights the presence of outliers and the spread of data.
Seaborn's boxplot() function makes it easy to compare distributions across categories.
The box shape shows the interquartile range where 50% of data lies.
Box plots are excellent for comparing variations between groups.
Code (Tips dataset):
tips = sns.load_dataset('tips')
sns.boxplot(x='day', y='total_bill', data=tips)
plt.show()
4. Violin Plot
• Combines box plot and KDE (kernel density estimate).
• Shows distribution shape + summary statistics together.
A violin plot combines a box plot and a kernel density plot.
It shows both the distribution shape and summary statistics of the data.
Seaborn's violinplot() allows splitting violins by categories easily.
The width of the violin represents the data density at different values.
It is more informative than boxplots when the distribution is multimodal.
Code (Tips dataset):
sns.violinplot(x='day', y='total_bill', data=tips)
plt.show()
5. Heatmap
• Displays matrix data using colors.
• Commonly used for correlations or feature similarities.
A heatmap displays matrix data as a color-coded grid.
It is commonly used to visualize correlations, tables, or 2D data distributions.
Seaborn's heatmap() function supports annotations and color gradients.
Color intensity represents the magnitude of the values.
Heatmaps make it easy to spot patterns, clusters, or relationships.
Code (Tips dataset):
corr = tips.corr()
sns.heatmap(corr, cmap='coolwarm')
plt.show()
6. Pair Plot
• Plots scatterplots for all feature pairs and histograms for individual features.
• Excellent for EDA (Exploratory Data Analysis).
A pair plot creates scatterplots between all pairs of variables and histograms on the
diagonal.
It provides a quick overview of how variables relate to each other.
Seaborn's pairplot() can color-code points by category labels.
It is useful in exploratory data analysis (EDA) for small datasets.
Pair plots reveal hidden trends, clusters, and relationships visually.
Code (Iris dataset):
sns.pairplot(iris, hue='species')
plt.show()
7. Count Plot
• Counts occurrences of categorical variables.
• Similar to a bar chart, but specifically shows frequencies.
A count plot shows the number of observations for each category.
It is a bar plot where height represents the count of data points.
Seaborn’s countplot() simplifies categorical data visualization.
It helps understand the balance or imbalance in the dataset.
Count plots are best for visualizing frequency distributions.
Code (Tips dataset):
sns.countplot(x='day', data=tips)
plt.show()
8. Displot (Distribution Plot)
• Shows histogram of a variable with an optional KDE overlay.
• Helps understand distribution, skewness, and spread.
A displot shows the distribution of a single variable.
It combines histogram and optional KDE (density curve) in one figure.
Seaborn's displot() supports faceting across multiple subsets.
It is useful for checking skewness, modality, and spread.
Displots are helpful for understanding how a variable behaves.
Code (Tips dataset):
sns.displot(tips['total_bill'], kde=True)
plt.show()
9. Joint Plot
• Combines scatterplot + histograms for two variables.
• Shows relationship + distributions in a single view.
A joint plot combines scatterplots and histograms in a single figure.
It shows both the bivariate relationship and univariate distributions.
Seaborn’s jointplot() can display regression lines, KDEs, or hex bins.
It is highly effective for studying two-variable relationships.
Joint plots offer detailed insights into data structures.
Code (Tips dataset):
sns.jointplot(x='total_bill', y='tip', data=tips, kind='scatter')
plt.show()
10. Faceted Histogram (FacetGrid)
• Creates multiple histograms split by a categorical variable.
• Useful to compare distributions across groups.
Faceted histograms show multiple histograms divided by categories.
Seaborn’s FacetGrid allows splitting a plot into subplots based on a variable.
Each facet shows a distribution for a specific subgroup.
It is useful for comparing how distributions differ across groups.
Faceted plots make multivariate data easier to interpret.
Code (Tips dataset):
g = sns.FacetGrid(tips, col="sex")
g.map(plt.hist, "total_bill")
plt.show()
12. Bar Plot
• Represents summary statistics (mean by default) for each category.
• Shows comparisons between different groups.
A bar plot shows summary statistics (mean by default) for each category.
Each bar's height indicates a value, often with confidence intervals.
Seaborn’s barplot() computes aggregations automatically.
It is commonly used to compare averages between groups.
Bar plots provide a clear and simple way to visualize group comparisons.
Code (Tips dataset):
sns.barplot(x='day', y='total_bill', data=tips)
plt.show()
3D Graphs in Python Using Matplotlib
Data visualization plays a critical role in understanding and interpreting data, especially
when dealing with multiple variables.
While 2D graphs (like line plots and bar charts) are widely used, they are limited to
visualizing relationships between two variables.
3D plots, on the other hand, allow us to graph data that involves three dimensions —
commonly referred to as X, Y, and Z axes.
Creating 3D graphs enables us to:
• Visualize mathematical functions in 3D space.
• Represent data that naturally exists in three dimensions (e.g., terrain maps, physical
simulations).
• Observe how two input variables affect an output (useful in machine learning,
physics, finance, etc.).
Matplotlib and the mplot3d Toolkit
Python provides the matplotlib library for plotting, and within it, the mplot3d toolkit is used
to generate 3D plots.
from mpl_toolkits.mplot3d import Axes3D # Enables 3D plotting
When creating a 3D plot, you specify:
• A figure using plt.figure()
• A 3D subplot using fig.add_subplot(..., projection='3d')
• Then plot using functions like plot3D(), scatter3D(), or plot_surface().
Types of 3D Plots
Plot Type Description
Line Plot Draws a line in 3D using X, Y, and Z points
Scatter Plot Plots individual points in 3D space
Surface Plot Shows a 3D surface defined by a function of X and Y
Wireframe Plot Similar to surface plot but uses wire-like grid structure
Simple 3D Plot Example
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
# Create data
z = np.linspace(0, 10, 100)
x = np.sin(z)
y = np.cos(z)
# Create figure and 3D axis
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
# Plot the 3D line
ax.plot3D(x, y, z, color='green')
# Show plot
plt.title("Simple 3D Line Plot")
plt.show()
Why Use 3D Graphs?
• More Insight: Helps to observe complex relationships.
• Intuitive Visuals: Makes it easier to interpret functions or datasets with 3 variables.
• Scientific and Mathematical Applications: Ideal for simulations, surface
modeling, or optimization problems.
2⃣ 3D Scatter Plot
Used to display individual data points in 3D. Great for showing distributions or clusters.
x = np.random.rand(50)
y = np.random.rand(50)
z = np.random.rand(50)
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter3D(x, y, z, color='red')
plt.title("3D Scatter Plot")
plt.show()
3⃣ 3D Surface Plot
Best for visualizing functions of two variables (z = f(x, y)). It shows the surface behavior.
x = np.linspace(-5, 5, 50)
y = np.linspace(-5, 5, 50)
x, y = np.meshgrid(x, y)
z = np.sin(np.sqrt(x**2 + y**2))
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(x, y, z, cmap='viridis')
plt.title("3D Surface Plot")
plt.show()
4⃣ 3D Wireframe Plot
Similar to surface plots but only shows the edges. Useful for a mesh/grid visualization of
data.
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.plot_wireframe(x, y, z, color='black')
plt.title("3D Wireframe Plot")
plt.show()
5⃣ 3D Contour Plot
Used to represent 3D surfaces with contour lines — like elevation maps.
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.contour3D(x, y, z, 50, cmap='coolwarm')
plt.title("3D Contour Plot")
plt.show()
Time Series Analysis with Pandas – Step-by-Step
What is Time Series Analysis?
Time Series Analysis is the process of examining data points collected or recorded over
time to extract meaningful insights. It is used in stock prices, weather patterns, sales
forecasting, and many other fields.
In Python, Pandas makes it very easy to handle time-indexed data due to its built-in
support for date-time indexing, resampling, rolling statistics, and data visualization.
Time Series Analysis is the process of studying datasets that are collected or recorded at
successive points in time, usually at equally spaced intervals (like daily, monthly, yearly).
The goal is to identify patterns (like trends, seasonality, and cycles), understand the
underlying structure of the data, and often to make forecasts about future values.
Unlike regular data analysis, time series analysis explicitly accounts for the temporal order
of the data, recognizing that past values can influence future values.
1⃣ Import Libraries and Load Data
You start by importing necessary libraries like pandas, and often matplotlib for plotting.
Then, you load the time-based dataset.
import pandas as pd
import matplotlib.pyplot as plt
2⃣ Parse Dates and Set Date Index (creating a time series dataset)
Make sure your data has a datetime column. Convert it using pd.to_datetime() and set it as
the index of the DataFrame. This is crucial for time-based operations. Convert raw date
strings to datetime objects and make them the index.
# Sample data
dates = pd.date_range(start='2024-01-01', periods=6, freq='D')
values = [100, 102, 101, 98, 105, 110]
df = pd.DataFrame({'Date': dates, 'Value': values})
df.set_index('Date', inplace=True) # Setting the date as index
print(df)
3⃣ Visualize the Time Series (time series plotting)
Plot the time series using .plot(). This helps identify trends, seasonality, and irregularities
visually. Understand the behavior of data over time.
df.plot(title='Time Series Data')
plt.xlabel('Date')
plt.ylabel('Value')
plt.show()
4⃣ Resample the Data (handling date time objects)
Sometimes, your data may be too granular (daily/hourly). You can change the frequency
(e.g., to monthly) using .resample(). Aggregate data to a new time frequency.
monthly = df.resample('M').mean() # Resample by Month and take mean
print(monthly)
5⃣ resampling time series data
Resampling is the process of changing the frequency of your time series data. You can
downsample (reduce frequency) or upsample (increase frequency) the data using
.resample().
data = {'Date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04'],
'Value': [100, 200, 300, 400]}
df = pd.DataFrame(data)
df['Date'] = pd.to_datetime(df['Date'])
df.set_index('Date', inplace=True)
df_resampled = df.resample('M').mean()
print(df_resampled)
6⃣ Time-based Filtering
You can filter data by specifying time ranges. This is useful for comparing specific time
periods. Analyze specific time windows.
print(df['2024-01-02':'2024-01-04']) # Slice data by date range
7⃣ Handling Missing Dates
If your data has missing days or timestamps, you can fill them using asfreq() and fillna().
Ensure consistent date frequency.
df_filled = df.asfreq('D').fillna(method='ffill')
8⃣ Detect Trends or Seasonality (Basic Visual)
Although Pandas alone doesn’t do full statistical decomposition, you can observe trends
and seasonal effects visually using plots.
Real-Life Applications
• Stock price analysis over months or years
• Weather forecasting using historical temperatures
• Sales predictions based on monthly trends
• Website traffic trends and seasonality
Difference Between Matplotlib and Seaborn