KEMBAR78
DAP 5 Module | PDF | Histogram | Errors And Residuals
0% found this document useful (0 votes)
31 views68 pages

DAP 5 Module

This document provides an overview of data visualization using NumPy arrays and Matplotlib in Python. It explains the importance of data visualization, the benefits of using NumPy for efficient data handling, and how to create various types of plots such as line, scatter, bar, and histogram using Matplotlib. Additionally, it covers the structure and main packages of Matplotlib, emphasizing its integration with NumPy and other libraries for effective data visualization.

Uploaded by

sharadhi080808
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views68 pages

DAP 5 Module

This document provides an overview of data visualization using NumPy arrays and Matplotlib in Python. It explains the importance of data visualization, the benefits of using NumPy for efficient data handling, and how to create various types of plots such as line, scatter, bar, and histogram using Matplotlib. Additionally, it covers the structure and main packages of Matplotlib, emphasizing its integration with NumPy and other libraries for effective data visualization.

Uploaded by

sharadhi080808
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 68

MODULE – 5

Data Visualization with NumPy Arrays


1. What is Data Visualization?

Data Visualization is the graphical representation of data and information.


It helps us to:

• Understand trends, patterns, and outliers

• Communicate complex data in a simple and meaningful way

• Make data-driven decisions more effectively

2. What is NumPy?

NumPy stands for Numerical Python. It is a core scientific computing library in Python.
NumPy provides:

• Support for large, multi-dimensional arrays and matrices

• Tools to perform mathematical and logical operations on arrays

• Fast performance compared to regular Python lists

3. Why Use NumPy in Data Visualization?

Using NumPy is extremely beneficial when preparing data for visualization:

Feature Benefit

Efficient Array
Fast computation and memory-saving
Handling

Mathematical
Easy calculation of values for plotting
Operations

Compatibility Works smoothly with libraries like matplotlib, seaborn, pandas

Can generate sample data (e.g., sine waves, random data) for
Simulation
practice or models

4. How NumPy Supports Data Visualization

NumPy makes it easy to:


• Create regular intervals of values (e.g., 0 to 10 with 100 steps)

• Perform mathematical operations (e.g., sin(x), log(x), etc.)

• Create data for function plots, trends, and distributions

• Clean, manipulate, or reshape data before plotting

5. Integration with Visualization Libraries

NumPy arrays are commonly used with:

• Matplotlib: For basic plots (line, bar, scatter, etc.)

• Seaborn: For statistical data visualization (with Pandas/NumPy)

• Plotly: For interactive charts

• Pandas: For labeled data structures (DataFrames use NumPy internally)

6. Common Use Cases of NumPy Arrays in Visualization

Use Case Example Plot

Plotting mathematical functions Line plot of sin(x)

Data distribution Histogram using random data

Comparing categories Bar charts

Time-series or signal data Line plots from sensor data

Matrix or image display 2D heatmaps or image plots

7. Very Small Example Code – Line Plot with NumPy

import numpy as np

import matplotlib.pyplot as plt

x = np.linspace(0, 10, 100) # 100 values from 0 to 10

y = np.sin(x) # Calculate sine for each x

plt.plot(x, y) # Line graph

plt.title("Sine Wave")

plt.xlabel("X-axis")
plt.ylabel("sin(x)")

plt.show()

8. Advantages of Using NumPy for Visualization

• Speed: Much faster than native Python lists

• Simplicity: Easy syntax for math operations

• Flexibility: Works with multiple libraries

• Real-world ready: Can be used in scientific, financial, and engineering


visualizations

NumPy Data Visualization


Why Use NumPy for Data Visualization?

NumPy makes it easy to generate and process numerical data for visualization. Here’s
why:

Feature Benefit

Fast numerical computations Efficient handling of large datasets

Easy function operations Perform math (sin, cos, log, etc.) directly

Integration with libraries Works perfectly with matplotlib, seaborn

Real-world modeling Simulate waves, signals, and distributions

How It Works

You use NumPy to:

1. Generate data (e.g., an array of values from 0 to 10)

2. Apply functions (e.g., sine, square, exponential)

3. Pass data to visualization libraries like matplotlib for plotting

NumPy Data Visualization (with Car Data)


First, create basic car data using NumPy:

import numpy as np

import matplotlib.pyplot as plt

# Sample data

brands = np.array(['Toyota', 'Honda', 'Ford', 'BMW', 'Audi'])

weights = np.array([1200, 1300, 1500, 1800, 1700]) # in kg

prices = np.array([20000, 22000, 25000, 35000, 40000]) # in dollars

1. Line Plot

• Line plots connect data points with straight lines.

• Useful for showing trends across ordered data (e.g., increasing weight or price).

• Easy to spot rises, falls, or patterns.

plt.plot(brands, weights, marker='o')

plt.title('Car Brand vs Weight')

plt.xlabel('Brand')

plt.ylabel('Weight (kg)')

plt.show()

2. Scatter Plot

• Scatter plots show the relationship between two numeric variables.

• Each point represents one observation (like one car).

• Good for spotting correlations or clusters.

Code:

plt.scatter(weights, prices)

plt.title('Weight vs Price of Cars')

plt.xlabel('Weight (kg)')
plt.ylabel('Price ($)')

plt.show()

3. Histogram

• A histogram shows how data is distributed into intervals (bins).

• Useful for checking data spread like car weights.

• Helps detect skewness, outliers, and frequency.

Code:

plt.hist(weights, bins=5, color='skyblue', edgecolor='black')

plt.title('Distribution of Car Weights')

plt.xlabel('Weight (kg)')

plt.ylabel('Frequency')

plt.show()

4. Bar Plot

• A bar plot shows quantities of different categories.

• Useful for comparing car weights or prices across brands.

• Heights of bars represent values.

Code:

plt.bar(brands, prices, color='orange')

plt.title('Car Brand vs Price')

plt.xlabel('Brand')

plt.ylabel('Price ($)')

plt.show()

6. Horizontal Bar Plot (Barh)


• A horizontal bar plot is like a bar plot but flipped sideways.

• Useful when category names (brands) are long.

• Makes comparisons easier when many categories exist.

Code:

plt.barh(brands, weights, color='green')

plt.title('Car Brand vs Weight (Horizontal)')

plt.xlabel('Weight (kg)')

plt.ylabel('Brand')

plt.show()

7. Box Plot

• A box plot shows the spread of data with min, max, median, and quartiles.

• It helps detect outliers and variability.

• Perfect for summarizing distributions like car weights.

Code:

plt.boxplot(weights)

plt.title('Box Plot of Car Weights')

plt.ylabel('Weight (kg)')

Matplotlib
Matplotlib is one of the most widely used data visualization libraries in Python. It is mainly
used to create static, interactive, and animated plots. It provides a variety of tools for
creating different types of graphs like line plots, bar charts, scatter plots, histograms,
and more.

Features of Matplotlib
• Wide Range of Plots: Supports line plots, scatter plots, bar charts, pie charts,
histograms, etc.

• Customization: High level of control over every aspect of a figure (size, colors,
fonts, axes).

• Integration: Works well with NumPy, Pandas, and integrates with GUI toolkits like
Tkinter, wxPython.

• Exporting: Graphs can be exported in various formats like PNG, PDF, SVG, EPS.

• Interactive Plots: Supports zooming, panning, and updating plots dynamically.

Basic Structure of Matplotlib

Matplotlib works on an object-oriented approach. The two main interfaces are:

1. Pyplot API (matplotlib.pyplot) — Easy-to-use interface like MATLAB.

2. Object-oriented API — Gives full control of plot elements.

The most commonly used is the Pyplot interface.

import matplotlib.pyplot as plt

Important Concepts

1. Figure and Axes

• Figure: The whole window where plots appear (like a canvas).

• Axes: A part of the figure where the data is plotted (could be multiple in one figure).

2. Plotting Basic Graphs

A simple line plot:

import matplotlib.pyplot as plt

x = [1, 2, 3, 4]

y = [10, 20, 25, 30]

plt.plot(x, y)

plt.title("Simple Line Plot")


plt.xlabel("X-axis")

plt.ylabel("Y-axis")

plt.show()

Types of Plots in Matplotlib

1. Line Plot

Useful for showing trends over time.

x = [0, 1, 2, 3]

y = [0, 1, 4, 9]

plt.plot(x, y, color='red', linestyle='--', marker='o')

plt.title("Line Plot Example")

plt.show()

2. Bar Chart

Good for comparing categories.

categories = ['A', 'B', 'C']

values = [10, 20, 15]

plt.bar(categories, values, color='green')

plt.title("Bar Chart Example")

plt.show()

3. Scatter Plot

Used for correlation between two variables.

x = [5, 7, 8, 7, 2, 17]

y = [99, 86, 87, 88, 100, 86]


plt.scatter(x, y, color='blue')

plt.title("Scatter Plot Example")

plt.show()

4. Histogram

Used to see the distribution of data.

import numpy as np

data = np.random.randn(1000)

plt.hist(data, bins=30, color='purple')

plt.title("Histogram Example")

plt.show()

Customization in Matplotlib

You can control color, line styles, markers, labels, legend, grids, and more.

x = [1, 2, 3, 4, 5]

y = [2, 3, 5, 7, 11]

plt.plot(x, y, color='magenta', marker='s', linestyle=':')

plt.title('Customized Plot', fontsize=14)

plt.xlabel('X-axis', fontsize=12)

plt.ylabel('Y-axis', fontsize=12)

plt.grid(True)

plt.legend(['Data Line'])

plt.show()

Subplots in Matplotlib

You can create multiple plots in the same figure using subplot().
# 2 rows, 1 column, 1st plot

plt.subplot(2, 1, 1)

plt.plot([1,2,3], [1,4,9])

# 2 rows, 1 column, 2nd plot

plt.subplot(2, 1, 2)

plt.plot([1,2,3], [1,2,3])

plt.show()

Saving Plots

You can save plots to files.

plt.plot([1,2,3], [5,7,4])

plt.title("Save Example")

plt.savefig('myplot.png')

Why use Matplotlib?

• Professional Quality: You can create publication-quality figures.

• Flexible and Powerful: Suitable for simple plots and complex visualizations.

• Community Support: Extensive documentation and large user community.

• Cross-platform: Works on Windows, Mac, Linux.

4. How Does It Work?

Matplotlib follows these general steps:

1. Import the library

2. Prepare the data (can be lists or arrays)

3. Plot the data using functions like plot(), bar(), scatter()

4. Customize the plot with titles, labels, grid, etc.

5. Display the plot using plt.show()


8. Advantages of Using Matplotlib

• Works with other libraries like NumPy, Pandas, and Seaborn

• Highly customizable: you can change colors, styles, legends, labels, and more

• Supports saving plots as images or PDFs

• Used by students, researchers, data scientists, and developers

Matplotlib Packages in Python


Matplotlib is a powerful plotting library for the Python programming language.
It provides a full set of functions for creating static, animated, and interactive
visualizations.

It has since become the default choice for plotting and graphing in data science,
machine learning, and scientific computing.

Visualization is one of the most important steps in any data-driven project. Without clear
graphs, data analysis can become confusing and incomplete.

It also integrates seamlessly with other libraries like NumPy, Pandas, and SciPy.

Why Study Matplotlib Packages?

Matplotlib is made up of many small sub-packages working together.


Understanding these packages helps to:

• Customize and control plots deeply.

• Create professional-quality graphs.

• Manage complex figures and multiple plots.

Main Packages of Matplotlib

Matplotlib is modular.
Here are the important packages and modules you must know:
Package/Module Description

matplotlib.pyplot Simplified plotting functions (main interface)

matplotlib.figure Manages the figure window (the canvas)

matplotlib.axes Handles the plotting area (x-axis, y-axis)

matplotlib.artist Base for all visible elements like lines, texts

matplotlib.backend Deals with rendering output (screen, file)

matplotlib.animation Creates animated plots

matplotlib.gridspec Controls layout of subplots

matplotlib.style Predefined styles to change appearance

1. matplotlib.pyplot

• Most commonly used module.

• Contains functions like plot(), show(), xlabel(), ylabel(), title(), etc.

• It is based on a state machine — calls affect the current figure and axes.

Small Code Example

import matplotlib.pyplot as plt

x = [1, 2, 3, 4]

y = [2, 4, 6, 8]

plt.plot(x, y)

plt.title("Basic Line Plot")

plt.xlabel("X-axis")

plt.ylabel("Y-axis")

plt.show()

2. matplotlib.figure
• A Figure is the overall window where plots appear.

• You can have multiple Axes (plots) inside one Figure.

Small Code Example

from matplotlib.figure import Figure

from matplotlib.backends.backend_agg import FigureCanvasAgg as FigureCanvas

fig = Figure()

canvas = FigureCanvas(fig) # Attach figure to a canvas

ax = fig.add_subplot(111) # 1x1 grid, 1st subplot

ax.plot([1, 2], [3, 4])

fig.savefig('figure1.png') # Save figure to file

3. matplotlib.axes

• Axes represent the actual plot area (with X-axis and Y-axis).

• You can add multiple Axes inside a single Figure.

Small Code Example

import matplotlib.pyplot as plt

fig, ax = plt.subplots() # Create a figure and a single axes

ax.plot([1, 2, 3, 4], [1, 4, 2, 3])

ax.set_title("Plot using Axes")

plt.show()

4. matplotlib.artist

• Everything that appears on a figure (lines, text, legends, etc.) is an Artist.

• Even the figure, axes, and titles are subclasses of Artist.

Small Code Example

import matplotlib.pyplot as plt


fig, ax = plt.subplots()

line = ax.plot([1, 2, 3], [1, 2, 3])

print(type(line)) # <class 'list'>, containing Artist elements

5. matplotlib.backend

• Matplotlib supports many backends — these determine how the plots are
displayed (screen, saved as file, web applications, etc.).

• Examples: TkAgg, Agg, PDF, SVG, etc.

Small Note

# You usually don't need to set backend manually

# But you can check the current backend:

import matplotlib

print(matplotlib.get_backend())

6. matplotlib.animation

• Allows you to animate plots.

• Useful for simulations, dynamic updates in real-time graphs.

Small Code Example

import matplotlib.pyplot as plt

import matplotlib.animation as animation

import numpy as np

fig, ax = plt.subplots()

x = np.arange(0, 2*np.pi, 0.01)

line, = ax.plot(x, np.sin(x))

def animate(i):

ani = animation.FuncAnimation(fig, animate, interval=100)


plt.show()

7. matplotlib.gridspec

• Used to organize multiple subplots in complex layouts.

• More flexible than subplot().

Small Code Example

import matplotlib.pyplot as plt

import matplotlib.gridspec as gridspec

fig = plt.figure()

gs = gridspec.GridSpec(2, 2)

ax1 = fig.add_subplot(gs[0, 0])

ax2 = fig.add_subplot(gs[0, 1])

ax3 = fig.add_subplot(gs[1, :])

ax1.set_title('Top Left')

ax2.set_title('Top Right')

ax3.set_title('Bottom Row')

plt.tight_layout()

plt.show()

8. matplotlib.style

• Provides predefined styles to quickly change the appearance of plots.

Small Code Example

import matplotlib.pyplot as plt

plt.style.use('ggplot') # Apply ggplot style

plt.plot([1, 2, 3], [2, 4, 6])

plt.title("Styled Plot")
plt.show()

You can view all available styles:

print(plt.style.available)

Main Parts of a Matplotlib Graph


1. Figure

• The entire window or canvas where everything is drawn.

• Think of it as a blank paper where all plots, titles, and labels go.

fig = plt.figure()

2. Axes

• The actual plot area inside the figure where data is drawn.

• It contains:

o X-axis and Y-axis

o Titles

o Ticks

o Lines, bars, etc.

ax = fig.add_subplot(1, 1, 1)

fig, ax = plt.subplots()

3. X-Axis and Y-Axis

• The two coordinate lines that define the horizontal and vertical scales.

• You can label them using:

plt.xlabel("X-axis Label")

plt.ylabel("Y-axis Label")

4. Plot (Data/Lines/Bars)

• The visual representation of your data — like a line, bar, or scatter dots.
• Created using commands like:

o plt.plot() for line graph

o plt.bar() for bar chart

o plt.scatter() for dot plots

5. Title

• The heading of the graph, explaining what the graph shows.

plt.title("Your Graph Title")

6. Ticks and Tick Labels

• Ticks are the small lines or markers along the axes.

• Tick labels are the numbers or names (like 1, 2, 3 or Jan, Feb, Mar).

• These helps read the values on the graph easily.

7. Legend

• A box that explains what each line, color, or symbol represents (especially in multi-
line plots).

plt.legend(["Line A", "Line B"])

8. Grid

• Background lines that make it easier to read the plot.

plt.grid(True)

Object-Oriented Interface for Plotting Graphs in


Matplotlib
In Matplotlib, there are two ways to create plots:

• State-based interface (using pyplot) — simpler, like MATLAB.

• Object-Oriented (OO) interface — more flexible and powerful, especially for


complex figures.
The Object-Oriented approach means you explicitly create Figure and Axes objects and
then call methods on these objects to create and customize the plots.
It gives better control when working with multiple plots, layouts, and customizations.

Steps in Object-Oriented Plotting

1. Create a Figure object using plt.figure() or plt.subplots().

2. Create one or more Axes (subplots) inside the Figure.

3. Use methods like .plot(), .set_title(), .set_xlabel(), .set_ylabel() on the Axes object.

Small Code Example

import matplotlib.pyplot as plt

# Step 1: Create figure and axes

fig, ax = plt.subplots()

# Step 2: Plot data on the axes

ax.plot([1, 2, 3, 4], [10, 20, 25, 30])

# Step 3: Set labels and title

ax.set_xlabel('X Axis')

ax.set_ylabel('Y Axis')

ax.set_title('Object-Oriented Plot')

# Step 4: Display the figure

plt.show()

Here, fig is the Figure and ax is the Axes object.

Advantages of OO Interface

• Useful for complex layouts with multiple plots.

• More readable and modular code.


• Easier to customize each part of the plot.

• Better when embedding Matplotlib plots into applications (Tkinter, PyQt, etc.)

Getting and Setting Values in Matplotlib

What does "Setting" and "Getting" Mean?

• Setting values = Changing properties of the plot, such as:

o Plot Title

o Axis Labels (X, Y)

o Axis Limits (range of X and Y)

o Tick Marks (positions and labels)

o Line Colors, Styles, Widths

• Getting values = Reading or checking the current settings of these elements.

o Useful if you want to modify the graph programmatically or check its current
state.

Using the pyplot Interface (plt)

In pyplot, you directly call functions on plt (no need to create figure or axes manually).

It automatically keeps track of the "current" figure and axes for you.
This is a simpler and more beginner-friendly way — perfect for quick plots.

Common Set and Get Methods in plt Interface

Element Set Function Get Function

Plot Title plt.title("My Title") plt.gca().get_title()

X-axis Label plt.xlabel("X Axis") plt.gca().get_xlabel()


Element Set Function Get Function

Y-axis Label plt.ylabel("Y Axis") plt.gca().get_ylabel()

X-axis Limits plt.xlim(min, max) plt.gca().get_xlim()

Y-axis Limits plt.ylim(min, max) plt.gca().get_ylim()

Tick Positions plt.xticks([1,2,3]) plt.gca().get_xticks()

Line Color Set inside plot() via color='red' line.get_color()

Here, gca() means Get Current Axes — it allows us to access axes properties even
when using plt style.

Full Code Example with plt Interface

import matplotlib.pyplot as plt

# Data

x = [1, 2, 3, 4]

y = [10, 20, 25, 30]

# Plotting the graph

line, = plt.plot(x, y, color='green') # Setting line color at plot time

# Setting values

plt.title("My Sample Plot")

plt.xlabel("X-Axis")

plt.ylabel("Y-Axis")

plt.xlim(0, 5)

plt.ylim(0, 35)

# Getting values using plt.gca() (get current axes)

print("Title:", plt.gca().get_title())

print("X-label:", plt.gca().get_xlabel())
print("Y-label:", plt.gca().get_ylabel())

print("X-limits:", plt.gca().get_xlim())

print("Y-limits:", plt.gca().get_ylim())

print("Line color:", line.get_color())

# Show the plot

plt.show()

How plt Interface is Different from OO Interface

Object-Oriented (fig, ax) pyplot Interface (plt)

You manually create Figure and Axes Automatically creates current figure and axes

More flexible for complex plots Simpler for quick and small plots

Example: ax.set_title() Example: plt.title()

Patches in Matplotlib
Patches are basic 2D shapes provided by Matplotlib that you can draw on your plots.
These shapes include:

• Rectangles

• Circles

• Arrows

• Ellipses

• Polygons

• Wedges, etc.
Patches are part of the matplotlib.patches module.

Why Use Patches?

Feature Use

Add custom shapes Rectangles, circles, etc. to highlight or annotate

Improve visual appeal Colored backgrounds, zones, highlights

Add interactivity Clickable or dynamic shapes

Create custom legends Using custom patch shapes

How to Use Patches?

1. Import the patch class you want

from matplotlib.patches import Rectangle, Circle


2. Create a figure and axes

fig, ax = plt.subplots()

3. Create a patch object and add it

rect = Rectangle((1, 1), width=2, height=3, color='skyblue')

ax.add_patch(rect)

4. Display the plot

plt.xlim(0, 5)

plt.ylim(0, 5)

plt.show()

Common Patch Types and Classes

Shape Patch Class

Rectangle Rectangle((x, y), w, h)

Circle Circle((x, y), radius)

Ellipse Ellipse((x, y), w, h)

Polygon Polygon([(x1,y1),(x2,y2)...])

Arrow FancyArrow(x, y, dx, dy)

Wedge Wedge(center, r, θ1, θ2)

Plotting
A plot is a visual representation of data using shapes like lines, bars, dots, etc., to help us
understand patterns, relationships, and trends.

Why Use Plots?

• To visualize large data in an understandable way

• To compare data

• To detect trends, patterns, or outliers


• To support data-driven decisions

Types of Plots in Matplotlib (with Simple Explanations)

Plot Type Use Case Function

Line Plot Show trends or time series plt.plot()

Bar Plot Compare quantities plt.bar()

Histogram Show distribution of data plt.hist()

Scatter Plot Show relation between 2 variables plt.scatter()

Pie Chart Show parts of a whole plt.pie()

Box Plot Show statistical summary plt.boxplot()

Area Plot Similar to line plot, with filled area plt.fill_between()

Stack Plot Cumulative data over time plt.stackplot()

Heatmap (via Seaborn) Show data intensity in grid sns.heatmap()

1. Line Plot

• A line plot connects individual data points with straight lines.

• It is mostly used to show trends over time (e.g., sales over months).

• Good for continuous data where points are related.

• X-axis usually shows time or sequence; Y-axis shows values.

• It is the most basic and widely used type of plot.

Small Code Example:

import matplotlib.pyplot as plt

x = [1, 2, 3, 4]

y = [10, 20, 25, 30]

plt.plot(x, y)
plt.title("Line Plot")

plt.show()

2. Bar Plot

• A bar plot displays data with rectangular bars.

• Bar height represents the value; useful for categorical data.

• Can be vertical or horizontal bars.

• Shows comparisons between different groups.

• Great for discrete data like survey results or sales by category.

Small Code Example:

categories = ['A', 'B', 'C']

values = [5, 7, 3]

plt.bar(categories, values)

plt.title("Bar Plot")

plt.show()

3. Histogram

• A histogram shows the distribution of a dataset.

• It groups data into bins (intervals) and plots how many values fall into each bin.

• Useful for understanding the spread and shape of data.

• Helps identify patterns like normal distribution, skewness, etc.

• Different from bar plots — histograms are for continuous data.

Small Code Example:

data = [1, 2, 2, 3, 3, 3, 4, 5, 5]

plt.hist(data, bins=5)

plt.title("Histogram")
plt.show()

4. Scatter Plot

• A scatter plot shows the relationship between two variables.

• Each point represents one observation with x and y coordinates.

• Useful to see correlations, trends, or clusters.

• No lines connecting points — just dots.

• Often used in regression analysis and machine learning.

Small Code Example:

import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5]

y = [2, 4, 1, 3, 5]

plt.scatter(x, y)

plt.title("Scatter Plot")

plt.show()

5. Pie Chart

• A pie chart shows parts of a whole as slices of a circle.

• Good for displaying percentage or proportional data.

• Each slice size is proportional to the quantity it represents.

• Useful when total = 100% and parts are clearly divided.

• Too many categories can make pie charts confusing.

Small Code Example:

sizes = [30, 40, 20, 10]

labels = ['A', 'B', 'C', 'D']

plt.pie(sizes, labels=labels, autopct='%1.1f%%')


plt.title("Pie Chart")

plt.show()

6. Box Plot (Box-and-Whisker Plot)

• A box plot summarizes a dataset's minimum, Q1, median, Q3, and maximum.

• Helps to visualize spread, central value, and outliers.

• The box shows the interquartile range (IQR).

• The line inside the box is the median.

• Useful in statistical analysis and comparing distributions.

Small Code Example:

data = [7, 15, 13, 18, 21, 25, 30, 10, 5]

plt.boxplot(data)

plt.title("Box Plot")

plt.show()

7. Area Plot

• An area plot is like a line plot, but the area under the line is filled with color.

• Shows quantity over time or different groups together.

• Useful for stacked trends (e.g., total sales of products).

• Helps to see cumulative changes clearly.

• Each colored area can represent a category.

Small Code Example:

import matplotlib.pyplot as plt

x = [1, 2, 3, 4]

y = [3, 5, 2, 8]

plt.fill_between(x, y)
plt.title("Area Plot")

plt.show()

8. Stack Plot

• A stack plot is a special type of area plot.

• Multiple datasets are stacked on top of each other.

• Shows cumulative changes in multiple variables over time.

• Useful for composition over time (e.g., population, sales growth).

• Each layer represents a different category.

Small Code Example:

days = [1, 2, 3, 4]

apples = [3, 4, 5, 6]

bananas = [1, 2, 1, 2]

plt.stackplot(days, apples, bananas, labels=['Apples', 'Bananas'])

plt.legend()

plt.title("Stack Plot")

plt.show()

Histograms, binning, and density


plots are visualization techniques used to understand the distribution of data —
especially how values are spread across a dataset.

Histogram

A Histogram is a type of graph that represents the distribution of a dataset.

• It divides the entire range of values into a series of intervals called bins.

• The height of each bar shows how many data points fall into each bin.

• Histograms are mainly used for continuous numerical data.


• They help us see patterns like skewness, modality (peaks), and spread of data.

• Example use: Visualizing test scores, age distribution, etc.

Simple code example:

import matplotlib.pyplot as plt

data = [1, 2, 2, 3, 3, 3, 4, 5, 5]

plt.hist(data, bins=5)

plt.title("Histogram Example")

plt.show()

Binning in Histograms

Binning refers to how we divide the data range into intervals (bins):

• Each bin covers a specific range of data.

• Choosing the number of bins is important:

o Too few bins → important details are hidden.

o Too many bins → graph becomes noisy and confusing.

• Common ways to choose bin size:

o Sturges’ rule

o Square root rule (bins ≈ √n, where n = number of data points)

o Freedman-Diaconis rule

Example showing bin change:

plt.hist(data, bins=3) # Only 3 bins

plt.title("Histogram with 3 Bins")

plt.show()

More bins = finer detail; fewer bins = more general view.

Density Estimation (KDE)


• Kernel Density Estimation (KDE) is a smooth version of a histogram.

• Instead of sharp bars, it produces a continuous curve that estimates the


probability density of the data.

• KDE smooths the distribution by placing a smooth "bump" (kernel) over each data
point.

• Useful when we want to understand the underlying distribution more clearly than
with histograms.

• KDE plots are especially good when the dataset is large.

Simple KDE Example (using Seaborn):

import seaborn as sns

sns.kdeplot(data)

plt.title("Density Estimation (KDE)")

plt.show()

What Is Error Visualization?


Error bars function used as graphical enhancement that visualizes the variability of the
plotted data on a Cartesian graph.

Error bars can be applied to graphs to provide an additional layer of detail on the presented
data. Error bars help you indicate estimated error or uncertainty.

To visualize this information error bars work by drawing lines that extend from the center of
the plotted data point or edge with bar charts the length of an error bar helps to reveal
uncertainty of a data point as shown in the below graph.

A short bar infers less error whereas a long bar indicates more error or deviation.

Error visualization is the process of graphically representing the uncertainty or variability


in data using error bars or shaded regions.
Why Visualize Errors?

• To show how accurate or uncertain your data points or model predictions are

• To show confidence intervals in measurements

• To make your graphs scientifically meaningful

• Useful in experiments, surveys, and model predictions

Common Ways to Visualize Errors:

1. Error Bars (plt.errorbar())

2. Shaded Regions (using fill_between())

3. Box plots (for statistical spread)

4. Confidence bands (for models, often with regression lines)

1. Using plt.errorbar()

plt.errorbar(x, y, yerr=errors, fmt='o')

Simple Example:

import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5]

y = [2, 4, 6, 8, 10]

errors = [0.5, 0.4, 0.6, 0.3, 0.7]

plt.errorbar(x, y, yerr=errors, fmt='o', capsize=5, color='blue', ecolor='red')

plt.title("Error Bar Example")

plt.xlabel("X")
plt.ylabel("Y")

plt.grid(True)

plt.show()

2. Shaded Error Region with fill_between()

Good for showing ranges (confidence intervals or variability).

import numpy as np

x = np.linspace(0, 10, 100)

y = np.sin(x)

error = 0.2

plt.plot(x, y, label='Mean Value')

plt.fill_between(x, y - error, y + error, alpha=0.3, label='± Error')

plt.title("Shaded Error Region")

plt.legend()

plt.show()

3. Horizontal and Vertical Error Bars

You can also add x-direction error bars using xerr.

plt.errorbar(x, y, xerr=0.2, yerr=errors, fmt='o', ecolor='green', capsize=3)

4. Box Plots (for spread and variability)

Box plots show median, quartiles, and outliers — another form of visualizing error or
spread in data.

data = [[5, 7, 6, 9, 12], [8, 5, 6, 7, 10]]

plt.boxplot(data)

plt.title("Box Plot")

plt.show()
Visualizing Continuous Errors

Continuous error bands are a graphical representation of error or uncertainty as a shaded


region around a main trace, rather than as discrete whisker-like error bars. Fill_between()
function can be used to visualize the continuous errors.

Contour Plots

Contour plots also called level plots are a tool for doing multivariate analysis and
visualizing 3-D plots in 2-D space. If we consider X and Y as our variables we want to plot
then the response Z will be plotted as slices on the X-Y plane due to which contours are
sometimes referred as Z-slices or iso-response.

import numpy as np

import matplotlib.pyplot as plt

feature_x = np.linspace(-5.0, 3.0, 70)

feature_y = np.linspace(-5.0, 3.0, 70)

X, Y = np.meshgrid(feature_x, feature_y)

Z = X**2 + Y**2

plt.contourf(X, Y, Z, cmap='viridis') # 'viridis' colormap

plt.colorbar()

plt.title('Filled Contour Plot of Z = X^2 + Y^2')

plt.xlabel('X-axis')

plt.ylabel('Y-axis')

plt.show()

Customizing

1. Customizing Plot Legends


What is a Plot Legend?

• A legend is a small box inside the plot that explains what different colors,
markers, or lines represent.

• It is very important when you have multiple plots in a single graph.

• By default, legends are placed automatically, but customization allows you to:

o Change position (top, bottom, left, right)

o Add title to the legend

o Adjust font size, background color, and border.

• The plt.legend() function in Matplotlib is used for legends.

Common Customizations

Feature Method

Position loc='upper right', 'lower left', etc.

Title title='Legend Title'

Font Size fontsize=10

Background color facecolor='lightgray'

Border color and width edgecolor='black', frameon=True

Simple Code Example:

import matplotlib.pyplot as plt

x = [1, 2, 3, 4]

y1 = [2, 3, 4, 5]

y2 = [3, 4, 2, 1]

plt.plot(x, y1, label="Line 1")

plt.plot(x, y2, label="Line 2")


plt.legend(loc='lower right', title='My Legend', fontsize=10, frameon=True,
facecolor='lightyellow', edgecolor='black')

plt.title("Customized Legend Example")

plt.show()

2. Customizing Colorbars (5 Marks)

What is a Colorbar?

• A colorbar is a visual representation of color mapping used in plots like:

o Heatmaps

o Contour plots

o Images (e.g., pixel intensities)

• It helps the reader understand what different colors mean (e.g., high value = dark
red, low value = light blue).

Common Customizations

Feature Method

Colorbar Label colorbar.set_label('label name')

Orientation orientation='horizontal'

Tick Fontsize colorbar.ax.tick_params(labelsize=10)

Shrink Size plt.colorbar(mappable, shrink=0.8)

Colorbar Ticks colorbar.set_ticks([list_of_ticks])

Simple Code Example:

import matplotlib.pyplot as plt


import numpy as np

data = np.random.rand(5, 5)

img = plt.imshow(data, cmap='viridis')

colorbar = plt.colorbar(img, shrink=0.8, orientation='vertical')

colorbar.set_label('Intensity')

colorbar.ax.tick_params(labelsize=8)

plt.title("Customized Colorbar Example")

plt.show()

3. Text and Annotation in Matplotlib (7 Marks)

Theory

• Text is used to add static words like titles, labels, or general comments inside a
plot.

• Annotations are special texts connected to a specific data point (for highlighting
something important).

• You can control font size, color, rotation, alignment, etc.

• Matplotlib functions used:

o plt.text(x, y, "text") — Add text at position (x, y).

o plt.annotate() — Add an annotation, often with arrows pointing to a point.


Common Parameters:

Parameter Meaning

fontsize Size of text

color Color of text

ha, va Horizontal, Vertical alignment

arrowprops To add arrow in annotation

Small Code Example:

import matplotlib.pyplot as plt

plt.plot([1, 2, 3], [4, 5, 6])

# Add simple text

plt.text(2, 5, "Midpoint", fontsize=12, color='blue')

# Add annotation with arrow

plt.annotate("Peak", xy=(3,6), xytext=(2.5,6.5),

arrowprops=dict(facecolor='red', shrink=0.05))

plt.title("Text and Annotation Example")

plt.show()
4. Transform and Text Positions (7 Marks)
Theory

• Transforms control where text or annotations appear — based on data


coordinates or figure coordinates.

• Data coordinate → based on the axes scale (e.g., x=2, y=5).

• Figure coordinate → relative to figure size (0 to 1 range).

• Useful for placing text at fixed positions regardless of data.

Types of Transforms:

Transform Meaning

ax.transData Data points (default)

fig.transFigure Figure (0,0) to (1,1)

Small Code Example:

fig, ax = plt.subplots()

ax.plot([1, 2, 3], [4, 5, 6])

# Add text in data coordinates

ax.text(2, 5, "Data Text", transform=ax.transData)

# Add text in figure coordinates

fig.text(0.5, 0.9, "Figure Text", ha='center', fontsize=12)

plt.show()

5. Customizing Ticks (7 Marks)


Theory

• Ticks are the small marks on x-axis and y-axis showing values.

• You can customize:


o Location of ticks

o Labels of ticks

o Font size, rotation, color

• Useful to make plots cleaner or match a specific style.

Main Functions:

Function Purpose

set_xticks() Set custom tick locations (x-axis)

set_yticks() Set custom tick locations (y-axis)

set_xticklabels() Set custom labels

tick_params() Customize appearance (size, color, direction)

Small Code Example:

fig, ax = plt.subplots()

ax.plot([1, 2, 3], [4, 5, 6])

# Set custom ticks

ax.set_xticks([1, 2, 3])

ax.set_xticklabels(['One', 'Two', 'Three'])

# Customize tick appearance

ax.tick_params(axis='x', rotation=45, labelsize=12, colors='red')

plt.show()

6. Customizing Plots (7 Marks)


Theory

• Customization means changing the look and feel of the entire plot.

• You can customize:


o Line style (dotted, dashed)

o Line color (blue, red)

o Markers (dots, squares)

o Background color

o Grid style

• Customization helps highlight important patterns and improve readability.

Customization Options:

Feature Example Code

Line color color='green'

Line style linestyle='--'

Marker type marker='o'

Background color fig.patch.set_facecolor('lightgray')

Grid display plt.grid(True, linestyle='--')

Small Code Example:

fig, ax = plt.subplots()

# Customize line color, style, and marker

ax.plot([1, 2, 3], [4, 5, 6], color='purple', linestyle='--', marker='o')

# Change background color

fig.patch.set_facecolor('lightyellow')

# Add grid

plt.grid(True, linestyle=':', color='blue')

plt.title("Customized Plot")

plt.show()
7. Plot: Adjust Line Colors and Styles
Customizing your plot makes it:

• Easier to understand

• More visually appealing

• Helpful when plotting multiple lines to differentiate

Basic Line Plot Syntax

plt.plot(x, y, style)

Where style can include:

• Color

• Line style

• Marker style

Line Color Options

You can specify the color in multiple ways:

Color Name Code Example

Blue 'b' plt.plot(x, y, 'b')

Green 'g' plt.plot(x, y, 'g')

Red 'r' plt.plot(x, y, 'r')

Black 'k' plt.plot(x, y, 'k')

Custom RGB color='#FF5733' Hex values

Line Style Options

Line Style Code Example

Solid Line '-' plt.plot(x, y, '-')

Dashed Line '--' plt.plot(x, y, '--')

Dotted Line ':' plt.plot(x, y, ':')


Line Style Code Example

Dash-dot Line '-.' plt.plot(x, y, '-.')

Marker Style Options

Marker Code Description

'o' Circle marker

's' Square marker

'D' Diamond marker

'^' Triangle marker

'*' Star marker

Simple Example with Colors and Styles

import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5]

y1 = [2, 4, 6, 8, 10]

y2 = [1, 2, 1, 2, 1]

plt.plot(x, y1, color='blue', linestyle='--', marker='o', label='Line 1')

plt.plot(x, y2, color='red', linestyle='-', marker='s', label='Line 2')

plt.title("Custom Line Styles and Colors")

plt.xlabel("X-Axis")

plt.ylabel("Y-Axis")

plt.legend()

plt.grid(True)

plt.show()

8. Plot: Adjust Axes Limits


By default, Matplotlib auto-scales the x and y axes based on your data. But sometimes you
want to manually control the axis range to:

• Focus on a specific part of the data

• Ensure consistency between multiple plots

• Zoom into trends or outliers

• Improve plot readability

How to Set Axes Limits

You can set limits using:

plt.xlim() and plt.ylim()

Set the minimum and maximum values for x and y axes.

ax.set_xlim() and ax.set_ylim()

Used in the object-oriented (OO) style with subplots.

Syntax

plt.xlim(min, max)

plt.ylim(min, max)

or:

ax.set_xlim([min, max])

ax.set_ylim([min, max])

Simple Example: Setting Axis Limits

import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5]

y = [10, 20, 25, 30, 35]

plt.plot(x, y, 'bo--')

plt.xlim(2, 4) # Focus only on x from 2 to 4

plt.ylim(15, 30) # Focus only on y from 15 to 30

plt.title("Axes Limits Example")


plt.xlabel("X-Axis")

plt.ylabel("Y-Axis")

plt.grid(True)

plt.show()

9. Customizing Axis

Multiple Subplots using subplot() in Matplotlib (for 10


Marks)

What are Subplots?

• Subplots mean creating multiple small plots inside a single figure (one window).

• Instead of drawing only one graph, you can display several related graphs
together.
• This is useful for comparing data, showing trends, or summarizing multiple
results at once.

• Each small plot has its own axes, labels, and title.

• Helps in efficient use of space and makes visual comparisons easier.

The subplot() Function

Matplotlib provides a simple function called plt.subplot() to create subplots easily.

Basic Syntax:

plt.subplot(nrows, ncols, index)

Parameter Meaning

nrows Number of rows in the grid

ncols Number of columns in the grid

index Position of the current plot (starts from 1)

Key Points:

• The figure is divided into a grid of rows × columns.

• Index tells where to place the plot in the grid (counted left to right, top to bottom).

• Each subplot() call creates one subplot at the specified location.

Simple Examples

Example 1: 2 Plots (Side-by-Side)

import matplotlib.pyplot as plt

# 1st plot

plt.subplot(1, 2, 1)

plt.plot([1, 2, 3], [4, 5, 6])

plt.title("Plot 1")
# 2nd plot

plt.subplot(1, 2, 2)

plt.plot([1, 2, 3], [6, 5, 4])

plt.title("Plot 2")

plt.show()

• Here, 1 row, 2 columns, plot 1 and plot 2.

Example 2: 4 Plots (2x2 Grid)

plt.subplot(2, 2, 1)

plt.plot([1, 2, 3], [3, 2, 1])

plt.title("Plot 1")

plt.subplot(2, 2, 2)

plt.plot([1, 2, 3], [1, 2, 3])

plt.title("Plot 2")

plt.subplot(2, 2, 3)

plt.plot([1, 2, 3], [2, 3, 1])

plt.title("Plot 3")

plt.subplot(2, 2, 4)

plt.plot([1, 2, 3], [3, 1, 2])

plt.title("Plot 4")

plt.tight_layout() # Adjusts layout to avoid overlap

plt.show()

• Here, 2 rows, 2 columns, plots 1 to 4.

Important Tips

• Starting index is always 1, not 0.


• Use plt.tight_layout() to automatically adjust spaces between plots.

• You can mix different kinds of plots inside subplots (like a line plot + bar plot
together).

• Subplots share the same figure window but different axes.

• Good practice: keep axes labeled for each subplot.

Difference Between subplot() and subplots()

• subplot() → For simple or few plots.

• subplots() → For complex layouts, it returns figure and axes objects for advanced
control.

Example using subplots():

fig, axes = plt.subplots(2, 2)

axes[0, 0].plot([1, 2, 3], [4, 5, 6])

axes[1, 1].plot([1, 2, 3], [6, 5, 4])

plt.show()

Seaborn
Seaborn is a Python data visualization library based on Matplotlib. It offers a high-level
interface to produce statistical graphics with ease and beauty.

Seaborn operates on the principle of mapping data variables to visual properties (like
position, color, or size) to reveal patterns and relationships.

It emphasizes statistical plotting, meaning it often includes built-in functionality to


compute and display statistical measures such as means, medians, or distributions
directly within the visualization.

Seaborn operates by using the structure of datasets (especially pandas DataFrames) to


map relationships between variables directly into plots.

It was developed to make complex visualization simple, especially when working with
Pandas DataFrames.
Features of Seaborn:

Feature Explanation

High-level API Short code for complex plots

Beautiful defaults Good-looking plots without extra effort

Works with Pandas Easily plots DataFrame data

Statistical analysis Supports regression, distribution, categorization

Thematic styling Built-in themes and color palettes

Why Use Seaborn Over Matplotlib?

Feature Matplotlib Seaborn

Code simplicity More verbose Cleaner and shorter

Statistical support Manual Built-in

Themes and style Basic Beautiful by default

Integration with data Needs more work Native with DataFrame

In short: Seaborn is built on top of Matplotlib but focuses on statistics, dataframes, and
aesthetics.

Real-Life Use Cases

• Analyzing sales data over time

• Visualizing customer behavior

• Understanding exam scores distribution

• Displaying gender-based spending

• Visualizing correlations between multiple variables

Easy syntax: Simplifies complex plots into few lines of code.


Built-in themes and color palettes: Improves visual quality without manual styling.

Automatic statistical aggregation: Makes it easy to summarize and display large data.

Flexible customizations: Allows fine control over every element of the plot if needed.

Integration with pandas: Direct plotting from structured datasets.

Basic Seaborn Syntax & Workflow

1. Import libraries

2. Load or prepare dataset

3. Choose the Seaborn function for the desired chart

4. Call the function with required arguments

5. Customize with labels, title, and style

6. Display the plot

Advantages of Seaborn

1. Concise syntax

2. Automatic handling of DataFrame

3. In-built datasets for practice

4. Statistical power built-in

5. Beautiful, professional-looking graphs

6. Excellent for quick exploratory data analysis

Seaborn Packages
Seaborn is organized into internal modules (packages) based on the type of visualization or
data analysis you need to perform.

Each Seaborn package targets specific data relationships, offering specialized plotting
functions.

Organized structure: Different packages for different visualization needs.


Automatic statistical support: Adds things like regression lines, confidence intervals.

Stylish default themes: Makes plots attractive without much effort.

3. Organization of Seaborn

Seaborn organizes its functionalities into packages based on:

Purpose Package Name

Relationship visualization seaborn.relational

Categorical data visualization seaborn.categorical

Distribution visualization seaborn.distributions

Regression analysis seaborn.regression

Matrix (grid) visualization seaborn.matrix

Plot styling and themes seaborn.themes

Each package contains specialized functions for targeted visualizations.

4.1 seaborn.relational

• Focus: Relationships between numerical variables (e.g., scatterplots, lineplots).

• Helps in finding correlations, clusters, or trends between features.

Functions:

• scatterplot()

• lineplot()

Small Example:

sns.scatterplot(x=[1, 2, 3], y=[4, 5, 6])

plt.show()

4.2 seaborn.categorical

• Focus: Comparisons among categories (e.g., barplots, boxplots, violinplots).


• Useful when data involves grouping or classifications.

Functions:

• barplot()

• boxplot()

• violinplot()

Small Example:

sns.barplot(x=["Cat1", "Cat2", "Cat3"], y=[7, 8, 5])

plt.show()

4.3 seaborn.distributions

• Focus: Understanding distribution of a single variable (e.g., histograms, KDE


plots).

• Helps detect outliers, skewness, modes, and spread.

Functions:

• histplot()

• kdeplot()

Small Example:

sns.histplot([2, 3, 4, 4, 5, 5, 6, 7])

plt.show()

4.4 seaborn.regression

• Focus: Predictive relationships and trend lines (e.g., regplot, lmplot).

• Adds linear regression lines or confidence intervals automatically.

Functions:

• regplot()

• lmplot()
Small Example:

sns.regplot(x=[1, 2, 3, 4], y=[2, 4, 5, 7])

plt.show()

4.5 seaborn.matrix

• Focus: Matrix visualizations like heatmaps and cluster maps.

• Useful for correlation analysis or similarity detection.

Functions:

• heatmap()

• clustermap()

Small Example:

import numpy as np

data = np.array([[1, 2], [3, 4]])

sns.heatmap(data, annot=True)

plt.show()

4.6 seaborn.themes

• Focus: Styling and formatting plots (e.g., grid style, font scale, background color).

• Important for making graphs publication-ready.

Functions:

• set_style()

• set_context()

• set_palette()

Small Example:

sns.set_style("whitegrid")
sns.lineplot(x=[1, 2, 3], y=[3, 2, 5])

plt.show()

Using Seaborn Along With Matplotlib


Why Combine Seaborn and Matplotlib?

Seaborn makes it easier to plot beautiful and statistical graphs, while Matplotlib gives
full control over plot elements like:

• Titles and labels

• Ticks and axis limits

• Fonts and colors

• Subplots and figure size

So, using both together helps you create professional, customized, and flexible plots.

Real-Life Example: Seaborn + Matplotlib

Let’s plot a simple scatterplot with Seaborn and customize it using Matplotlib functions.

import seaborn as sns

import matplotlib.pyplot as plt

# Load sample dataset

data = sns.load_dataset("tips")

# Set Seaborn theme

sns.set_theme(style="whitegrid")

# Plot with Seaborn

sns.scatterplot(x="total_bill", y="tip", hue="day", data=data)

# Now customize using Matplotlib

plt.title("Total Bill vs Tip by Day")

plt.xlabel("Total Bill (in $)")


plt.ylabel("Tip (in $)")

plt.xlim(0, 60) # Set x-axis limits

plt.ylim(0, 12) # Set y-axis limits

plt.legend(title="Day of Week")

plt.grid(True)

# Show plot

plt.show()

Histograms, KDE, and Density Plots


These are tools to visualize how data is distributed, especially for continuous numerical
data.

1. Histogram

A histogram shows the frequency (count) of data values within specified intervals called
bins.

• Helps in understanding how values are spread (e.g., exam scores, salaries)

• Tall bars = many values in that range

2. KDE (Kernel Density Estimation)

KDE is a smoothed version of the histogram that estimates the probability density
function of the data.

• Gives a smooth curve instead of bars

• Better for understanding continuous distribution trends

• Represents "how likely" a value is

Seaborn Functions for These

Plot Type Function Description

Histogram sns.histplot() Bar-based frequency visualization


Plot Type Function Description

KDE sns.kdeplot() Smooth probability distribution curve

Both sns.histplot(..., kde=True) Combine histogram and KDE

Very Simple Code Examples

Histogram Only:

import seaborn as sns

import matplotlib.pyplot as plt

data = sns.load_dataset("tips")

sns.histplot(data["total_bill"], bins=20, color="skyblue")

plt.title("Histogram of Total Bill")

plt.xlabel("Total Bill")

plt.ylabel("Frequency")

plt.show()

KDE Only:

sns.kdeplot(data["total_bill"], shade=True, color="green")

plt.title("KDE Plot of Total Bill")

plt.xlabel("Total Bill")

plt.ylabel("Density")

plt.show()

Histogram + KDE Together:

sns.histplot(data["total_bill"], kde=True, color="purple", bins=20)

plt.title("Histogram + KDE of Total Bill")


plt.xlabel("Total Bill")

plt.ylabel("Frequency / Density")

plt.show()

Uses:

• To identify skewness, peaks, and spread

• To understand normality of data

• To compare distributions of different datasets

Detailed Seaborn Plot Explanations + Code (Iris & Tips


Dataset)

1. Line Plot

• Shows trend or pattern over continuous variables like time, measurements, etc.

• Good for time series, progressions, or comparisons.

A line plot displays information as a series of data points connected by straight lines.
It is commonly used to visualize data trends over time or continuous variables.
In Seaborn, lineplot() can automatically handle aggregation and error bands.
It is ideal for analyzing relationships and detecting patterns or cycles.
Line plots are highly readable and great for time-series analysis.

Code (Iris dataset):

import seaborn as sns

import matplotlib.pyplot as plt

iris = sns.load_dataset('iris')

sns.lineplot(x='sepal_length', y='sepal_width', data=iris)

plt.show()

2. Scatter Plot
• Displays relationship between two continuous variables.

• Good for finding correlations, clusters, or outliers.

A scatter plot shows the relationship between two continuous variables.


Each point represents an observation with x and y values.
It is useful for identifying correlations, clusters, and outliers.
Seaborn's scatterplot() supports grouping using colors or styles.
It is a key tool in exploring relationships between variables.

Code (Iris dataset):

sns.scatterplot(x='petal_length', y='petal_width', hue='species', data=iris)

plt.show()

3. Box Plot

• Shows distribution of data based on five summary statistics: minimum, Q1,


median, Q3, and maximum.

• Detects outliers and variation between categories.

A box plot visualizes the distribution of data using five summary statistics: min, Q1,
median, Q3, and max.
It highlights the presence of outliers and the spread of data.
Seaborn's boxplot() function makes it easy to compare distributions across categories.
The box shape shows the interquartile range where 50% of data lies.
Box plots are excellent for comparing variations between groups.

Code (Tips dataset):

tips = sns.load_dataset('tips')

sns.boxplot(x='day', y='total_bill', data=tips)

plt.show()

4. Violin Plot

• Combines box plot and KDE (kernel density estimate).


• Shows distribution shape + summary statistics together.

A violin plot combines a box plot and a kernel density plot.


It shows both the distribution shape and summary statistics of the data.
Seaborn's violinplot() allows splitting violins by categories easily.
The width of the violin represents the data density at different values.
It is more informative than boxplots when the distribution is multimodal.

Code (Tips dataset):

sns.violinplot(x='day', y='total_bill', data=tips)

plt.show()

5. Heatmap

• Displays matrix data using colors.

• Commonly used for correlations or feature similarities.

A heatmap displays matrix data as a color-coded grid.


It is commonly used to visualize correlations, tables, or 2D data distributions.
Seaborn's heatmap() function supports annotations and color gradients.
Color intensity represents the magnitude of the values.
Heatmaps make it easy to spot patterns, clusters, or relationships.

Code (Tips dataset):

corr = tips.corr()

sns.heatmap(corr, cmap='coolwarm')

plt.show()

6. Pair Plot

• Plots scatterplots for all feature pairs and histograms for individual features.

• Excellent for EDA (Exploratory Data Analysis).

A pair plot creates scatterplots between all pairs of variables and histograms on the
diagonal.
It provides a quick overview of how variables relate to each other.
Seaborn's pairplot() can color-code points by category labels.
It is useful in exploratory data analysis (EDA) for small datasets.
Pair plots reveal hidden trends, clusters, and relationships visually.

Code (Iris dataset):

sns.pairplot(iris, hue='species')

plt.show()

7. Count Plot

• Counts occurrences of categorical variables.

• Similar to a bar chart, but specifically shows frequencies.

A count plot shows the number of observations for each category.


It is a bar plot where height represents the count of data points.
Seaborn’s countplot() simplifies categorical data visualization.
It helps understand the balance or imbalance in the dataset.
Count plots are best for visualizing frequency distributions.

Code (Tips dataset):

sns.countplot(x='day', data=tips)

plt.show()

8. Displot (Distribution Plot)

• Shows histogram of a variable with an optional KDE overlay.

• Helps understand distribution, skewness, and spread.

A displot shows the distribution of a single variable.


It combines histogram and optional KDE (density curve) in one figure.
Seaborn's displot() supports faceting across multiple subsets.
It is useful for checking skewness, modality, and spread.
Displots are helpful for understanding how a variable behaves.
Code (Tips dataset):

sns.displot(tips['total_bill'], kde=True)

plt.show()

9. Joint Plot

• Combines scatterplot + histograms for two variables.

• Shows relationship + distributions in a single view.

A joint plot combines scatterplots and histograms in a single figure.


It shows both the bivariate relationship and univariate distributions.
Seaborn’s jointplot() can display regression lines, KDEs, or hex bins.
It is highly effective for studying two-variable relationships.
Joint plots offer detailed insights into data structures.

Code (Tips dataset):

sns.jointplot(x='total_bill', y='tip', data=tips, kind='scatter')

plt.show()

10. Faceted Histogram (FacetGrid)

• Creates multiple histograms split by a categorical variable.

• Useful to compare distributions across groups.

Faceted histograms show multiple histograms divided by categories.


Seaborn’s FacetGrid allows splitting a plot into subplots based on a variable.
Each facet shows a distribution for a specific subgroup.
It is useful for comparing how distributions differ across groups.
Faceted plots make multivariate data easier to interpret.

Code (Tips dataset):

g = sns.FacetGrid(tips, col="sex")

g.map(plt.hist, "total_bill")

plt.show()
12. Bar Plot

• Represents summary statistics (mean by default) for each category.

• Shows comparisons between different groups.

A bar plot shows summary statistics (mean by default) for each category.
Each bar's height indicates a value, often with confidence intervals.
Seaborn’s barplot() computes aggregations automatically.
It is commonly used to compare averages between groups.
Bar plots provide a clear and simple way to visualize group comparisons.

Code (Tips dataset):

sns.barplot(x='day', y='total_bill', data=tips)

plt.show()

3D Graphs in Python Using Matplotlib


Data visualization plays a critical role in understanding and interpreting data, especially
when dealing with multiple variables.

While 2D graphs (like line plots and bar charts) are widely used, they are limited to
visualizing relationships between two variables.

3D plots, on the other hand, allow us to graph data that involves three dimensions —
commonly referred to as X, Y, and Z axes.

Creating 3D graphs enables us to:

• Visualize mathematical functions in 3D space.

• Represent data that naturally exists in three dimensions (e.g., terrain maps, physical
simulations).

• Observe how two input variables affect an output (useful in machine learning,
physics, finance, etc.).

Matplotlib and the mplot3d Toolkit


Python provides the matplotlib library for plotting, and within it, the mplot3d toolkit is used
to generate 3D plots.

from mpl_toolkits.mplot3d import Axes3D # Enables 3D plotting

When creating a 3D plot, you specify:

• A figure using plt.figure()

• A 3D subplot using fig.add_subplot(..., projection='3d')

• Then plot using functions like plot3D(), scatter3D(), or plot_surface().

Types of 3D Plots

Plot Type Description

Line Plot Draws a line in 3D using X, Y, and Z points

Scatter Plot Plots individual points in 3D space

Surface Plot Shows a 3D surface defined by a function of X and Y

Wireframe Plot Similar to surface plot but uses wire-like grid structure

Simple 3D Plot Example

import matplotlib.pyplot as plt

from mpl_toolkits.mplot3d import Axes3D

import numpy as np

# Create data

z = np.linspace(0, 10, 100)

x = np.sin(z)

y = np.cos(z)

# Create figure and 3D axis

fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')

# Plot the 3D line

ax.plot3D(x, y, z, color='green')

# Show plot

plt.title("Simple 3D Line Plot")

plt.show()

Why Use 3D Graphs?

• More Insight: Helps to observe complex relationships.

• Intuitive Visuals: Makes it easier to interpret functions or datasets with 3 variables.

• Scientific and Mathematical Applications: Ideal for simulations, surface


modeling, or optimization problems.

2⃣ 3D Scatter Plot

Used to display individual data points in 3D. Great for showing distributions or clusters.

x = np.random.rand(50)

y = np.random.rand(50)

z = np.random.rand(50)

fig = plt.figure()

ax = fig.add_subplot(111, projection='3d')

ax.scatter3D(x, y, z, color='red')

plt.title("3D Scatter Plot")

plt.show()

3⃣ 3D Surface Plot

Best for visualizing functions of two variables (z = f(x, y)). It shows the surface behavior.

x = np.linspace(-5, 5, 50)
y = np.linspace(-5, 5, 50)

x, y = np.meshgrid(x, y)

z = np.sin(np.sqrt(x**2 + y**2))

fig = plt.figure()

ax = fig.add_subplot(111, projection='3d')

ax.plot_surface(x, y, z, cmap='viridis')

plt.title("3D Surface Plot")

plt.show()

4⃣ 3D Wireframe Plot

Similar to surface plots but only shows the edges. Useful for a mesh/grid visualization of
data.

fig = plt.figure()

ax = fig.add_subplot(111, projection='3d')

ax.plot_wireframe(x, y, z, color='black')

plt.title("3D Wireframe Plot")

plt.show()

5⃣ 3D Contour Plot

Used to represent 3D surfaces with contour lines — like elevation maps.

fig = plt.figure()

ax = fig.add_subplot(111, projection='3d')

ax.contour3D(x, y, z, 50, cmap='coolwarm')

plt.title("3D Contour Plot")

plt.show()

Time Series Analysis with Pandas – Step-by-Step


What is Time Series Analysis?

Time Series Analysis is the process of examining data points collected or recorded over
time to extract meaningful insights. It is used in stock prices, weather patterns, sales
forecasting, and many other fields.

In Python, Pandas makes it very easy to handle time-indexed data due to its built-in
support for date-time indexing, resampling, rolling statistics, and data visualization.

Time Series Analysis is the process of studying datasets that are collected or recorded at
successive points in time, usually at equally spaced intervals (like daily, monthly, yearly).
The goal is to identify patterns (like trends, seasonality, and cycles), understand the
underlying structure of the data, and often to make forecasts about future values.

Unlike regular data analysis, time series analysis explicitly accounts for the temporal order
of the data, recognizing that past values can influence future values.

1⃣ Import Libraries and Load Data

You start by importing necessary libraries like pandas, and often matplotlib for plotting.
Then, you load the time-based dataset.

import pandas as pd

import matplotlib.pyplot as plt

2⃣ Parse Dates and Set Date Index (creating a time series dataset)

Make sure your data has a datetime column. Convert it using pd.to_datetime() and set it as
the index of the DataFrame. This is crucial for time-based operations. Convert raw date
strings to datetime objects and make them the index.

# Sample data

dates = pd.date_range(start='2024-01-01', periods=6, freq='D')

values = [100, 102, 101, 98, 105, 110]

df = pd.DataFrame({'Date': dates, 'Value': values})

df.set_index('Date', inplace=True) # Setting the date as index

print(df)

3⃣ Visualize the Time Series (time series plotting)


Plot the time series using .plot(). This helps identify trends, seasonality, and irregularities
visually. Understand the behavior of data over time.

df.plot(title='Time Series Data')

plt.xlabel('Date')

plt.ylabel('Value')

plt.show()

4⃣ Resample the Data (handling date time objects)

Sometimes, your data may be too granular (daily/hourly). You can change the frequency
(e.g., to monthly) using .resample(). Aggregate data to a new time frequency.

monthly = df.resample('M').mean() # Resample by Month and take mean

print(monthly)

5⃣ resampling time series data

Resampling is the process of changing the frequency of your time series data. You can
downsample (reduce frequency) or upsample (increase frequency) the data using
.resample().

data = {'Date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04'],

'Value': [100, 200, 300, 400]}

df = pd.DataFrame(data)

df['Date'] = pd.to_datetime(df['Date'])

df.set_index('Date', inplace=True)

df_resampled = df.resample('M').mean()

print(df_resampled)

6⃣ Time-based Filtering

You can filter data by specifying time ranges. This is useful for comparing specific time
periods. Analyze specific time windows.

print(df['2024-01-02':'2024-01-04']) # Slice data by date range

7⃣ Handling Missing Dates


If your data has missing days or timestamps, you can fill them using asfreq() and fillna().
Ensure consistent date frequency.

df_filled = df.asfreq('D').fillna(method='ffill')

8⃣ Detect Trends or Seasonality (Basic Visual)

Although Pandas alone doesn’t do full statistical decomposition, you can observe trends
and seasonal effects visually using plots.

Real-Life Applications

• Stock price analysis over months or years

• Weather forecasting using historical temperatures

• Sales predictions based on monthly trends

• Website traffic trends and seasonality

Difference Between Matplotlib and Seaborn

You might also like