KEMBAR78
Data Visualization - U5 | PDF | Categorical Variable | Regression Analysis
0% found this document useful (0 votes)
39 views31 pages

Data Visualization - U5

The document introduces Seaborn, a Python library for creating statistical graphics that simplifies data visualization and integrates with Pandas. It covers installation instructions, various plotting functions like bar plots, count plots, and heatmaps, as well as key features such as built-in themes and statistical estimation. Additionally, it discusses spatial analysis and the Folium library for visualizing geospatial data, highlighting its applications in urban planning and resource management.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views31 pages

Data Visualization - U5

The document introduces Seaborn, a Python library for creating statistical graphics that simplifies data visualization and integrates with Pandas. It covers installation instructions, various plotting functions like bar plots, count plots, and heatmaps, as well as key features such as built-in themes and statistical estimation. Additionally, it discusses spatial analysis and the Folium library for visualizing geospatial data, highlighting its applications in urban planning and resource management.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

UNIT-5 Introduction to Seaborn: Seaborn functionalities and usage, Spatial

Visualizations and Analysis in Python with Folium, Case Study.

Seaborn is a library for making statistical graphics in Python. It builds on top of


matplotlib and integrates closely with pandas data structures.

Seaborn helps you explore and understand your data. Its plotting functions operate on
dataframes and arrays containing whole datasets and internally perform the necessary
semantic mapping and statistical aggregation to produce informative plots. Its
dataset-oriented, declarative API lets you focus on what the different elements of your
plots mean, rather than on the details of how to draw them.​

Overview - ​

Seaborn
Seaborn in Python is a data visualisation toolkit that simplifies the process of
constructing visually appealing and useful statistics graphs. Simply said, it is a tool that
allows you to make your data not only understandable but also visually appealing.It's
based on Matplotlib, another famous plotting package, but it has a more advanced
interface that makes it easier to create meaningful statistical visualisations.

Seaborn's appeal comes from its capacity to generate complex visualizations with only
a few lines of code.Its concise syntax allows you to generate elaborate statistical
graphs without delving into the complexity of plotting processes.

One of the distinguishing characteristics is its interoperability with Pandas DataFrames.


It can directly transform your data frame into a visual appeal. This seamless connection
ensures a smooth transition from data processing to data display.

It goes beyond only creating visually appealing graphs; it also incorporates statistical
estimates into the visualizations. In a scatter plot, for example, it can automatically
build a linear regression line, demonstrating insights into the underlying trends in your
data. This combination of visualization and statistical analysis is what Seaborn is
known for and everyone who wants to extract useful information from their data.
Installing Seaborn on multiple operating systems is simple, and here's a brief instruction
to get you started.​

Windows: Windows users can install Seaborn by opening their command prompt and
typing the command:

pip install seaborn

MacOS:For the Mac users, open up your terminal and enter the installation command:

pip install seaborn

Linux:Installing Seaborn on Linux is an easy task. Open your terminal and type:

pip install seaborn

Verifying the Installation: Regardless of your operating system, it's always


recommended to check if Seaborn has settled into your Python environment. Open a
Python shell and type:

import seaborn as sns

Python Seaborn Plotting Functions


The Seaborn library provides a range of plotting functions that makes the visualization
and analysis of data easier. You’ll cover some of the crucial plots in this tutorial.

Barplot
A bar plot gives an estimate of the central tendency for a numeric variable with the
height of each rectangle. It provides some indication of the uncertainty around that
estimate using error bars. To build this plot, you usually choose a categorical column on
the x-axis and a numerical column on the y-axis.
In the above plot, you have used the barplot() function and passed it in the cylinder (cyl)
column in the x-axis and carburetors (carb) in the y-axis. The code depicted below is
another way to create the same bar plot.
Here you are exclusively defining the x and y-axis columns and also passing the name of
the data frame using the data argument.

Python Seaborn allows the users to assign colors to the bars. The bar chart below will
convert all the bars to yellow color.
Seaborn library also has the palette attribute which you can use to give different colors
to the bars.In the example below, there is a bar plot that uses palette = ‘rocket’.

Countplot
The countplot() function in the Python Seaborn library returns the count of total values
for each category using bars.The below count plot returns the number of vehicles for
each category of cylinders.
The next count plot shows the number of cars for each carburetor.

Python Seaborn allows you to create horizontal count plots where the feature column is
in the y-axis and the count is on the x-axis.The below visualization shows the count of
cars for each category of gear.
From the above plot, you can see that we have 15 vehicles with 3 gears, 12 vehicles with
4 gears, and 5 vehicles with 5 gears.Now, you can also create a grouped count plot
using the hue parameter. The hue parameter accepts the column name for color
encoding.In the below count plot, you have the count of cars for each category of gears
that are grouped based on the number of cylinders.

Distribution Plot
The Seaborn library supports the distplot() function that creates the distribution of any
continuous data.In the below example, you must plot the distribution of miles per gallon
of the different vehicles. The mpg metrics measure the total distance the car can travel
per gallon of fuel.
Heatmap
Heatmaps in the Seaborn library lets you visualize matrix-like data. The values of the
variables are contained in a matrix and are represented as colors.Below is an example
of the heatmap where you are finding the correlation between each variable in the
mtcars dataset.

Scatterplot
The Seaborn scatterplot() function helps you create plots that can draw relationships
between two continuous variables.

Moving ahead, to understand scatter plots and other plotting functions, you must use
the IRIS flower dataset.So, go ahead and load the iris dataset.
The scatter plot below shows the relationship between sepal length and petal length for
different species of iris flowers.

Now, you can classify the different species of flowers using the hue parameter as
“species” in the function.From the below plot, you can easily differentiate the three types
of iris flowers based on their sepal length and petal length.

Pairplot
The Python Seaborn library lets you visualize data using pair plots that produce a matrix
of relationships between each variable in the dataset.
In the below plot, all the plots are histograms that represent the distribution of each
feature.

You can convert the diagonal visuals to KDE plots and the rest to scatter plots using the
hue parameter. This makes the pairplot easier to classify each type of flower.

.
Linear Regression Plot
The lmplot() function in the Seaborn library draws a linear relationship as determined
through regression for the continuous variables.The plot below shows the relationship
between petal length and petal width of the different species of iris flowers.

The hue parameter can differentiate between each species of flower and you can set
markers for different species.
Boxplot
A boxplot, also known as a box and whisker plot, depicts the distribution of quantitative
data. The box represents the quartiles of the dataset. The whiskers show the rest of the
distribution, except for the outlier points.The boxplot below shows the distribution of
the three species of iris flowers based on their sepal width.

Key Features of Seaborn

1. Built-in Themes and Aesthetics


Seaborn includes several predefined themes that enhance the visual appeal of plots.
These themes—"darkgrid", "whitegrid", "dark", "white", and "ticks"—help create
professional-looking visualizations with minimal customization. The sns.set_theme()
function allows users to apply consistent styles across multiple plots.

2. Statistical Color Palettes


Seaborn provides a diverse range of color palettes optimized for categorical and
numerical data:
Sequential Palettes – Best for ordered data representation.
Categorical Palettes – Ideal for distinguishing discrete categories.
Diverging Palettes – Useful for highlighting deviations from a central value.
The sns.color_palette() function helps customize and apply these palettes effortlessly.
3. Flexible Plotting Functions
Seaborn simplifies complex visualizations by offering high-level plotting functions built
on top of Matplotlib. These functions work directly with Pandas DataFrames, making
data manipulation and visualization seamless. Commonly used plot types include:
Scatter Plots (sns.scatterplot()) – Show relationships between two variables.
Line Plots (sns.lineplot()) – Display trends over time or ordered data.
Bar Plots (sns.barplot()) – Represent categorical comparisons with statistical
estimation.
Box Plots & Violin Plots (sns.boxplot(), sns.violinplot()) – Analyze data distribution and
variability.
Heatmaps (sns.heatmap()) – Visualize correlations and matrix data.

4. Statistical Estimation and Regression


Seaborn incorporates statistical estimation techniques to enhance visual analysis. It
automatically computes means, confidence intervals, and regression trends within
plots.
Regression Plots (sns.lmplot(), sns.regplot()) – Fit and visualize linear regression
models with confidence intervals.
Kernel Density Estimation (KDE) (sns.kdeplot()) – Display smooth probability density
functions.

5. Categorical Data Visualization


Seaborn excels in visualizing categorical data distributions and comparisons. It offers
various functions for analyzing categorical variables:
Count Plots (sns.countplot()) – Display the frequency of categorical values.
Bar Plots (sns.barplot()) – Show the mean of a quantitative variable per category.
Box & Violin Plots – Compare distributions across different categories.
Grouped Bar & Point Plots – Facilitate comparative analysis between multiple groups.

6. Matrix and Heatmap Visualizations


Seaborn includes powerful tools for visualizing matrix-like or two-dimensional data
structures.
Heatmaps (sns.heatmap()) – Represent data intensity using color gradients, useful for
correlation matrices and large datasets.
Clustermaps (sns.clustermap()) – Cluster and visualize relationships in complex
datasets.
7. Multi-Plot Grids for Advanced Analysis
Seaborn enables the creation of multi-plot grids to compare multiple variables
efficiently.
FacetGrid (sns.FacetGrid()) – Generate multiple plots based on categorical variables.
PairGrid (sns.PairGrid()) – Display pairwise relationships between different variables.
Pairplot (sns.pairplot()) – Automatically plots scatter and histogram relationships for
multiple numerical variables.

8. Time Series Visualization


Seaborn supports time series analysis by allowing direct plotting of data with
time-based indices.Line Plots (sns.lineplot()) – Track changes in variables over time
with numeric, datetime, or categorical time representations.

9. Seamless Integration with Pandas


Seaborn is designed to work directly with Pandas DataFrames, making it easy to create
plots without manually extracting arrays. It recognizes column names and automatically
maps them to appropriate variables in plots. This integration allows for efficient data
exploration and analysis.

Usage & Example Gallery ​


Seaborn VS. Matplotlib

Feature Matplotlib Seaborn

Functionality Utilized for making basic graphs. Contains several patterns and
Datasets visualized with bar graphs, plots for data visualization.
histograms, pie charts, scatter plots, Uses fascinating themes.
lines, etc. Helps compile whole data
into a single plot. Provides
data distribution.

Syntax Comparatively complex and lengthy Comparatively simple syntax,


syntax. Example: easier to learn and
matplotlib.pyplot.bar(x_axis understand. Example:
, y_axis) seaborn.barplot(x_axis
, y_axis)

Dealing Can open and use multiple figures Sets the time for figure
Multiple simultaneously, but they are closed creation, potentially leading
Figures distinctly. to out-of-memory issues.
matplotlib.pyplot.close()
(one figure) and
matplotlib.pyplot.close("all
") (all figures).
Visualization Well-connected with NumPy and More comfortable handling
Pandas. Acts as a graphics package Pandas data frames. Uses
for data visualization in Python. basic methods to provide
Pyplot provides similar features and beautiful graphics in Python.
syntax as in MATLAB.

Pliability Highly customized and robust. Avoids overlapping plots with


default themes.

Data Frames Works efficiently with data frames More functional and
and Arrays and arrays. Treats figures and axes organized, treats the whole
as objects. Stateful APIs allow dataset as a single unit. Less
plot() methods to work without stateful, requires parameters
parameters. for methods like plot().

Use Cases Plots various graphs using Pandas Extended version of


and NumPy. Matplotlib, using Matplotlib,
NumPy, and Pandas for
plotting graphs.

What is Spatial Analysis?


Spatial Analysis is a technique of building and analyzing map-based visualizations
made from GPS data, sensors, mobile devices, satellite imagery, and other sources.
Visuals can be maps, cartograms, graphs, etc. The recognizable maps make it easy to
understand and act upon. Location-based events are easily understandable using
geospatial analysis. Location aspects often dictate various trends. It goes beyond
simply displaying data on a map; it seeks to uncover patterns, relationships, and
trends that are inherently tied to location. This involves using various analytical
methods to examine the spatial distribution of phenomena, understand how they
interact, and model their behavior over space and time. Spatial analysis is crucial for
understanding complex systems, making informed decisions, and solving problems
that have a spatial component. It helps answer questions like "Where are things
located?", "Why are they located there?", and "What are the implications of their
location?".

For example, a residential area in a city having more expensive properties will likely have
people with higher incomes, and they will spend higher amounts of money.
Applications and Uses of Spatial Analysis

●​ Spatial analysis can be used to map natural resources, track weather phenomena
like rainfall, snow, humidity, air pressure, etc.

●​ For telecommunication data, geospatial analysis can help understand connection


strength, subscriber spread, and other parameters.

●​ Commercial data such as sales can be plotted on maps to analyze the most
profitable locations and make better decisions.

●​ Urban planning and city planning can take the help of map-based analysis
techniques to understand the growing population’s electricity and water needs.

●​ With the data plotted on a map, one can determine which regions need an urgent
upgrade and more supply, and all aspects of urban planning can be done easily
with proper geospatial analysis.

Introduction to Folium for Spatial Analysis


Folium is a python library that can be used to visualize geospatial data. The simple
commands in Folium make it the best choice to make plots on maps. Python Folium is
wrapper for Leaflet.js which is a leading open-source JavaScript library for plotting
interactive maps. Folium has a number of built-in tilesets from Mapbox, OpenStreetMap,
and Stamen and also supports custom tilesets.
Installation of Folium :
pip install folium

Now, after Folium is installed, we now get started.


import numpy as np
import pandas as pd

We import NumPy and pandas.


# Create a map
kol = folium.Map(location=[22.57, 88.36], tiles='openstreetmap', zoom_start=12)
Kol
We created a basic map of Kolkata in python.

To plot some interesting locations in folium, if you know the map coordinates its very
easy.

#add marker for a place

#victoria memorial
tooltip_1 = "This is Victoria Memorial"
tooltip_2 ="This is Eden Gardens"

folium.Marker(
[22.54472, 88.34273], popup="Victoria Memorial", tooltip=tooltip_1).add_to(kol)

folium.Marker(
[22.56487826917627, 88.34336378854425], popup="Eden Gardens",
tooltip=tooltip_2).add_to(kol)

Kol
folium.Marker(
location=[22.55790780507432, 88.35087264462007],
popup="Indian Museum",
icon=folium.Icon(color="red", icon="info-sign"),
).add_to(kol)

kol
Here are the results of the above code.
kol2 = folium.Map(location=[22.55790780507432, 88.35087264462007], tiles="Stamen
Toner", zoom_start=13)
kol2
Output:
Adding markers to the map serves the purpose of labelling and identifying something.
With labelling, one can mark any particular point of interest on the map.

#adding circle

folium.Circle(
location=[22.585728381244373, 88.41462932675563],
radius=1500,
popup="Salt Lake",
color="blue",
fill=True,
).add_to(kol2)

folium.Circle(
location=[22.56602918189088, 88.36508424354102],
radius=2000,
popup="Old Kolkata",
color="red",
fill=True,
).add_to(kol2)

kol2
Output:
The map is movable and interactable. Usage of circles can be used for zoning and zone
marking purposes in the case of real-life data.

# Create a map
india = folium.Map(location=[20.180862078886562, 78.77642751195584],
tiles='openstreetmap', zoom_start=5)
india
To choose any specific place on the map, we can change the coordinates and edit the
zoom_start parameter.
Output:

#adding 3 locations, Mumbai, Delhi and Kolkata


loc= [(19.035698150834815, 72.84981409864244),(28.61271068361265,
77.22359851696532) ,
(22.564213404457185, 88.35872006950966)]
We will take three cities in India, and plot a line between them.

folium.PolyLine(locations = loc,
line_opacity = 0.5).add_to(india)

india
Output:
In this way, we can plot some basic data based on coordinates.

When worked on Kaggle Dataset, having Indian states’ population centres as per 2011
census data. Let us proceed.

df_state=pd.read_csv("/kaggle/input/indian-census-data-with-geospatial-indexing/state
wise centroids_2011.csv")
df_state.head()
Output:
Plot the data which has 35 entries.

#creating a new map for India, for all states population centres to be plotted
# Create a map
india2 = folium.Map(location=[20.180862078886562, 78.77642751195584],
tiles='openstreetmap', zoom_start=4.5)
#adding the markers

for i in range (0,35):


state=df_state["State"][i]
lat=df_state["Latitude"][i]
long=df_state["Longitude"][i]
folium.Marker(
[lat, long], popup=state, tooltip=state).add_to(india2)

india2
Output:

The plot is generated, and the location of each of the markers is the population centre
for the respective state/UT.
Applications

●​ Real Estate Analysis – Visualizes neighborhood details, property prices, and


listings to help buyers and real estate professionals understand market trends.

●​ Tourism & Travel Planning – Highlights attractions, hotels, and restaurants,


assisting tourists in itinerary planning.

●​ Supply Chain & Logistics – Tracks shipment routes, distribution hubs, and
delivery statuses for better route optimization.

●​ Public Health & Epidemiology – Maps disease outbreaks, vaccination rates, and
healthcare facility locations for informed decision-making.

●​ Urban Planning & Smart Cities – Helps city planners analyze infrastructure
projects, transportation networks, and urban development.

●​ Education & Research – Acts as an interactive teaching tool for geography,


environmental science, and spatial data studies.

Now that you are familiar with folium, let us use it for our next case study which is as
mentioned below:

Case Study: An e-commerce company ‘ wants to get into logistics “Deliver4U” . It wants
to know the pattern for maximum pickup calls from different areas of the city
throughout the day. This will result in:

i) Build optimum number of stations where its pickup delivery personnel will be located.
ii) Ensure pickup personnel reaches the pickup location at the earliest possible time.

For this the company uses its existing customer data in Delhi to find the highest density
of probable pickup locations in the future.

Solution:

1) Pre-requisites : Python, Jupyter Notebooks, Pandas


2) Data set : Please download the following from the location specified by the trainer.
The dataset contains two separate data files – train_del.csv and test_del.csv. The
difference is that train_del.csv contains an additional column which is trip_duration
which we will not be needed for our present analysis.
3) Importing and pre-processing data:
a) Import libraries – Pandas and Folium. Drop the trip_duration column and combine the
2 different files as one dataframe.
We will need to generate some columns such as month or other time features using the
Datetime package of python. Let us then use it with Folium:
Please note that month, week, day, hour columns will be used next for our analysis
Note the following regarding visualizing spatial data with Folium:
• Maps are defined as folium.Map object. We will need to add other objects on top of
this before rendering
• Different map tiles for map rendered by Folium can be seen at
: https://github.com/pythonvisualization/folium/tree/master/folium/templates/tiles
• Folium.Map() : First thing to be executed when you work with Folium.
Let us define the default map object:

Let us now visualize the rides data using a class method called Heatmap()
Code for reference:
from folium.plugins import HeatMap
df_copy = df[df.month>4].copy()
df_copy['count'] = 1
base_map = generateBaseMap()
HeatMap(data=df_copy[['pickup_latitude', 'pickup_longitude',
'count']].groupby(['pickup_latitude',
'pickup_longitude']).sum().reset_index().values.tolist(), radius=8,
max_zoom=13).add_to(base_map)

Interpretation of the output:


There is high demand for cabs in areas marked by the heat map which is central Delhi
most probably and other surrounding areas.Now let us add functionality to add markers
to the map by using the folium.ClickForMarker() object.After adding the below line of
code, we can add markers on the map to recommends points where logistic pickup
stops can be built

We can also animate our heat maps to dynamically change the data on a timely basis
based on a certain dimension of time. This can be done using HeatMapWithTime(). Use
the following code :
df_hour_list = []
for hour in df_copy.hour.sort_values().unique():
df_hour_list.append(df_copy.loc[df_copy.hour == hour,['pickup_latitude',
'pickup_longitude', 'count']].groupby(['pickup_latitude',
'pickup_longitude']).sum().reset_index().values.tolist())
from folium.plugins import HeatMapWithTime
base_map = generateBaseMap(default_zoom_start=11)
HeatMapWithTime(df_hour_list, radius=5, gradient={0.2: 'blue', 0.4: 'lime', 0.6:
'orange', 1: 'red'}, min_opacity=0.5, max_opacity=0.8,
use_local_extrema=True).add_to(base_map)
Base_map

Conclusion
Throughout the city, pickups are more probable from the central area so it is better to
set a lot of pickup stops at these locations. Therefore, by using maps we can highlight
trends and uncover patterns and derive insights from the data.

You might also like