11/3/23, 5:23 AM Seaborn.
ipynb - Colaboratory
Visualization with Seaborn
Matplotlib has proven to be an incredibly useful and popular visualization tool, but even avid
users will admit it often leaves much to be desired. There are several valid complaints about
Matplotlib that often come up:
Prior to version 2.0, Matplotlib's defaults are not exactly the best choices.
Matplotlib's API is relatively low level. Doing sophisticated statistical visualization is
possible, but often requires a lot of redundant code.
Matplotlib predated Pandas by more than a decade, and thus is not designed for use with
Pandas DataFrame s. In order to visualize data from a Pandas DataFrame , you must extract
each Series and often concatenate them together into the right format. It would be nicer
to have a plotting library that can intelligently use the DataFrame labels in a plot.
An answer to these problems is [Seaborn]. Seaborn provides an API on top of Matplotlib that
offers sane choices for plot style and color defaults, defines simple high-level functions for
common statistical plot types, and integrates with the functionality provided by Pandas
DataFrame s.
Exploring Seaborn Plots
Histograms, KDE, and densities
Often in statistical data visualization, all you want is to plot histograms and joint distributions of
variables. We have seen that this is relatively straightforward in Matplotlib:
# Import necessary libraries
import seaborn as sns
import numpy as np
import pandas as pd
# Generating dataset of random numbers
x = np.random.randn(200)
x = pd.Series(x, name = "Numerical Variable")
# Plot histograms witout the density estimate
sns.histplot(x, kde = False)
https://colab.research.google.com/drive/1UOWXvbwioSb7EnnUYklNyqdM1htuCLre?usp=sharing#printMode=true 1/8
11/3/23, 5:23 AM Seaborn.ipynb - Colaboratory
<matplotlib.axes._subplots.AxesSubplot at 0x7eff2f436e20>
By default kde parameter of seaborn.histplot is set to false. So, by setting the kde to true, a
kernel density estimate is computed to smooth the distribution and a density plotline is drawn.
# Plot histograms with density estimate
sns.histplot(x, kde = True)
<matplotlib.axes._subplots.AxesSubplot at 0x7eff31a29c70>
#Plots the kde alone
sns.kdeplot(x, shade=False)
<matplotlib.axes._subplots.AxesSubplot at 0x7eff3f3a4070>
https://colab.research.google.com/drive/1UOWXvbwioSb7EnnUYklNyqdM1htuCLre?usp=sharing#printMode=true 2/8
11/3/23, 5:23 AM Seaborn.ipynb - Colaboratory
There are other parameters that can be passed to jointplot —for example, we can use a
hexagonally based histogram instead:
Pair plots
To plot multiple pairwise bivariate distributions in a dataset, you can use the .pairplot() function.
The diagonal plots are the univariate plots, and this displays the relationship for the (n, 2)
combination of variables in a DataFrame as a matrix of plots.
iris = sns.load_dataset("iris")
iris.head()
sepal_length sepal_width petal_length petal_width species
0 5.1 3.5 1.4 0.2 setosa
1 4.9 3.0 1.4 0.2 setosa
2 4.7 3.2 1.3 0.2 setosa
3 4.6 3.1 1.5 0.2 setosa
4 5.0 3.6 1.4 0.2 setosa
Visualizing the multidimensional relationships among the samples is as easy as calling
sns.pairplot :
sns.pairplot(iris, hue='species', size=2.5);
https://colab.research.google.com/drive/1UOWXvbwioSb7EnnUYklNyqdM1htuCLre?usp=sharing#printMode=true 3/8
11/3/23, 5:23 AM Seaborn.ipynb - Colaboratory
/usr/local/lib/python3.8/dist-packages/seaborn/axisgrid.py:2076: UserWarning: The `si
warnings.warn(msg, UserWarning)
Faceted histograms
Sometimes the best way to view data is via histograms of subsets. Seaborn's FacetGrid makes
this extremely simple. We'll take a look at some data that shows the amount that restaurant staff
receive in tips based on various indicator data:
tips = sns.load_dataset('tips')
tips.head()
total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4
#plotting tip % as histogram
import matplotlib.pyplot as plt
tips['tip_pct'] = 100 * tips['tip'] / tips['total_bill']
grid = sns.FacetGrid(tips, row="sex", col="time", margin_titles=True)
grid.map(plt.hist, "tip_pct", bins=np.linspace(0, 40, 15));
https://colab.research.google.com/drive/1UOWXvbwioSb7EnnUYklNyqdM1htuCLre?usp=sharing#printMode=true 4/8
11/3/23, 5:23 AM Seaborn.ipynb - Colaboratory
Factor plots
Factor plots can be useful for this kind of visualization as well. This allows you to view the
distribution of a parameter within bins defined by any other parameter:
A box and whisker plot—also called a box plot—displays the five-number summary of a set of
data. The five-number summary is the minimum, first quartile, median, third quartile, and
maximum. In a box plot, we draw a box from the first quartile to the third quartile. A vertical line
goes through the box at the median.
#Box plot
sns.factorplot("day", "total_bill", "sex", data=tips, kind="box")
plt.show()
https://colab.research.google.com/drive/1UOWXvbwioSb7EnnUYklNyqdM1htuCLre?usp=sharing#printMode=true 5/8
11/3/23, 5:23 AM Seaborn.ipynb - Colaboratory
/usr/local/lib/python3.8/dist-packages/seaborn/categorical.py:3717: UserWarning: The
warnings.warn(msg)
/usr/local/lib/python3.8/dist-packages/seaborn/_decorators.py:36: FutureWarning: Pass
warnings.warn(
#Bar plot
sns.factorplot("day", data=tips, kind="count")
plt.show()
/usr/local/lib/python3.8/dist-packages/seaborn/_decorators.py:36: FutureWarning: Pass
warnings.warn(
#Violin plot
sns.factorplot("day", "total_bill", "sex", data=tips, kind="violin")
plt.show()
https://colab.research.google.com/drive/1UOWXvbwioSb7EnnUYklNyqdM1htuCLre?usp=sharing#printMode=true 6/8
11/3/23, 5:23 AM Seaborn.ipynb - Colaboratory
/usr/local/lib/python3.8/dist-packages/seaborn/categorical.py:3717: UserWarning: The
warnings.warn(msg)
/usr/local/lib/python3.8/dist-packages/seaborn/_decorators.py:36: FutureWarning: Pass
warnings.warn(
#Plotting a pie chart
plt.figure(figsize=[9,7])
tips['size'].value_counts().plot.pie()
plt.show()
Advantages of Seaborn: By using the seaborn library, we can easily represent our data on a plot.
This library is used to visualize our data; we do not need to take care of the internal details; we
just have to pass our data set or data inside the relplot() function, and it will calculate and place
the value accordingly.
Inside this, we can switch to any other representation of data using the ‘kind’ property inside it.
It creates an interactive and informative plot to representation our data; also, this is easy for the
user to understand and visualize the records on the application.
It uses static aggregation for plot generation in python. As it is based on the matplotlib so while
installing seaborn, we also have other libraries installed, out of which we have matplotlib, which
also provides several features and functions to create more interactive plots in python.
https://colab.research.google.com/drive/1UOWXvbwioSb7EnnUYklNyqdM1htuCLre?usp=sharing#printMode=true 7/8
11/3/23, 5:23 AM Seaborn.ipynb - Colaboratory
https://colab.research.google.com/drive/1UOWXvbwioSb7EnnUYklNyqdM1htuCLre?usp=sharing#printMode=true 8/8