Simple Line Plots:-
Perhaps the simplest of all plots is the visualization of a single function y=f(x). Here we will take a
first look at creating a simple plot of this type. As with all the following sections, we'll start by setting up
the notebook for plotting and importing the packages we will use:
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import numpy as np
fig = plt.figure()
ax = plt.axes()
In Matplotlib, the figure (an instance of the class plt.Figure) can be thought of as a single container that
contains all the objects representing axes, graphics, text, and labels. The axes (an instance of the
class plt.Axes) is what we see above: a bounding box with ticks and labels, which will eventually
contain the plot elements that make up our visualization. Throughout this book, we'll commonly use the
variable name fig to refer to a figure instance, and ax to refer to an axes instance or group of axes
instances.
Once we have created an axes, we can use the ax.plot function to plot some data. Let's start with a
simple sinusoid:
fig = plt.figure()
ax = plt.axes()
x = np.linspace(0, 10, 1000)
ax.plot(x, np.sin(x));
If we want to create a single figure with multiple lines, we can simply call the plot function
multiple times
plt.plot(x, np.sin(x))
plt.plot(x, np.cos(x));
Scatter Plot:-
Another commonly used plot type is the simple scatter plot, a close cousin of the line plot.
Instead of points being joined by line segments, here the points are represented individually
with a dot, circle, or other shape. We’ll start by setting up the notebook for plotting and
importing the functions we will use:
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import numpy as np
x = np.linspace(0, 10, 30)
y = np.sin(x)
plt.plot(x, y, 'o', color='black');
A second, more powerful method of creating scatter plots is the plt.scatter function, which can be used very
similarly to the plt.plot function:
plt.scatter(x, y, marker='o');
The primary difference of plt.scatter from plt.plot is that it can be used to create scatter plots where the
properties of each individual point (size, face color, edge color, etc.) can be individually controlled or mapped to
data.
Visualizing Errors
For any scientific measurement, accurate accounting for errors is nearly as important, if not
more important, than accurate reporting of the number itself. For example, imagine that I am
using some astrophysical observations to estimate the Hubble Constant, the local
measurement of the expansion rate of the Universe. I know that the current literature suggests
a value of around 71 (km/s)/Mpc, and I measure a value of 74 (km/s)/Mpc with my method. Are
the values consistent? The only correct answer, given this information, is this: there is no way
to know.
In visualization of data and results, showing these errors effectively can make a plot convey
much more complete information.
Basic Errorbars
A basic errorbar can be created with a single Matplotlib function call:
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import numpy as np
x = np.linspace(0, 10, 50)
dy = 0.8
y = np.sin(x) + dy * np.random.randn(50)
plt.errorbar(x, y, yerr=dy, fmt='.k');
Here the fmt is a format code controlling the appearance of lines and points, and has the same
syntax as the shorthand used in plt.plot, outlined in Simple Line Plots and Simple Scatter
Plots.
In addition to these basic options, the errorbar function has many options to fine-tune the
outputs. Using these additional options you can easily customize the aesthetics of your
errorbar plot. I often find it helpful, especially in crowded plots, to make the errorbars lighter
than the points themselves:
plt.errorbar(x, y, yerr=dy, fmt='o', color='black',
ecolor='lightgray', elinewidth=3, capsize=0);
Density and Contour Plots
Sometimes it is useful to display three-dimensional data in two dimensions using contours or
color-coded regions. There are three Matplotlib functions that can be helpful for this
task: plt.contour for contour plots, plt.contourf for filled contour plots, and plt.imshow for
showing images. This section looks at several examples of using these. We'll start by setting
up the notebook for plotting and importing the functions we will use:
A contour plot can be created with the plt.contour function. It takes three arguments: a grid
of x values, a grid of y values, and a grid of z values. The x and y values represent positions
on the plot, and the z values will be represented by the contour levels. Perhaps the most
straightforward way to prepare such data is to use the np.meshgrid function, which builds two-
dimensional grids from one-dimensional arrays:
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('seaborn-white')
import numpy as np
def f(x, y):
return np.sin(x) ** 10 + np.cos(10 + y * x) * np.cos(x)
x = np.linspace(0, 5, 50)
y = np.linspace(0, 5, 40)
X, Y = np.meshgrid(x, y)
Z = f(X, Y)
plt.contour(X, Y, Z, colors='black');
Histograms
A simple histogram can be a great first step in understanding a dataset. Earlier, we saw a
preview of Matplotlib's histogram function (see Comparisons, Masks, and Boolean Logic),
which creates a basic histogram in one line, once the normal boiler-plate imports are done:
The hist() function has many options to tune both the calculation and the display;
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('seaborn-white')
data = np.random.randn(1000)
plt.hist(data);
Customizing Plot Legends
Plot legends give meaning to a visualization, assigning meaning to the various plot elements.
We previously saw how to create a simple legend; here we'll take a look at customizing the
placement and aesthetics of the legend in Matplotlib.
The simplest legend can be created with the plt.legend() command, which automatically
creates a legend for any labeled plot elements:
import matplotlib.pyplot as plt
plt.style.use('classic')
%matplotlib inline
import numpy as np
x = np.linspace(0, 10, 1000)
fig, ax = plt.subplots()
ax.plot(x, np.sin(x), '-b', label='Sine')
ax.plot(x, np.cos(x), '--r', label='Cosine')
ax.axis('equal')
leg = ax.legend();
Customizing Colorbars
In Matplotlib, a colorbar is a separate axes that can provide a key for the meaning of colors in a plot.
Because the book is printed in black-and-white, this section has an accompanying online supplement
where you can view the figures in full color
import matplotlib.pyplot as plt
plt.style.use('classic')
%matplotlib inline
import numpy as np
x = np.linspace(0, 10, 1000)
I = np.sin(x) * np.cos(x[:, np.newaxis])
plt.imshow(I)
plt.colorbar();
Multiple Subplots
Sometimes it is helpful to compare different views of data side by side. To this end, Matplotlib has the
concept of subplots: groups of smaller axes that can exist together within a single figure. These subplots
might be insets, grids of plots, or other more complicated layouts. In this section we'll explore four
routines for creating subplots in Matplotlib.
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('seaborn-white')
import numpy as np
ax1 = plt.axes() # standard axes
ax2 = plt.axes([0.65, 0.65, 0.2, 0.2])
Text and Annotation
Creating a good visualization involves guiding the reader so that the figure tells a story. In
some cases, this story can be told in an entirely visual manner, without the need for added
text, but in others, small textual cues and labels are necessary. Perhaps the most basic types
of annotations you will use are axes labels and titles, but the options go beyond this. Let's take
a look at some data and how we might visualize and annotate it to help convey interesting
information. We'll start by setting up the notebook for plotting and importing the functions we
will use:
%matplotlib inline
import matplotlib.pyplot as plt
import matplotlib as mpl
plt.style.use('seaborn-whitegrid')
import numpy as np
import pandas as pd
births = pd.read_csv('data/births.csv')
quartiles = np.percentile(births['births'], [25, 50, 75])
mu, sig = quartiles[1], 0.74 * (quartiles[2] - quartiles[0])
births = births.query('(births > @mu - 5 * @sig) & (births < @mu + 5 * @sig)')
births['day'] = births['day'].astype(int)
births.index = pd.to_datetime(10000 * births.year +
100 * births.month +
births.day, format='%Y%m%d')
births_by_date = births.pivot_table('births',
[births.index.month, births.index.day])
births_by_date.index = [pd.datetime(2012, month, day)
for (month, day) in births_by_date.index]
fig, ax = plt.subplots(figsize=(12, 4))
births_by_date.plot(ax=ax);
Three-Dimensional Plotting in Matplotlib
Matplotlib was initially designed with only two-dimensional plotting in mind. Around the time of the 1.0
release, some three-dimensional plotting utilities were built on top of Matplotlib's two-dimensional
display, and the result is a convenient (if somewhat limited) set of tools for three-dimensional data
visualization. three-dimensional plots are enabled by importing the mplot3d toolkit, included with the
main Matplotlib installation:
from mpl_toolkits import mplot3d
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
fig = plt.figure()
ax = plt.axes(projection='3d')
With this three-dimensional axes enabled, we can now plot a variety of three-dimensional plot
types. Three-dimensional plotting is one of the functionalities that benefits immensely from
viewing figures interactively rather than statically in the notebook; recall that to use interactive
figures, you can use %matplotlib notebook rather than %matplotlib inline when running this
code.
Geographic Data with Basemap
One common type of visualization in data science is that of geographic data. Matplotlib's main
tool for this type of visualization is the Basemap toolkit, which is one of several Matplotlib
toolkits which lives under the mpl_toolkits namespace. Admittedly, Basemap feels a bit
clunky to use, and often even simple visualizations take much longer to render than you might
hope. More modern solutions such as leaflet or the Google Maps API may be a better choice
for more intensive map visualizations. Still, Basemap is a useful tool for Python users to have
in their virtual toolbelts. In this section, we'll show several examples of the type of map
visualization that is possible with this toolkit.
$ conda install basemap
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
plt.figure(figsize=(8, 8))
m = Basemap(projection='ortho', resolution=None, lat_0=50, lon_0=-100)
m.bluemarble(scale=0.5);
Visualization with Seaborn
eaborn provides an API on top of Matplotlib that offers sane choices for plot style and color
defaults, defines simple high-level functions for common statistical plot types, and integrates
with the functionality provided by Pandas DataFrames.
To be fair, the Matplotlib team is addressing this: it has recently added the plt.style tools
discussed in Customizing Matplotlib: Configurations and Style Sheets, and is starting to handle
Pandas data more seamlessly. The 2.0 release of the library will include a new default
stylesheet that will improve on the current status quo. But for all the reasons just discussed,
Seaborn remains an extremely useful addon.
The main idea of Seaborn is that it provides high-level commands to create a variety of plot
types useful for statistical data exploration, and even some statistical model fitting.
import seaborn as sns
sns.set()
# same plotting code as above!
plt.plot(x, y)
plt.legend('ABCDEF', ncol=2, loc='upper left');