SCHOOL OF SOFTWARE ENGINEERING OF USTC
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
Matplotlib
A tutorial
Devert Alexandre
School of Software Engineering of USTC
30 November 2012  Slide 1/44
SCHOOL OF SOFTWARE ENGINEERING OF USTC
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
Table of Contents
1
First steps
Curve plots
Scatter plots
Boxplots
Histograms
Usage example
Devert Alexandre (School of Software Engineering of USTC)  Matplotlib  Slide 2/44
SCHOOL OF SOFTWARE ENGINEERING OF USTC
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
Curve plot
Lets plot a curve
i m p o r t math
import m a t p l o t l i b . pyplot as p l t
# Generate a s i n u s o i d
n b S a m p l e s = 256
xRange = (math . p i , math . p i )
x , y = [] , []
f o r n i n xrange ( nbSamples ) :
k = ( n + 0 . 5 ) / nbSamples
x . append ( xRange [ 0 ] + ( xRange [ 1 ]  xRange [ 0 ] )  k )
y . append ( math . s i n ( x [  1 ] ) )
# Plot the s i n u s o i d
plt . plot (x , y)
p l t . show ( )
Devert Alexandre (School of Software Engineering of USTC)  Matplotlib  Slide 3/44
SCHOOL OF SOFTWARE ENGINEERING OF USTC
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
Curve plot
This will show you something like this
1.0
0.5
0.0
0.5
1.0 4
Devert Alexandre (School of Software Engineering of USTC)  Matplotlib  Slide 4/44
SCHOOL OF SOFTWARE ENGINEERING OF USTC
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
numpy
matplotlib can work with numpy arrays
i m p o r t math
i m p o r t numpy
import m a t p l o t l i b . pyplot as p l t
# Generate a s i n u s o i d
n b S a m p l e s = 256
xRange = (math . p i , math . p i )
x , y = numpy . z e r o s ( n b S a m p l e s ) , numpy . z e r o s ( n b S a m p l e s )
f o r n i n xrange ( nbSamples ) :
k = ( n + 0 . 5 ) / nbSamples
x [ n ] = xRange [ 0 ] + ( xRange [ 1 ]  xRange [ 0 ] )  k
y [ n ] = math . s i n ( x [ n ] )
# Plot the s i n u s o i d
plt . plot (x , y)
p l t . show ( )
numpy provides a lot of function and is efficient
Devert Alexandre (School of Software Engineering of USTC)  Matplotlib  Slide 5/44
SCHOOL OF SOFTWARE ENGINEERING OF USTC
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
numpy
 zeros build arrays filled of 0
 linspace build arrays filled with an arithmetic
sequence
i m p o r t math
i m p o r t numpy
import m a t p l o t l i b . pyplot as p l t
# Generate a s i n u s o i d
x = numpy . l i n s p a c e (math . p i , math . p i , num=256)
y = numpy . z e r o s ( n b S a m p l e s )
f o r n i n xrange ( nbSamples ) :
y [ n ] = math . s i n ( x [ n ] )
# Plot the s i n u s o i d
plt . plot (x , y)
p l t . show ( )
Devert Alexandre (School of Software Engineering of USTC)  Matplotlib  Slide 6/44
SCHOOL OF SOFTWARE ENGINEERING OF USTC
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
numpy
numpy functions can work on entire arrays
i m p o r t math
i m p o r t numpy
import m a t p l o t l i b . pyplot as p l t
# Generate a s i n u s o i d
x = numpy . l i n s p a c e (math . p i , math . p i , num=256)
y = numpy . s i n ( x )
# Plot the s i n u s o i d
plt . plot (x , y)
p l t . show ( )
Devert Alexandre (School of Software Engineering of USTC)  Matplotlib  Slide 7/44
SCHOOL OF SOFTWARE ENGINEERING OF USTC
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
PDF output
Exporting to a PDF file is just one change
i m p o r t math
i m p o r t numpy
import m a t p l o t l i b . pyplot as p l t
# Generate a s i n u s o i d
x = numpy . l i n s p a c e (math . p i , math . p i , num=256)
y = numpy . s i n ( x )
# Plot the s i n u s o i d
plt . plot (x , y)
p l t . s a v e f i g (  s i n p l o t . p d f  , t r a n s p a r e n t=True )
Devert Alexandre (School of Software Engineering of USTC)  Matplotlib  Slide 8/44
SCHOOL OF SOFTWARE ENGINEERING OF USTC
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
Table of Contents
1
First steps
Curve plots
Scatter plots
Boxplots
Histograms
Usage example
Devert Alexandre (School of Software Engineering of USTC)  Matplotlib  Slide 9/44
SCHOOL OF SOFTWARE ENGINEERING OF USTC
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
Multiple curves
Its often convenient to show several curves in one figure
i m p o r t math
i m p o r t numpy
import m a t p l o t l i b . pyplot as p l t
#
x
y
z
G e n e r a t e a s i n u s o i d and a c o s i n o i d
= numpy . l i n s p a c e (math . p i , math . p i , num=256)
= numpy . s i n ( x )
= numpy . c o s ( x )
# Plot the curves
plt . plot (x , y)
plt . plot (x , z)
p l t . show ( )
Devert Alexandre (School of Software Engineering of USTC)  Matplotlib  Slide 10/44
SCHOOL OF SOFTWARE ENGINEERING OF USTC
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
Multiple curves
Its often convenient to show several curves in one figure
1.0
0.5
0.0
0.5
1.0 4
Devert Alexandre (School of Software Engineering of USTC)  Matplotlib  Slide 10/44
SCHOOL OF SOFTWARE ENGINEERING OF USTC
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
Custom colors
Changing colors can help to make nice documents
i m p o r t math
i m p o r t numpy
import m a t p l o t l i b . pyplot as p l t
#
x
y
z
G e n e r a t e a s i n u s o i d and a c o s i n o i d
= numpy . l i n s p a c e (math . p i , math . p i , num=256)
= numpy . s i n ( x )
= numpy . c o s ( x )
# Plot the curves
p l t . p l o t ( x , y , c= #FF4500  )
p l t . p l o t ( x , z , c=  #4682B4  )
p l t . show ( )
Devert Alexandre (School of Software Engineering of USTC)  Matplotlib  Slide 11/44
SCHOOL OF SOFTWARE ENGINEERING OF USTC
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
Custom colors
Changing colors can help to make nice documents
1.0
0.5
0.0
0.5
1.0 4
Devert Alexandre (School of Software Engineering of USTC)  Matplotlib  Slide 11/44
SCHOOL OF SOFTWARE ENGINEERING OF USTC
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
Line thickness
Line thickness can be changed as well
i m p o r t math
i m p o r t numpy
import m a t p l o t l i b . pyplot as p l t
#
x
y
z
G e n e r a t e a s i n u s o i d and a c o s i n o i d
= numpy . l i n s p a c e (math . p i , math . p i , num=256)
= numpy . s i n ( x )
= numpy . c o s ( x )
# Plot the curves
p l t . p l o t ( x , y , l i n e w i d t h =3 , c= #FF4500  )
p l t . p l o t ( x , z , c=  #4682B4  )
p l t . show ( )
Devert Alexandre (School of Software Engineering of USTC)  Matplotlib  Slide 12/44
SCHOOL OF SOFTWARE ENGINEERING OF USTC
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
Line thickness
Line thickness can be changed as well
1.0
0.5
0.0
0.5
1.0 4
Devert Alexandre (School of Software Engineering of USTC)  Matplotlib  Slide 12/44
SCHOOL OF SOFTWARE ENGINEERING OF USTC
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
Line patterns
For printed document, colors can be replaced by line
patterns
i m p o r t math
i m p o r t numpy
import m a t p l o t l i b . pyplot as p l t
# L i n e s t y l e s can be    , ,   .  ,
#
x
y
z
G e n e r a t e a s i n u s o i d and a c o s i n o i d
= numpy . l i n s p a c e (math . p i , math . p i , num=256)
= numpy . s i n ( x )
= numpy . c o s ( x )
# Plot the curves
p l t . p l o t ( x , y , l i n e s t y l e =  , c=  #000000  )
p l t . p l o t ( x , z , c=  #808080  )
p l t . show ( )
Devert Alexandre (School of Software Engineering of USTC)  Matplotlib  Slide 13/44
SCHOOL OF SOFTWARE ENGINEERING OF USTC
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
Line patterns
For printed document, colors can be replaced by line
patterns
1.0
0.5
0.0
0.5
1.0 4
Devert Alexandre (School of Software Engineering of USTC)  Matplotlib  Slide 13/44
SCHOOL OF SOFTWARE ENGINEERING OF USTC
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
Markers
It sometime relevant to show the data points
i m p o r t math
i m p o r t numpy
import m a t p l o t l i b . pyplot as p l t
# M a r k e r s can be
#
x
y
z
.  ,
,,
o  ,
 1  and more
G e n e r a t e a s i n u s o i d and a c o s i n o i d
= numpy . l i n s p a c e (math . p i , math . p i , num=64)
= numpy . s i n ( x )
= numpy . c o s ( x )
# Plot the curves
p l t . p l o t ( x , y , m a r k e r=  1  , m a r k e r s i z e =15 , c=  #000000  )
p l t . p l o t ( x , z , c=  #000000  )
p l t . show ( )
Devert Alexandre (School of Software Engineering of USTC)  Matplotlib  Slide 14/44
SCHOOL OF SOFTWARE ENGINEERING OF USTC
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
Markers
It sometime relevant to show the data points
1.0
0.5
0.0
0.5
1.0 4
Devert Alexandre (School of Software Engineering of USTC)  Matplotlib  Slide 14/44
SCHOOL OF SOFTWARE ENGINEERING OF USTC
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
Legend
A legend can help to make selfexplanatory figures
i m p o r t math
i m p o r t numpy
import m a t p l o t l i b . pyplot as p l t
# l e g e n d l o c a t i o n can be  b e s t  ,
#
x
y
z
 center  ,
 left ,
 right  , etc .
G e n e r a t e a s i n u s o i d and a c o s i n o i d
= numpy . l i n s p a c e (math . p i , math . p i , num=256)
= numpy . s i n ( x )
= numpy . c o s ( x )
# Plot the curves
p l t . p l o t ( x , y , c= #FF4500  , l a b e l=  s i n ( x )  )
p l t . p l o t ( x , z , c=  #4682B4  , l a b e l=  c o s ( x )  )
p l t . l e g e n d ( l o c=  b e s t  )
p l t . show ( )
Devert Alexandre (School of Software Engineering of USTC)  Matplotlib  Slide 15/44
SCHOOL OF SOFTWARE ENGINEERING OF USTC
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
Legend
A legend can help to make selfexplanatory figures
1.0
sin(x)
cos(x)
0.5
0.0
0.5
1.0 4
Devert Alexandre (School of Software Engineering of USTC)  Matplotlib  Slide 15/44
SCHOOL OF SOFTWARE ENGINEERING OF USTC
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
Custom axis scale
Changing the axis scale can improve readability
i m p o r t math
i m p o r t numpy
import m a t p l o t l i b . pyplot as p l t
# l e g e n d l o c a t i o n can be  b e s t  ,
#
x
y
z
 center  ,
 left ,
 right  , etc .
G e n e r a t e a s i n u s o i d and a c o s i n o i d
= numpy . l i n s p a c e (math . p i , math . p i , num=256)
= numpy . s i n ( x )
= numpy . c o s ( x )
# Axis setup
f i g = plt . figure ()
a x i s = f i g . add subplot (111)
a x i s . s e t y l i m ( 0.5  math . p i , 0 . 5  math . p i )
# Plot the curves
p l t . p l o t ( x , y , c= #FF4500  , l a b e l=  s i n ( x )  )
p l t . p l o t ( x , z , c=  #4682B4  , l a b e l=  c o s ( x )  )
p l t . l e g e n d ( l o c=  b e s t  )
p l t . show ( )
Devert Alexandre (School of Software Engineering of USTC)  Matplotlib  Slide 16/44
SCHOOL OF SOFTWARE ENGINEERING OF USTC
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
Custom axis scale
Changing the axis scale can improve readability
1.5
sin(x)
cos(x)
1.0
0.5
0.0
0.5
1.0
1.5
Devert Alexandre (School of Software Engineering of USTC)  Matplotlib  Slide 16/44
SCHOOL OF SOFTWARE ENGINEERING OF USTC
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
Grid
Same goes for a grid, can be helpful
i m p o r t math
i m p o r t numpy
import m a t p l o t l i b . pyplot as p l t
# l e g e n d l o c a t i o n can be  b e s t  ,
#
x
y
z
 center  ,
 left ,
 right  , etc .
G e n e r a t e a s i n u s o i d and a c o s i n o i d
= numpy . l i n s p a c e (math . p i , math . p i , num=256)
= numpy . s i n ( x )
= numpy . c o s ( x )
# Axis setup
f i g = plt . figure ()
a x i s = f i g . add subplot (111)
a x i s . s e t y l i m ( 0.5  math . p i , 0 . 5  math . p i )
a x i s . g r i d ( True )
# Plot the curves
p l t . p l o t ( x , y , c= #FF4500  , l a b e l=  s i n ( x )  )
p l t . p l o t ( x , z , c=  #4682B4  , l a b e l=  c o s ( x )  )
p l t . l e g e n d ( l o c=  b e s t  )
p l t . show ( )
Devert Alexandre (School of Software Engineering of USTC)  Matplotlib  Slide 17/44
SCHOOL OF SOFTWARE ENGINEERING OF USTC
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
Grid
Same goes for a grid, can be helpful
1.5
sin(x)
cos(x)
1.0
0.5
0.0
0.5
1.0
1.5
Devert Alexandre (School of Software Engineering of USTC)  Matplotlib  Slide 17/44
SCHOOL OF SOFTWARE ENGINEERING OF USTC
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
Error bars
Your data might come with a known measure error
import
import
import
import
math
numpy
numpy . random
m a t p l o t l i b . pyplot as p l t
# Generate a noisy s i n u s o i d
x = numpy . l i n s p a c e (math . p i , math . p i , num=48)
y = numpy . s i n ( x + 0 . 0 5  numpy . random . s t a n d a r d n o r m a l ( l e n ( x ) ) )
y e r r o r = 0 . 1  numpy . random . s t a n d a r d n o r m a l ( l e n ( x ) )
# Axis setup
f i g = plt . figure ()
a x i s = f i g . add subplot (111)
a x i s . s e t y l i m ( 0.5  math . p i , 0 . 5  math . p i )
# Plot the curves
p l t . p l o t ( x , y , c= #FF4500  )
p l t . e r r o r b a r ( x , y , y e r r=y e r r o r )
p l t . show ( )
Devert Alexandre (School of Software Engineering of USTC)  Matplotlib  Slide 18/44
SCHOOL OF SOFTWARE ENGINEERING OF USTC
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
Error bars
Your data might come with a known measure error
1.5
1.0
0.5
0.0
0.5
1.0
1.5
Devert Alexandre (School of Software Engineering of USTC)  Matplotlib  Slide 18/44
SCHOOL OF SOFTWARE ENGINEERING OF USTC
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
Table of Contents
1
First steps
Curve plots
Scatter plots
Boxplots
Histograms
Usage example
Devert Alexandre (School of Software Engineering of USTC)  Matplotlib  Slide 19/44
SCHOOL OF SOFTWARE ENGINEERING OF USTC
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
Scatter plot
A scatter plot just shows one point for each dataset entry
i m p o r t numpy
i m p o r t numpy . random
import m a t p l o t l i b . pyplot as p l t
# G e n e r a t e a 2d n o r m a l d i s t r i b u t i o n
n b P o i n t s = 100
x = numpy . random . s t a n d a r d n o r m a l ( n b P o i n t s )
y = numpy . random . s t a n d a r d n o r m a l ( n b P o i n t s )
# Plot the points
plt . scatter (x , y)
p l t . show ( )
Devert Alexandre (School of Software Engineering of USTC)  Matplotlib  Slide 20/44
SCHOOL OF SOFTWARE ENGINEERING OF USTC
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
Scatter plot
A scatter plot just shows one point for each dataset entry
5
4
3
2
1
0
1
2
3
43
Devert Alexandre (School of Software Engineering of USTC)  Matplotlib  Slide 20/44
SCHOOL OF SOFTWARE ENGINEERING OF USTC
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
Aspect ratio
If can be very important to have the same scale on both
axis
i m p o r t numpy
i m p o r t numpy . random
import m a t p l o t l i b . pyplot as p l t
# G e n e r a t e a 2d n o r m a l d i s t r i b u t i o n
n b P o i n t s = 100
x = numpy . random . s t a n d a r d n o r m a l ( n b P o i n t s )
y = 0 . 1  numpy . random . s t a n d a r d n o r m a l ( n b P o i n t s )
# Axis setup
f i g = plt . figure ()
a x i s = f i g . a d d s u b p l o t ( 1 1 1 , a s p e c t=  e q u a l  )
# Plot the points
p l t . s c a t t e r ( x , y , c= #FF4500  )
p l t . show ( )
Devert Alexandre (School of Software Engineering of USTC)  Matplotlib  Slide 21/44
SCHOOL OF SOFTWARE ENGINEERING OF USTC
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
Aspect ratio
If can be very important to have the same scale on both
axis
0.3
0.2
0.1
0.0
0.1
0.2
0.3
3
Devert Alexandre (School of Software Engineering of USTC)  Matplotlib  Slide 21/44
SCHOOL OF SOFTWARE ENGINEERING OF USTC
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
Aspect ratio
Alternative way to keep the same scale on both axis
i m p o r t numpy
i m p o r t numpy . random
import m a t p l o t l i b . pyplot as p l t
# G e n e r a t e a 2d n o r m a l d i s t r i b u t i o n
n b P o i n t s = 100
x = numpy . random . s t a n d a r d n o r m a l ( n b P o i n t s )
y = 0 . 1  numpy . random . s t a n d a r d n o r m a l ( n b P o i n t s )
# Axis setup
f i g = plt . figure ()
a x i s = f i g . add subplot (111)
cmin , cmax = min ( min ( x ) , min ( y ) ) , max ( max ( x ) , max ( y ) )
cmin = 0 . 0 5  ( cmax  cmin )
cmax += 0 . 0 5  ( cmax  cmin )
a x i s . s e t x l i m ( cmin , cmax )
a x i s . s e t y l i m ( cmin , cmax )
# Plot the points
p l t . s c a t t e r ( x , y , c= #FF4500  )
p l t . show ( )
Devert Alexandre (School of Software Engineering of USTC)  Matplotlib  Slide 22/44
SCHOOL OF SOFTWARE ENGINEERING OF USTC
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
Aspect ratio
Alternative way to keep the same scale on both axis
2
1
0
1
2
2
Devert Alexandre (School of Software Engineering of USTC)  Matplotlib  Slide 22/44
SCHOOL OF SOFTWARE ENGINEERING OF USTC
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
Multiple scatter plots
As for curve, you can show 2 datasets on one figure
i m p o r t numpy
i m p o r t numpy . random
import m a t p l o t l i b . pyplot as p l t
c o l o r s = ( #FF4500  , #3CB371  ,  #4682B4  , #DB7093  , #FFD700  )
# G e n e r a t e a 2d n o r m a l d i s t r i b u t i o n
n b P o i n t s = 100
x , y = [] ,
[]
x += [ numpy . random . s t a n d a r d n o r m a l ( n b P o i n t s ) ]
y += [ 0 . 2 5  numpy . random . s t a n d a r d n o r m a l ( n b P o i n t s ) ]
x += [ 0 . 5  numpy . random . s t a n d a r d n o r m a l ( n b P o i n t s ) + 3 . 0 ]
y += [ 2  numpy . random . s t a n d a r d n o r m a l ( n b P o i n t s ) + 2 . 0 ]
# Axis setup
f i g = plt . figure ()
a x i s = f i g . a d d s u b p l o t ( 1 1 1 , a s p e c t=  e q u a l  )
# Plot the points
f o r i in xrange ( len ( x ) ) :
p l t . s c a t t e r ( x [ i ] , y [ i ] , c=c o l o r s [ i % l e n ( c o l o r s ) ] )
p l t . show ( )
Devert Alexandre (School of Software Engineering of USTC)  Matplotlib  Slide 23/44
SCHOOL OF SOFTWARE ENGINEERING OF USTC
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
Multiple scatter plots
As for curve, you can show 2 datasets on one figure
8
6
4
2
0
2
43
1 0
Devert Alexandre (School of Software Engineering of USTC)  Matplotlib  Slide 23/44
SCHOOL OF SOFTWARE ENGINEERING OF USTC
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
Showing centers
It can help to see the centers or the median points
i m p o r t numpy
i m p o r t numpy . random
import m a t p l o t l i b . pyplot as p l t
c o l o r s = ( #FF4500  , #3CB371  ,  #4682B4  , #DB7093  , #FFD700  )
# G e n e r a t e a 2d n o r m a l d i s t r i b u t i o n
n b P o i n t s = 100
x , y = [] ,
[]
x += [ numpy . random . s t a n d a r d n o r m a l ( n b P o i n t s ) ]
y += [ 0 . 2 5  numpy . random . s t a n d a r d n o r m a l ( n b P o i n t s ) ]
x += [ 0 . 5  numpy . random . s t a n d a r d n o r m a l ( n b P o i n t s ) + 3 . 0 ]
y += [ 2  numpy . random . s t a n d a r d n o r m a l ( n b P o i n t s ) + 2 . 0 ]
# Axis setup
f i g = plt . figure ()
a x i s = f i g . a d d s u b p l o t ( 1 1 1 , a s p e c t=  e q u a l  )
# Plot the points
f o r i in xrange ( len ( x ) ) :
col = colors [ i % len ( colors )]
p l t . s c a t t e r ( x [ i ] , y [ i ] , c=c o l )
p l t . s c a t t e r ( [ numpy . median ( x [ i ] ) ] ,
[ numpy . median ( y [ i ] ) ] , c=c o l , s =250)
p l t . show ( )
Devert Alexandre (School of Software Engineering of USTC)  Matplotlib  Slide 24/44
SCHOOL OF SOFTWARE ENGINEERING OF USTC
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
Showing centers
It can help to see the centers or the median points
8
6
4
2
0
2
43
1 0
Devert Alexandre (School of Software Engineering of USTC)  Matplotlib  Slide 24/44
SCHOOL OF SOFTWARE ENGINEERING OF USTC
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
Marker styles
You can use different markers styles
i m p o r t numpy
i m p o r t numpy . random
import m a t p l o t l i b . pyplot as p l t
m a r k e r s = ( +  ,    ,  .  )
# G e n e r a t e a 2d n o r m a l d i s t r i b u t i o n
n b P o i n t s = 100
x , y = [] ,
[]
x += [ numpy . random . s t a n d a r d n o r m a l ( n b P o i n t s ) ]
y += [ 0 . 2 5  numpy . random . s t a n d a r d n o r m a l ( n b P o i n t s ) ]
x += [ 0 . 5  numpy . random . s t a n d a r d n o r m a l ( n b P o i n t s ) + 3 . 0 ]
y += [ 2  numpy . random . s t a n d a r d n o r m a l ( n b P o i n t s ) + 2 . 0 ]
# Axis setup
f i g = plt . figure ()
a x i s = f i g . a d d s u b p l o t ( 1 1 1 , a s p e c t=  e q u a l  )
# Plot the points
f o r i in xrange ( len ( x ) ) :
m = markers [ i % l en ( markers ) ]
p l t . s c a t t e r ( x [ i ] , y [ i ] , m a r k e r=m, c=  #000000  )
p l t . s c a t t e r ( [ numpy . median ( x [ i ] ) ] , [ numpy . median ( y [ i ] ) ] , m a r k e r=m, s =250 , c=  #000000
p l t . show ( )
Devert Alexandre (School of Software Engineering of USTC)  Matplotlib  Slide 25/44
SCHOOL OF SOFTWARE ENGINEERING OF USTC
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
Marker styles
You can use different markers styles
10
8
6
4
2
0
2
44 3 2 1 0 1 2 3 4 5
Devert Alexandre (School of Software Engineering of USTC)  Matplotlib  Slide 25/44
SCHOOL OF SOFTWARE ENGINEERING OF USTC
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
Table of Contents
1
First steps
Curve plots
Scatter plots
Boxplots
Histograms
Usage example
Devert Alexandre (School of Software Engineering of USTC)  Matplotlib  Slide 26/44
SCHOOL OF SOFTWARE ENGINEERING OF USTC
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
Boxplots
Lets do a simple boxplot
i m p o r t numpy
i m p o r t numpy . random
import m a t p l o t l i b . pyplot as p l t
# Generate normal d i s t r i b u t i o n data
x = numpy . random . s t a n d a r d n o r m a l ( 2 5 6 )
# Show a b o x p l o t o f t h e d a t a
plt . boxplot (x)
p l t . show ( )
Devert Alexandre (School of Software Engineering of USTC)  Matplotlib  Slide 27/44
SCHOOL OF SOFTWARE ENGINEERING OF USTC
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
Boxplots
Lets do a simple boxplot
3
2
1
0
1
2
3
Devert Alexandre (School of Software Engineering of USTC)  Matplotlib  Slide 27/44
SCHOOL OF SOFTWARE ENGINEERING OF USTC
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
Boxplots
You might want to show the original data in the same
time
i m p o r t numpy
i m p o r t numpy . random
import m a t p l o t l i b . pyplot as p l t
# Generate normal d i s t r i b u t i o n data
x = numpy . random . s t a n d a r d n o r m a l ( 2 5 6 )
# Show a b o x p l o t o f t h e d a t a
p l t . s c a t t e r ( [ 0 ]  l e n ( x ) , x , c=  #4682B4  )
p l t . boxplot (x , p o s i t i o n s =[0])
p l t . show ( )
Devert Alexandre (School of Software Engineering of USTC)  Matplotlib  Slide 28/44
SCHOOL OF SOFTWARE ENGINEERING OF USTC
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
Boxplots
You might want to show the original data in the same
time
4
3
2
1
0
1
2
3
4
Devert Alexandre (School of Software Engineering of USTC)  Matplotlib  Slide 28/44
SCHOOL OF SOFTWARE ENGINEERING OF USTC
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
Multiple boxplots
Boxplots are often used to show side by side various
distributions
i m p o r t numpy
i m p o r t numpy . random
import m a t p l o t l i b . pyplot as p l t
# Generate normal d i s t r i b u t i o n data
data = [ ]
f o r i in xrange ( 5 ) :
mu = 10  numpy . random . r a n d o m s a m p l e ( )
s i g m a = 2  numpy . random . r a n d o m s a m p l e ( ) + 0 . 1
d a t a . append ( numpy . random . n o r m a l (mu , sigma , 2 5 6 ) )
# Show a b o x p l o t o f t h e d a t a
p l t . boxplot ( data )
p l t . show ( )
Devert Alexandre (School of Software Engineering of USTC)  Matplotlib  Slide 29/44
SCHOOL OF SOFTWARE ENGINEERING OF USTC
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
Multiple boxplots
Boxplots are often used to show side by side various
distributions
14
12
10
8
6
4
2
Devert Alexandre (School of Software Engineering of USTC)  Matplotlib  Slide 29/44
SCHOOL OF SOFTWARE ENGINEERING OF USTC
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
Orientation
Changing the orientation is easily done
i m p o r t numpy
i m p o r t numpy . random
import m a t p l o t l i b . pyplot as p l t
# Generate normal d i s t r i b u t i o n data
data = [ ]
f o r i in xrange ( 5 ) :
mu = 10  numpy . random . r a n d o m s a m p l e ( )
s i g m a = 2  numpy . random . r a n d o m s a m p l e ( ) + 0 . 1
d a t a . append ( numpy . random . n o r m a l (mu , sigma , 2 5 6 ) )
# Show a b o x p l o t o f t h e d a t a
p l t . b o x p l o t ( data , v e r t =0)
p l t . show ( )
Devert Alexandre (School of Software Engineering of USTC)  Matplotlib  Slide 30/44
SCHOOL OF SOFTWARE ENGINEERING OF USTC
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
Orientation
Changing the orientation is easily done
5
4
3
2
1
6
10
12
Devert Alexandre (School of Software Engineering of USTC)  Matplotlib  Slide 30/44
SCHOOL OF SOFTWARE ENGINEERING OF USTC
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
Legend
Good graphics have a proper legend
i m p o r t numpy
i m p o r t numpy . random
import m a t p l o t l i b . pyplot as p l t
# Generate normal d i s t r i b u t i o n data
l a b e l s = [  m e r c u r y  ,  l e a d  ,  l i t h i u m  ,  t u n g s t e n e  ,  cadnium  ]
data = [ ]
f o r i in xrange ( len ( l a b e l s ) ) :
mu = 10  numpy . random . r a n d o m s a m p l e ( ) + 100
s i g m a = 2  numpy . random . r a n d o m s a m p l e ( ) + 0 . 1
d a t a . append ( numpy . random . n o r m a l (mu , sigma , 2 5 6 ) )
# Axis setup
f i g = plt . figure ()
a x i s = f i g . add subplot (111)
axis . s e t t i t l e (  Alien nodules composition  )
xtickNames = p l t . setp ( axis , x t i c k l a b e l s = l a b e l s )
a x i s . s e t y l a b e l (  c o n c e n t r a t i o n ( ppm )  )
# Show a b o x p l o t o f t h e d a t a
p l t . boxplot ( data )
p l t . show ( )
Devert Alexandre (School of Software Engineering of USTC)  Matplotlib  Slide 31/44
SCHOOL OF SOFTWARE ENGINEERING OF USTC
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
Legend
Good graphics have a proper legend
Alien nodules composition
112
concentration (ppm)
110
108
106
104
102
100
mercury
lead
lithium
tungstene
cadnium
Devert Alexandre (School of Software Engineering of USTC)  Matplotlib  Slide 31/44
SCHOOL OF SOFTWARE ENGINEERING OF USTC
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
Table of Contents
1
First steps
Curve plots
Scatter plots
Boxplots
Histograms
Usage example
Devert Alexandre (School of Software Engineering of USTC)  Matplotlib  Slide 32/44
SCHOOL OF SOFTWARE ENGINEERING OF USTC
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
Histograms
Histogram are convenient to sum-up results
i m p o r t numpy
i m p o r t numpy . random
import m a t p l o t l i b . pyplot as p l t
# Some d a t a
d a t a = numpy . a b s ( numpy . random . s t a n d a r d n o r m a l ( 3 0 ) )
# Show an h i s t o g r a m
p l t . b a r h ( r a n g e ( l e n ( d a t a ) ) , data , c o l o r=  #4682B4  )
p l t . show ( )
Devert Alexandre (School of Software Engineering of USTC)  Matplotlib  Slide 33/44
SCHOOL OF SOFTWARE ENGINEERING OF USTC
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
Histograms
Histogram are convenient to sum-up results
30
25
20
15
10
5
00.0
0.5
1.0
1.5
2.0
2.5
Devert Alexandre (School of Software Engineering of USTC)  Matplotlib  Slide 33/44
SCHOOL OF SOFTWARE ENGINEERING OF USTC
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
Histograms
A variant to show 2 quantities per item on 1 figure
i m p o r t numpy
i m p o r t numpy . random
import m a t p l o t l i b . pyplot as p l t
# Some d a t a
data = [ [ ] , [ ] ]
f o r i i n xrange ( l e n ( data ) ) :
d a t a [ i ] = numpy . a b s ( numpy . random . s t a n d a r d n o r m a l ( 3 0 ) )
# Show an h i s t o g r a m
l a b e l s = range ( l e n ( data [ 0 ] ) )
p l t . b a r h ( l a b e l s , d a t a [ 0 ] , c o l o r=  #4682B4  )
p l t . b a r h ( l a b e l s , 1  d a t a [ 1 ] , c o l o r= #FF4500  )
p l t . show ( )
Devert Alexandre (School of Software Engineering of USTC)  Matplotlib  Slide 34/44
SCHOOL OF SOFTWARE ENGINEERING OF USTC
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
Histograms
A variant to show 2 quantities per item on 1 figure
30
25
20
15
10
5
03
Devert Alexandre (School of Software Engineering of USTC)  Matplotlib  Slide 34/44
SCHOOL OF SOFTWARE ENGINEERING OF USTC
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
Labels
Very often, we need to name items on an histogram
i m p o r t numpy
i m p o r t numpy . random
import m a t p l o t l i b . pyplot as p l t
# Some d a t a
names = [  Wang Bu  ,  Cheng Cao  ,  Zhang Xue L i  ,  Tang Wei  ,  Sun Wu Kong  ]
marks = 7  numpy . random . r a n d o m s a m p l e ( l e n ( names ) ) + 3
# Axis setup
f i g = plt . figure ()
a x i s = f i g . add subplot (111)
a x i s . s e t x l i m (0 , 10)
p l t . y t i c k s ( numpy . a r a n g e ( l e n ( marks ) ) + 0 . 5 , names )
a x i s . s e t t i t l e (  Datam i n i n g marks  )
a x i s . s e t x l a b e l (  mark  )
# Show an h i s t o g r a m
p l t . b a r h ( r a n g e ( l e n ( marks ) ) , marks , c o l o r=  #4682B4  )
p l t . show ( )
Devert Alexandre (School of Software Engineering of USTC)  Matplotlib  Slide 35/44
SCHOOL OF SOFTWARE ENGINEERING OF USTC
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
Labels
Very often, we need to name items on an histogram
Data-mining marks
Sun Wu Kong
Tang Wei
Zhang Xue Li
Cheng Cao
Wang Bu
0
mark
10
Devert Alexandre (School of Software Engineering of USTC)  Matplotlib  Slide 35/44
SCHOOL OF SOFTWARE ENGINEERING OF USTC
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
Error bars
Error bars, to indicate the accuracy of values
i m p o r t numpy
i m p o r t numpy . random
import m a t p l o t l i b . pyplot as p l t
# Some d a t a
names = [  6809  ,  6502  ,  8086  ,  Z80  ,  RCA1802  ]
s p e e d = 70  numpy . random . r a n d o m s a m p l e ( l e n ( names ) ) + 30
e r r o r = 9  numpy . random . r a n d o m s a m p l e ( l e n ( names ) ) + 1
# Axis setup
f i g = plt . figure ()
a x i s = f i g . add subplot (111)
p l t . y t i c k s ( numpy . a r a n g e ( l e n ( names ) ) + 0 . 5 , names )
a x i s . s e t t i t l e (  8 b i t s CPU benchmark  s t r i n g
axis . set xlabel (  score  )
test )
# Show an h i s t o g r a m
p l t . b a r h ( r a n g e ( l e n ( names ) ) , s p e e d , x e r r=e r r o r , c o l o r=  #4682B4  )
p l t . show ( )
Devert Alexandre (School of Software Engineering of USTC)  Matplotlib  Slide 36/44
SCHOOL OF SOFTWARE ENGINEERING OF USTC
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
Error bars
Error bars, to indicate the accuracy of values
8 bits CPU benchmark - string test
RCA1802
Z80
8086
6502
6809
0
20
40
score
60
80
100
Devert Alexandre (School of Software Engineering of USTC)  Matplotlib  Slide 36/44
SCHOOL OF SOFTWARE ENGINEERING OF USTC
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
Table of Contents
1
First steps
Curve plots
Scatter plots
Boxplots
Histograms
Usage example
Devert Alexandre (School of Software Engineering of USTC)  Matplotlib  Slide 37/44
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
SCHOOL OF SOFTWARE ENGINEERING OF USTC
Old Faithful
Lets display some real data: Old Faithful geyser
Devert Alexandre (School of Software Engineering of USTC)  Matplotlib  Slide 38/44
SCHOOL OF SOFTWARE ENGINEERING OF USTC
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
Old Faithful
This way works, but good example of half-done job
i m p o r t numpy
import m a t p l o t l i b . pyplot as p l t
# Read t h e d a t a
d a t a = numpy . l o a d t x t (  . / d a t a s e t s / g e y s e r . d a t  )
# Axis setup
f i g = plt . figure ()
a x i s = f i g . a d d s u b p l o t ( 1 1 1 , a s p e c t=  e q u a l  )
# Plot the points
p l t . s c a t t e r ( d a t a [ : , 0 ] , d a t a [ : , 1 ] , c= #FF4500  )
p l t . show ( )
Devert Alexandre (School of Software Engineering of USTC)  Matplotlib  Slide 39/44
SCHOOL OF SOFTWARE ENGINEERING OF USTC
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
Old Faithful
This way works, but good example of half-done job
5.5
5.0
4.5
4.0
3.5
3.0
2.5
2.0
1.530
40
50
60
70
80
90
100
Devert Alexandre (School of Software Engineering of USTC)  Matplotlib  Slide 39/44
SCHOOL OF SOFTWARE ENGINEERING OF USTC
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
Old Faithful
Lets make a more readable figure
i m p o r t numpy
import m a t p l o t l i b . pyplot as p l t
# Read t h e d a t a
d a t a = numpy . l o a d t x t (  . / d a t a s e t s / g e y s e r . d a t  )
# Axis setup
f i g = plt . figure ()
a x i s = f i g . add subplot (111)
a x i s . s e t t i t l e (  Old F a i t h f u l g e y s e r d a t a s e t  )
a x i s . s e t x l a b e l (  w a i t i n g t i m e (mn . )  )
a x i s . s e t y l a b e l (  e r u p t i o n d u r a t i o n (mn . )  )
# Plot the points
p l t . s c a t t e r ( d a t a [ : , 0 ] , d a t a [ : , 1 ] , c= #FF4500  )
p l t . show ( )
Devert Alexandre (School of Software Engineering of USTC)  Matplotlib  Slide 40/44
SCHOOL OF SOFTWARE ENGINEERING OF USTC
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
Old Faithful
Lets make a more readable figure
Old Faithful geyser dataset
5.5
5.0
eruption duration (mn.)
4.5
4.0
3.5
3.0
2.5
2.0
1.530
40
50
60
70
waiting time (mn.)
80
90
100
Devert Alexandre (School of Software Engineering of USTC)  Matplotlib  Slide 40/44
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
SCHOOL OF SOFTWARE ENGINEERING OF USTC
Mercury & fishes
Lets display more complex data: fishes and mercury
poisoning
Devert Alexandre (School of Software Engineering of USTC)  Matplotlib  Slide 41/44
SCHOOL OF SOFTWARE ENGINEERING OF USTC
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
Mercury & fishes
A first try
i m p o r t numpy
import m a t p l o t l i b . pyplot as p l t
# Read t h e d a t a
d a t a = numpy . l o a d t x t (  . / d a t a s e t s / f i s h . d a t  )
# Axis setup
f i g = plt . figure ()
a x i s = f i g . add subplot (111)
a x i s . s e t t i t l e (  North C a r o l i n a f i s h e s mercury c o n c e n t r a t i o n  )
a x i s . s e t x l a b e l (  weight (g . )  )
a x i s . s e t y l a b e l (  m e r c u r y c o n c e n t r a t i o n ( ppm )  )
# Plot the points
p l t . s c a t t e r ( d a t a [ : , 3 ] , d a t a [ : , 4 ] , c= #FF4500  )
p l t . show ( )
Devert Alexandre (School of Software Engineering of USTC)  Matplotlib  Slide 42/44
SCHOOL OF SOFTWARE ENGINEERING OF USTC
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
Mercury & fishes
A first try
4.0
North Carolina fishes mercury concentration
mercury concentration (ppm)
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0.0
0.51000
1000
2000
weight (g.)
3000
4000
5000
Devert Alexandre (School of Software Engineering of USTC)  Matplotlib  Slide 42/44
SCHOOL OF SOFTWARE ENGINEERING OF USTC
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
Mercury & fishes
Show more information with subplots
mercury concentration
mercury
(ppm)concentration (ppm)
weight (g.)
North Carolina fishes mercury concentration
5000
4000
3000
2000
1000
0
100020
4.0
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0.0
0.51000
4.0
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0.0
0.520
30
40
1000
30
length (cm.)
50
2000
weight (g.)
40
length (cm.)
60
3000
50
4000
60
70
5000
70
Devert Alexandre (School of Software Engineering of USTC)  Matplotlib  Slide 43/44
SCHOOL OF SOFTWARE ENGINEERING OF USTC
UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
Mercury & fishes
Data from one river with its own color
mercury concentration
mercury
(ppm)concentration (ppm)
weight (g.)
North Carolina fishes mercury concentration
5000
4000
3000
2000
1000
0
100020
4.0
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0.0
0.51000
4.0
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0.0
0.520
30
40
1000
30
length (cm.)
50
2000
weight (g.)
40
length (cm.)
60
3000
50
4000
60
70
5000
70
Devert Alexandre (School of Software Engineering of USTC)  Matplotlib  Slide 44/44