KEMBAR78
Data visualization in Python | PDF
Data Visualization in Python
Marc Garcia - @datapythonista
Data Visualisation Summit - London, 2017
1 / 34
Data Visualization in Python - @datapythonista
About me
http://datapythonista.github.io
2 / 34
Data Visualization in Python - @datapythonista
Python for data science
3 / 34
Data Visualization in Python - @datapythonista
Python for data science
Why Python?
Python is the favorite of many:
Fast to write: Batteries included
Easy to read: Readability is KEY
Excellent community: Conferences, local groups, stackoverflow...
Ubiquitous: Present in all major platforms
Easy to integrate: Implements main protocols and formats
Easy to extend: C extensions for low-level operations
4 / 34
Data Visualization in Python - @datapythonista
Python for data science
Python performance
Is Python fast for data science?
Short answer: No
Long answer: Yes
numpy
Cython
C extensions
Numba
etc.
5 / 34
Data Visualization in Python - @datapythonista
Python for data science
Python is great for data science
A whole ecosystem exists:
numpy
scipy
pandas
statsmodels
scikit-learn
etc.
6 / 34
Data Visualization in Python - @datapythonista
Python for data science
Python environment
One ring to rule them all:
7 / 34
Data Visualization in Python - @datapythonista
Python for data science
Python platform
Jupyter notebook
8 / 34
Data Visualization in Python - @datapythonista
Python for data science
Python for visualization
Main libraries:
Matplotlib
Seaborn
Bokeh
HoloViews
Datashader
Domain-specific
Folium: maps
yt: volumetric data
9 / 34
Data Visualization in Python - @datapythonista
Visualization tools
10 / 34
Data Visualization in Python - @datapythonista
Visualization tools
Matplotlib
First Python visualization tool
Still a de-facto standard
Replicates Matlab API
Supports many backends
11 / 34
Data Visualization in Python - @datapythonista
Visualization tools
Matplotlib
import numpy
from matplotlib import pyplot
x = numpy.linspace(0., 100., 1001)
y = x + numpy.random.randn(1001) * 5
pyplot.plot(x, y)
pyplot.xlabel(’time (seconds)’)
pyplot.ylabel(’some noisy signal’)
pyplot.title(’A simple plot in matplotlib’)
12 / 34
Data Visualization in Python - @datapythonista
Visualization tools
Matplotlib
13 / 34
Data Visualization in Python - @datapythonista
Visualization tools
Matplotlib
import numpy
from matplotlib import pyplot
x = numpy.linspace(0., 100., 1001)
y1 = x + numpy.random.randn(1001) * 3
y2 = 45 + x * .4 + numpy.random.randn(1001) * 7
pyplot.plot(x, y1, label=’Our previous signal’)
pyplot.plot(x, y2, color=’orange’, label=’A new signal’)
pyplot.xlabel(’time (seconds)’)
pyplot.ylabel(’some noisy signal’)
pyplot.title(’A simple plot in matplotlib’)
pyplot.legend()
14 / 34
Data Visualization in Python - @datapythonista
Visualization tools
Matplotlib
15 / 34
Data Visualization in Python - @datapythonista
Visualization tools
Seaborn
Matplotlib wrapper
Built-in themes
Higher level plots:
Heatmap
Violin plot
Pair plot
16 / 34
Data Visualization in Python - @datapythonista
Visualization tools
Seaborn
from matplotlib import pyplot
import seaborn
flights_flat = seaborn.load_dataset(’flights’)
flights = flights_flat.pivot(’month’, ’year’, ’passengers’)
seaborn.heatmap(flights, annot=True, fmt=’d’)
pyplot.title(’Number of flight passengers (thousands)’)
17 / 34
Data Visualization in Python - @datapythonista
Visualization tools
Seaborn
18 / 34
Data Visualization in Python - @datapythonista
Visualization tools
Bokeh
Client-server architecture: JavaScript front-end
Interactive
Drawing shapes to generate plots
19 / 34
Data Visualization in Python - @datapythonista
Visualization tools
Bokeh
Demo
20 / 34
Data Visualization in Python - @datapythonista
Visualization tools
HoloViews
Bokeh wrapper
Higher level plots
Mainly for Bokeh, but other backends supported
21 / 34
Data Visualization in Python - @datapythonista
Visualization tools
HoloViews
import numpy as np
import holoviews as hv
from bokeh.sampledata.us_counties import data as counties
from bokeh.sampledata.unemployment import data as unemployment
hv.extension(’bokeh’)
counties = {code: county for code, county in counties.items() if county[’state’] == ’tx’}
county_xs = [county[’lons’] for county in counties.values()]
county_ys = [county[’lats’] for county in counties.values()]
county_names = [county[’name’] for county in counties.values()]
county_rates = [unemployment[county_id] for county_id in counties]
county_polys = {name: hv.Polygons((xs, ys), level=rate, vdims=[’Unemployment’])
for name, xs, ys, rate in zip(county_names, county_xs, county_ys,
county_rates)}
choropleth = hv.NdOverlay(county_polys, kdims=[’County’])
plot_opts = dict(logz=True, tools=[’hover’], xaxis=None, yaxis=None,
show_grid=False, show_frame=False, width=500, height=500)
style = dict(line_color=’white’)
choropleth({’Polygons’: {’style’: style, ’plot’: plot_opts}})
22 / 34
Data Visualization in Python - @datapythonista
Visualization tools
HoloViews
23 / 34
Data Visualization in Python - @datapythonista
Visualization tools
Datashader
Bokeh wrapper
Built for big data
Advanced subsampling and binning techniques
24 / 34
Data Visualization in Python - @datapythonista
Visualization tools
Datashader
25 / 34
Data Visualization in Python - @datapythonista
Visualization tools
Folium
Visualization of maps
Compatible with Google maps and Open street maps
Visualization of markers, paths and polygons
26 / 34
Data Visualization in Python - @datapythonista
Visualization tools
Folium
import folium
m = folium.Map(location=[45.372, -121.6972],
zoom_start=12,
tiles=’Stamen Terrain’)
folium.Marker(location=[45.3288, -121.6625],
popup=’Mt. Hood Meadows’,
icon=folium.Icon(icon=’cloud’)).add_to(m)
folium.Marker(location=[45.3311, -121.7113],
popup=’Timberline Lodge’,
icon=folium.Icon(color=’green’)).add_to(m)
folium.Marker(location=[45.3300, -121.6823],
popup=’Some Other Location’,
icon=folium.Icon(color=’red’, icon=’info-sign’)).add_to(m)
m
27 / 34
Data Visualization in Python - @datapythonista
Visualization tools
Folium
28 / 34
Data Visualization in Python - @datapythonista
Visualization tools
yt
Visualization of volumetric data
Compatible with many formats
Projects multidimensional data to a 2-D plane
29 / 34
Data Visualization in Python - @datapythonista
Visualization tools
yt
import yt
ds = yt.load(’MOOSE_sample_data/out.e-s010’)
sc = yt.create_scene(ds)
ms = sc.get_source()
ms.cmap = ’Eos A’
cam = sc.camera
cam.focus = ds.arr([0.0, 0.0, 0.0], ’code_length’)
cam_pos = ds.arr([-3.0, 3.0, -3.0], ’code_length’)
north_vector = ds.arr([0.0, -1.0, -1.0], ’dimensionless’)
cam.set_position(cam_pos, north_vector)
cam.resolution = (800, 800)
sc.save()
30 / 34
Data Visualization in Python - @datapythonista
Visualization tools
yt
31 / 34
Data Visualization in Python - @datapythonista
Conclusions
32 / 34
Data Visualization in Python - @datapythonista
Conclusions
Conclusions
Python is great as a programming language
And is great for data science
Plenty of options for visualization:
Standard plots
Ad-hoc plots
Interactive
3D plots
Maps
Big data
Specialized
33 / 34
Data Visualization in Python - @datapythonista
Conclusions
Questions?
@datapythonista
34 / 34
Data Visualization in Python - @datapythonista

Data visualization in Python

  • 1.
    Data Visualization inPython Marc Garcia - @datapythonista Data Visualisation Summit - London, 2017 1 / 34 Data Visualization in Python - @datapythonista
  • 2.
    About me http://datapythonista.github.io 2 /34 Data Visualization in Python - @datapythonista
  • 3.
    Python for datascience 3 / 34 Data Visualization in Python - @datapythonista
  • 4.
    Python for datascience Why Python? Python is the favorite of many: Fast to write: Batteries included Easy to read: Readability is KEY Excellent community: Conferences, local groups, stackoverflow... Ubiquitous: Present in all major platforms Easy to integrate: Implements main protocols and formats Easy to extend: C extensions for low-level operations 4 / 34 Data Visualization in Python - @datapythonista
  • 5.
    Python for datascience Python performance Is Python fast for data science? Short answer: No Long answer: Yes numpy Cython C extensions Numba etc. 5 / 34 Data Visualization in Python - @datapythonista
  • 6.
    Python for datascience Python is great for data science A whole ecosystem exists: numpy scipy pandas statsmodels scikit-learn etc. 6 / 34 Data Visualization in Python - @datapythonista
  • 7.
    Python for datascience Python environment One ring to rule them all: 7 / 34 Data Visualization in Python - @datapythonista
  • 8.
    Python for datascience Python platform Jupyter notebook 8 / 34 Data Visualization in Python - @datapythonista
  • 9.
    Python for datascience Python for visualization Main libraries: Matplotlib Seaborn Bokeh HoloViews Datashader Domain-specific Folium: maps yt: volumetric data 9 / 34 Data Visualization in Python - @datapythonista
  • 10.
    Visualization tools 10 /34 Data Visualization in Python - @datapythonista
  • 11.
    Visualization tools Matplotlib First Pythonvisualization tool Still a de-facto standard Replicates Matlab API Supports many backends 11 / 34 Data Visualization in Python - @datapythonista
  • 12.
    Visualization tools Matplotlib import numpy frommatplotlib import pyplot x = numpy.linspace(0., 100., 1001) y = x + numpy.random.randn(1001) * 5 pyplot.plot(x, y) pyplot.xlabel(’time (seconds)’) pyplot.ylabel(’some noisy signal’) pyplot.title(’A simple plot in matplotlib’) 12 / 34 Data Visualization in Python - @datapythonista
  • 13.
    Visualization tools Matplotlib 13 /34 Data Visualization in Python - @datapythonista
  • 14.
    Visualization tools Matplotlib import numpy frommatplotlib import pyplot x = numpy.linspace(0., 100., 1001) y1 = x + numpy.random.randn(1001) * 3 y2 = 45 + x * .4 + numpy.random.randn(1001) * 7 pyplot.plot(x, y1, label=’Our previous signal’) pyplot.plot(x, y2, color=’orange’, label=’A new signal’) pyplot.xlabel(’time (seconds)’) pyplot.ylabel(’some noisy signal’) pyplot.title(’A simple plot in matplotlib’) pyplot.legend() 14 / 34 Data Visualization in Python - @datapythonista
  • 15.
    Visualization tools Matplotlib 15 /34 Data Visualization in Python - @datapythonista
  • 16.
    Visualization tools Seaborn Matplotlib wrapper Built-inthemes Higher level plots: Heatmap Violin plot Pair plot 16 / 34 Data Visualization in Python - @datapythonista
  • 17.
    Visualization tools Seaborn from matplotlibimport pyplot import seaborn flights_flat = seaborn.load_dataset(’flights’) flights = flights_flat.pivot(’month’, ’year’, ’passengers’) seaborn.heatmap(flights, annot=True, fmt=’d’) pyplot.title(’Number of flight passengers (thousands)’) 17 / 34 Data Visualization in Python - @datapythonista
  • 18.
    Visualization tools Seaborn 18 /34 Data Visualization in Python - @datapythonista
  • 19.
    Visualization tools Bokeh Client-server architecture:JavaScript front-end Interactive Drawing shapes to generate plots 19 / 34 Data Visualization in Python - @datapythonista
  • 20.
    Visualization tools Bokeh Demo 20 /34 Data Visualization in Python - @datapythonista
  • 21.
    Visualization tools HoloViews Bokeh wrapper Higherlevel plots Mainly for Bokeh, but other backends supported 21 / 34 Data Visualization in Python - @datapythonista
  • 22.
    Visualization tools HoloViews import numpyas np import holoviews as hv from bokeh.sampledata.us_counties import data as counties from bokeh.sampledata.unemployment import data as unemployment hv.extension(’bokeh’) counties = {code: county for code, county in counties.items() if county[’state’] == ’tx’} county_xs = [county[’lons’] for county in counties.values()] county_ys = [county[’lats’] for county in counties.values()] county_names = [county[’name’] for county in counties.values()] county_rates = [unemployment[county_id] for county_id in counties] county_polys = {name: hv.Polygons((xs, ys), level=rate, vdims=[’Unemployment’]) for name, xs, ys, rate in zip(county_names, county_xs, county_ys, county_rates)} choropleth = hv.NdOverlay(county_polys, kdims=[’County’]) plot_opts = dict(logz=True, tools=[’hover’], xaxis=None, yaxis=None, show_grid=False, show_frame=False, width=500, height=500) style = dict(line_color=’white’) choropleth({’Polygons’: {’style’: style, ’plot’: plot_opts}}) 22 / 34 Data Visualization in Python - @datapythonista
  • 23.
    Visualization tools HoloViews 23 /34 Data Visualization in Python - @datapythonista
  • 24.
    Visualization tools Datashader Bokeh wrapper Builtfor big data Advanced subsampling and binning techniques 24 / 34 Data Visualization in Python - @datapythonista
  • 25.
    Visualization tools Datashader 25 /34 Data Visualization in Python - @datapythonista
  • 26.
    Visualization tools Folium Visualization ofmaps Compatible with Google maps and Open street maps Visualization of markers, paths and polygons 26 / 34 Data Visualization in Python - @datapythonista
  • 27.
    Visualization tools Folium import folium m= folium.Map(location=[45.372, -121.6972], zoom_start=12, tiles=’Stamen Terrain’) folium.Marker(location=[45.3288, -121.6625], popup=’Mt. Hood Meadows’, icon=folium.Icon(icon=’cloud’)).add_to(m) folium.Marker(location=[45.3311, -121.7113], popup=’Timberline Lodge’, icon=folium.Icon(color=’green’)).add_to(m) folium.Marker(location=[45.3300, -121.6823], popup=’Some Other Location’, icon=folium.Icon(color=’red’, icon=’info-sign’)).add_to(m) m 27 / 34 Data Visualization in Python - @datapythonista
  • 28.
    Visualization tools Folium 28 /34 Data Visualization in Python - @datapythonista
  • 29.
    Visualization tools yt Visualization ofvolumetric data Compatible with many formats Projects multidimensional data to a 2-D plane 29 / 34 Data Visualization in Python - @datapythonista
  • 30.
    Visualization tools yt import yt ds= yt.load(’MOOSE_sample_data/out.e-s010’) sc = yt.create_scene(ds) ms = sc.get_source() ms.cmap = ’Eos A’ cam = sc.camera cam.focus = ds.arr([0.0, 0.0, 0.0], ’code_length’) cam_pos = ds.arr([-3.0, 3.0, -3.0], ’code_length’) north_vector = ds.arr([0.0, -1.0, -1.0], ’dimensionless’) cam.set_position(cam_pos, north_vector) cam.resolution = (800, 800) sc.save() 30 / 34 Data Visualization in Python - @datapythonista
  • 31.
    Visualization tools yt 31 /34 Data Visualization in Python - @datapythonista
  • 32.
    Conclusions 32 / 34 DataVisualization in Python - @datapythonista
  • 33.
    Conclusions Conclusions Python is greatas a programming language And is great for data science Plenty of options for visualization: Standard plots Ad-hoc plots Interactive 3D plots Maps Big data Specialized 33 / 34 Data Visualization in Python - @datapythonista
  • 34.
    Conclusions Questions? @datapythonista 34 / 34 DataVisualization in Python - @datapythonista