KEMBAR78
Plotting data with python and pylab | PDF
Barcelona Python Meetup



Plotting data with python and
            pylab
        Giovanni M. Dall'Olio
Problem statement
   Let's say we have a table of data like this:
     name        country     apples      pears
     Giovanni    Italy       31          13
     Mario       Italy       23          33
     Luigi       Italy       0           5
     Margaret    England     22          13
     Albert      Germany     15          6

   How to read it in python?
   How to do some basic plotting?
Alternatives for plotting
          data in python
   Pylab (enthought)→ Matlab/Octave approach
   Enthought → extended version of Pylab (free for 
     academic use)
   rpy/rpy2 → allows to run R commands within 
      python
   Sage → interfaces python with Matlab, R, octave, 
      mathematica, ...
The Pylab system
   pylab is a system of three libraries, which together 
     transform python in a Matlab­like environment
   It is composed by:
          Numpy (arrays, matrices, complex numbers, etc.. in 
            python)
          Scipy (extended scientific/statistics functions)
          Matplotlib (plotting library)
          iPython (extended interactive interpreter)
How to install pylab
   There are many alternatives to install PyLab:
          use the package manager of your linux distro 
          use enthought's distribution (
             http://www.enthought.com/products/epd.php) (free 
             for academic use)
          compile and google for help!
   Numpy and scipy contains some Fortran libraries, 
     therefore easy_install doesn't work well with 
     them
ipython -pylab
   Ipython is an extended version of the standard 
      python interpreter
   It has a modality especially designed for pylab
   The standard python interpreter doesn't support 
     very well plotting (not multi­threading)
   So if you want an interactive interpreter, use 
     ipython with the pylab option:

           $: alias pylab=”ipython -pylab”
           $: pylab

        In [1]:
Why the python interpreter
is not the best for plotting




     Gets blocked when you create a plot
How to read a CSV file with
         python
   To read a file like this in pylab:
      name        country     apples     pears
      Giovanni    Italy       31         13
      Mario       Italy       23         33
      Luigi       Italy       0          5
      Margaret    England     22         13
      Albert      Germany     15         6

   → Use the function 'matplotlib.mlab.csv2rec'
         >>> data = csv2rec('exampledata.txt',
           delimiter='t')
Numpy - record arrays
   csv2rec stores data in a numpy recarray object, where 
      you can access columns and rows easily:
     >>> print data['name']
     ['Giovanni' 'Mario' 'Luigi' 'Margaret'
      'Albert']

     >>> data['apples']
     array([31, 23, 0, 22, 15])

     >>> data[1]
     ('Mario', 'Italy', 23, 33)
Alternative to csv2rec
   numpy.genfromtxt (new in 2009)
   More options than csv2rec, included in numpy
   Tricky default parameters: need to specify dtype=None

      >>> data = numpy.genfromtxt('datafile.txt',
     dtype=None)
      >>> data
      array....
Barchart
>>> data = csv2rec('exampledata.txt', delimiter='t')

>>> bar(arange(len(data)), data['apples'], color='red',
width=0.1, label='apples')

>>> bar(arange(len(data))+0.1, data['pears'],
color='blue', width=0.1, label='pears')

>>> xticks(range(len(data)), data['name'], )

>>> legend()

>>> grid('.')
Barchart
  >>> data = csv2rec('exampledata.txt',
delimiter='t')

>>> figure()
>>> clf()


 Read a CSV file and storing 
  it in a recordarray object


 Use figure() and cls() to 
  reset the graphic device
Barchart
>>> data = csv2rec('exampledata.txt',
delimiter='t')

>>> bar(x=arange(len(data)), y=data['apples'],
color='red', width=0.1, label='apples')

   The bar function creates a 
     barchart
Barchart
>>> data = csv2rec('exampledata.txt',
delimiter='t')

>>> bar(x=arange(len(data)), y=data['apples'],
color='red', width=0.1, label='apples')

>>> bar(arange(len(data))+0.1, data['pears'],
color='blue', width=0.1, label='pears')


   This is the second barchart
Barchart
>>> data = csv2rec('exampledata.txt',
delimiter='t')

>>> bar(x=arange(len(data)), y=data['apples'],
color='red', width=0.1, label='apples')

>>> bar(arange(len(data))+0.1, data['pears'],
color='blue', width=0.1, label='pears')


>>> xticks(range(len(data)), data['name'], )


   Re­defining the labels in the X axis 
     (xticks)
Barchart
>>> data = csv2rec('exampledata.txt',
delimiter='t')

>>> bar(x=arange(len(data)), y=data['apples'],
color='red', width=0.1, label='apples')

>>> bar(arange(len(data))+0.1, data['pears'],
color='blue', width=0.1, label='pears')

>>> xticks(range(len(data)), data['name'], )

>>> legend()
>>> grid('.')
>>> title('apples and pears by person')

   Adding legend, grid, title
Barchart (result)
Pie Chart
>>> pie(data['pears'], labels=data['name'])
>>> pie(data['pears'], labels=['%sn(%s
  pears)' % (i,j) for (i, j) in
  zip(data['name'], data['pears'])] )
Pie chart (result)
A plot chart
>>> x = linspace(1,10, 10)
>>> y = randn(10)
>>> plot(x,y, 'r.', ms=15)
 
An histogram
>>> x = randn(1000)
>>> hist(x, bins=40)
>>> title('histogram of random numbers')
 
Matplotlib gallery
Scipy Cookbook
Thanks for the attention!!
   PyLab ­ http://www.scipy.org/PyLab 
   matplotlib ­ http://matplotlib.sourceforge.net/ 
   scipy ­ http://www.scipy.org/ 
   numpy ­ http://numpy.scipy.org/ 
   ipython ­ http://ipython.scipy.org/moin/ 


   These slides: http://bioinfoblog.it 

Plotting data with python and pylab

  • 1.
    Barcelona Python Meetup Plottingdata with python and pylab Giovanni M. Dall'Olio
  • 2.
    Problem statement  Let's say we have a table of data like this: name country apples pears Giovanni Italy 31 13 Mario Italy 23 33 Luigi Italy 0 5 Margaret England 22 13 Albert Germany 15 6  How to read it in python?  How to do some basic plotting?
  • 3.
    Alternatives for plotting data in python  Pylab (enthought)→ Matlab/Octave approach  Enthought → extended version of Pylab (free for  academic use)  rpy/rpy2 → allows to run R commands within  python  Sage → interfaces python with Matlab, R, octave,  mathematica, ...
  • 4.
    The Pylab system  pylab is a system of three libraries, which together  transform python in a Matlab­like environment  It is composed by:  Numpy (arrays, matrices, complex numbers, etc.. in  python)  Scipy (extended scientific/statistics functions)  Matplotlib (plotting library)  iPython (extended interactive interpreter)
  • 5.
    How to installpylab  There are many alternatives to install PyLab:  use the package manager of your linux distro   use enthought's distribution ( http://www.enthought.com/products/epd.php) (free  for academic use)  compile and google for help!  Numpy and scipy contains some Fortran libraries,  therefore easy_install doesn't work well with  them
  • 6.
    ipython -pylab  Ipython is an extended version of the standard  python interpreter  It has a modality especially designed for pylab  The standard python interpreter doesn't support  very well plotting (not multi­threading)  So if you want an interactive interpreter, use  ipython with the pylab option:      $: alias pylab=”ipython -pylab” $: pylab In [1]:
  • 7.
    Why the pythoninterpreter is not the best for plotting Gets blocked when you create a plot
  • 8.
    How to reada CSV file with python  To read a file like this in pylab: name country apples pears Giovanni Italy 31 13 Mario Italy 23 33 Luigi Italy 0 5 Margaret England 22 13 Albert Germany 15 6  → Use the function 'matplotlib.mlab.csv2rec' >>> data = csv2rec('exampledata.txt', delimiter='t')
  • 9.
    Numpy - recordarrays  csv2rec stores data in a numpy recarray object, where  you can access columns and rows easily: >>> print data['name'] ['Giovanni' 'Mario' 'Luigi' 'Margaret' 'Albert'] >>> data['apples'] array([31, 23, 0, 22, 15]) >>> data[1] ('Mario', 'Italy', 23, 33)
  • 10.
    Alternative to csv2rec  numpy.genfromtxt (new in 2009)  More options than csv2rec, included in numpy  Tricky default parameters: need to specify dtype=None >>> data = numpy.genfromtxt('datafile.txt', dtype=None) >>> data array....
  • 11.
    Barchart >>> data =csv2rec('exampledata.txt', delimiter='t') >>> bar(arange(len(data)), data['apples'], color='red', width=0.1, label='apples') >>> bar(arange(len(data))+0.1, data['pears'], color='blue', width=0.1, label='pears') >>> xticks(range(len(data)), data['name'], ) >>> legend() >>> grid('.')
  • 12.
    Barchart >>>data = csv2rec('exampledata.txt', delimiter='t') >>> figure() >>> clf() Read a CSV file and storing  it in a recordarray object Use figure() and cls() to  reset the graphic device
  • 13.
    Barchart >>> data =csv2rec('exampledata.txt', delimiter='t') >>> bar(x=arange(len(data)), y=data['apples'], color='red', width=0.1, label='apples')  The bar function creates a  barchart
  • 14.
    Barchart >>> data =csv2rec('exampledata.txt', delimiter='t') >>> bar(x=arange(len(data)), y=data['apples'], color='red', width=0.1, label='apples') >>> bar(arange(len(data))+0.1, data['pears'], color='blue', width=0.1, label='pears')  This is the second barchart
  • 15.
    Barchart >>> data =csv2rec('exampledata.txt', delimiter='t') >>> bar(x=arange(len(data)), y=data['apples'], color='red', width=0.1, label='apples') >>> bar(arange(len(data))+0.1, data['pears'], color='blue', width=0.1, label='pears') >>> xticks(range(len(data)), data['name'], )  Re­defining the labels in the X axis  (xticks)
  • 16.
    Barchart >>> data =csv2rec('exampledata.txt', delimiter='t') >>> bar(x=arange(len(data)), y=data['apples'], color='red', width=0.1, label='apples') >>> bar(arange(len(data))+0.1, data['pears'], color='blue', width=0.1, label='pears') >>> xticks(range(len(data)), data['name'], ) >>> legend() >>> grid('.') >>> title('apples and pears by person')  Adding legend, grid, title
  • 17.
  • 18.
    Pie Chart >>> pie(data['pears'],labels=data['name']) >>> pie(data['pears'], labels=['%sn(%s pears)' % (i,j) for (i, j) in zip(data['name'], data['pears'])] )
  • 19.
  • 20.
    A plot chart >>>x = linspace(1,10, 10) >>> y = randn(10) >>> plot(x,y, 'r.', ms=15)  
  • 21.
    An histogram >>> x= randn(1000) >>> hist(x, bins=40) >>> title('histogram of random numbers')  
  • 22.
  • 23.
  • 24.
    Thanks for theattention!!  PyLab ­ http://www.scipy.org/PyLab   matplotlib ­ http://matplotlib.sourceforge.net/   scipy ­ http://www.scipy.org/   numpy ­ http://numpy.scipy.org/   ipython ­ http://ipython.scipy.org/moin/   These slides: http://bioinfoblog.it