KEMBAR78
Making use of OpenStreetMap data with Python | PDF
Using OpenStreetMap data with
              Python

                           Andrii V. Mishkovskyi


                               June 22, 2011




Andrii V. Mishkovskyi ()     Using OpenStreetMap data with Python   June 22, 2011   1 / 47
Who is this dude anyway?



      I love Python
      I love OpenStreetMap
      I do map rendering at CloudMade using Python
      CloudMade uses OpenStreetMap data extensively




  Andrii V. Mishkovskyi ()   Using OpenStreetMap data with Python   June 22, 2011   2 / 47
Objectives



      Understand OpenStreetMap data structure
      How to parse it
      Get a feel of how basic GIS services work




  Andrii V. Mishkovskyi ()   Using OpenStreetMap data with Python   June 22, 2011   3 / 47
OpenStreetMap


      Founded in 2004 as a response to Ordnance
      Survey pricing scheme
      >400k registered users
      >16k active mappers
      Supported by Microsoft, MapQuest (AOL), Yahoo!
      Crowd-sourcing at its best




  Andrii V. Mishkovskyi ()   Using OpenStreetMap data with Python   June 22, 2011   4 / 47
Why OSM?



     Fairly easy
     Good quality
     Growing community
     Absolutely free




 Andrii V. Mishkovskyi ()   Using OpenStreetMap data with Python   June 22, 2011   5 / 47
1     Data layout




Andrii V. Mishkovskyi ()   Using OpenStreetMap data with Python   June 22, 2011   6 / 47
Andrii V. Mishkovskyi ()   Using OpenStreetMap data with Python   June 22, 2011   7 / 47
Storage type



      XML (.osm)
      Protocol buffers (.pbf, in beta status)
      Other formats through 3rd parties (Esri shapefile,
      Garmin GPX, etc.)




  Andrii V. Mishkovskyi ()   Using OpenStreetMap data with Python   June 22, 2011   8 / 47
The data


      Each object has geometry, tags and changeset
      information
      Tags are simply a list of key/value pairs
      Geometry definition differs for different types
      Changeset is not interesting when simply using the
      data (as opposed to editing)




  Andrii V. Mishkovskyi ()   Using OpenStreetMap data with Python   June 22, 2011   9 / 47
Data types



   Node Geometric point or point of interest
    Way Collection of points
 Relation Collections of objects of any type




  Andrii V. Mishkovskyi ()   Using OpenStreetMap data with Python   June 22, 2011   10 / 47
Nodes



< node id = " 592637238 " lat = " 47.1675211 " lon = " 9.5 89882 "
        version = " 2 " changeset = " 6628391 "
        user = " phinret " uid = " 135921 "
        timestamp = " 2 1 -12 -11 T19:2 :16Z " >
   < tag k= " amenity " v = " bar " / >
   < tag k= " name " v = " Black Pearl " / >
</ node >




   Andrii V. Mishkovskyi ()   Using OpenStreetMap data with Python   June 22, 2011   11 / 47
Ways

< way id =" 4781367 " version = " 1 " changeset = " 1 226 "
       uid = " 871 " user = " murmel "
       timestamp = " 2 7 - 6 -19 T 6:25:57Z " >
   < nd ref = " 3 6 4 7 " / >
   < nd ref = " 3 6 4 15 " / >
   < nd ref = " 3 6 4 17 " / >
   < nd ref = " 3 6 4 19 " / >
   < nd ref = " 3 6 4 2 " / >
   < tag k= " created_by " v = " JOSM " / >
   < tag k= " highway " v = " residential " / >
   < tag k= " name " v = " In den ÃĎusseren " / >
</ way >




   Andrii V. Mishkovskyi ()   Using OpenStreetMap data with Python   June 22, 2011   12 / 47
Relations
< relation id = " 16239 " version = " 699 " changeset = " 844 52 "
              uid = " 1224 6 " user = " hanskuster "
              timestamp = " 2 11 - 6 -14 T18:53:49Z " >
   < member type = " way " ref = " 75393767 " role = " outer " / >
   < member type = " way " ref = " 75393837 " role = " outer " / >
   < member type = " way " ref = " 75393795 " role = " outer " / >
   ...
   < member type = " way " ref = " 75393788 " role = " outer " / >
   < tag k= " admin_level " v = " 2 " / >
   < tag k= " boundary " v = " administrative " / >
   < tag k= " currency " v = " EUR " / >
   < tag k= " is_in " v = " Europe " / >
   < tag k= " ISO3166 -1 " v = " AT " / >
   < tag k= " name " v = " ÃŰsterreich " / >
   ...
   < tag k= " wikipedia:de " v = " ÃŰsterreich " / >
   < tag k= " wikipedia:en " v = " Austria " / >
</ relation >

   Andrii V. Mishkovskyi ()   Using OpenStreetMap data with Python   June 22, 2011   13 / 47
2     Importing




Andrii V. Mishkovskyi ()   Using OpenStreetMap data with Python   June 22, 2011   14 / 47
Andrii V. Mishkovskyi ()   Using OpenStreetMap data with Python   June 22, 2011   15 / 47
Major points when parsing OSM

      Expect faulty data
      Parse iteratively
      Cache extensively
      Order of elements is not guaranteed
      But it’s generally: nodes, ways, relations
      Ids are unique to datatype, not to the whole data
      set



  Andrii V. Mishkovskyi ()   Using OpenStreetMap data with Python   June 22, 2011   16 / 47
Parsing data



      Using SAX
      Doing simple reprojection
      Create geometries using Shapely




  Andrii V. Mishkovskyi ()   Using OpenStreetMap data with Python   June 22, 2011   17 / 47
Parsing data
Projection




 import pyproj

 projection = pyproj . Proj (
     ’+ proj = merc + a =6378137 + b =6378137 ’
     ’+ lat_ts = . + lon_ = . + x_ = . + y_ = ’
     ’+k =1. + units = m + nadgrids = @null + wktext ’
     ’+ no_defs ’)




     Andrii V. Mishkovskyi ()   Using OpenStreetMap data with Python   June 22, 2011   18 / 47
Parsing data
Nodes




 from shapely . geometry import Point

 class Node ( object ):

        def __init__ ( self , id , lonlat , tags ):
            self . id = id
            self . geometry = Point ( projection (* lonlat ))
            self . tags = tags




    Andrii V. Mishkovskyi ()   Using OpenStreetMap data with Python   June 22, 2011   19 / 47
Parsing data
Nodes

 class SimpleHandler ( sax . handler . ContentHandler ):

        def __init__ ( self ):
            sax . handler . ContentHandler . __init__ ( self )
            self . id = None
            self . geometry = None
            self . nodes = {}

        def startElement ( self , name , attrs ):
            if name == ’ node ’:
                self . id = attrs [ ’ id ’]
                self . tags = {}
                self . geometry = map (
                     float , ( attrs [ ’ lon ’] , attrs [ ’ lat ’ ]))
            elif name == ’ tag ’:
                self . tags [ attrs [ ’k ’ ]] = attrs [ ’v ’]

    Andrii V. Mishkovskyi ()   Using OpenStreetMap data with Python   June 22, 2011   19 / 47
Parsing data
Nodes




        def endElement ( self , name ):
            if name == ’ node ’:
                self . nodes [ self . id ] = Node ( self . id ,
                                                    self . geometry ,
                                                    self . tags )
                self . id = None
                self . geometry = None
                self . tags = None




    Andrii V. Mishkovskyi ()   Using OpenStreetMap data with Python   June 22, 2011   19 / 47
Parsing data
Ways



 from shapely . geometry import LineString

 nodes = {...} # dict of nodes , keyed by their ids

 class Way ( object ):

       def __init__ ( self , id , refs , tags ):
           self . id = id
           self . geometry = LineString (
                [( nodes [ ref ]. x , nodes [ ref ]. y )
                  for ref in refs ])
           self . tags = tags




    Andrii V. Mishkovskyi ()   Using OpenStreetMap data with Python   June 22, 2011   20 / 47
Parsing data
Ways


 class SimpleHandler ( sax . handler . ContentHandler ):

       def __init__ ( self ):
           ...
           self . ways = {}

       def startElement ( self , name , attrs ):
           if name == ’ way ’:
               self . id = attrs [ ’ id ’]
               self . tags = {}
               self . geometry = []
           elif name == ’ nd ’:
               self . geometry . append ( attrs [ ’ ref ’ ])




    Andrii V. Mishkovskyi ()   Using OpenStreetMap data with Python   June 22, 2011   20 / 47
Parsing data
Ways



       def reset ( self ):
           self . id = None
           self . geometry = None
           self . tags = None

       def endElement ( self , name ):
           if name == ’ way ’:
               self . way [ self . id ] = Way ( self . id ,
                                                self . geometry ,
                                                self . tags )
               self . reset ()




   Andrii V. Mishkovskyi ()   Using OpenStreetMap data with Python   June 22, 2011   20 / 47
Parsing data
Relations
 from shapely . geometry import MultiPolygon , MultiLineString ,

 ways = {...} # dict of ways , with ids as keys

 class Relation ( object ):

        def __init__ ( self , id , members , tags ):
            self . id = id
            self . tags = tags
            if tags [ ’ type ’] == ’ multipolygon ’:
                 outer = [ ways [ member [ ’ ref ’ ]]
                             for member in members
                             if member [ ’ role ’] == ’ outer ’]
                 inner = [ ways [ member [ ’ ref ’ ]]
                             for member in members
                             if member [ ’ role ’] == ’ inner ’]
                 self . geometry = MultiPolygon ([( outer , inner )])

     Andrii V. Mishkovskyi ()   Using OpenStreetMap data with Python   June 22, 2011   21 / 47
Parsing data
Relations




          The importing code is left as an
              exercise for the reader




     Andrii V. Mishkovskyi ()   Using OpenStreetMap data with Python   June 22, 2011   21 / 47
For language zealots




           Excuse me for not using
               namedtuples.



  Andrii V. Mishkovskyi ()   Using OpenStreetMap data with Python   June 22, 2011   22 / 47
Parsing data: homework


      The idea is simple
      The implementation can use ElementTree if you
      work with small extracts of data
      Have to stick to SAX when parsing huge extracts
      or the whole planet data




  Andrii V. Mishkovskyi ()   Using OpenStreetMap data with Python   June 22, 2011   23 / 47
Existing solutions



      Osmosis
      osm2pgsql
      osm2mongo, osm2shp, etc.




  Andrii V. Mishkovskyi ()   Using OpenStreetMap data with Python   June 22, 2011   24 / 47
3     Rendering




Andrii V. Mishkovskyi ()   Using OpenStreetMap data with Python   June 22, 2011   25 / 47
Andrii V. Mishkovskyi ()   Using OpenStreetMap data with Python   June 22, 2011   26 / 47
Principles



      Scale
      Projection
      Cartography
      Types of maps




  Andrii V. Mishkovskyi ()   Using OpenStreetMap data with Python   June 22, 2011   27 / 47
Layers



      Not exactly physical layers
      Layers of graphical representation
      Don’t render text in several layers




  Andrii V. Mishkovskyi ()   Using OpenStreetMap data with Python   June 22, 2011   28 / 47
How to approach rendering



      Split your data in layers
      Make projection configurable
      Provide general way to select data sources
      Think about cartographers




  Andrii V. Mishkovskyi ()   Using OpenStreetMap data with Python   June 22, 2011   29 / 47
The magic of Mapnik


import mapnik

map = mapnik . Map (1      , 1    )
mapnik . load_map ( map , " style . xml " )
bbox = mapnik . Envelope ( mapnik . Coord ( -18 . , -9 . ) ,
                             mapnik . Coord (18 . , 9 . ))
map . zoom_to_box ( bbox )
mapnik . render_to_file ( map , ’ map . png ’ , ’ png ’)




   Andrii V. Mishkovskyi ()   Using OpenStreetMap data with Python   June 22, 2011   30 / 47
Magic?



      Mapnik’s interface is straightforward
      The implementation is not
      Complexity is hidden in XML




  Andrii V. Mishkovskyi ()   Using OpenStreetMap data with Python   June 22, 2011   31 / 47
Mapnik’s XML
< Style name = " Simple " >
   < Rule >
      < PolygonSymbolizer >
         < CssParameter name = " fill " ># f2eff9
         </ CssParameter >
      </ PolygonSymbolizer >
      < LineSymbolizer >
         < CssParameter name = " stroke " > red
         </ CssParameter >
         < CssParameter name = " stroke - width " > .1
         </ CssParameter >
      </ LineSymbolizer >
   </ Rule >
</ Style >



   Andrii V. Mishkovskyi ()   Using OpenStreetMap data with Python   June 22, 2011   32 / 47
Mapnik’s XML


< Layer name = " world " srs = " + proj = latlong + datum = WGS84 " >
   < StyleName > My Style </ StyleName >
   < Datasource >
      < Parameter name = " type " > shape
      </ Parameter >
      < Parameter name = " file " > world_borders
      </ Parameter >
   </ Datasource >
</ Layer >




   Andrii V. Mishkovskyi ()   Using OpenStreetMap data with Python   June 22, 2011   32 / 47
4     Searching




Andrii V. Mishkovskyi ()   Using OpenStreetMap data with Python   June 22, 2011   33 / 47
Andrii V. Mishkovskyi ()   Using OpenStreetMap data with Python   June 22, 2011   34 / 47
What’s that?



      Codename geocoding
      Similar to magnets
      Fast or correct – choose one




  Andrii V. Mishkovskyi ()   Using OpenStreetMap data with Python   June 22, 2011   35 / 47
Why is it hard?


      Fuzzy search
      Order matters
      But not always
      One place can have many names
      One name can correspond to many places
      People don’t care about this at all!




  Andrii V. Mishkovskyi ()   Using OpenStreetMap data with Python   June 22, 2011   36 / 47
Why is it hard?




                             I blame Google.




  Andrii V. Mishkovskyi ()     Using OpenStreetMap data with Python   June 22, 2011   36 / 47
Attempt at implementation



      Put restrictions
      Make the request structured
      Or at least assume order
      Assume valid input from users




  Andrii V. Mishkovskyi ()   Using OpenStreetMap data with Python   June 22, 2011   37 / 47
Attempt at implementation
def geocode (** query ):
    boundary = world
    for key in [ ’ country ’ , ’ zip ’ , ’ city ’ ,
                  ’ street ’ , ’ housenumber ’ ]:
        try :
              value = query [ key ]
              boundary = find ( key , value , boundary )
        except KeyError :
              continue
    return boundary

def find ( key , value , boundary ):
    for tags , geometry in data :
         if geometry in boundary and 
                  tags . get ( key ) == value :
               return geometry


   Andrii V. Mishkovskyi ()   Using OpenStreetMap data with Python   June 22, 2011   37 / 47
Fixing user input


Soundex/Metaphone/DoubleMetaphone
    Phonetic algorithms
    Works in 90% of the cases
    If your language is English
    Doesn’t work well for placenames




  Andrii V. Mishkovskyi ()   Using OpenStreetMap data with Python   June 22, 2011   38 / 47
Fixing user input

from itertools import groupby

def soundex ( word ):
    table = { ’b ’: 1 , ’f ’: 1 , ’p ’: 1 , ’v ’: 1 ,
               ’c ’: 2 , ’g ’: 2 , ’j ’: 2 , ...}
    yield word [ ]
    codes = ( table [ char ]
               for char in word [1:]
               if char in table )
    for code in groupby ( codes ):
        yield code




   Andrii V. Mishkovskyi ()   Using OpenStreetMap data with Python   June 22, 2011   38 / 47
Fixing user input

Edit distance
     Works for two words
     Most geocoding requests consist of several words
     Scanning database for each pair distance isn’t
     feasible
     Unless you have it cached already
     Check out Peter Norvig’s “How to Write Spelling a
     Corrector” article



  Andrii V. Mishkovskyi ()   Using OpenStreetMap data with Python   June 22, 2011   38 / 47
Fixing user input


N-grams
    Substrings of n items from the search string
    Easier to index than edit distance
    Gives less false positives than phonetic algorithm
    Trigrams most commonly used




  Andrii V. Mishkovskyi ()   Using OpenStreetMap data with Python   June 22, 2011   38 / 47
Fixing user input

from itertools import izip , islice , tee

def nwise ( iterable , count =2):
    iterables = enumerate ( tee ( iterable , count ))
    return izip (*[ islice ( iterable , start , None )
                     for start , iterables in iterables ])

def trigrams ( string ):
    string = ’ ’. join ([ ’ ’ , string , ’ ’ ]). lower ()
    return nwise ( string , 3)




   Andrii V. Mishkovskyi ()   Using OpenStreetMap data with Python   June 22, 2011   38 / 47
Making the search free-form



      Normalize input: remove the, a, . . .
      Use existing free-form search solution
      Combine ranks from different sources




  Andrii V. Mishkovskyi ()   Using OpenStreetMap data with Python   June 22, 2011   39 / 47
Making the search free-form

from operator import itemgetter
from collections import defaultdict

def freeform ( string ):
    ranks = defaultdict ( float )
    searchfuncs = [( phonetic , .3) ,
                      ( levenshtein , .15) ,
                      ( trigrams , .55)]
    for searchfunc , coef in searchfuncs :
        for match , rank in searchfunc ( string ):
             ranks [ match ] += rank * coef
    return max ( ranks . iteritems () , key = itemgetter (1))




   Andrii V. Mishkovskyi ()   Using OpenStreetMap data with Python   June 22, 2011   39 / 47
5     Routing




Andrii V. Mishkovskyi ()   Using OpenStreetMap data with Python   June 22, 2011   40 / 47
Andrii V. Mishkovskyi ()   Using OpenStreetMap data with Python   June 22, 2011   41 / 47
The problem



When introduced with routing problem, people think
Build graph, use Dijsktra, you’re done! (And they are
mostly right)




  Andrii V. Mishkovskyi ()   Using OpenStreetMap data with Python   June 22, 2011   42 / 47
The problem


Not that simple
     Graph is sparse
     Graph has to be updated often
     Dijkstra algorithm is too general
     A* is no better




  Andrii V. Mishkovskyi ()   Using OpenStreetMap data with Python   June 22, 2011   42 / 47
The problem


      Routing is not only a technical problem
      Different people expect different results for the
      same input
      Routing through cities is always a bad choice (even
      if it’s projected to be faster)




  Andrii V. Mishkovskyi ()   Using OpenStreetMap data with Python   June 22, 2011   42 / 47
Building the graph



      Adjacency matrix is not space-efficient
      The graph representation has to very compact
      networkx and igraph are both pretty good for a start




  Andrii V. Mishkovskyi ()   Using OpenStreetMap data with Python   June 22, 2011   43 / 47
Building the graph
from networkx import Graph , shortest_path

...

def build_graph ( ways ):
    graph = Graph ()
    for way , tags in ways :
        for segment in nwise ( way . coords ):
             weight = length ( segment ) * coef ( tags )
             graph . add_edge ( segment [ ] , segment [1] ,
                                weight = weight )
    return graph

shortest_path ( graph , source , dest )



   Andrii V. Mishkovskyi ()   Using OpenStreetMap data with Python   June 22, 2011   43 / 47
Building the graph


      There is no silver bullet
      No matter how nice these libs are, importing even
      Europe will require more than 20 GB of RAM
      Splitting data into country graphs is not enough
      Our in-house C++ graph library requires 20GB of
      mem for the whole world




  Andrii V. Mishkovskyi ()   Using OpenStreetMap data with Python   June 22, 2011   43 / 47
Other solutions


      PgRouting – easier to start with, couldn’t make it
      fast, harder to configure
      Neo4j – tried 2 years ago, proved to be lacking
      when presented with huge sparse graphs
      Eat your own dogfood – if doing “serious business”,
      most probably the best solution. Half-wink.




  Andrii V. Mishkovskyi ()   Using OpenStreetMap data with Python   June 22, 2011   44 / 47
Bored already?




Andrii V. Mishkovskyi ()     Using OpenStreetMap data with Python   June 22, 2011   45 / 47
Lighten up, I’m done




Andrii V. Mishkovskyi ()   Using OpenStreetMap data with Python   June 22, 2011   45 / 47
Highlights



      Start using OpenStreetMap data – it’s easy
      Try building something simple – it’s cool
      Try building something cool – it’s simple
      Python is one of the best languages [for doing GIS]




  Andrii V. Mishkovskyi ()   Using OpenStreetMap data with Python   June 22, 2011   46 / 47
Questions?


                       contact@mishkovskyi.net
                   Slides: mishkovskyi.net/ep2 11




Andrii V. Mishkovskyi ()   Using OpenStreetMap data with Python   June 22, 2011   47 / 47

Making use of OpenStreetMap data with Python

  • 1.
    Using OpenStreetMap datawith Python Andrii V. Mishkovskyi June 22, 2011 Andrii V. Mishkovskyi () Using OpenStreetMap data with Python June 22, 2011 1 / 47
  • 2.
    Who is thisdude anyway? I love Python I love OpenStreetMap I do map rendering at CloudMade using Python CloudMade uses OpenStreetMap data extensively Andrii V. Mishkovskyi () Using OpenStreetMap data with Python June 22, 2011 2 / 47
  • 3.
    Objectives Understand OpenStreetMap data structure How to parse it Get a feel of how basic GIS services work Andrii V. Mishkovskyi () Using OpenStreetMap data with Python June 22, 2011 3 / 47
  • 4.
    OpenStreetMap Founded in 2004 as a response to Ordnance Survey pricing scheme >400k registered users >16k active mappers Supported by Microsoft, MapQuest (AOL), Yahoo! Crowd-sourcing at its best Andrii V. Mishkovskyi () Using OpenStreetMap data with Python June 22, 2011 4 / 47
  • 5.
    Why OSM? Fairly easy Good quality Growing community Absolutely free Andrii V. Mishkovskyi () Using OpenStreetMap data with Python June 22, 2011 5 / 47
  • 6.
    1 Data layout Andrii V. Mishkovskyi () Using OpenStreetMap data with Python June 22, 2011 6 / 47
  • 7.
    Andrii V. Mishkovskyi() Using OpenStreetMap data with Python June 22, 2011 7 / 47
  • 8.
    Storage type XML (.osm) Protocol buffers (.pbf, in beta status) Other formats through 3rd parties (Esri shapefile, Garmin GPX, etc.) Andrii V. Mishkovskyi () Using OpenStreetMap data with Python June 22, 2011 8 / 47
  • 9.
    The data Each object has geometry, tags and changeset information Tags are simply a list of key/value pairs Geometry definition differs for different types Changeset is not interesting when simply using the data (as opposed to editing) Andrii V. Mishkovskyi () Using OpenStreetMap data with Python June 22, 2011 9 / 47
  • 10.
    Data types Node Geometric point or point of interest Way Collection of points Relation Collections of objects of any type Andrii V. Mishkovskyi () Using OpenStreetMap data with Python June 22, 2011 10 / 47
  • 11.
    Nodes < node id= " 592637238 " lat = " 47.1675211 " lon = " 9.5 89882 " version = " 2 " changeset = " 6628391 " user = " phinret " uid = " 135921 " timestamp = " 2 1 -12 -11 T19:2 :16Z " > < tag k= " amenity " v = " bar " / > < tag k= " name " v = " Black Pearl " / > </ node > Andrii V. Mishkovskyi () Using OpenStreetMap data with Python June 22, 2011 11 / 47
  • 12.
    Ways < way id=" 4781367 " version = " 1 " changeset = " 1 226 " uid = " 871 " user = " murmel " timestamp = " 2 7 - 6 -19 T 6:25:57Z " > < nd ref = " 3 6 4 7 " / > < nd ref = " 3 6 4 15 " / > < nd ref = " 3 6 4 17 " / > < nd ref = " 3 6 4 19 " / > < nd ref = " 3 6 4 2 " / > < tag k= " created_by " v = " JOSM " / > < tag k= " highway " v = " residential " / > < tag k= " name " v = " In den ÃĎusseren " / > </ way > Andrii V. Mishkovskyi () Using OpenStreetMap data with Python June 22, 2011 12 / 47
  • 13.
    Relations < relation id= " 16239 " version = " 699 " changeset = " 844 52 " uid = " 1224 6 " user = " hanskuster " timestamp = " 2 11 - 6 -14 T18:53:49Z " > < member type = " way " ref = " 75393767 " role = " outer " / > < member type = " way " ref = " 75393837 " role = " outer " / > < member type = " way " ref = " 75393795 " role = " outer " / > ... < member type = " way " ref = " 75393788 " role = " outer " / > < tag k= " admin_level " v = " 2 " / > < tag k= " boundary " v = " administrative " / > < tag k= " currency " v = " EUR " / > < tag k= " is_in " v = " Europe " / > < tag k= " ISO3166 -1 " v = " AT " / > < tag k= " name " v = " ÃŰsterreich " / > ... < tag k= " wikipedia:de " v = " ÃŰsterreich " / > < tag k= " wikipedia:en " v = " Austria " / > </ relation > Andrii V. Mishkovskyi () Using OpenStreetMap data with Python June 22, 2011 13 / 47
  • 14.
    2 Importing Andrii V. Mishkovskyi () Using OpenStreetMap data with Python June 22, 2011 14 / 47
  • 15.
    Andrii V. Mishkovskyi() Using OpenStreetMap data with Python June 22, 2011 15 / 47
  • 16.
    Major points whenparsing OSM Expect faulty data Parse iteratively Cache extensively Order of elements is not guaranteed But it’s generally: nodes, ways, relations Ids are unique to datatype, not to the whole data set Andrii V. Mishkovskyi () Using OpenStreetMap data with Python June 22, 2011 16 / 47
  • 17.
    Parsing data Using SAX Doing simple reprojection Create geometries using Shapely Andrii V. Mishkovskyi () Using OpenStreetMap data with Python June 22, 2011 17 / 47
  • 18.
    Parsing data Projection importpyproj projection = pyproj . Proj ( ’+ proj = merc + a =6378137 + b =6378137 ’ ’+ lat_ts = . + lon_ = . + x_ = . + y_ = ’ ’+k =1. + units = m + nadgrids = @null + wktext ’ ’+ no_defs ’) Andrii V. Mishkovskyi () Using OpenStreetMap data with Python June 22, 2011 18 / 47
  • 19.
    Parsing data Nodes fromshapely . geometry import Point class Node ( object ): def __init__ ( self , id , lonlat , tags ): self . id = id self . geometry = Point ( projection (* lonlat )) self . tags = tags Andrii V. Mishkovskyi () Using OpenStreetMap data with Python June 22, 2011 19 / 47
  • 20.
    Parsing data Nodes classSimpleHandler ( sax . handler . ContentHandler ): def __init__ ( self ): sax . handler . ContentHandler . __init__ ( self ) self . id = None self . geometry = None self . nodes = {} def startElement ( self , name , attrs ): if name == ’ node ’: self . id = attrs [ ’ id ’] self . tags = {} self . geometry = map ( float , ( attrs [ ’ lon ’] , attrs [ ’ lat ’ ])) elif name == ’ tag ’: self . tags [ attrs [ ’k ’ ]] = attrs [ ’v ’] Andrii V. Mishkovskyi () Using OpenStreetMap data with Python June 22, 2011 19 / 47
  • 21.
    Parsing data Nodes def endElement ( self , name ): if name == ’ node ’: self . nodes [ self . id ] = Node ( self . id , self . geometry , self . tags ) self . id = None self . geometry = None self . tags = None Andrii V. Mishkovskyi () Using OpenStreetMap data with Python June 22, 2011 19 / 47
  • 22.
    Parsing data Ways fromshapely . geometry import LineString nodes = {...} # dict of nodes , keyed by their ids class Way ( object ): def __init__ ( self , id , refs , tags ): self . id = id self . geometry = LineString ( [( nodes [ ref ]. x , nodes [ ref ]. y ) for ref in refs ]) self . tags = tags Andrii V. Mishkovskyi () Using OpenStreetMap data with Python June 22, 2011 20 / 47
  • 23.
    Parsing data Ways classSimpleHandler ( sax . handler . ContentHandler ): def __init__ ( self ): ... self . ways = {} def startElement ( self , name , attrs ): if name == ’ way ’: self . id = attrs [ ’ id ’] self . tags = {} self . geometry = [] elif name == ’ nd ’: self . geometry . append ( attrs [ ’ ref ’ ]) Andrii V. Mishkovskyi () Using OpenStreetMap data with Python June 22, 2011 20 / 47
  • 24.
    Parsing data Ways def reset ( self ): self . id = None self . geometry = None self . tags = None def endElement ( self , name ): if name == ’ way ’: self . way [ self . id ] = Way ( self . id , self . geometry , self . tags ) self . reset () Andrii V. Mishkovskyi () Using OpenStreetMap data with Python June 22, 2011 20 / 47
  • 25.
    Parsing data Relations fromshapely . geometry import MultiPolygon , MultiLineString , ways = {...} # dict of ways , with ids as keys class Relation ( object ): def __init__ ( self , id , members , tags ): self . id = id self . tags = tags if tags [ ’ type ’] == ’ multipolygon ’: outer = [ ways [ member [ ’ ref ’ ]] for member in members if member [ ’ role ’] == ’ outer ’] inner = [ ways [ member [ ’ ref ’ ]] for member in members if member [ ’ role ’] == ’ inner ’] self . geometry = MultiPolygon ([( outer , inner )]) Andrii V. Mishkovskyi () Using OpenStreetMap data with Python June 22, 2011 21 / 47
  • 26.
    Parsing data Relations The importing code is left as an exercise for the reader Andrii V. Mishkovskyi () Using OpenStreetMap data with Python June 22, 2011 21 / 47
  • 27.
    For language zealots Excuse me for not using namedtuples. Andrii V. Mishkovskyi () Using OpenStreetMap data with Python June 22, 2011 22 / 47
  • 28.
    Parsing data: homework The idea is simple The implementation can use ElementTree if you work with small extracts of data Have to stick to SAX when parsing huge extracts or the whole planet data Andrii V. Mishkovskyi () Using OpenStreetMap data with Python June 22, 2011 23 / 47
  • 29.
    Existing solutions Osmosis osm2pgsql osm2mongo, osm2shp, etc. Andrii V. Mishkovskyi () Using OpenStreetMap data with Python June 22, 2011 24 / 47
  • 30.
    3 Rendering Andrii V. Mishkovskyi () Using OpenStreetMap data with Python June 22, 2011 25 / 47
  • 31.
    Andrii V. Mishkovskyi() Using OpenStreetMap data with Python June 22, 2011 26 / 47
  • 32.
    Principles Scale Projection Cartography Types of maps Andrii V. Mishkovskyi () Using OpenStreetMap data with Python June 22, 2011 27 / 47
  • 33.
    Layers Not exactly physical layers Layers of graphical representation Don’t render text in several layers Andrii V. Mishkovskyi () Using OpenStreetMap data with Python June 22, 2011 28 / 47
  • 34.
    How to approachrendering Split your data in layers Make projection configurable Provide general way to select data sources Think about cartographers Andrii V. Mishkovskyi () Using OpenStreetMap data with Python June 22, 2011 29 / 47
  • 35.
    The magic ofMapnik import mapnik map = mapnik . Map (1 , 1 ) mapnik . load_map ( map , " style . xml " ) bbox = mapnik . Envelope ( mapnik . Coord ( -18 . , -9 . ) , mapnik . Coord (18 . , 9 . )) map . zoom_to_box ( bbox ) mapnik . render_to_file ( map , ’ map . png ’ , ’ png ’) Andrii V. Mishkovskyi () Using OpenStreetMap data with Python June 22, 2011 30 / 47
  • 36.
    Magic? Mapnik’s interface is straightforward The implementation is not Complexity is hidden in XML Andrii V. Mishkovskyi () Using OpenStreetMap data with Python June 22, 2011 31 / 47
  • 37.
    Mapnik’s XML < Stylename = " Simple " > < Rule > < PolygonSymbolizer > < CssParameter name = " fill " ># f2eff9 </ CssParameter > </ PolygonSymbolizer > < LineSymbolizer > < CssParameter name = " stroke " > red </ CssParameter > < CssParameter name = " stroke - width " > .1 </ CssParameter > </ LineSymbolizer > </ Rule > </ Style > Andrii V. Mishkovskyi () Using OpenStreetMap data with Python June 22, 2011 32 / 47
  • 38.
    Mapnik’s XML < Layername = " world " srs = " + proj = latlong + datum = WGS84 " > < StyleName > My Style </ StyleName > < Datasource > < Parameter name = " type " > shape </ Parameter > < Parameter name = " file " > world_borders </ Parameter > </ Datasource > </ Layer > Andrii V. Mishkovskyi () Using OpenStreetMap data with Python June 22, 2011 32 / 47
  • 39.
    4 Searching Andrii V. Mishkovskyi () Using OpenStreetMap data with Python June 22, 2011 33 / 47
  • 40.
    Andrii V. Mishkovskyi() Using OpenStreetMap data with Python June 22, 2011 34 / 47
  • 41.
    What’s that? Codename geocoding Similar to magnets Fast or correct – choose one Andrii V. Mishkovskyi () Using OpenStreetMap data with Python June 22, 2011 35 / 47
  • 42.
    Why is ithard? Fuzzy search Order matters But not always One place can have many names One name can correspond to many places People don’t care about this at all! Andrii V. Mishkovskyi () Using OpenStreetMap data with Python June 22, 2011 36 / 47
  • 43.
    Why is ithard? I blame Google. Andrii V. Mishkovskyi () Using OpenStreetMap data with Python June 22, 2011 36 / 47
  • 44.
    Attempt at implementation Put restrictions Make the request structured Or at least assume order Assume valid input from users Andrii V. Mishkovskyi () Using OpenStreetMap data with Python June 22, 2011 37 / 47
  • 45.
    Attempt at implementation defgeocode (** query ): boundary = world for key in [ ’ country ’ , ’ zip ’ , ’ city ’ , ’ street ’ , ’ housenumber ’ ]: try : value = query [ key ] boundary = find ( key , value , boundary ) except KeyError : continue return boundary def find ( key , value , boundary ): for tags , geometry in data : if geometry in boundary and tags . get ( key ) == value : return geometry Andrii V. Mishkovskyi () Using OpenStreetMap data with Python June 22, 2011 37 / 47
  • 46.
    Fixing user input Soundex/Metaphone/DoubleMetaphone Phonetic algorithms Works in 90% of the cases If your language is English Doesn’t work well for placenames Andrii V. Mishkovskyi () Using OpenStreetMap data with Python June 22, 2011 38 / 47
  • 47.
    Fixing user input fromitertools import groupby def soundex ( word ): table = { ’b ’: 1 , ’f ’: 1 , ’p ’: 1 , ’v ’: 1 , ’c ’: 2 , ’g ’: 2 , ’j ’: 2 , ...} yield word [ ] codes = ( table [ char ] for char in word [1:] if char in table ) for code in groupby ( codes ): yield code Andrii V. Mishkovskyi () Using OpenStreetMap data with Python June 22, 2011 38 / 47
  • 48.
    Fixing user input Editdistance Works for two words Most geocoding requests consist of several words Scanning database for each pair distance isn’t feasible Unless you have it cached already Check out Peter Norvig’s “How to Write Spelling a Corrector” article Andrii V. Mishkovskyi () Using OpenStreetMap data with Python June 22, 2011 38 / 47
  • 49.
    Fixing user input N-grams Substrings of n items from the search string Easier to index than edit distance Gives less false positives than phonetic algorithm Trigrams most commonly used Andrii V. Mishkovskyi () Using OpenStreetMap data with Python June 22, 2011 38 / 47
  • 50.
    Fixing user input fromitertools import izip , islice , tee def nwise ( iterable , count =2): iterables = enumerate ( tee ( iterable , count )) return izip (*[ islice ( iterable , start , None ) for start , iterables in iterables ]) def trigrams ( string ): string = ’ ’. join ([ ’ ’ , string , ’ ’ ]). lower () return nwise ( string , 3) Andrii V. Mishkovskyi () Using OpenStreetMap data with Python June 22, 2011 38 / 47
  • 51.
    Making the searchfree-form Normalize input: remove the, a, . . . Use existing free-form search solution Combine ranks from different sources Andrii V. Mishkovskyi () Using OpenStreetMap data with Python June 22, 2011 39 / 47
  • 52.
    Making the searchfree-form from operator import itemgetter from collections import defaultdict def freeform ( string ): ranks = defaultdict ( float ) searchfuncs = [( phonetic , .3) , ( levenshtein , .15) , ( trigrams , .55)] for searchfunc , coef in searchfuncs : for match , rank in searchfunc ( string ): ranks [ match ] += rank * coef return max ( ranks . iteritems () , key = itemgetter (1)) Andrii V. Mishkovskyi () Using OpenStreetMap data with Python June 22, 2011 39 / 47
  • 53.
    5 Routing Andrii V. Mishkovskyi () Using OpenStreetMap data with Python June 22, 2011 40 / 47
  • 54.
    Andrii V. Mishkovskyi() Using OpenStreetMap data with Python June 22, 2011 41 / 47
  • 55.
    The problem When introducedwith routing problem, people think Build graph, use Dijsktra, you’re done! (And they are mostly right) Andrii V. Mishkovskyi () Using OpenStreetMap data with Python June 22, 2011 42 / 47
  • 56.
    The problem Not thatsimple Graph is sparse Graph has to be updated often Dijkstra algorithm is too general A* is no better Andrii V. Mishkovskyi () Using OpenStreetMap data with Python June 22, 2011 42 / 47
  • 57.
    The problem Routing is not only a technical problem Different people expect different results for the same input Routing through cities is always a bad choice (even if it’s projected to be faster) Andrii V. Mishkovskyi () Using OpenStreetMap data with Python June 22, 2011 42 / 47
  • 58.
    Building the graph Adjacency matrix is not space-efficient The graph representation has to very compact networkx and igraph are both pretty good for a start Andrii V. Mishkovskyi () Using OpenStreetMap data with Python June 22, 2011 43 / 47
  • 59.
    Building the graph fromnetworkx import Graph , shortest_path ... def build_graph ( ways ): graph = Graph () for way , tags in ways : for segment in nwise ( way . coords ): weight = length ( segment ) * coef ( tags ) graph . add_edge ( segment [ ] , segment [1] , weight = weight ) return graph shortest_path ( graph , source , dest ) Andrii V. Mishkovskyi () Using OpenStreetMap data with Python June 22, 2011 43 / 47
  • 60.
    Building the graph There is no silver bullet No matter how nice these libs are, importing even Europe will require more than 20 GB of RAM Splitting data into country graphs is not enough Our in-house C++ graph library requires 20GB of mem for the whole world Andrii V. Mishkovskyi () Using OpenStreetMap data with Python June 22, 2011 43 / 47
  • 61.
    Other solutions PgRouting – easier to start with, couldn’t make it fast, harder to configure Neo4j – tried 2 years ago, proved to be lacking when presented with huge sparse graphs Eat your own dogfood – if doing “serious business”, most probably the best solution. Half-wink. Andrii V. Mishkovskyi () Using OpenStreetMap data with Python June 22, 2011 44 / 47
  • 62.
    Bored already? Andrii V.Mishkovskyi () Using OpenStreetMap data with Python June 22, 2011 45 / 47
  • 63.
    Lighten up, I’mdone Andrii V. Mishkovskyi () Using OpenStreetMap data with Python June 22, 2011 45 / 47
  • 64.
    Highlights Start using OpenStreetMap data – it’s easy Try building something simple – it’s cool Try building something cool – it’s simple Python is one of the best languages [for doing GIS] Andrii V. Mishkovskyi () Using OpenStreetMap data with Python June 22, 2011 46 / 47
  • 65.
    Questions? contact@mishkovskyi.net Slides: mishkovskyi.net/ep2 11 Andrii V. Mishkovskyi () Using OpenStreetMap data with Python June 22, 2011 47 / 47