This is the second part of a two-part blog post about some of our work on making it easier to deploy FixMyStreet and MapIt in new countries. This part describes how to generate KML files for every useful administrative and political boundary in OpenStreetMap.
The previous post on this subject described how to take the ID for a particular relation or way that represents a boundary in OpenStreetMap, and generate a KML file for it. That’s much of the work that we needed to create MapIt Global, but there are a few more significant steps that were required:
Efficiently extracting boundaries en masse
The code I previously described for extracting a boundary from OpenStreetMap used a public Overpass API server. This is fine for occasional boundaries, but, given that there are hundreds of thousands of boundaries in OSM, ideally we don’t want to be hitting a public server that many times – it puts a large load on that server, and is extremely slow. As an alternative, I tried parsing the OSM planet file with a SAX-based parser, but this also turned out to be very slow – multiple passes of the file were required to pick out the required nodes, ways and elements, and keeping the memory requirements down to something reasonable was tricky. (Using the PBF format would have helped a bit, but presented the same algorithmic problems.) Eventually, I decided that a better approach was simply to set up a local Overpass API server, and to query that. This is a great improvement – it allows very fast lookups of the data we need, and the query language is flexible enough to be able to retrieve huge sets of relations and ways that match the tags we’re interested in. It also means we would no longer have problems if connectivity to the remote server went down.
Another question that arose when scaling up the boundary extraction was, “Which set of tags we should consider as boundaries of interest?” On the first import, we considered any relation or way with the tag boundary=”adminstrative” and where the admin_level tag was one of “2″, “3″, “4″, … “11″. At the time, there were about 225,000 such elements that represented closed boundaries. Afterwards, it was pointed out to me that we should also include elements with the tag boundary=”political”, which includes parliamentary constituencies in the UK, for example. For later import purposes, I gave each of these boundary types a 3-letter code in MapIt, which are as follows:
- O02 (boundary="administrative", admin_level="2")
- O03 (boundary="administrative", admin_level="3")
- O11 (boundary="administrative", admin_level="11")
- OLC (boundary="political", political_division="linguistic_community")
- OIC (boundary="political", political_division="insular_council")
- OEC (boundary="political", political_division="euro_const")
- OCA (boundary="political", political_division="canton")
- OCL (boundary="political", political_division="circonscription_législative")
- OPC (boundary="political", political_division="parl_const")
- OCD (boundary="political", political_division="county_division")
- OWA (boundary="political", political_division="ward")
Importing Boundaries into MapIt
The next step in building our service was to take the 236,000 KML files generated by the previous step and import them into MapIt.
The code that creates the KML file for an element includes all its OpenStreetMap tags in the <ExtendedData/> section. On importing the KML into MapIt, there are only a few of those tags that we’re interested in – chiefly those that describe the alternative names of the area. We have to pick a canonical name for the boundary, which is currently done by taking the first of name, name:en and place_name that exists. If none of those exist, the area is given a default name of the form “Unknown name for (way|relation) with ID (element ID)”. There are also tags for the name of a country in different languages, which we also import into the database so that localized versions of the name of the boundaries will be available through MapIt with their ISO 639 language code.
Another tricky consideration when importing the data is how to deal with boundaries that have changed or disappeared since the previous import. MapIt has a concept of generations, so we could perfectly preserve the boundaries from the previous import as an earlier generation. This would certainly be desirable in one respect, since if someone is depending on the service they should be able to pick a generation that they have tested their application against, and then not have to worry about a boundary disappearing on the next import. However, with quarterly imports the size of our database would grow quite dramatically: I found that approximately 50% of the boundaries in MapIt Global had changed over the 5 months since the initial import. Our proposed compromise solution is that we will only keep the polygons associated with areas in the two most recent generations, and notify any known users of the service when a new generation is available for them to test and subsequently migrate to.
For reference, you can see the script that extracts boundaries and generates the KML files and the Django admin command for importing these files into MapIt.
The end result: MapIt Global
The aim of all this work was to create our now-launched web service, MapIt Global. As far as we know, this is the only API that allows you to take the latitude and longitude of any point on the globe, and look up the administrative and political boundaries in OpenStreetMap that surround that point. We hope it’s of use to anyone trying to build services that need to look up administrative boundaries – please let us know!
Photo credit: Hadrian’s Wall by Joe Dunckley,