This is the second part of a two-part blog post about some of our work on making it easier to deploy FixMyStreet and MapIt in new countries. This part describes how to generate KML files for every useful administrative and political boundary in OpenStreetMap.
The previous post on this subject described how to take the ID for a particular relation or way that represents a boundary in OpenStreetMap, and generate a KML file for it. That’s much of the work that we needed to create MapIt Global, but there are a few more significant steps that were required:
Efficiently extracting boundaries en masse
The code I previously described for extracting a boundary from OpenStreetMap used a public Overpass API server. This is fine for occasional boundaries, but, given that there are hundreds of thousands of boundaries in OSM, ideally we don’t want to be hitting a public server that many times – it puts a large load on that server, and is extremely slow. As an alternative, I tried parsing the OSM planet file with a SAX-based parser, but this also turned out to be very slow – multiple passes of the file were required to pick out the required nodes, ways and elements, and keeping the memory requirements down to something reasonable was tricky. (Using the PBF format would have helped a bit, but presented the same algorithmic problems.) Eventually, I decided that a better approach was simply to set up a local Overpass API server, and to query that. This is a great improvement – it allows very fast lookups of the data we need, and the query language is flexible enough to be able to retrieve huge sets of relations and ways that match the tags we’re interested in. It also means we would no longer have problems if connectivity to the remote server went down.
Another question that arose when scaling up the boundary extraction was, “Which set of tags we should consider as boundaries of interest?” On the first import, we considered any relation or way with the tag boundary=”adminstrative” and where the admin_level tag was one of “2″, “3″, “4″, … “11″. At the time, there were about 225,000 such elements that represented closed boundaries. Afterwards, it was pointed out to me that we should also include elements with the tag boundary=”political”, which includes parliamentary constituencies in the UK, for example. For later import purposes, I gave each of these boundary types a 3-letter code in MapIt, which are as follows:
- O02 (boundary="administrative", admin_level="2")
- O03 (boundary="administrative", admin_level="3")
- O11 (boundary="administrative", admin_level="11")
- OLC (boundary="political", political_division="linguistic_community")
- OIC (boundary="political", political_division="insular_council")
- OEC (boundary="political", political_division="euro_const")
- OCA (boundary="political", political_division="canton")
- OCL (boundary="political", political_division="circonscription_législative")
- OPC (boundary="political", political_division="parl_const")
- OCD (boundary="political", political_division="county_division")
- OWA (boundary="political", political_division="ward")
Importing Boundaries into MapIt
The next step in building our service was to take the 236,000 KML files generated by the previous step and import them into MapIt.
The code that creates the KML file for an element includes all its OpenStreetMap tags in the <ExtendedData/> section. On importing the KML into MapIt, there are only a few of those tags that we’re interested in – chiefly those that describe the alternative names of the area. We have to pick a canonical name for the boundary, which is currently done by taking the first of name, name:en and place_name that exists. If none of those exist, the area is given a default name of the form “Unknown name for (way|relation) with ID (element ID)”. There are also tags for the name of a country in different languages, which we also import into the database so that localized versions of the name of the boundaries will be available through MapIt with their ISO 639 language code.
Another tricky consideration when importing the data is how to deal with boundaries that have changed or disappeared since the previous import. MapIt has a concept of generations, so we could perfectly preserve the boundaries from the previous import as an earlier generation. This would certainly be desirable in one respect, since if someone is depending on the service they should be able to pick a generation that they have tested their application against, and then not have to worry about a boundary disappearing on the next import. However, with quarterly imports the size of our database would grow quite dramatically: I found that approximately 50% of the boundaries in MapIt Global had changed over the 5 months since the initial import. Our proposed compromise solution is that we will only keep the polygons associated with areas in the two most recent generations, and notify any known users of the service when a new generation is available for them to test and subsequently migrate to.
For reference, you can see the script that extracts boundaries and generates the KML files and the Django admin command for importing these files into MapIt.
The end result: MapIt Global
The aim of all this work was to create our now-launched web service, MapIt Global. As far as we know, this is the only API that allows you to take the latitude and longitude of any point on the globe, and look up the administrative and political boundaries in OpenStreetMap that surround that point. We hope it’s of use to anyone trying to build services that need to look up administrative boundaries – please let us know!
Photo credit: Hadrian’s Wall by Joe Dunckley,
One of the projects we’ve been working on at mySociety recently is that of making it easier for people to set up new versions of our sites in other countries. Something we’ve heard again and again is that for many people, setting up new web applications is a frustrating process, and that they would appreciate anything that would make it easier.
You can use these AMIs to create a running version of one of these sites on an Amazon EC2 instance with just a couple of clicks. If you haven’t used Amazon Web Services before, then you can get a Micro instance free for a year to try this out. (We have previously published an AMI for Alaveteli, which helped many people to get started with setting up their own Freedom of Information sites.)
Each AMI is created from an install script designed to be used on a clean installation of Debian squeeze or Ubuntu precise, so if you have a new server hosted elsewhere, you can use that script to similarly create a default installation of the site without being dependent on Amazon:
In addition, we’ve launched new sites with documentation for FixMyStreet and MapIt, which will tell you how to customize those sites and import data if you’ve created a running instance using one of the above methods.
These documentation sites also have improved instructions for completely manual installations of either site, for people who are comfortable with setting up PostgreSQL, Apache / nginx, PostGIS, etc.
We hope that these changes make it easier than ever before to reuse our code, and set up new sites that help people fix things that matter to them.
Photo credit: duncan
This is the first part of a two-part blog post about some of our work on making it easier to deploy FixMyStreet and MapIt in new countries. This part describes how to generate KML for a given boundary in OpenStreetMap. Update: the second part is now available.
As mentioned in a previous post, we’re looking at ways of making it smoother to deploy FixMyStreet for a new country, city or use-case. Essentially there are two fundamental bits of data that you need for this:
- a mapping between a latitude and longitude (or postcode) to all the adminstrative areas that cover that point.
- a mapping between problem type and adminstrative areas to an appropriate email address for reporting that problem.
The first of those is typically provided by a service called MapIt, an open source GeoDjango web application written by Matthew Somerville. In the UK we are fortunate that official boundary data and the postcode database have now been released under an open data license from the Ordnance Survey. However, in other many other countries similar data is unavailable, or not available under reasonable licensing conditions. In such cases, though, all is not lost thanks to the extraordinary work of contributors to the OpenStreetMap project. OpenStreetMap contains high-quality administrative boundary data for many countries of the world, and we know that data submitted to the project is available under the Creative Commons Attribution-ShareAlike license, so we can reuse it in a web service like MapIt.
The first step towards being able to build an instance of MapIt based on OpenStreetMap boundary data is to be able to generate a shapefile that represents a boundary in the project. (In OpenStreetMap’s data model, boundaries are represented as either ways or relations, and those that we are interested in are tagged with boundary=administrative.) Matthew had previously written code to generate KML files for boundaries in Norway in order to help to set up an instance of MapIt for Norway, which is used by FiksGataMi. However, that script was quite specific to the organization of boundaries in that country, and it did not deal with more complex boundary topologies (e.g. enclaves), or different representations of boundaries (e.g. multiply nested relations).
So, Matthew and I wrote a new version of the code to extract a boundary from OpenStreetMap and generate a KML representation of it. The new version uses the Overpass API instead of XAPI, since it allows us to specify multiple predicates in the query and recursively fetch the ways and nodes that are contained in a relation. Once all the ways that make up a relation have been fetched (ignoring those with roles like “defaults” or “subarea”), the script tries to join each unclosed way to any other with which it shares an endpoint. We should end up with a series of closed polygons – the script exits in error if there are any unclosed ways left. We can then directly create KML from these polygons, the only subtlety being that we need to mark certain boundaries as being an inner boundary (i.e., creating a hole in a boundary) if they had the role “enclave” or “inner” in an OpenStreetMap relation. For example, the South Cambridgeshire District Council boundary has a Cambridge City Council-shaped hole in it:
Similarly, the script has to cope with multiple distinct polygons, such as the boundary of Orkney.
$ bin/boundaries.py --help Usage: boundaries.py [options] Options: -h, --help show this help message and exit --test Run all doctests in this file --relation=<RELATION_ID> Output KML for the OSM relation <RELATION_ID> --way=<WAY_ID> Output KML for the OSM way <WAY_ID>
For example, to generate a KML boundary for the Hottingen area of Zürich, you can do:
$ bin/boundaries.py --relation=1701449 > hottingen.kml
In the next blog post in this series, we will discuss extracting such boundaries en masse and creating a service based on them.
We’ve already put a lot of work into making the code base for the FixMyStreet platform generic and country-neutral, but we’d like to make the process of setting up such a site easier than it is at present.
The first step in this project is going to be contacting as many people across the world as we can who have thought about trying to set up their own version of FixMyStreet, or who’ve actually tried.
We’ll be talking to people ranging from those who have active, running sites, to those who never got past the stage of thinking it might be a nice idea. We want to find out what things presented particular difficulties, and which of the next steps we’re considering would make the greatest difference to international adoption of FixMyStreet.
Some of the things we’re interested in, for example, include:
- Would you be interested in a hosted version of FixMyStreet, or do you prefer to set up a version locally?
- How difficult did you find the process of finding administrative boundaries for your country? Are the boundaries in OpenStreetMap good enough for your use? As a last resort, would you want to draw the boundaries manually?
- Would you be interested in an automated setup procedure which deploys a new server for your locality and then just requires web-based setup?