Stefan Wehrmeyer

On the things I do.

A Mapnificent World

I just released Mapnificent for 17 cities in the US and some other cities world wide (before Mapnificent was only available for Berlin and somehow for London night buses). You can watch a short video about what Mapnificent is and what it can do here. This post will explain how Mapnificent works and how I scaled it up to more than 20 cities with hopefully more cities to come.

Summary (tl;dr)

I get the data from the awesome GTFS Data Exchange, extract and compress all information necessary to calculate all journeys from any point to every stop in the system in the browser with JavaScript and visualize the result with an HTML5 Canvas overlay on a Google map. I hope people use it to find the best place to live, work and meet and that they discover new things about their public transport system. To make Mapnificent truly useful, there is an API to embed a Mapnificent overlay in your Google Maps application.

Data

Google specified the GTFS format in order to be able to use public transport information in Google Maps. The public transport providers publish a file with all necessary information inside and the magnificent GTFS Data Exchange collects these files and keeps them up-to-date in one place. So getting the data for the cities is easy. The hard part was that I have to read and adhere to the license agreement of every single provider. Some providers like New York’s MTA do it right by not having any complicated license text at all, some like San Francisco require me to put a text of 98(!) words on the site. I don’t know what they hope to accomplish, but I can work with that. However, I have a problem at the moment with the Terms of Use of Los Angeles Metro and San Diego Metro where they say that I’m only authorized to reproduce the data  in the form provided by them. What? Shall I show users the raw GTFS files? I guess not, but Mapnificent aggregates and converts data and I’m unsure if this is covered by the license. I have tried to contact Metro LA on two different channels, but did not get a response (their sites also make it extremely difficult to contact them by email). If you can change anything about that, please feel free to do it.

Routing

From the GTFS data Mapnificent knows all routes and their stops, how long a route needs between two stops and the average headway of each route in a given time range (e.g. in the morning). This is compressed into JSON and delivered to your browser. When you set a point, Mapnificent looks for the closest couple of stations, then starts to travel to every station in the network it can reach and records the minimum amount of time it takes to a station. The calculation process runs in a Web Worker thread. For some smaller public transport systems it is so fast that I can enable the “Calculate on Drag” feature: check it out for Sacramento and Berlin.

There is a major heuristic involved: how can Mapnificent know how long a trip takes when it doesn’t even know the exact time the next bus/train/helicopter departs from a stop? It really can’t know for sure. To compensate for this lack of knowledge Mapnificent makes some assumptions.

It assumes that you will time your journey in a way that you don’t have to wait for your first transport option. Of course this does not always hold in real life, but it’s a reasonable assumption to work with.

It assumes further that if you change transportation, you will probably not immediatley catch the next bus/train/spaceship. The average waiting time could be thought of as headway divided by two. However, I found that the waiting time (for Berlin) is actually lower. That can be explained by the fact that the public transport agency supposedly designed the timetables in a smart way that allows connecting trains/busses/submarines to wait for each other in order to let riders change more quickly. Also people plan their trips in a way that reduces waiting time. So the waiting time can be considered to be less than the mathematical average. I tried around a bit with a logarithmic formula which worked ok for Berlin, but for the whole world I settled for a simpler approach: headway divided by three.

How accurate is it?

Difficult to say. I have actually never been to any of the cities except Berlin and London, so I don’t know if any important transportation options are missing (because they are not available as a GTFS feed) or if anything is totally incorrect. I created Mapnificent for Berlin first (because I usually live there) and then used it as the basis for a seminar at Hasso Plattner Institute (where I’m a Master student). There I actually had the chance to compare the times calculated by Mapnificent Berlin (starting point was Berlin main station) to times from an actual trip planner command line interface. Here is a graph that shows how many errors in minutes there were for trips to every station in Berlin:

You can see that the majority of trips (53 %) have an error less than or equal to 5 minutes and 96 % have an error of less than or equal to 20 minutes. I traced the outliers to stops only reachable by very infrequent buses with headways of around an hour. This gave me some confidence about the relative accuracy of Berlin, but at the moment I can’t do the same analysis for all the other cities I added to Mapnificent. I asked friends to check out some routes in cities they know and it seemed to be sufficiently correct to them. If you find anything totally odd, please contact me.

Performance

Well, to sum it up, I can’t wait for browsers to get true graphics hardware acceleration. JavaScript performance is ok for latest browsers, but drawing a couple of thousand circles noticeably blocks the UI thread in browsers for some cities. Also a really annoying bug in Webkit’s Canvas Compositing prevents the awesome intersection feature from working in Chrome and Safari. If you can change anything about that, I urge you to do it.

Mapnificent requires Web Worker and Canvas to work. I tried the latest official versions of Opera (fast), Chrome (fast), Safari (ok) and Firefox (slow) as well as the latest beta versions of Webkit, Chromium (both seem to have a bug that cuts of the lower part of the canvas) and Firefox 4. I didn’t try Internet Explorer 9 (not sure if they already have Web Worker support), but I don’t have Vista or Windows 7, so I won’t be able to test it there anyways.

If I was a research department in a big company I would say that the product is ready for the browsers of the average web user in 2-3 years and would wait accordingly. But since that is not the case, I release it now, even though it pains me to see people using Mapnificent in suboptimal environments. Please browser developers, make that pain go away soon.

What is it for?

Mapnificent ist not a trip planning service and you should in fact use the trip planning services of the public transport providers that are linked to in the bottom left corner of the site.

Mapnificent is rather a tool to explore many fuzzy trips at once to see more possibilities where to live, to work or to meet up. With the integrated Google Local Search you may discover that even though some place is a few miles away, it is still reachable with public transport in a couple of minutes. Mapnificent on its own is fun to play with and looks awesome, but to uncover the real use cases, it has a JavaScript API that lets you integrate a Mapnificent overlay into your Google Maps application.

The old Mapnificent

I ported Berlin and London to the new Mapnificent, but they both rely on hand scraped data sources which are difficult to maintain. The old blog post about Mapnificent is wildly outdated, but might be fun read. The source of the new Mapnificent is not published as Open Source at the moment (I still have to sort some stuff out), but you can still find the old Mapnificent source on my GitHub account.

Closing Words

I want to thank all the beta testers that gave awesome feedback, Mapnificent would be uglier and less usable without you. I am excited to bring this project to the next level and I can’t even begin to imagine where Mapnificent is going from here.

It’s Mapnificent

This blog post is wildly outdated in most parts. A description of the latest Mapnificent version can be found here: A Mapnificent World.

Mapnificent visualizes interesting data sets with interactive layers on a map of Berlin. It was inspired by Mapumental, a project by MySociety. Mapumental looks awesome and I wanted to build something like that. I am going to talk about how I implemented Mapnificent, specifically the Public Transport layer and how it compares to Mapumental.

But let me shortly describe how Mapumental seems to work (did not try it myself, it’s still in private beta). The project page of Mapumental on mysociety.org explains that Mapumental is basically a flash application that loads images from a tile server. As I understand it, when you set a point on the Mapumental map covering whole UK, Mapumental will calculate the quickest route from every station in UK (there are 300.000) to this point with an optimized routing algorithm written in C++ based on the actual timetables combined with heavy caching. It’s kind of hardcore and requires a cloud setup to scale.

Before Mapumental there was Travel Maps by mysociety, a program that calculated journeys and displayed them in one image. However, you have to fill in a form to request a quote for a custom map. So it takes a lot more time and is not for free. I am just guessing here, but they probably just made a lot of requests to a travel planner site and scraped the result for their custom maps.

I started similarly. I build an interactive map that shows how long a journey to the Hasso Plattner Institute in Potsdam, Germany would take when you start from any point in a grid over the general area of Berlin. For this I actually queried the VBB Fahrinfo site about 30.000 times and it took a day to gather all the data (their service is not really fast).

When I had a minute value for every point in the grid, I wondered how to visualize this efficiently without falling back to proprietary technologies like flash and heavy stuff like tile servers. My first approach was using Google Maps API Overlay Classes. To keep it simple: doesn’t scale. Too many datapoints each one represented in the DOM did not work for me.

Then I experimented with the Canvas element. I placed a canvas outside but still over the Google Map. Obviously that sucks: you can’t click on the map anymore. With the help of the really handy ELabel (crazy homepage) I placed the canvas inside the map: now you can click on the map and move it, great. But wait, how about zooming in? That doesn’t work. My fix was making the canvas bigger so it would cover the same geographical area. However, apparently no browser I know can handle a canvas with over 50 million pixels. They all just crash and it’s kind of understandable. To counter that I restricted zooming so a user could only zoom to a certain level. But that sucks, too. What I wanted was a canvas that should not grow too big, but the user should still be able to drag the map around without seeing edges of the canvas where there shouldn’t be any. So some not so pretty calculations later and the dimensions of the canvas are now three times bigger than the respective dimensions of the map and the position of the canvas will adapt when a user drags the viewport of the map over a certain threshold.

So that was the visualization part, no flash, no tile-server. What about the Public Transport map that works for every point? Any C++ written routing in the cloud? No, it’s all done in JavaScript. And takes less than two seconds to calculate. So, is this really possible? No, of course not, I fake a lot: I don’t use actual timetables, but only estimates, don’t care about buildings when walking from/to a station and sometimes I am really just guessing a number. But it turns out that this approach still works good enough.

Details? Mapnificent basically knows the location of every station in Berlin, the lines that connect them, the time between two adjacent stations on a line and the time interval of the line. When a user clicks on the map, Mapnificent finds the next couple of stations, calculates the time to walk to them and then takes all the lines from these stations and every other reached station. When switching lines it “waits” half the interval of the next line (or currently six minutes if the line interval is unknown). Mapnificent remembers the time it took to get to a station and if you reach that station with a time less than before, it overrides it and keeps going. If it reaches a station where another line got faster, it stops. Really simple recursion. The outcome is a list of minutes it takes to get to every station. Mapnificent then simply draws a growing circle from every station as soon as this station is reached.

I can’t prove it, but it looks good and by checking some known routes I would say the margin of error is generally at most +-4 minutes which I think is still acceptable if you just want to get the big picture. And Mapnificent is just for the big picture anyways. If you want the real thing, you still need to ask an online trip planner. And if there is S-Bahn chaos in Berlin again, you can’t even trust the trip planner.

Since I have not used Mapumental myself, I can’t really say how it compares to Mapnificent. Mapnificent just works for Berlin whereas Mapumental apparently works for the whole UK. Mapumental is probably really accurate with actual timetables whereas Mapnificent uses a good basis of public transport information and some guesstimates. Mapumental looks really awesome, it needs some time to calculate a route if not cached (10 to 30 seconds according to their website), but the interface is really fast (flash + rendered tiles). Mapnificent looks awesome too (with some flaws here and there), calculates routes faster (but less correct) and the interface is fast enough, but the speed depends on the browser, JavaScript engine, implementation of canvas, number of active layers and zoom level. Currenlty the Mapnificent interface won’t really scale with a lot of active layers (that’s why you can deactivate them).

However, the point I want to make is that the setup of Mapumental is apparently quite complex, expensive (server side routing, rendering, caching) and generally hardcore while Mapnificent is just JavaScript and Canvas running in the browser and performs the same task with comparable results.

JavaScript also makes it really easy to open up the framework for addional layers. Users can add their own layer by just writing a couple of lines of JavaScript and providing a JSONP data source. Have a look at the Mapnificent documentation for details.

I want to thank the creators of Mapumental for their great and really inspiring work.

Update:

I got an invite to Mapumental’s private beta! First hurdle for a non-UK citizen, how the hell do UK postcodes work? You need to enter one of the location you want to arrive at. They give examples, but I wanted to arrive in London and it’s really hard to guess a UK postcode right. First impression when I finally found one on the net: Looks nice, tile server is slow (obviously they can’t compare to Google Maps, but I guess they are working on it) and the Flash overlay lags a bit when dragged.

But all that aside, it is really impressive: you can actually travel the whole UK! I knew that before, but when I saw it, I understood the amount of work behind it. Mapnificent has only 2213 stations for Berlin, that’s 3% of what Mapumental is providing. It’d be really interesting to scale Mapnificent up to this magnitude (or is it mapnitude?). It’s relieving that they are actually also just drawing circles from stations and that their travel patterns look similar: it means I got it quite right. Well, let’s see where we go from here.