Monthly Archives: May 2013

Crunching subway data- a New Yorker’s busiest stations

There are many reasons to complain about the subway system here in New York. It’s underfunded, the air conditioning breaks, and if you’ve ever tried relying on the G line you’ve probably ended up with a deep, serious commitment-phobia. But there are many bright spots of the subway system, and as a tech-head developer I’d like to draw your attention to one in particular- data.

The MTA makes a ton of data available. The entire subway and bus system are available as GTFS feeds, allowing you to set up your own instance of OpenTripPlanner for all your subway routing needs- something I used in the aftermath of Sandy to set up an emergency trip planner (and OpenPlans then used to create some great heatmaps). It has data on each and every Metrocard swipe in the city, and, er, Pantone colours for each of the subway lines. It also has a crazy amount of data on each subway turnstile in the city, which is what I’ve been playing around with lately.

The most popular stations in New York are already known- the MTA themselves has them listed. Unsurprisingly, Times Square tops the list by a large margin, with every other large station following closely behind. They don’t share how they came to that number, but I assume that it includes everyone who travels on the subway- commuters, tourists, even the Mariachi guys traveling from train to train. For a heatmap side project I’m currently working on, I wanted to know what the most popular weekday, commuter stations are.

So I downloaded some turnstile data from mid-January to late April of this year to use as my sample set. The format the MTA uses is… weird, to say the least. The basics are there- it doesn’t record every turnstile turn, but rather keeps cumulative totals during the day- 3am, 7am, 12pm, and so on. For some reason it has eight repeating sets of columns that should really be rows, so I threw together a quick and dirty node.js script to flatten these out (and merge all my CSV files into one) and imported the data into a Postgres database.

First off, I needed to get my commuter totals. I did this by creating a view that ran an exceptionally messy SELECT statement which selected the first row of exit data available after or on 3am, then matched it up with the first result after 11am- while excluding all weekend results:

As I said, awful SQL. If anyone has any suggestions for improvements I’d love them. But it worked. For each turnstile, I now have the number of exits taken during the morning rush hour(…ish). Unfortunately, I quickly realised that I’d need to clean the data up- it appears that at certain points, turnstiles just go absolutely haywire and you end up with -200000 exits on one day, which can really mess with your totals. I discovered that I could easily chop this data out just by calculating how far that day’s result deviated from the turnstile’s overall median. The anomalous results were so different that I could set the cut off point at 10x the median and still exclude them.

With that done, it was only a short step to aggregate the data up to the station level, and discover the most popular commuter stations, or, The Stations With The Most Turnstile Exits During Peak-ish Hours on Weekdays:

  1. 14th St Union Square
  2. 42nd St Grand Central
  3. 42nd St Times Square
  4. 34th St Penn Station
  5. Fulton St
  6. 47-50th St Rockefeller Plaza
  7. 34th St Herald Square
  8. 23rd St (6)
  9. Chambers St
  10. 59th St – Columbus

So while many of the results are similar to the overall station popularity, there are some definite differences- Union Square jumping to the top being one of the most noticeable. Be careful not to take too much out of these numbers- as I said, it’s based on a limited dataset of a few months. And I’d welcome any corrections on my working from people smarter than myself!

Customizing your iOS webapp icon- per user

I threw together a little mobile subway-themed webapp hack last weekend, called Subwalkway. I had to make a quick icon for it, and my immediate thought was to make it look like an NYC subway route sign. Luckily for me, the W line was decommissioned a few years ago, so it’s free for me to steal. But what colour to use? Or should I make some hideous beach ball of all of them? No- that’ll just remind me how slow my Macbook is these days. But I got to thinking- maybe I don’t need to choose. The icon is specified with a <link/> tag after all, why don’t I just randomise it? So I did:

Gotta catch 'em all!

The logic is very simple/stupid. On page load, I run a Math.random(), and use that number to choose one of the items in an array of file names. Set the <link rel=”apple-touch-icon-precomposed” /> field, and we’re done.

Obviously my example is a little pointless, but it serves as a proof of concept- iOS will respect whatever icon metadata changes you make after page load. So, there are some more reasonable applications out there- if I expanded the app to Boston, say, I could modify my app icon to better fit the Boston T style. Or if your app works in numerous countries/cities, you could make an Apple Maps-style icon- only with local landmarks.

Dear Google, let’s talk about webapps.

screenshotI threw together a little hack project last weekend, called Subwalkway. It’s mobile-only, and it’s a bit of a mess- part UI experiment, part subway navigation tool, and rough around the edges. But if you’re on an iPhone, try adding it to your home screen. It looks right, doesn’t it? An icon (with an easter egg!) that seamlessly blends with the phone interface, a splash screen when you launch it, and no navigation chrome when you’re using it. If you squint a little,  you could almost imagine that it was a native app.

Now try doing that on an Android phone. Actually, don’t bother, I’ll save you the effort- it does precisely none of these things. If you’re using Google’s Chrome browser you can’t even add a site to your home screen*. So, today I ask: Google, what the hell? From email to entire office suites, you’ve spent years trying to convince us that the web is the future for software- you even went as far as to create an entire OS based around it. We could be making responsive webapps that work great on ChromeOS and adapt to being perfect first-class citizens in Android- if you let us. Instead we’re forced to provide sub par in-browser experiences, or wrap our apps up in a clunky WebView frame and lose all of the performance and automatic updating that HTML5 can provide.

Screen Shot 2013-05-10 at 10.34.07 AMOther phone manufacturers are catching up to the game- even though WebOS is dearly departed, Blackberry 10 lets us make HTML-based apps. Firefox OS makes them a first class citizen. Even Microsoft allows you to write apps in JavaScript. Why has it been left to Apple- who have every incentive to trap me in the sickly embrace of their Objective C App Store- to be the pioneer in webapps?

I want to make cross platform apps using HTML technologies, and I want to make them great. How can it be that Google is the one standing in the way of me achieving that?

(Hat tip to Peter Nixey, whose blog post title I shamelessly ripped off)

* As pointed out in the comments, it actually is possible. But it’s little wonder that I never discovered it, given the steps required.