What the Obama IT team teaches us about polyglot programming
4Gb/s, 10k requests per second, 2,000 nodes, 3 datacenters, 180TB and 8.5 billion requests. Design, deploy, dismantle in 583 days to elect the President. #madops – Scott VanDenPlas, Obama for America IT team
From the standpoint of tech geeks like myself, the IT operation behind Obama for America was extremely impressive. Why? Because the operation managed to be all of the following at the same time: 100% cloud-based, deeply polyglot, poly-framework, poly-DB, and, perhaps most importantly, unbelievably agile and time- and resource-constrained. And when you consider that they were passing around terabytes of data across over 100 applications… Yeah, this was inspiring.
There have been a variety of articles on Obama’s IT operation, many of them very engaging and informative, but none of them, I feel, have covered some of thing things I want to know: what is the broader significance of an operation like this? What does it tell us about the cloud and the promise it offers (not to mention the potential pitfalls)? About polyglot programming? About the state of web development going into 2013?
So I decided to do some digging to answer my own questions and share my thoughts with all of you.
The Obama stack: a (more or less) merry medley of components
Obama for America’s web dev stack, according to CTO and minor celebrity Harper Reed, consisted of over 100 different apps performing a bewildering array of different functions. I won’t list all of these apps here, of course, but I will shine some light on some of the more important components of the Obama team’s tech arsenal.
I’ll begin with social media. Now, any politician with a brain nowadays has a Facebook page and probably a Twitter account and maybe even a presence on a handful of other platforms. This is simply de rigeur.
But the Obama team did something pretty impressive in their social media tactics: they simply built their own social media platform, Dashboard. This platform acted as a sort of decentralized online field office, complete with message boards and other communication channels for local organizing. Users could organize their own teams, invite people to sign up, upload pictures and customize their Facebook-esque home pages, and so on. And the Call Tool built into Dashboard gave people (semi-)randomized phone numbers to call before and on Election Day. The interface was clean and aesthetically pleasing. Dashboard was a real triumph.
A presence in the social media space is essential to modern campaigns, and now crowdsourcing has to be considered standard as well. The Obama team built a crowdsourcing platform to act as a “voter incident tracker” on Election Day which enabled volunteers to report incidents such as illegal electioneering or improper ballot handling. The team opted to use Ushahidi (written in PHP), originally built to work seamlessly across Facebook, Twitter, email, SMS, etc., as the basis for their platform.
Alongside Dashboard and the Ushahidi crowdsourcing tool were a vast variety of other apps, for example Dreamcatcher, which helped target voters on the basis of sentiments expressed on a variety of platforms. For the sake of brevity I won’t describe all of them here, because it’s far more important that I introduce the most important element of the entire Obama tech stack: the Narwhal data layer.
Narwhal: demonstrating the power of abstraction
Narwhal is probably the most impressive piece of the puzzle. It is a tool charged with “linking separate depositories of information to build a more complete picture of the potential voter” and then exposing that information via API to the more than 100 applications built by the team. Basically, it acted the way that the Twitter, Facebook, Twilio, and other APIs operate with respect to the app ecosystems built around them.
The primary goal of Narwhal was essentially to secure the scalability of the entire system depending on it. If you’re managing a system of 100+ apps, having an ultra-reliable, API-driven Single Point of Truth tying the loose strands together is simply beyond necessary. Narwhal clearly rose to the challenge. The end result is that services like the Dashboard (and all the others) never spoke to any database whatsoever. They never made SQL queries or put key/value pairs in a NoSQL database. The 100+ apps in the stack only ever spoke to the Narwhal API.
Well executed APIs are such a powerful thing because they enable the construction of “aggregated organic” systems of this sort. They provide a kind of lingua franca within a system that enables all kinds of “citizens” to seamlessly interact. With a RESTful API-driven platform as solid as Narwhal, suddenly Ushahidi and Rails and Node.js and Java and many others can play quite nicely with one another. Apps can be scaled one by one, in isolation from one another and from the data layer, which constitutes a major agility win.
Another impressive aspect of Narwhal is that it relied on a variety of database systems. The team began with a SQL-based data retrieval system that was to be used by all applications. But they ultimately decided to separated applications and data with a single, common abstraction layer. The bulk of the team’s database needs were filled by Amazon RDS. RDS’s “snapshot” feature enabled the team to perform large data dumps into Amazon’s S3 file store.
According to multiple articles, if the campaign had gone on longer, the team would have had to switch to their own large EC2 instances running MySQL. From the beginning, they really strained the limits of RDS. But staying on RDS meant significant cost savings, as hiring database administrators to manage such a huge and complex transition was safely avoided.
The team also relied on Amazon’s SQS (Simple Queue Service) as the basis for distributed systems-style architecting. Narwhal pulled in streams of data from a variety of sources and passed the data along to a vast constellation of worker applications. Quite often, that data was then passed into SQS queues for further processing, to be made available via a variety of APIs. SQS was also used to process emails in conjunction with RDS and S3 storage. The team even experimented with Amazon’s DynamoDB when it was introduced. It ended up being used in some of the campaign’s get-out-the-vote efforts. It will indeed be interesting to see if key/value stores will become core technological components in future campaign operations.
With all of these elements tied together, the team had assembled a powerful anchor for the campaign’s digital effort. While Narwhal entailed a lot of work up front (not least of all because of the sheer quantity of data that the designers knew the system would ultimately have to be able to manage), the payoff in terms of agility, according to multiple members of the campaign, was enormous. At its inception, the idea of having a single interface and shared data store for everything probably seemed like a promising but frightening approach. But it paid off because it made every bit of the campaign’s terabytes of data more valuable.
Against all odds, the campaign managed to do all of this with only 30 minutes of downtime over the entire course of the campaign. The whole stack was even architected to resist catastrophic infrastructure failure (which we know to be an all-too-real phenomenon). That’s exactly what happened on October 21st, when AWS-US-East went down due to Hurricane Sandy, just days before the election. Fortunately, the team had a failover plan in place to move everything to AWS-US-West in case precisely that scenario transpired.
In the digital landscape as we know it today, this kind of approach–polyglot, distributed, fault-tolerant, cloud-based–is the Holy Grail. Monoglots beware.
Polyglot by necessity
The Obama IT team could have opted to go monoglot from the outset. They could have chosen Ruby or Python or PHP or whatever and made their entire technology stack conform to that principle. Database clients, API proxies, crowdsourcing tools, social media platforms, etc., all in one language. In the abstract, there really is something compelling about a vision like that. The people you hire don’t have to be as flexible; getting things to play nicely in the same sandbox is easier.
But knowing what we know about the operation, a monoglot approach would have been absolute MADNESS. By going polyglot, the team was able to capture what I would call comparative advantage gains. If you survey the entire web dev space, you see a bewildering array of tools, each with their own strengths and weaknesses. Most of the time, picking the absolute best tool for a specific task means drawing upon multiple runtime communities. The Ruby community simply hasn’t come up with a crowdsourcing tool as powerful as Ushahidi. Ruby on Rails, however, strikes me as the best possible way to build a platform like Dashboard in a short period of time. This is to give one example among many.
Going polyglot means never being forced to say “no” just because something–or someone–lies outside of the walled garden of a language community. And so yet another important advantage for the Obama team of building a deeply polyglot architecture is that the team was able to reap personnel gains. Team members came from a vast variety of tech companies, from Twitter to Google to Facebook to Quora to Orbitz to small startups and beyond. When you’re simply looking for the best and brightest, you often don’t have a lot of latitude to simply say “we’re using X, so get used to it.”
Note: I relied on articles from Mother Jones, Ars Technica (this article as well as this one), The Atlantic, and Current for almost of the information contained here.