Why JSON will continue to push XML out of the picture
The world’s digital infrastructure is currently characterized by a plethora of data interchange formats. It’s not the least bit surprising that such a multiplicity undergirds things at the moment. The internet is scarcely a generation old, while the “Internet of Things” and “Big Data” more closely resemble regulative ideals than realities. But I nonetheless believe that there are strong, discernible historical tendencies currently at work in this field, tendencies that strongly favor JSON over others.
Ten years ago, XML was the primary data interchange format. When it came on the scene, it was a breath of fresh air and a vast improvement over the truly appalling SGML (Standard Generalized Markup Language). It enabled people to do previously unthinkable things, like exchange Microsoft Office documents across HTTP connections. With all the dissatisfaction surrounding XML, it’s easy to forget just how crucial it was in the evolution of the web in its capacity as a “Swiss Army Knife of the internet.”
I strongly believe that this transformation can be attributed to four broad trends, which I’ll discuss in turn:
1. APIs (application programming interfaces)
2. Big Data
3. The Internet of Things
Like it or not, today’s web landscape remains heavily siloed in a lot of crucial respects. There’s tons of information out there that you will never, ever be privy to (and this extends beyond things like authentication information that should be secret in principle). But beginning with companies like eBay in the mid-oughts, APIs have come along as a kind of de-siloing force.
This has created a scenario in which organizations like Twitter, Facebook, LinkedIn, and millions of others (a) essentially offer information-based services in exchange for data, and (b) increasingly have an interest in opening up a wide variety of information to third parties. A lot of that data never sees the light of day (hence the silo metaphor). I’ll have a lot more to say about APIs next week, but suffice it to say for now that they’re a force to be reckoned with, and changes in that space leave a mark on the rest of the web.
There isn’t a lot of hard data on XML vs. JSON usage in APIs, but sources like Programmable Web and others strongly suggest that XML is still a major player in the world of APIs, but that JSON’s star is rising fast. Twitter’s API went JSON-only almost two years ago. Foursquare has followed suit.
Scott Gilbertson agrees with my general assessment: “When it comes to the data APIs serve up, XML is still the most used format, but JSON is hot on its heels and growing much faster. Even though there are still more XML APIs, the more recent the API, the more likely it’s serving JSON. In many cases…companies are rapidly moving from XML to JSON.” Gilbertson was making this argument more than a year ago, but there’s little to suggest that his argument is any less valid today.
In short: no longer simply a cool thing to have or an appendage to a web service, APIs have become, in Gilbertson’s words, a “first class citizen of the web.” On top of that, REST is replacing SOAP as a data transfer protocol. XML is not REST-incompatible, of course, but if SOAP continues its precipitous decline, then XML usage will shrink along with it.
2. Big Data
The rise of JSON as a key player in database technologies is another bad portent for XML. As it stands, Big Data does not have a preferred data interchange format per se. But the claim that I’d like to make about Big Data and JSON is a bit more specific. What I’d like to argue is that JSON is emerging as a preferred format in web-centric, so-called “NoSQL” databases. These are databases that are (a) intended to accommodate massive scalability, (b) designed to deal with data that often does not seamlessly conform to a columnar/relational model, and (c) to be web-oriented at their very core.
The most well-known examples of databases of this sort are MongoDB, CouchDB, and Riak. All three are JSON-based, horizontally scalable, and deeply web-driven.
Other examples abound: the architecture of Amazon’s DynamoDB is entirely REST/JSON. Neo4J, a graph database that really confounds a lot of our thinking about what databases are all about, has a REST/JSON API, with no corresponding XML support. HBase‘s REST architecture currently supports XML, but that support is on the way to deprecation.
For some time now, it has been possible, via various means, to feed queries into MySQL and get JSON back (there are plenty of ways to do this, but MySQL 4.1’s AS json command is surely the most handy). The same goes for Postgres and other columnar databases. But MySQL, Postgres, and the others were not constructed with JSON as a fundamental building block.
For Postgres, this will soon be changing. In version 9.2, Postgres will be adding support for a JSON data type, which will “allow for hybrid document-relational databases which can store JSON documents, and JSON functions which convert array and row data into JSON” (quoted from this article). Although Postgres has had an XML data type for some time, this change strikes me as a not-so-subtle acknowledgment of the rising importance of JSON.
There are a few databases out there that are XML-based (such as MarkLogic), but there isn’t any movement in this sphere analogous to what we’re seeing involving the rapid adoption of JSON-based storage models.
3. The Internet of Things
Movements in this sphere are more difficult to discern than in the other spheres I’ve mentioned. The Internet of Things remains an idea, albeit a powerful one. It remains far too unrealized to be able to make claims about ideal or even preferred data formats. The internet is basically a whole bunch of computers hooked up to a handful of things.
We haven’t yet reached a point where you can look around at a densely interconnected world of objects and almost feel the JSON coursing through the air. But who knows?
One way or another, the future is bright for JSON
It would be quite surprising if the above-mentioned tendencies had nothing to do with JSON itself. Many have argued that JSON is better because it’s less “verbose” than XML, and more readily intelligible to humans than formats like pure binary.
These factors have certainly helped JSON, but our developer Matthew Lyon has a far more compelling argument. He believes JSON’s rise has to do with the fact that JSON possesses a very limited set of data types. It’s essentially restricted to null, Booleans, numerics, strings, arrays, and dictionaries. It doesn’t even have a Date data type. JSON is thus not only generally less verbose than XML: it is more parsimonious in its use of data types.Restricting itself to primitive data types makes JSON deeply and immediately interoperable with pretty much any programming language that exists out there (in fact, the list of languages on JSON’s main page is frankly staggering).
Overall, my claim isn’t really as audacious as it might seem at first. It basically has two components: (1) in order to have a global digital infrastructure, you need have pervasive data interchange formats to knit everything together and establish intelligibility across nodes; and (2) there are good reasons to think that JSON will someday hold a privileged position in our digital architecture. Our expectations and skill sets should register this change and adjust accordingly.
Postscript: According to AppFog dev Troy Howard, we should all just be using YAML, which is a superset of JSON. That particular discussion is going to have to wait for another day!