Why JSON will continue to push XML out of the picture

The world’s digital infrastructure is currently characterized by a plethora of data interchange formats. It’s not the least bit surprising that such a multiplicity undergirds things at the moment. The internet is scarcely a generation old, while the “Internet of Things” and “Big Data” more closely resemble regulative ideals than realities. But I nonetheless believe that there are strong, discernible historical tendencies currently at work in this field, tendencies that strongly favor JSON over others.

Ten years ago, XML was the primary data interchange format. When it came on the scene, it was a breath of fresh air and a vast improvement over the truly appalling SGML (Standard Generalized Markup Language). It enabled people to do previously unthinkable things, like exchange Microsoft Office documents across HTTP connections. With all the dissatisfaction surrounding XML, it’s easy to forget just how crucial it was in the evolution of the web in its capacity as a “Swiss Army Knife of the internet.”

But it’s no secret that in the last few years, a bold transformation has been afoot in the world of data interchange. The more lightweight, bandwidth-non-intensive JSON (JavaScript Object Notation) has emerged not just as an alternative to XML, but rather as a potential full-blown successor. A variety of historical forces are now converging and conspiring to render XML less and less relevant and to crown JSON as the privileged data format of the global digital architecture of the future. I think that the only question is how near that future is.

I strongly believe that this transformation can be attributed to four broad trends, which I’ll discuss in turn:

1. APIs (application programming interfaces)
2. Big Data
3. The Internet of Things
4. Full-stack JavaScript

1. APIs

Like it or not, today’s web landscape remains heavily siloed in a lot of crucial respects. There’s tons of information out there that you will never, ever be privy to (and this extends beyond things like authentication information that should be secret in principle). But beginning with companies like eBay in the mid-oughts, APIs have come along as a kind of de-siloing force.

This has created a scenario in which organizations like Twitter, Facebook, LinkedIn, and millions of others (a) essentially offer information-based services in exchange for data, and (b) increasingly have an interest in opening up a wide variety of information to third parties. A lot of that data never sees the light of day (hence the silo metaphor). I’ll have a lot more to say about APIs next week, but suffice it to say for now that they’re a force to be reckoned with, and changes in that space leave a mark on the rest of the web.

There isn’t a lot of hard data on XML vs. JSON usage in APIs, but sources like Programmable Web and others strongly suggest that XML is still a major player in the world of APIs, but that JSON’s star is rising fast. Twitter’s API went JSON-only almost two years ago. Foursquare has followed suit.

Scott Gilbertson agrees with my general assessment: “When it comes to the data APIs serve up, XML is still the most used format, but JSON is hot on its heels and growing much faster. Even though there are still more XML APIs, the more recent the API, the more likely it’s serving JSON. In many cases…companies are rapidly moving from XML to JSON.” Gilbertson was making this argument more than a year ago, but there’s little to suggest that his argument is any less valid today.

In short: no longer simply a cool thing to have or an appendage to a web service, APIs have become, in Gilbertson’s words, a “first class citizen of the web.” On top of that, REST is replacing SOAP as a data transfer protocol. XML is not REST-incompatible, of course, but if SOAP continues its precipitous decline, then XML usage will shrink along with it.

2. Big Data

The rise of JSON as a key player in database technologies is another bad portent for XML. As it stands, Big Data does not have a preferred data interchange format per se. But the claim that I’d like to make about Big Data and JSON is a bit more specific. What I’d like to argue is that JSON is emerging as a preferred format in web-centric, so-called “NoSQL” databases. These are databases that are (a) intended to accommodate massive scalability, (b) designed to deal with data that often does not seamlessly conform to a columnar/relational model, and (c) to be web-oriented at their very core.

The most well-known examples of databases of this sort are MongoDB, CouchDB, and Riak. All three are JSON-based, horizontally scalable, and deeply web-driven.

Other examples abound: the architecture of Amazon’s DynamoDB is entirely REST/JSON. Neo4J, a graph database that really confounds a lot of our thinking about what databases are all about, has a REST/JSON API, with no corresponding XML support. HBase‘s REST architecture currently supports XML, but that support is on the way to deprecation.

For some time now, it has been possible, via various means, to feed queries into MySQL and get JSON back (there are plenty of ways to do this, but MySQL 4.1’s AS json command is surely the most handy). The same goes for Postgres and other columnar databases. But MySQL, Postgres, and the others were not constructed with JSON as a fundamental building block.

For Postgres, this will soon be changing. In version 9.2, Postgres will be adding support for a JSON data type, which will “allow for hybrid document-relational databases which can store JSON documents, and JSON functions which convert array and row data into JSON” (quoted from this article). Although Postgres has had an XML data type for some time, this change strikes me as a not-so-subtle acknowledgment of the rising importance of JSON.

There are a few databases out there that are XML-based (such as MarkLogic), but there isn’t any movement in this sphere analogous to what we’re seeing involving the rapid adoption of JSON-based storage models.

3. The Internet of Things

Movements in this sphere are more difficult to discern than in the other spheres I’ve mentioned. The Internet of Things remains an idea, albeit a powerful one. It remains far too unrealized to be able to make claims about ideal or even preferred data formats. The internet is basically a whole bunch of computers hooked up to a handful of things.

But it deserves mention that JSON has begun to establish a toehold in this realm. There’s a library for using JSON on the Arduino. It is argued in the book “Architecting the Internet of Things” (p. 102) that “JSON is better adapted [than XML] to devices with limited capabilities such as smart things. Furthermore, it can be parsed to JavaScript objects. This makes it an ideal candidate for integration into Web Mashups.” You can construct LED gauges running on JSON. Your next thermostat might run on JSON.

We haven’t yet reached a point where you can look around at a densely interconnected world of objects and almost feel the JSON coursing through the air. But who knows?

4. Full-stack JavaScript

In addition to the three forces mentioned above, there is one more deserving brief mention: JavaScript is the new hotness and that probably won’t change anytime soon. node.js has gone mainstream and the community surrounding it is rabidly productive, new client-side JavaScript libraries are coming along every single day, JavaScript is already the lingua franca of the web, etc. To say that the people involved in this growing branch of the web dev world prefer JSON to XML is more than a wee bit of an understatement.

Sure, there’s an XML parser for node, but it’s largely geared toward dealing with legacy XML-based endpoints. The fact remains that if you’re doing top-to-bottom, full-stack JavaScript, using anything besides JSON is borderline silly. And full-stack JavaScript has already gone mainstream.

One way or another, the future is bright for JSON

It would be quite surprising if the above-mentioned tendencies had nothing to do with JSON itself. Many have argued that JSON is better because it’s less “verbose” than XML, and more readily intelligible to humans than formats like pure binary.

These factors have certainly helped JSON, but our developer Matthew Lyon has a far more compelling argument. He believes JSON’s rise has to do with the fact that JSON possesses a very limited set of data types. It’s essentially restricted to null, Booleans, numerics, strings, arrays, and dictionaries. It doesn’t even have a Date data type. JSON is thus not only generally less verbose than XML: it is more parsimonious in its use of data types.Restricting itself to primitive data types makes JSON deeply and immediately interoperable with pretty much any programming language that exists out there (in fact, the list of languages on JSON’s main page is frankly staggering).

Overall, my claim isn’t really as audacious as it might seem at first. It basically has two components: (1) in order to have a global digital infrastructure, you need have pervasive data interchange formats to knit everything together and establish intelligibility across nodes; and (2) there are good reasons to think that JSON will someday hold a privileged position in our digital architecture. Our expectations and skill sets should register this change and adjust accordingly.

Postscript: According to AppFog dev Troy Howard, we should all just be using YAML, which is a superset of JSON. That particular discussion is going to have to wait for another day!

Share this post
Facebook Twitter Google
Try AppFog: The new PaaS Hackers love
  • http://danpalmer.me Dan Palmer

    As a web developer I love using JSON. It’s easy to output and requires considerably less effort to use. However as an app developer, error handling on the formatting of it is very tiresome, and for big data, parsing it can actually be quite slow.

    For many developers, more guarantees on data integrity and format are needed and so XML schemas are invaluable for ensuring correctness of software. For example, I would want my bank to be using properly validated XML in their transaction APIs.

    Speed is the other issue. XML parsers are a very well researched and perfected area, options are generally fast and mature. On the other hand JSON parsing isn’t quite at the same level yet. You can even buy hardware XML decoders, although these are very specialist and not in common use.

    JSON is a great data format for many web APIs, but it’s not suitable for everything, and XML still has it’s uses. All well designed, mature, publicly accessible APIs should support both formats.

  • http://danpalmer.me Dan Palmer

    As a web developer I love using JSON. It’s easy to output and requires considerably less effort to use. However as an app developer, error handling on the formatting of it is very tiresome, and for big data, parsing it can actually be quite slow.

    For many developers, more guarantees on data integrity and format are needed and so XML schemas are invaluable for ensuring correctness of software. For example, I would want my bank to be using properly validated XML in their transaction APIs.

    Speed is the other issue. XML parsers are a very well researched and perfected area, options are generally fast and mature. On the other hand JSON parsing isn’t quite at the same level yet. You can even buy hardware XML decoders, although these are very specialist and not in common use.

    JSON is a great data format for many web APIs, but it’s not suitable for everything, and XML still has it’s uses. All well designed, mature, publicly accessible APIs should support both formats.

  • Anonymous

    “When it came on the scene, it was a breath of fresh air and a vast improvement over the truly appalling SGML”

    Can you elaborate on why you think this? I’ve used both, and for documents I prefer SGML.

  • Hogart

    I just want XML to stay where it’s most useful, and JSON to stay where it’s most useful. No need to evangelize them outside their niches like they’re revolutionary, because that will only cause JSON et al to go the way of XML – used too much for the wrong things until it’s stuck with that stigma.

  • Frank

    You really should do something about your background colour or your link colour.

    • Frank

      It seems like there was just something wrong when I first loaded the page. Sorry.

    • Frank

      It seems like there was just something wrong when I first loaded the page. Sorry.

  • Roger

    I agree with all your main points, but I think that you are confusing *XML* with the *abuse of XML*. XML as the hammer and any data that needed to be exchanged were the nails.

  • Anonymous

    You obviously are confused about the different uses of each and are trying to shoehorn one into the other. I would hate to have everything stored in my text files as JSON when trying to convert between multiple formats of documents. I would also prefer my data sent from the browser to the server in JSON unless it was a few text strings in which case I’d prefer XML. If it weren’t for search engines, I would far prefer serving XML to browsers and let XSL transform them than do anything with JSON.

  • Emre

    I love to use json but Some timed i need CDATA

  • Anonymous

    JSON is a really lean data set, it will continue to dominate. I wrote an article outlining the three things why I think you should use JSON, take a look :)

  • piyush

    i have a question regarding json . does json is able to store images or it is only used for text and boolean values… i m really confused because if json worked as database then it should have to store it..

    • Peter Drinnan

      My company is using MongoDB to store binary blobs, so I guess the answer would be yes.

  • ijabz

    So basically your saying json is starting to get the tools that Xml has had for years.how does that make it better ? The only point i agree with is the last one that it is a more natural ft if using node.js

  • David Webber

    Out today the new CAM editor v3.0 release – allows you to use JSON or XML alongside each other. Pick which ever format suits your application needs, no need to sacrifice one for the other. Provides full validation services using the CAMV validation rules engine and Open Data API creation services – see http://www.cameditor.org for more details.

  • KP

    try this http://codebeautify.org/
    its good to json viewer as well as xml viewer with convter

  • krunal lakhatariya

    if you want to beautify your current code or want to show in propoer form then you can visit this site to make better code ..

    http://codebeautify.org/view/jsonviewer

  • 1234

    1234

  • Clemens

    If you need a JSON editor on Windows take a look at JSONBuddy http://www.xml-tools.com/download.htm

    It supports syntax-coloring, auto-completion, well-formed checking and validation of JSON data. You can also convert CSV input to JSON, XML to JSON and do JSON Pointer evaluation.

  • Eduardo

    I see that JSON as a light format for the internet. The main benefit is for mobile applications and external interfaces. In my opinion is just a external interface and only that. But move and save everything in JSON format is criminal, because you need to waste time reprocesing. Imagine that! We have bit stable relational databases! Why do we need to move everything to JSON ? – JSON is only a light way to show data, only that!

  • Pingback: Why JSON will continue to push XML out of the picture | skpachauri()

Powered by Olark