It's often said that XML is very verbose and therefore JSON is better. I wanted to challenge that assumption and find the smallest way to represent any JSON value using XML.

Constants
true
<t/> 0
false
<f/> - 1
null
<l/> 0
Number
NaN
<n/> + 1
123.45
<n>123.45</n>
+ 6
String

""
<s/> + 2
"Abcd"
<s>Abcd</s>
+ 4
Array
[]
<a/> + 2
[1, "two", false]
<a>
  <n>1</n>
  <s>two</s>
  <f></f>
</a>
+ 4 - n
Object
{}
<o/> + 2
{
  "first": 1,
  "second": "two",
  "third": false
}
<o>
  <n k="first">1</n>
  <s k="second">two</s>
  <f k="third"></f>
</o>
+ 5 + n

As you can see, the XML counter part is a bit more verbose and less readable because of the way syntax highlighting is setup. However, while it is bigger, it isn't out of proportion bigger. The structure can be at most twice as big.

Implementation

In order to implement it, I decided to go with the same API as JSON:

  • XSON.stringify(object, formatter, space)
  • XSON.parse(string)

You can play with it on this current page or can check it out on GitHub.

The implementation was more straightforward than I expected thanks to the fact that there's a XML Parser inside browsers. However, I had to deal with nasty encoding issues πŸ™

String encoding

There are some characters such as < and \0 that we want to escape because otherwise they are likely to be problematic while parsing the XML. The way to encode those characters in XML is to use the &#number; notation where number is the character code. For example a is represented by a:

> new DOMParser().parseFromString('<a>&#97;</a>', 'text/xml')
    .getElementsByTagName('a')[0].textContent
"a"

Unfortunately, you cannot express all the characters with this notation. The XML specification introduces Restricted Characters in the following ranges: [#x1-#x8], [#xB-#xC], [#xE-#x1F], [#x7F-#x84] and [#x86-#x9F]. When you try to read those characters, then the XML parser generates an error.

> new DOMParser().parseFromString('<a>&#0;</a>', 'text/xml')
    .getElementsByTagName('a')[0].textContent
"error on line 1 at column 8: xmlParseCharRef: invalid xmlChar value 0"

Instead of fighting with the XML spec, I decided to use my own encoding. I replace the character by \u0000. Where the number is an hexadecimal representation of the number padding so it has exactly four digits.

To do that, we need first to escape all the \ and can use a regex to do it in few lines of code πŸ™‚

function encode(str) {
   return str
     .replace(/\\/g, '\\\\')
     .replace(/[\u0000-\u0008\u000b-\u001f&<>"\n\t]/g, function(c) {
       var hex = c.charCodeAt(0).toString(16);
       while (hex.length < 4) {
         hex = '0' + hex;
       }
       return '\\u' + hex;
     });
 }

Then, in order to decode it, we do the opposite: we first decode all the unicode characters and remove the escapes. In order to make sure that the unicode character was not escape, I'm using a small trick. You can count the number of \. If it's an even number, then it is not escaped, otherwise it is escaped!

function decode(str) {
  return str
    .replace(/(\\*)\\u([0-9a-f]{4})/g, function(match, backslash, n) {
      if (backslash.length % 2 !== 0) {
        return match;
      }
      return backslash + String.fromCharCode(parseInt(n, 16));
    })
    .replace(/\\\\/g, '\\');
}
If you liked this article, you might be interested in my Twitter feed as well.
 
  • http://www.facebook.com/warren.seine Warren Seine

    You should include a comparison of a large file, including gzipped sizes. In the end, it's basically the same size.

    I never really understood what made JSON so successful. I really don't care which one I use, but I'm still wondering why we needed to renew our default exchange format. Parsing is hard? Maybe encoding issues, like the ones you show? Readability (meh…)?

  • Martin Davis

    @Warren:

    There's a few reasons why people prefer JSON to XML:

    - clearer syntax. The approach in the post is a very pragmatic approach to defining an equivalent encoding, but the JSON is just more readable.

    - more powerful semantics. XML doesn't have arrays or typed constants. Period. So it needs workarounds to provide this. JSON has them built in.

    - ecosystem. This follows from the last point - because there are more constructs native to JSON, the JSON ecosystem handles them all natively. Thus it provides more functionaliy "out of the box" than XML does. And of course a huge part of the JSON ecosytem is Javascript, which gives a powerful language with JSON fully embedded.

    There's lots I don't like about JSON (#1 being the lack of a standard schema language). But purely pragmatically speaking, it's just a lot easier to work with in the Web ecosystem. Which like it or not is taking over a good chunk of the development world.

  • http://christopher.lord.ac clord

    I don't think it's accurate to say the semantics (and ecosystem) of JSON is more powerful than XML (of all things.) First of all, the point of JSON is to be restricted and low power. Second, XML is very (extremely) powerful, with namespaces, query languages, transform languages, attributes and *arrays* of children nodes.

  • http://christopher.lord.ac clord

    The forced schema of objects + arrays is probably the main thing going for JSON. It's simpler than most other encodings and maps cleanly to language constructs, and hence has a lower cognitive load for beginner/intermediate developers. The ease of using it is probably another factor. Also, XML has several problems attributable to its complexity.

 

Related Posts

  • September 22, 2011 URLON: URL Object Notation (43)
    #json, #urlon, #rison { width: 100%; font-size: 12px; padding: 5px; height: 18px; color: #560061; } I am in the process of rewriting MMO-Champion Tables and I want a generic way to manage the hash part of the URL (#table__search_results_item=4%3A-slot). I no longer […]
  • August 20, 2011 Idea – mouseFreeze – A solution for Browser FPS Games (8)
    There is an open problem in porting real game into the web browser related to cursor handling. Problem Many games such as First-Person Shooters require the mouse to freely move, without the constraints of screen edges. However there is no such API in the browser to make this […]
  • March 6, 2012 Github Oauth Login – Browser-Side (15)
    I'm working on an application in the browser that lets you take notes. I don't want to have the burden to save them on my own server therefore I want to use Github Gists as storage. The challenge is to be able to communicate with the Github API 100% inside the browser. Since it is a […]
  • September 11, 2011 World of Warcraft HTML Tooltip Diff (0)
    MMO-Champion is a World of Warcraft news website. When a new patch is released, we want to show what has changed in the game (Post Example). An english summary of each spell change is hand written, but we want to show the exact tooltip changes. jsHTMLDiff is available on […]
  • March 19, 2012 MMO-Champion Miscellaneous Work (0)
    WoWDB Design I was the only active developper on db.mmo-champion.com and since I was no longer working at Curse, they decided to restart a database project, WoWDB.com, on the shiny Cobalt platform that powers SWOTR, Aion and Rift databases. The release of Mist of Pandaria beta […]