It's often said that XML is very verbose and therefore JSON is better. I wanted to challenge that assumption and find the smallest way to represent any JSON value using XML.

Constants
true
<t/> 0
false
<f/> - 1
null
<l/> 0
Number
NaN
<n/> + 1
123.45
<n>123.45</n>
+ 6
String

""
<s/> + 2
"Abcd"
<s>Abcd</s>
+ 4
Array
[]
<a/> + 2
[1, "two", false]
<a>
  <n>1</n>
  <s>two</s>
  <f></f>
</a>
+ 4 - n
Object
{}
<o/> + 2
{
  "first": 1,
  "second": "two",
  "third": false
}
<o>
  <n k="first">1</n>
  <s k="second">two</s>
  <f k="third"></f>
</o>
+ 5 + n

As you can see, the XML counter part is a bit more verbose and less readable because of the way syntax highlighting is setup. However, while it is bigger, it isn't out of proportion bigger. The structure can be at most twice as big.

Implementation

In order to implement it, I decided to go with the same API as JSON:

  • XSON.stringify(object, formatter, space)
  • XSON.parse(string)

You can play with it on this current page or can check it out on GitHub.

The implementation was more straightforward than I expected thanks to the fact that there's a XML Parser inside browsers. However, I had to deal with nasty encoding issues 🙁

String encoding

There are some characters such as < and \0 that we want to escape because otherwise they are likely to be problematic while parsing the XML. The way to encode those characters in XML is to use the &#number; notation where number is the character code. For example a is represented by a:

> new DOMParser().parseFromString('<a>&#97;</a>', 'text/xml')
    .getElementsByTagName('a')[0].textContent
"a"

Unfortunately, you cannot express all the characters with this notation. The XML specification introduces Restricted Characters in the following ranges: [#x1-#x8], [#xB-#xC], [#xE-#x1F], [#x7F-#x84] and [#x86-#x9F]. When you try to read those characters, then the XML parser generates an error.

> new DOMParser().parseFromString('<a>&#0;</a>', 'text/xml')
    .getElementsByTagName('a')[0].textContent
"error on line 1 at column 8: xmlParseCharRef: invalid xmlChar value 0"

Instead of fighting with the XML spec, I decided to use my own encoding. I replace the character by \u0000. Where the number is an hexadecimal representation of the number padding so it has exactly four digits.

To do that, we need first to escape all the \ and can use a regex to do it in few lines of code 🙂

function encode(str) {
   return str
     .replace(/\\/g, '\\\\')
     .replace(/[\u0000-\u0008\u000b-\u001f&<>"\n\t]/g, function(c) {
       var hex = c.charCodeAt(0).toString(16);
       while (hex.length < 4) {
         hex = '0' + hex;
       }
       return '\\u' + hex;
     });
 }

Then, in order to decode it, we do the opposite: we first decode all the unicode characters and remove the escapes. In order to make sure that the unicode character was not escape, I'm using a small trick. You can count the number of \. If it's an even number, then it is not escaped, otherwise it is escaped!

function decode(str) {
  return str
    .replace(/(\\*)\\u([0-9a-f]{4})/g, function(match, backslash, n) {
      if (backslash.length % 2 !== 0) {
        return match;
      }
      return backslash + String.fromCharCode(parseInt(n, 16));
    })
    .replace(/\\\\/g, '\\');
}
If you liked this article, you might be interested in my Twitter feed as well.
 
 

Related Posts

  • September 22, 2011 URLON: URL Object Notation (43)
    #json, #urlon, #rison { width: 100%; font-size: 12px; padding: 5px; height: 18px; color: #560061; } I am in the process of rewriting MMO-Champion Tables and I want a generic way to manage the hash part of the URL (#table__search_results_item=4%3A-slot). I no longer […]
  • August 20, 2011 Idea – mouseFreeze – A solution for Browser FPS Games (8)
    There is an open problem in porting real game into the web browser related to cursor handling. Problem Many games such as First-Person Shooters require the mouse to freely move, without the constraints of screen edges. However there is no such API in the browser to make this […]
  • March 6, 2012 Github Oauth Login – Browser-Side (15)
    I'm working on an application in the browser that lets you take notes. I don't want to have the burden to save them on my own server therefore I want to use Github Gists as storage. The challenge is to be able to communicate with the Github API 100% inside the browser. Since it is a […]
  • September 11, 2011 World of Warcraft HTML Tooltip Diff (0)
    MMO-Champion is a World of Warcraft news website. When a new patch is released, we want to show what has changed in the game (Post Example). An english summary of each spell change is hand written, but we want to show the exact tooltip changes. jsHTMLDiff is available on […]
  • March 19, 2012 MMO-Champion Miscellaneous Work (0)
    WoWDB Design I was the only active developper on db.mmo-champion.com and since I was no longer working at Curse, they decided to restart a database project, WoWDB.com, on the shiny Cobalt platform that powers SWOTR, Aion and Rift databases. The release of Mist of Pandaria beta […]