Writing a parser for a structured binary format such as a 3D model is extremely annoying. You have first to declare your file structure, and then go over every structure again and make a proper code to parse it. This is mainly caused because the lack of introspection of C/C++ and for performance reasons.

jParser is available on Github. It works on both NodeJS and Browser.

In Javascript, it does not have to be that way! jParser is a class that only asks you to write a JSON version of the structure. It will parse the file automatically for you.

Here is an example of what you could write with jParser:

var description = {
  header: {
    magic: ['string', 4],
    version: 'uint32'
  },
  model: {
    header: 'header'
  }
};
 
var model = new jParser(file, description).parse('model');
console.log(model);
// {
//   header: {
//     magic: 'MD20',
//     version: 272
//   }
// }

Description

Standard Structure

The description object defines blocks that needs to be parsed. In the previous example, we define two blocks header and model where each block is a list of labelled sub-blocks.

Default blocks such as int32, char, double are provided by jDataView.

This organization makes it easy to reproduce C structures. Let's see the description of the BMP format.

// Javascript Description
header: {
  header_sz: 	'uint32',
  width: 	'int32',
  height: 	'int32',
  nplanes: 	'uint16',
  bitspp: 	'uint16',
  compress_type:'uint32',
  bmp_bytesz: 	'uint32',
  hres: 	'int32',
  vres: 	'int32',
  ncolors: 	'uint32',
  nimpcolors: 	'uint32'
}
// C Structure
typedef struct {
  uint32_t header_sz;
  int32_t  width;
  int32_t  height;
  uint16_t nplanes;
  uint16_t bitspp;
  uint32_t compress_type;
  uint32_t bmp_bytesz;
  int32_t  hres;
  int32_t  vres;
  uint32_t ncolors;
  uint32_t nimpcolors;
} BITMAPINFOHEADER;

Reference Structures

As you already noticed, instead of using basic blocks, we can use our own blocks. In the following example, uvAnimation uses animationBlock that uses nofs:

nofs: {
  count: 'uint32',
  offset: 'uint32'
},
 
animationBlock: {
  interpolationType: 'uint16',
  globalSequenceID: 'int16',
  timestamps: 'nofs',
  keyFrame: 'nofs'
},
 
uvAnimation: {
  translation: 'animationBlock',
  rotation: 'animationBlock',
  scaling: 'animationBlock'
}

Functions

At this point, it is possible to express any C structure and parse files that could be loaded using a simple read. We now need to integrate a logic within our parser using anonymous functions.

Recursive Parsing

It is a common operation to read consecutive blocks. It is possible to make an array block that takes a block name and a count. It parses all theses blocks and aggregates them into a Javascript array.

array: function (type, length) {
  var array = [];
  for (var i = 0; i < length; ++i) {
    array.push(this.parse(type));
  }
  return array;
},

In order to call a function, we use an array literal where the first element is the block name and the rest are the arguments. We can easily define float[234].

float2: ['array', 'float', 2],
float3: ['array', 'float', 3],
float4: ['array', 'float', 4]

We can use the array block to build a string block. We parse an array of char and join it.

string: function (length) {
  return this.parse(['array', 'char', length]).join('');
},
 
filename: ['string', 32]

Seek & Tell

In the World of Warcraft models, there is a small structure called nofs that tells us "There are [count] consecutive [type] at [offset]". We build a struct block in order to parse this pattern. It will use seek and tell to navigate through the file.

nofs: {
  count: 'uint32',
  offset: 'uint32'
},
 
struct: function (type) {
  // Read the count & offset
  var nofs = this.parse('nofs');
 
  // Save the current offset & Seek to the new one
  var pos = this.tell();
  this.seek(nofs.offset);
 
  // Read the array
  var result = this.parse(['array', type, nofs.count]);
 
  // Seek back & Return the result
  this.seek(pos);
  return result;
},
 
triangles: ['struct', 'uint16'],
properties: ['struct', 'boneIndices']

Code

The code that powers this is only 30 lines long (70 including the standard integral types). It just handles each possible data type.

parse: function (description, param) {
  var type = typeof description;
 
  // Function
  if (type === 'function') {
    return description.apply(this, [this.param].concat(param));
  }
 
  // Shortcut: 'string' == ['string']
  if (type === 'string') {
    description = [description];
  }
 
  // Array: Function Call
  if (description instanceof Array) {
    return this.parse(this.description[description[0]], description.slice(1));
  }
 
  // Object: Structure
  if (type === 'object') {
    var output = {};
    for (var key in description) {
      if (description.hasOwnProperty(key)) {
        output[key] = this.parse(description[key]);
      }
    }
    return output;
  }
 
  throw new Error('Unknown description type ' + description);
}

Conclusion

This little parser is an example of how to extensively use all the dynamic characteristics of Javascript such as Object Literals, Anonymous Functions and Dynamic Typing in order to build a powerful and easy to use tool.

I don't want to release the library just yet as I need to explore more use cases and find elegant solution for them too. But I hope it will give you inspiration to use full Javascript power.

Demo

You can see it in action in my 0.1% completed Javascript WoW Model Viewer demo. The two following files are important:

If you liked this article, you might be interested in my Twitter feed as well.
 
  • http://twitter.com/g_marty Guillaume Marty

    Great post!
    About other use cases, how would you parse bit fields to structures?

    typedef struct Foo {
    int flag : 1;
    int counter : 15;
    } Foo;

    I hope you'll release this lib soon!

  • http://blog.vjeux.com/ Vjeux

    I would probably do something like this:

    foo: ['bit', {
    flag: 1,
    counter: 15
    }]

    What I'm investigating is the addition of back-references. For example if you want to parse a BMP you would want to write

    header: {
    width: 'uint32',
    height: 'uint32'
    },
    bmp: {
    header: header,
    pixels: ['array', 'uint3', header.width * header.height]
    }

    But how to express header.width & header.height given that it has not yet been parsed already. It could also be very useful for structure with conditions such as extended header if some values are set.

    extended_header: {
    // ...
    },
    header: {
    version: 'uint32'
    },
    bmp: {
    header: 'header',
    extended_header['cond', 'extended_header', header.version > 42]
    }

    Often, the first number of a block is its length, I'd like to be able to automatically pad appropriately the structure

    data: ['padding', data.size * 8, {
    size: 'uint32',
    // ...
    }]

    The last thing I would explore is error handling. For example if the magic number at the beginning is not "MD20" then throw an error. Be able to add some checks so that if the description is no longer correct, we can easily spot where it went wrong.

  • Henrik

    Looking forward to play with your code in some pet projects.

    Even prior to the first release -- do you mind adding an OS license to your github projects jDataView and jsWoWModelViewer? Or did I miss one?

    Thank you for considering,
    Henrik

  • Vjeux .

    They are both with the Licence: Do What The Fuck You Want To Public License

    http://sam.zoy.org/wtfpl/

    You can use it however you want, in public/private projects, modify it, tell that you wrote it ... Just have fun with it :)

  • http://twitter.com/g_marty Guillaume Marty

    So you have great plans for this lib!

    I'm highly interested in having back references and error handling on BinaryParser, as well as bitfields.

    I'm currently working on a porting a C lib to JavaScript and it relies on many structures, so I might be helpful to test new features (remember my pull request on jDataView yesterday?!).

    Don't hesitate, I'd be glad to help!

  • Mike

    How would I parse bit fields? I.e. uint5. I see in your ico example on GitHub you make a new type for a nibble but I can't figure out how to apply this for any sized field.

    Thanks for the great library :-)

  • http://seriyps.ru/me/ Sergey

    How does it handle big-endian and little-endian byte-orders?
    Also, can I suggest this library https://github.com/squaremo/bitsyntax-js ?

  • Pingback: Binary serialization for JavaScript objects | Dev @ Work

  • http://twitter.com/RReverser Ingvar Stepanyan

    Are you still going to add bitfield support into jParser/jDataView? I'll need that soon and wondering if you still have plans to add it or if I have to implement it myself.

  • http://blog.vjeux.com/ Vjeux

    I don't have plans. Feel free to do it and submit a pull request :)

    --
    Christopher "vjeux" Chedeau
    Facebook Engineer
    http://blog.vjeux.com/

  • Pingback: xperiments | Pearltrees

 

Related Posts

  • September 22, 2011 -- URLON: URL Object Notation (42)
    #json, #urlon, #rison { width: 100%; font-size: 12px; padding: 5px; height: 18px; color: #560061; } I am in the process of rewriting MMO-Champion Tables and I want a generic way to manage the hash part of the URL (#table__search_results_item=4%3A-slot). I no longer want to wr...
  • December 22, 2011 -- Javascript – One line global + export (2)
    I've been working on code that works on Browser, Web Workers and NodeJS. In order to export my module, I've been writing ugly code like this one: (function () { /* ... Code that defines MyModule ... */ var all; if (typeof self !== 'undefined') { all = self; // Web Worker }...
  • November 5, 2011 -- Simulated Annealing Project (0)
    For a school project, I have to implement Simulated Annealing meta heuristic. Thanks to many open source web tools, I've been able to quickly do the project and have a pretty display. CoffeeScript, Raphael, Highcharts, Three.js, Twitter Bootstrap, jQuery and Web Workers. .hover-border img {...
  • August 19, 2011 -- Javascript – Stupid Idea: Hoisting at the end (0)
    JSLint imposes us to do manual hoisting of variables. What if we did it but at the end of the function? :P How you write function print_array (array) { var length = array.length; for (var i = 0; i < length; ++i) { var elem = array[i]; console.log(elem); } } How ...
  • August 29, 2011 -- Javascript: Improve Cache Performance: Reduce Lookups (2)
    In my Binary Decision Diagram Library, the performance bottleneck was the uniqueness cache. By reducing the number of cache lookup, it is possible to greatly improve the performances. Common pattern In order to test if the key is already in the cache, the usual pattern is to use key in cache....