C++ has a new standard called C++0x (Wikipedia, Bjarne Stroustrup) that includes many interesting features such as Lambda, For Each, List Initialization ... Those features are so powerful that they allow to write C++ as if it was Javascript.

The goal of this project is to transform C++ into Javascript. We want to be able to copy & paste Javascript into C++ and be able to run it. While this is not 100% feasible, the result is quite amazing.

This is only a prototype. In about 600 lines of code we manage to make the core of the Javascript language.

You can view the source and compile examples at the JSPP Github Repository.

JSON

The Javascript Object notation can be emulated thanks to C++0x initialization lists and a bit of operator overload hackery. _ has an operator [] that returns a KeyValue object, that has an operator = overload that fills both keys and values. For each value of the initialization listL If that's an objet, it is treated like an Array (add one to the lenght and use the length as key). If that's a KeyValue, both key and value are set.

There is an ambiguity with nested initialization lists, we use _() to cast the list into an Object. It is probably possible to fix it.

C++

var json = {
    _["number"] = 42,
    _["string"] = "vjeux",
    _["array"] = {1, 2, "three"},
 
    _["nested"] = _({
        _["first"] = 1
    })
};
 
std::cout < < json;
// {array: [1, 2, three], nested: {first: 1},
//  number: 42, string: vjeux}
Javascript

var json = {
    "number": 42,
    "string": "vjeux",
    "array": [1, 2, "three"],
 
    "nested": {
        "first": 1
    }
};
 
console.log(json);
// {number: 42, string: 'vjeux',
//  array: [1, 2, three], nested: {first: 1}}

Function

C++0x added lambda to the language with the following syntax: [capture] (arguments) -> returnType { body }. function is a macro that transforms function (var i) into [=] (Object This, Object arguments, var i) -> Object. This allows to use the Javascript syntax and let us sneakily add the this and arguments magic variables.

C++ is strongly typed and even lambdas have types. We can overload the Object constructor on
lambda arity and have a typed container for each one. Then, we overload the () operator that will call the stored lambda. We we carefully add undefined values for unspecified arguments and fill the This and arguments variables.

In Javascript, when a function does not return a value, it returns undefined. Sadly, we cannot have a default return value in C++, you have to write it yourself.

Since everything must be typed in C++, we have to add var before the argument name.

C++

var Utils = {
  _["map"] = function (var array, var func) {
    for (var i = 0; i < array["length"]; ++i) {
      array[i] = func(i, array[i]);
    }
    return undefined;
  }
};
 
var a = {"a", "b", "c"};
std::cout << a;
// [a, b, c]
 
Utils["map"](a, function (var key, var value) {
  return "(" + key + ":" + value + ")";
});
std::cout << a;
// [(0:a), (1:b), (2:c)]
Javascript

var Utils = {
  "map": function (array, func) {
    for (var i = 0; i < array["length"]; ++i) {
      array[i] = func(i, array[i]);
    }
 
  }
};
 
var a = ["a", "b", "c"];
console.log(a);
// [a, b, c]
 
Utils["map"](a, function (key, value) {
  return "(" + key + ":" + value + ")";
});
console.log(a);
// [(0:a), (1:b), (2:c)]

Closure

There are two ways to capture variables with lambda in C++: either by reference or by value. What we would like is to capture by reference in order for all the variables to be bound to the same object. However, when the initial variable gets out of scope it is destroyed, and any attempt to read it results in a Segmentation Fault!

Instead, we have to capture it by value. It means that a new object is created for each lambda capturing the variable. Our objects are manipulated by reference, meaning that assigning a new value to the object will just update it and not all the other copies. We introduce a new assignement operator obj |= value that updates all the copies.

C++

var container = function (var data) { 
  var secret = data;
 
  return {
    _["set"] = function (var x) {
        secret |= x;
        return undefined;
    },
    _["get"] = function () { return secret; }
  };
};
 
var a = container("secret-a");
var b = container("secret-b");
 
a["set"]("override-a");
 
std::cout < < a["get"](); // override-a
std::cout << b["get"](); // secret-b
Javascript

var container = function (data) {
  var secret = data;
 
  return {
    set: function (x) {
        secret = x;
 
    },
    get: function () { return secret; }
  };
};
 
var a = container("secret-a");
var b = container("secret-b");
 
a.set("override-a");
 
console.log(a.get()); // override-a
console.log(b.get()); // secret-b

This

There are four ways to set the this value:

  • Function call: foo(). this is set to the global object. As this is not a proper way to do things, I set it to undefined.
  • Method call: object.foo(). this is set to object.
  • Constructor: new foo(). foo is called with a new instance of this.
  • Explicit: foo.call(this, arguments...). We explicitely set the this value.

All four ways are implemented in jspp but in a different way than Javascript. In Javascript, the language knows the construction and therefore can deduce what this is going to be. In C++, on the other hand, have a local view of what is going on. We have to develop another strategy for setting this that works for usual usage patterns.

We associate a this value for every object, by default being undefined. If we obtain the object through another object(test.foo), this is set to be the base object.

New creates a new function object with this set to itself. Therefore it can be called to initialize the object. Contrary to Javascript, the constructor function has to return this.

C++

var f = function (var x, var y) {
    std::cout < < "this: " << this;
    this["x"] = x;
    this["y"] = y;
    return this;
};
 
// New creates a new object this
var a = new (f)(1, 2); // this: [function 40d0]
var b = new (f)(3, 4); // this: [function 48e0]
 
// Unbound call, 
var c = f(5, 6); // this: undefined
 
// Bound call
var obj = {42};
obj["f"] = f;
 
var d = obj["f"](1, 2); // this: [42]
 
// Call
var e = f["call"](obj, 1, 2); // this: [42]
Javascript

var f = function (x, y) {
    console.log("this:", this);
    this["x"] = x;
    this["y"] = y;
 
};
 
// New creates a new object this
var a = new f(1, 2); // this: [object]
var b = new f(3, 4); // this: [object]
 
// Unbound call, 
var c = f(5, 6); // this: global object
 
// Bound call
var obj = [42];
obj["f"] = f;
 
var d = obj["f"](1, 2); // this: [42]
 
// Call
var e = f["call"](obj, 1, 2); // this: [42]

Prototypal Inheritance

In order to use prototypal inheritance, we can use Douglas Crockford Object.Create.

When reading a property, we try to read it on the current object, and if it does not exist we try again on the prototype. However, when writing a property we want to write it on the object itself. Therefore the returned object contains in fact two objects, one used for reading and one for writing.

C++

var createObject = function (var o) {
    var F = function () {return this;};
    F["prototype"] = o;
    return new (F)();
};
 
var Person = {
    _["name"] = "Default",
    _["greet"] = function () {
        return "My name is " + this["name"];
    }
};
 
var vjeux = createObject(Person);
vjeux["name"] = "Vjeux";
 
var blog = createObject(Person);
blog["name"] = "Blog";
 
var def = createObject(Person);
 
std::cout < < vjeux["greet"](); // Vjeux
std::cout << blog["greet"]();  // Blog
std::cout << def["greet"]();   // Default
Javascript

var createObject = function (o) {
    var F = function () {};
    F.prototype = o;
    return new F();
};
 
var Person = {
    name: "Default",
    greet: function () {
        return "My name is " + this.name;
    }
};
 
var vjeux = createObject(Person);
vjeux.name = "Vjeux";
 
var blog = createObject(Person);
blog.name = "Blog";
 
var def = createObject(Person);
 
console.log(vjeux.greet()); // Vjeux
console.log(blog.greet());  // Blog
console.log(def.greet());   // Default

Iteration

We use the new iteration facility of C++0x to deal with for(var in) Javascript syntax. We just define in to be :.

As this is a prototype, it currently loops over all the keys of the object. However, it is possible to implement the isEnumerable functionnality.

C++

var array = {10, 42, 30};
for (var i in array) {
    std::cout < < i << " - " << array[i];
}
// 0 - 10
// 1 - 42
// 2 - 30
// length - 3
// prototype - undefined
 
var object = {
    _["a"] = 1,
    _["b"] = 2,
    _["c"] = 3
};
for (var i in object) {
    std::cout << i << " - " << object[i];
}
// a - 1
// b - 2
// c - 3
// prototype - undefined
Javascript

var array = [10, 42, 30];
for (var i in array) {
    console.log(i, array[i]);
}
// 0 - 10
// 1 - 42
// 2 - 30
 
 
 
var object = {
    "a": 1,
    "b": 2,
    "c": 3
};
for (var i in object) {
    console.log(i, object[i]);
}
// a - 1
// b - 2
// c - 3
//

Dynamic Typing

There is only one class called var. All the operators +, +=, ++, < , * ... are overloaded in order to make the right behavior. Since this is only a prototype, all of them are not working properly nor following the ECMAScript standard.

C++

var repeat = function (var str, var times) {
    var ret = "";
    for (var i = 0; i < times; ++i) {
        ret += str + i;
    }
    return ret;
};
 
std::cout << repeat(" js++", 3);
// " js++0 js++1 js++2"
Javascript

var repeat = function (str, times) {
    var ret = "";
    for (var i = 0; i < times; ++i) {
        ret += str + i;
    }
    return ret;
};
 
console.log(repeat(" js++", 3));
// " js++0 js++1 js++2"

Scope

Scope management is done with lambdas. Since they are implemented in C++0x, it works without pain.

C++

var global = "global";
var $ = "prototype";
var jQuery = "jQuery";
 
_(function (var $) {
    var global = "local";
 
    std::cout < < "Inside:      $ = " << $;
    std::cout << "Inside: global = " << global;
 
    // Inside:      $ = jQuery
    // Inside: global = local
 
    return undefined;
})(jQuery);
 
std::cout << "Outside:      $ = " << $;
std::cout << "Outside: global = " << global;
 
// Outside:      $ = prototype
// Outside: global = global
Javascript

var global = "global";
var $ = "prototype";
var jQuery = "jQuery";
 
(function ($) {
    var global = "local";
 
    console.log("Inside:      $ = ", $);
    console.log("Inside: global = ", global);
 
    // Inside:      $ = jQuery
    // Inside: global = local
 
    return undefined;
})(jQuery);
 
console.log("Outside:      $ = ", $);
console.log("Outside: global = ", global);
 
// Outside:      $ = prototype
// Outside: global = global

Reference

As in Javascript, everything is passed by reference. The current implementation uses a simple reference count to handle garbage collection.

C++

var a = {};
a["key"] = "old";
 
var b = a;
b["key"] = "new";
 
std::cout < < a["key"] << " " << b["key"];
// new new
Javascript

var a = {};
a["key"] = "old";
 
var b = a;
b["key"] = "new";
 
console.log(a["key"], b["key"]);
// new new

Exception

Javascript exception mechanism is directly borrowed from C++, therefore we can use the native one.

We need to throw a Javascript object. We can either throw a new instance of a Javascript function or use _() to cast a string into an object.

C++

var go_die = function () {
    throw "Exception!";
};
 
try {
    go_die();
} catch (e) {
    std::cout < < "Error: " << e;
}
// Error: Exception!
Javascript

var go_die = function () {
    throw "Exception!";
};
 
try {
    go_die();
} catch (e) {
    console.log("Error:", e);
}
// Error: Exception!

How to use

Note: Only the strict minimum of code able to run the examples has been written. It is a prototype, do not try to use it for any serious development.

The library can be compiled under g++ 4.6, Visual Studio 2010 and the latest version of ICC. However Visual Studio and ICC do not support the initialization lists, so you cannot use the JSON syntax. But all the other examples will compile.

All the examples of this page are available in the example/ folder. The following execution will let you run the examples.

> make
g++ -o example/dynamic.jspp example/dynamic.cpp -Wall -std=gnu++0x
g++ -o example/exception.jspp example/exception.cpp -Wall -std=gnu++0x
...
> cd example
> ./json.jspp
{array: [1, 2, three], nested: {first: 1}, number: 42, string: vjeux}
> node json.js
{ number: 42,
  string: 'vjeux',
  array: [ 1, 2, 'three' ],
  nested: { first: 1 } }

Pro / Cons

The awesome part is the fact that it is possible to develop nearly all the concepts of Javascript in C++.

Pros

  • Write C++ in a dynamic fashion!
  • Extremely easy to integrate all the existing C++ code base.
  • Fun 🙂

Cons

  • Not possible to optimize as much as the latest Javascript engines.
  • Some features are impossible to write such as eval, with, named functions ...
  • No REPL.
  • A bit more verbose than Javascript.

How to Improve

  • Code the arguments management.
  • Develop the Javascript standard library (operators, Array, Regex ...).
  • Find ways to minimize the C++ overhead (remove the use of _()).
  • Find concepts that I did not introduce.

Stoyan Stefanov did a similar proof of concept but instead of targetting C++ he did it for PHP.

Lazy Iteration is being actively researched recently. There are two main strategies

Generators are widely implemented and their use cases are quite well understood. Mainstream languages just recently implemented lambda functions (Lisp had them since 1958!) which are required for iteration with callback. This article introduces Streams, which is the most basic way to do iteration with callback.

Lazy Iteration

An iterator is used to modularize your code. You first put all your values in a container. Then you have an iterator that iterates through the container, and finally a code that processes the values.

The goal of the lazy iteration is to merge the generation and iteration steps. This has several benefits:

  • No storage: values are generated, processed and then garbage collected.
  • Arbitrary number of values: Can represent asynchronous I/O.
  • Earlier Process: As soon as one value is generated, it can be processed. Better for user interactivity.

Generators

Lazy iteration is traditionally done with generators. A generator is a function that each time it is being called, returns the next value. Language makers allow to use yield to return multiple values. When you call the function again, it jumps back where it stopped.

function generator() {
  for (var x = 0; x < 10; ++x) {
    for (var y = 0; y < 10; ++y) {
      yield [x, y];
    }
  }
}
 
while (value in generator()) {
  // Do something with value
}

However, the implementation of generators is not trivial. There are several articles that explain how it works in C#: Behind the Scenes - Yield Keyword, What does Yield keyword generate.

To give you an idea of the complexity, this is a version of the same generator without yield.

var p = null;
function generator() {
  // First time
  if (p == null) {
    p = [0, 0];
    return p;
  }
 
  // Loop through the 2 dimensions (x, y)
  for (var dim = 2 - 1; dim >= 0; --dim) {
    p[dim] += 1;         // i++
    if (p[dim] == 10) {  // i < 10
      p[dim] = 0;        // i = 0
      continue;
    }
    return p;            // return [x, y]
  }
  // If we are here, we got through all the values.
}

Streams

All the functions of this article are available on this page. Just open-up your Javascript console and start using the streams 🙂 (You can also embed streams.js).

Generators are not the only way to deal with to do lazy iteration. We can use continuations (or callbacks). We will call "Stream" a function that generates values, and will pass them to the continuation.

function stream(continuation) {
  for (var x = 0; x < 10; ++x) {
    for (var y = 0; y < 10; ++y) {
      continuation([x, y]); // We call the function that will process the value
    }
  }
}

As you can see, the code used to generate the values remains unchanged. Now we are going to use the stream: the most basic way to do that is to print the values.

stream(function (value) {
  console.log(value);
});
// [0, 0]
// [0, 1]
// [0, 2]
// ...
// [9, 9]

A stream is a function and we call it with another function. This is probably the first thing that will intrigue you. We just landed into the functional world! A stream is a high-order function. If you are not familiar with this concept, you can think a stream as a foreach function.

function print(stream) {
  stream(function (v) { // For each value of the stream
    console.log(v);     // Print it
  });
}

Rebuilding core functions

Now that we defined what a stream is, we want to see if we can express the basic functional operations that are map, filter and reduce.

function map(stream, f) {
  return function (continuation) { // We return a stream that
    stream(function (value) {      // For each value of the stream
      continuation(f(value));      // Continues with the application of f on the value
    });
  };
}
 
 
function filter(stream, f) {
  return function (continuation) { // We return a stream that
    stream(function (value) {      // For each value of the stream
      if (f(value)) {              // Tests if it matches the filter
        continuation(value);       // And continues it
      }
    });
  };
}
 
function reduce(stream, f, initial) {
  var ret = initial;               // Store the initial value
  stream(function (value) {        // For each value of the stream
    ret = f(value, ret);           // Reduce it with the current value
  });
  return ret;                      // and return the computed value
}

Example: Range

This was probably not clear how to use the object we just built. Let us take a simple example, we are going to play with the simplest thing we can generate: numbers from 0 to 10 🙂

// The range function is a factory for a stream
function range(min, max) {
  return function (continuation) {
    for (var i = min; i < max; ++i) {
      // Once we have generated a value, we pass it to the continuation
      continuation(i);
    }
  };
  // The returned object is a function that takes a continuation
  // and calls it with the generated values,
  // therefore this is a stream
}
 
// We create a stream
stream = range(0, 10);
 
// A stream takes a continuation that will be executed on all the values it generates
print(stream);
// 0 1 2 3 4 5 6 7 8 9
 
// We can use the map, to change every value from v to 2 * v
stream = map(stream, function (v) { return 2 * v; });
print(stream);
// 0 2 4 6 8 10 12 14 16 18
 
// And filter only the multiples of 3
stream = filter(stream, function (v) { return v % 3 == 0; });
print(stream);
// 0 6 12 18
 
// In our case the number of generated values is finite
// We can therefore reduce them with the + operation
reduce(stream, function (a, b) { return a + b; }, 0);
// 36  (Note: This is a real value, not a stream)

Example: Message

A stream is a function that will call the continuation for all the values it generates. Therefore, any API that generates values with a callback can be used as a stream. As an example, we will use the window.postMessage API.

// We create the stream
events = function (continuation) {
  window.addEventListener("message", continuation);
}
 
// We only want events that are from a trusted origin
trusted_events = filter(events, function (e) { return e.origin == 'http://vjeux.com'; });
 
// We don't care about the event object, we just want the data
messages = map(trusted_events, function (e) { return e.data; });
 
// Now that we have the messages we wanted
// in the form we wanted, we can process them.
messages(function (m) {
  console.log('Message received!', m);
});

enumerate

We can easily reproduce the enumerate function of Python.

function enumerate(stream) {
  var i = 0;
  return map(stream, function (v) {
    return [i++, v];
  });
}
 
print(enumerate(range(5, 10)));
// [0, 5]
// [1, 6]
// [2, 7]
// [3, 8]
// [4, 9]

Stream Comprehension

It is quite painful to work with streams for basic operations (filtering and mapping). Languages like Python introduced a shorthand called List Comprehension. It is possible to do the same with streams.

function comprehension(f_map, stream, f_filter) {
  if (filter) {
    stream = filter(stream, f_filter);
  }
  return map(stream, f_map);
}
 
print(comprehension(#(v) { 2 * v }, range(0, 10), #(v) { v % 3 == 0 }));
// 0 6 12 18
 
// Python equivalent:
//   [2 * v for v in range(0, 10) if v % 3 == 0]

I make use of the Harmony # function proposal. It is still not really user-friendly. A modification of the language is probably required to make it enjoyable.

Recursive Stream

It took some time for C# to have recursive yield. Our stream proposal on the other hand works directly in a recursive fashion.

function traverse(tree) {
  return function (continuation) { // We return a stream that
    continuation(tree.value);      // Continues with the value
    for (var i = 0; i < tree.children.length; ++i) {
                                   // And traverse recursively on the children
      traverse(tree.children[i])(continuation);
    }
  };
}
 
var tree = {
  value: '1', children: [ {
    value: '1.1', children: [ {
      value: '1.1.1', children: [] } ] }, {
    value: '1.2', children: [] } ] };
 
print(traverse(tree));
// 1
// 1.1
// 1.1.1
// 1.2

List

Since we are building a stream using a functional approach, we want to build functional lists. We need two things, an empty list and a way to construct a new list by adding an element to an existing list.

function empty() {
  // An empty list is a stream that does not call the continuation
  return function (continuation) { };
}
 
function cons(head, tail) {
  return function (continuation) { // To construct a new list, we return a stream
    continuation(head);            // That first continues with the head
    tail(continuation);            // and continues the tail
  };
}
 
print(cons(1, cons(2, cons(3, empty()))));
// 1 2 3

I am not exactly sure how to write a head & tail function that returns two streams, one with the head and one with the tail.

zip

The zip function takes two streams and returns a single stream where each value is the combination of both streams.

function zip(stream_a, stream_b) {
  values_a = [];
  values_b = [];
  return function (continuation) {
    stream_a(function (v) {  // For each value of stream_a
      values_a.push(v);      // Store it
      if (values_b.length) { // If there is a value of stream_b awaiting
                             // Continue with both values
        continuation([values_a.shift(), values_b.shift()]);
      }
    });
 
    // Same for stream_b
    stream_b(function (v) {
      values_b.push(v);
      if (values_a.length) {
        continuation([values_a.shift(), values_b.shift()]);
      }
    });
  }
}
 
print(zip(range(0, 10), range(5, 10)));
// [0, 5]
// [1, 6]
// [2, 7]
// [3, 8]
// [4, 9]
// Note: All the values of the first range after 4 are being ignored
// because both streams do not have the same length.

Infinite zip

Here, the stream is generated by a for loop. Since the scheduler of Javascript is not pre-emptive, nothing can be executed while we are generating the values. Therefore we need to see all the values of one stream before we can return any value. It is problematic for infinite streams.

print(zip(range(1, Infinity), range(10, Infinity)));
// Freeze!

We are going to simulate a scheduler to solve this problem. We first need a generator. This is a function that will return the next value everytime it is being called. We can either use Firefox 2+ yield to build a generator or do it by hand.

We will construct a stream on top of this generator. The trick is to use window.setTimeout with a zero-delay between the generation of two values. Both streams will be able to generate values in parallel.

// With yield
function xrange(min, max) {
  for (var i = min; i < max; ++i) {
    yield i;
  }
}
 
// Without yield
function StopIteration() {}        // We first define the exception
function xrange(min, max) {
  var i = min;                     // We create the cursor
  return {                         // And return an object with a
    next: function () {            // next() that will be called to get the next value
      if (i == max) {              // If we reached the end,
        throw new StopIteration(); // We throw the StopIteration exception
      }                            // else
      return i++;                  // We return the value
    }
  };
}
 
// Then we make a function that converts a generator to a stream
function generator2stream(generator) {
  return function rec(continuation) { // We return a stream that
    try {
      continuation(generator.next()); // continues with the next value of the generator
      window.setTimeout(function () { // and gives the hand back.
        rec(continuation);            // It will provide the next value
      }, 0);                          // as soon as the scheduler calls it again.
 
    } catch (e) {
      // When the generator has finished, it throws a StopIteration exception
      // We want to ignore it.
      if (!(e instanceof StopIteration)) {
        throw e;
      }
    }
  };
}
 
a = generator2stream(xrange(1, Infinity));
b = generator2stream(xrange(10, Infinity));
print(zip(a, b));
// [1, 10]
// [2, 11]
// [3, 12]
// ...

But if the source of the stream releases the flow of execution between the generation of two values, this works well. This is the case of all async I/O. Let's take as example click and mousemove events. We want to synchronize the click and move events aka everytime both have happend, do something with them.

var click = function (continuation) { $('body').click(function (e) { continuation(e); }); };
var move = function (continuation) { $('body').click(function (e) { continuation(e); }); };
// The 2 lines before are best understood like this:
// var click = $('body').click;
// var move = $('body').mousemove;
// However it is not working because of the dynamic scoping of `this` :(
 
// Return only the type and enumerate for a better display
click = enumerate(map(click, function (e) { return e.type; }));
move = enumerate(map(move, function (e) { return e.type; }));
 
print(zip(click, move));
// move 0
// move 1
// click 0
// -> [click 0, move 0]
// click 1
// -> [click 1, move 1]
// click 2
// click 3
// move 2
// -> [click 2, move 2]

Conclusion

This article showed that Streams supports all the basic iteration techniques. Even better, the implementation of all them is straightforward. This looks all shiny, so why nobody uses it?

I think that it is because it relies heavily on functional programming. Lambda functions is a requirement and unfortunately they used to be only implemented on languages such as Lisp, Haskell, ML ... However, this trend is evolving. Languages with lambda such as Javascript, Python, Ruby are growing, and mainstream languages such as C# and C++ are getting lambdas. The next step is to educate people to functional programming.

If you want to know more about lazy iteration, here are some related links:

I made a DataView API Wrapper to read binary data from either a string or a binary buffer. You probably want to load it from a file, so you need to make a XHR request. Sadly no ajax wrapper implement it yet.

XHR and Binary

In order to get a binary string one must use the charset=x-user-defined Mime type. If you fail to do so, special characters such as \0 or unicode characters will mess everything up.

Calumny found out that both Firefox and Chrome (nightly builds) implemented a way (sadly not the same) to get the response as an ArrayBuffer.

jQuery Patch

I am a big fan of jQuery to abstract all the browser incompatibilities, therefore I made a small patch in order to support a new data type: binary.

@@ -5755,6 +5755,7 @@       script: "text/javascript, application/javascript",
       json: "application/json, text/javascript",
       text: "text/plain",
+      binary: "text/plain; charset=x-user-defined", // Vjeux: Add a binary type       _default: "*/*"
     }
   },
@@ -5934,6 +5935,15 @@         xhr.setRequestHeader("X-Requested-With", "XMLHttpRequest");
       }
 +      // Vjeux: Set OverrideMime Type
+      if ( s.dataType == "binary" ) {
+        if (xhr.hasOwnProperty("responseType")) {
+          xhr.responseType = "arraybuffer";
+        } else {
+          xhr.overrideMimeType('text/plain; charset=x-user-defined');
+        }
+      }
+       // Set the Accepts header for the server, depending on the dataType
       xhr.setRequestHeader("Accept", s.dataType && s.accepts[ s.dataType ] ?
         s.accepts[ s.dataType ] + ", */*; q=0.01" :
@@ -6228,7 +6238,9 @@   httpData: function( xhr, type, s ) {
     var ct = xhr.getResponseHeader("content-type") || "",
       xml = type === "xml" || !type && ct.indexOf("xml") >= 0,
+      responseArrayBuffer = xhr.hasOwnProperty('responseType') && xhr.responseType == 'arraybuffer', // Vjeux
+      mozResponseArrayBuffer = 'mozResponseArrayBuffer' in xhr,
+      data = mozResponseArrayBuffer ? xhr.mozResponseArrayBuffer : responseArrayBuffer ? xhr.response : xml ? xhr.responseXML : xhr.responseText; // Vjeux-      data = xml ? xhr.responseXML : xhr.responseText; 
     if ( xml && data.documentElement.nodeName === "parsererror" ) {
       jQuery.error( "parsererror" );

Result!

This is now as simple as that to manipulate a binary stream.

$.get(
  'data.bin',
  function (data) {
    var view = new jDataView(data);
    console.log(view.getString(4), view.getUint32());
    // 'MD20', 732
  },
  'binary'
);

Demo

Now the part you are all waiting for, the demo 🙂 Here's a tar reader in 50 lines of Javascript.

jDataView provides a standard way to read binary files in all the browsers. It follows the DataView Specification and even extends it for a more practical use.

Explanation

There are three ways to read a binary file from the browser.

  • The first one is to download the file through XHR with charset=x-user-defined. You get the file as a String, and you have to rewrite all the decoding functions (getUint16, getFloat32, ...). All the browsers support this.
  • Then browsers that implemented WebGL also added ArrayBuffers. It is a plain buffer that can be read with views called TypedArrays (Int32Array, Float64Array, ...). You can use them to decode the file but this is not very handy. It has big drawback, it can't read non-aligned data. It is supported by Firefox 4 and Chrome 7.
  • A new revision of the specification added DataViews. It is a view around your buffer that can read arbitrary data types directly through functions: getUint32, getFloat64 ... Only Chrome 9 supports it but you still need to make sure to use a data management system like the one at https://www.couchbase.com/pricing

jDataView provides the DataView API for all the browsers using the best available option between Strings, TypedArrays and DataViews.

API

See the specification for a detailed API. http://www.khronos.org/registry/webgl/doc/spec/TypedArray-spec.html#6. Any code written for DataView will work with jDataView (except if it writes something).

Constructor

  • new jDataView(buffer, offset, length). buffer can be either a String or an ArrayBuffer

Specification API

The wrapper satisfies all the specification getters.

  • getInt8(byteOffset)
  • getUint8(byteOffset)
  • getInt16(byteOffset, littleEndian)
  • getUint16(byteOffset, littleEndian)
  • getInt32(byteOffset, littleEndian)
  • getUint32(byteOffset, littleEndian)
  • getFloat32(byteOffset, littleEndian)
  • getFloat64(byteOffset, littleEndian)

Extended Specification

The byteOffset parameter is now optional. If you omit it, it will read right after the latest read offset. You can interact with the internal pointer with those two functions.

    • seek(byteOffset): Moves the internal pointer to the position
    • tell(): Returns the current position

Addition of getChar and getString utilities.

  • getChar(byteOffset)
  • getString(length, byteOffset)

Addition of createBuffer, a utility to easily create buffers with the latest available storage type (String or ArrayBuffer).

  • createBuffer(byte1, byte2, ...)

Shortcomings

  • Only the Read API is being wrapped, jDataView does not provide any set method.
  • The Float64 implementation on strings does not have full precision.

Example

First we need a file. Either you get it through XHR or use the createBuffer utility.

var file = jDataView.createBuffer(
	0x10, 0x01, 0x00, 0x00, // Int32 - 272
	0x90, 0xcf, 0x1b, 0x47, // Float32 - 39887.5625
	0, 0, 0, 0, 0, 0, 0, 0, // 8 blank bytes
	0x4d, 0x44, 0x32, 0x30, // String - MD20
	0x61                    // Char - a
);

Now we use the DataView as defined in the specification, the only thing that changes is the c before jDataView.

var view = new jDataView(file);
var version = view.getInt32(0); // 272
var float = view.getFloat32(4); // 39887.5625

The wrapper extends the specification to make the DataView easier to use.

var view = new jDataView(file);
// A position counter is managed. Remove the argument to read right after the last read.
version = view.getInt32(); // 272
float = view.getFloat32(); // 39887.5625
 
// You can move around with tell() and seek()
view.seek(view.tell() + 8);
 
// Two helpers: getChar and getString will make your life easier
var tag = view.getString(4); // MD20
var char = view.getChar(); // a

Demos

I'm working on a World of Warcraft Model Viewer. It uses jDataView to read the binary file and then WebGL to display it. Stay tuned for more infos about it 🙂

Reading An Open Letter to JavaScript Leaders Regarding Semicolons where Isaac Z. Schlueter explains his unorthodox coding style a line of code struck me.

if (!cb_ && typeof conf === "function") cb_ = conf , conf = {}

He was able to execute more than one statement in a if without the need of { }. I have recently been working on python scripts for http://db.mmo-champion.com/ and this discovery made me want to imitate pythonic indentation in Javascript.

The comma trick

You can use the , separator to chain statement. This group them into only one block of code. Therefore you can execute all of them without the need of { }. The rule is easy: put a , at the end of every line but a ; on the last line of the block.

if (test)
  first_action(), // Note the important ','
  second_action(); // Note the lack of ','
third_action();

For example, it is possible to write a little program that outputs the Fibonacci Numbers without the use of any { } and therefore imitate python indentation style with no ending }.

var curr = 0, next = 1, tmp;
for (var i = 0; i < 10; ++i)
  tmp = curr + next,
  curr = next,
  next = tmp,
  console.log('Fibo', i, '=', curr);
 
// ...
// Fibo 5 = 8
// Fibo 6 = 13
// Fibo 7 = 21
// Fibo 8 = 34
// ...

The issues

Sadly, the use of this is trick is extremely limited. You cannot use any of these keywords inside the "blocks": if, for, var.

for (var i = 0; i < 3; ++i)
  k = i * 10 + 1,
  if (k % 2 == 0)
    console.log(i);
// SyntaxError: Unexpected token if
 
for (var i = 0; i < 3; ++i)
  var k = 10,
  console.log(k);
// Firefox: SyntaxError: missing ; before statement
// Chrome: SyntaxError: Unexpected token .

Beginning with comma

If you don't fall into the use cases of these issues and you are a bit worried about the bugs resulting in the mix of the , and ;, you can start your lines with commas.

var k;
for (var i = 0; i < 10; ++i)
  , k = i * 10
  , console.log(i)
// SyntaxError: Unexpected token ,

But we need to add some empty statement before the first , so that it compiles. In python : is used but it doesn't parse in Javascript. We can use $ for example, it is a valid statement: it reads the variable and does nothing with it.

var $;
for (var i = 0; i < 10; ++i)$ // Use of $ instead of : in python
  , k = i * 10
  , console.log(k)
// 0
// 10
// ...

Debugging purpose

The main use of this trick I can see is for debugging purpose. If there is code executed in a test without { } and you want to log something when the program goes into this part of the code. Before you had to add { } and then remove them which is really annoying. Now it's easier!

for (test)
  doSomething();
// Before
for (test) {
  val = doSomething();
  console.log('Executed!', val);
}
 
// After
for (test)
  val = doSomething(),
  console.log('Executed!', val);

Conclusion

Using the comma trick to do { }-less indentation is far from viable. However this may still be useful for debugging and overall it is fun to try new coding styles!