Image Processing is an active research area and it is mostly written in C and C++ because of performance reasons. Browser makers are at war to make Javascript fast. I want to know if it is viable to do it in Javascript.

At the moment, there are really few people doing Image Processing in Javascript. I listed some examples there:

Image processing is computational heavy, you have to do the same operation on millions of elements. In order to get acceptable performances we should look into all the possible ways to implement it and micro benchmark those. Today we start with the different types of Arrays.

Different types of Arrays

In Javascript there are 3 ways to get an array of data.

  • Classic Arrays. The Javascript Arrays that you get using [1, 2, 3] or new Array(size)
    var classic = new Array(1048576);

  • Canvas Data. You can access the raw pixel elements of a canvas in an array. It's an array of 4 * width * height * unsigned 8 bit integers (0 - 255).
    var canvas = $('#canvas').getContext('2d').getImageData(0, 0, 512, 512).data;

  • Typed Arrays. They are being introduced in Javascript for WebGL (spec).
    var typed8 = new Uint8Array(1048576);
    var typed32 = new Uint32Array(1048576);

Benchmarks

The objective of this benchmark is to test both read and write abilities of the arrays for use in image processing. We used an array of 1 millions of elements (1024 * 1024) with random values ranging from 0 to 255. We benchmark a loop that increases all the values by one.

for (var i = 0; i < 1048576; ++i) {
  array[i] += 1;
}
Note: The following results were made in December 2010. The implementation of Typed Arrays have improved a lot. The benchmark and conclusions are no longer accurate.

(Chrome 10.0.611.0 Canary, Firefox Minefield 4.0b9pre, Opera 11.00 Beta 1111, Safari 5.0.3 (7533.19.4))

Conclusion

Typed Arrays are not viable

Typed arrays are a new addition to Javascript and the specifications are not frozen yet. A wish would have been that they perform better as they are direct machine representation. However benchs shows us the opposite. There is probably a lot of boxing / unboxing being done behind the scene between the representation and the number type.

One advantage of typed arrays is the ability to instantly work on a raw binary image (like p*m). All you have to do is to map your typed array to the file data part.

Since Typed Arrays are not implemented on all browsers and are slower than both classic arrays and canvas, they should not be used right now.

Classical Arrays

In both Chrome and Firefox classical arrays are faster than canvas (respectively 25% and 45%). They are containing numbers that are stored on either 32 or 64 bits (Mozilla Implementation). This is more than the 8 bits of the Canvas element.

Images are usually being stored in binary files readable either through the Canvas element or the Typed Arrays. The use of Classical Arrays requires to do 2 type conversions (loading and saving). This is a big overhead for small processing.

If you want to do heavy processing and are targeting either Firefox or Chrome, Classical Arrays may be a good choice. For more than 8 bits values it is the preferred method.

Canvas Arrays

Canvas Arrays are slower than classical arrays in Chrome and Firefox, however they are really fast on Opera. They have one huge advantage is the ability to be read and written directly through the canvas element. But they are limited to unsigned 8 bits.

If you want to do normal processing on traditional rgb values, Canvas Array is the best option. Especially on Opera where it is blazing fast.

As I read an article about solving the 8-queen problem storing the board in a 64bit integer (French) I wanted to test it in Javascript. I knew that numbers where not stored as int64 but who knows, maybe it would have worked!

As you may have expected, it failed, giving completly off results. The reason behind this is a lack of precision, the least significant bits are being ignored.

(0x8040201008040201).toString(2)
//  10000000 01000000 00100000 00010000 00001000 00000100 00000010 00000001
>> "10000000 01000000 00100000 00010000 00001000 00000100 00000000 00000000"
                                                                ^         ^

Floating Points

The reason behind this behavior is the fact that numbers are stored as floating points! The best way to know what's the exact storage system is to look at the specifications:

The type Number is a set of values representing numbers. In ECMAScript, the set of values represents the double-precision 64-bit format IEEE 754 values including the special "Not-a-Number" (NaN) values, positive infinity, and negative infinity.

Source: ECMA 262 - 4.3.20 Number Type

Looking at Wikipedia we've got this table:

Total bits Sign Exponent Significand
64 1 11 52

If you don't know how floating point values are working, let me quick explain it to you. It's in fact really simple. You start with a normal unsigned integer, in this case of 52 bits. You shift it left or right with the value of the exponent, here stored on 11 bits. And if the sign bit is set, you mark it as negative. (This is a really simplistic view, but serve the purpose here!)

Example:

Significand: 1000 1100 1001, Exponent: 4, Sign: 1
 
- 1000 1100 1001 0000
                < <<<

As you can see, when the exponent is set, 0 are being added at the end of the number. This is the source of our precision loss.

Maximum Integer

With this in mind we can guess the biggest number that does not suffer from this. We have to fill the significand with 1, set the exponent to 0 and the sign to 0. It is the number 253-1.

Significand: 111..111 ~ 52, Exponent: 0, Sign: 0
 
  1111 1111 .... 1111 = 2 ^ 53 - 1

Now let's see what happen if we want to write 253.

Significand: 100..000 ~ 52, Exponent: 1, Sign: 0
 
1 0000 0000 .... 0000 = 2 ^ 53
                    <

It still works due to the fact that the least significant bit is set to 0 during the shift. However this is impossible to write 253+1, the next number we can write is 253+2!

Significand: 100..001 ~ 52, Exponent: 1, Sign: 0
 
1 0000 0000 .... 0010 = 2 ^ 53 + 2
                    <

When Trouble Arises!

Finally, we can provide the first example such as n == n + 1!

Math.pow(2, 53) == Math.pow(2, 53) + 1
>> true

You should take care using traditional loops with big numbers, they will go infinite after that maximum integer value.

// WARNING: DO NOT TRY THIS AT HOME!
 
var MAX_INT = Math.pow(2, 53); // 9 007 199 254 740 992
for (var i = MAX_INT; i < MAX_INT + 2; ++i) {
  // infinite loop
}

Conclusion

The integer part of the Number type in Javascript is safe in [-253 .. 253] (253 = 9 007 199 254 740 992). Beyond this there will be precision loss on the least significant numbers.

If you want to use bigger integers there are several options:

  • String: if you don't want to perform arithmetic operations on them.
  • Combine Multiple Numbers: One number gives you 52 bits of precision. It's possible to use an array of them to achieve the precision you need. See Closure Long (Int64) and Leemon Big Integer for examples.

A friend of mine gave me a great challenge. Find out how jQuery did to return objects that behave like arrays but that are not arrays! The aim of this article is to find out how Firebug and Web Inspector can display an object with the bracket notation.

$('a')
>> [<a href="http://vjeux.com/">vjeux</a>, <a href="http://google.com/">google</a>]

Not an Array!

First, let's make sure that this is really not an Array. If you wonder why just not subclassing the array, there are many reasons explained on this great article.

// Easy way
typeof $('a')
>> "object"
 
// Normal way
$('a') instanceof Array
>> false
 
// Duck typing ...
$('a').indexOf
>> undefined
 
// Object toString
// http://whattheheadsaid.com/2010/10/cross-context-isarray-and-internet-explorer
Object.prototype.toString.call($('a'))
>> "[object Object]"
 
// http://ajaxian.com/archives/isarray-why-is-it-so-bloody-hard-to-get-right
$('a').constructor
>> function Object() { [native code] }
 
// ES5 Way
// https://developer.mozilla.org/en/JavaScript/Reference/Global_Objects/Array/isArray
Array.isArray($('a'))
>> false

Testing ...

Obviously Firebug and Web Inspector must use another technique. What they need to display an Array is only the length property and the keys [0 .. length-1]. So let's try!

({0: 42, 1: 666, length: 2})
>> Object { 0=42, 1=666, length=2 }

This is obviously not working 🙁

In Source the Truth Lies!

Hopefully, both Webkit and Firebug are open source projects, this means that anyone can browse the source code and therefore tell exactly what happens behind the scene.

// Web Inspector Source Code
// http://trac.webkit.org/browser/trunk/WebCore/inspector/front-end/InjectedScript.js#L409
 
// FireBug's array detection.
if (isFinite(obj.length) && typeof obj.splice === "function")
  return "array";

We can see that Duck Typing is being used. It is now really easy to trick Firebug and Web Inspector to make them believe that your objects are arrays! Only 3 small rules are required:

  • [0 .. length - 1]: Array elements
  • length: Set to a positive integer
  • splice: Any function (even empty)

And here is the demo 🙂

({0: 42, 1: 666, length: 2, splice: function() {}})
>> [42, 666]

Check out the updated World of Warcraft talent calculator I've been working on 🙂

Popularity System

In Starcraft 2, the Custom Maps are being listed by popularity. The popularity is the number of times the map has been played for more than 5 minutes during the last 12 hours.

As I am running the website sc2mapster.com, I would like to have this listing. This would let the players know what are the popular maps in the other regions (US, EU, KR ...) but also let them find quickly the maps they have played to leave comments!

Getting the list!

Bad news, the listing is only available inside the game. And with the anti-hack measures Blizzard is taking, this is not a viable solution to extract them from the client. So we have to find another solution!

Hopefully, there's one. Blizzard is updating a website called Battle.net with the profiles or every player. In the player profile there are his latest 24 games played.

Since we can't parse all the characters (there are more than 1.6 millions as I speak) we are going to parse as much as we can picking them randomly. And we just add +1 for the couple [Map, Date]. As a result for each map we get the an approximation of number of times it has been played.

As you can see there's a hole around September 9, this is a side-effect of the 24 latest played limit. The data has been gathered for 2 days between the 15 and 16. Before the 9 you can see all the casual players that don't play too often and who's 24 limit has not been reached. Between the 11 and 13 the frequent players and after the hardcore ones.

However those artifacts are likely to disappear with a constant spidering.

Does it match?

Now that we've got a lot of data, the question that everyone's waiting ... Does it match the popularity listing!


Calculated Popularity | Position Difference | Real Popularity

As you can see, the top 12 maps are the same with some small ordering differences. However since the order is constantly changing, that's not that a big issue.

In order to get those results about 30 000 players have been parsed and the top popular map Nexus Wars has 2200 points (Real popularity is about 6000). Instead of using the data from the last 12 hours we used the data from the current day and the day before (the granularity of Battle.net listing is 1 day).

Conclusion

It was really hard to know if this method was going to give similar results as the ingame popularity. The only way to make sure of it was to test! I've setup the website scladder.com (recycled domain name) to show the values. However, due to the number of requests needed, I am probably not going to keep the spider running for a long time.

I just showed you that it was possible to datamine websites in order to obtain useful statistics 🙂 I hope you like it!