Hey, I'm Christopher Chedeau aka Vjeux, a 22 years-old frenchy! I started this blog to talk about the various projects I am working on and to reveal some of my programming tricks! I hope you will find some of my stuff fun if not useful :)

I'm a Facebook Software Engineer in the Photo Team. Before that, I went to EPITA, a 5-year Computer Science school and majored in its R&D lab LRDE. I also worked for Curse during the nights and week-ends.

Laurent Senta had the opportunity to go to the 5th European Lisp Symposium to present Climb, the project I've been working on during the past 2 years. He did an excellent job at writing a 4-page paper that sums up the interesting parts of the project (Download).

Presentation

I recommend reading the short article before getting to the slides. Download the PPTX if you want to see the speaker text.

This is the third (and last) presentation about my work on Climb at the LRDE. During the first one I tackled genericity on data structures, the second was about genericity on values and this one talks about genericity on algorithms.

Climb - Property-based dispatch in functional languages

Abstract: "Climb is a generic image processing library. A generic algorithm interface often requires several different specialized implementations. Olena, a C++ library, solves this using properties.

We present a way to dispatch a function call to the best specialized implementation using properties in a dynamic programming language: Common Lisp. Then, we introduce examples of algorithms and properties used in image processing."

WoWDB Design

I was the only active developper on db.mmo-champion.com and since I was no longer working at Curse, they decided to restart a database project, WoWDB.com, on the shiny Cobalt platform that powers SWOTR, Aion and Rift databases.

The release of Mist of Pandaria beta being close (less than 24 hours away) and the website without any CSS, I've been asked to come up with a design. 3 hours later, here's the result :)

As you can see, I borrowed a lot of design elements and CSS from the original MMO-Champion website. I really like the end result. Often database websites are on black backgrounds, making it with a light one gives a fresh look.

Countdown

MMO-Champion uses countdown to make hype around certain events. I've had the pleasure to do two of them, one for each expansion.

Cataclysm

28Hours
56Minutes
43Seconds
23Hours
56Minutes
43Seconds

That's my first one. The hardest part was to find a good font that doesn't suck with a big font-size. (Note: the times here are placeholders!)

Mist of Pandarian


051648

As you can see, my Photoshop skills have improved a lot since the first one :) I've been able to steal design elements from Blizzard website to make the artwork look better.

You can notice that each digit of numbers is absolutely positioned, therefore it doens't constantly move when the number changes.

I also use a Brawler, a custom Google Web Font and text-stroke to help with anti-aliasing.

I'm working on an application in the browser that lets you take notes. I don't want to have the burden to save them on my own server therefore I want to use Github Gists as storage. The challenge is to be able to communicate with the Github API 100% inside the browser.

Since it is a difficult task due to Cross-origin resource sharing limitations and multi-step OAuth process, I decided to share with you a working procedure I found. It involves different communication protocols such as HTTP Redirect, window.postMessage, Ajax post and get and a small PHP proxy using cURL.

Login Phase

Phase 0 - Create an application

Before doing anything, you have to create a Github application. It will provide you the client_id and client_secret as well as an admin to put the redirect URL.

Phase 1 - Get authentication code

Using Github API OAuth guide we learn that we have to redirect the user to a page on github server. After the user authorizes the application, the page is redirected to one of our page with a code.

Since we do not want to leave the current page (it would make all the user changes vanish) we must open the page in another context. The first one I tried was an iframe but github has the X-Frame-Options header set that prevents embedding the page in the iframe.

So the other option was to open a new window. With window.open it was really easy to do so. The only tricky part was to actually give back the result to the main window. After digging, I found the following snippet of code that works well: window.opener.postMessage(message, window.location).

Phase 2 - Get access token

We are back in the main window and have the code. We now need to exchange this code for a token. I really wonder why they didn't give us the token already but well, there must be a reason! In order to get the token, we must send a POST request to a page on github.

However another difficulty comes in, this one page does not have a Access-Control-Allow-Origin header set to our domain. So basically, we cannot access it from the browser using AJAX. Since it's a POST request, we cannot even use JSON-P to bypass it.

I did not want to have a server but I am resigned to write a small PHP proxy that will forward the call. I believe that the main reason why they blocked it was because they ask for the client_secret. They don't want us to write it down in our Javascript in plain sight.

Phase 3 - Enjoy!

Now that we have got our token, we can call all the APIs on Github using post and get AJAX requests and they all work fine. One good thing is the fact that the token is permanent. Unless you change the permissions you request or the user revokes your application, every time the user logs in, he will be associated the same token.

You can safely store the token in the user's browser with localStorage in order to keep them logged when they come back to the application. Just make sure to catch 401 Unauthorized error on requests in case the token is no longer valid and ask the user to log in again.

Demo

And here's the demo! The source code is really small and available on github. If you plan to integrate an in-browser login, it can be used as a starting point.

You might want the link to revoke the access from the dummy application for testing purposes.

Conclusion

At first glance, the login process seemed to be really straightforward, you just had 2 requests to get your code and token and you are good to go. But doing so in the browser revealed itself to be a lot harder. I'm not satisfied with the process as it involves many different technologies but that's the best I could find. If you handled things differently please tell me :)

I recently had the chance to do a 2-hour Javascript evangelism talk at Dassault Systèmes. Unfortunately the presentation has not been recorded. I reused my the presentation I did at EPITA at the beginning and added a second part with a lot of demos. I've written down notes about the second part so you can get an idea.

Developer Tools

  • Web Inspector. It is integrated into Google Chrome and has all the features you would be expecting in an IDE. The console is really powerful as it lets you browse through the Javascript objects. You no longer need to write endless printing functions. You can edit the HTML and CSS without a page reload, it makes designing interfaces a lot more efficient. There is also a full panel dedicated to profiling both Javascript and DOM events.
  • JSFiddle. Web programming is all about interactivity. Not only you with the program (REPL) but also with other people. Everything you do can be one link away. JSFiddle lets you try and experiment things without the need of an IDE and allows you to show it to the world easily.
  • JSHint. Because Javascript, the language, has design issues and is highly dynamic, it is useful to enforce good practices and to set common programming rules when working together. Always in the spirit of the web, you can just copy and paste your code to check it. Note that JSHint can also be integrated in all major text editors and IDEs.

CSS

HTML and CSS were traditionally used to make websites and forms. We can now make completely different things.

  • jmpress.js. Here's an example of how to use 3D in CSS in order to make animated presentation. One important thing to notice is how easy it is to use it. Just include jmpress.js in the page and add data-x="-5000" data-rotate="180" attributes to your HTML. It just works!
  • CSS Panic. In order to show how powerful CSS got those days, here's a game completely written in HTML + CSS. There is 0 lines of Javascript!

Canvas

Canvas is just a rectangle where you can manipulate each pixel's color.

  • RayTracer. Ray Tracer is a common computer science school project. Usually, you write it on your computer and sometimes share the resulting image but you don't really share it because no one want to take the pain of compiling it on their machine. With Javascript, you can just share a link and everyone can test it!
  • Canvas Rider. You can now create games in the browser. There is even a level editor implemented that follows principle of the web: interactivity. You can draw the map and move your character at the same time. When you are done, you are a single click away from saving the map and sharing it to people.
  • pdf.js. The browser is now able to create applications that have always been restricted to native ones. The perfect example is this demo by Mozilla of a pdf renderer written exclusively in Javascript!

SVG

SVG let you manipulate vector graphics such as line, curve, circle ...

  • Cloth Simulation. Javascript implementations have gotten fast enough to do real time constraint solving simulations such as this one. It uses SVG to easily render the graph.
  • Simulated Annealing. It's another school project that gains from being written in the web. This would have probably been written in console mode, using ascii art and generating images as output. The parameters would have been entered in the command line. We can instead exploit HTML to make forms that update in real time, and SVG to render the problem and a graph to display the progress.

WebGL

WebGL is an implementation of OpenGL in the browser. It let us use the graphical card from Javascript.

  • 3D Simulated Annealing. Same as previous demo but this time there is a 3D representation of the progress. It also follows the interactivity rule, you can use your mouse and WASD to explore the scene. This demo uses Web Workers to exploit multiple cores.
  • WebGL Maps. The graphical card is dedicated to manipulate images, therefore you can use it to improve performance on image intensive applications such as Google Maps.
  • Bevelity. Ever wondered if it was possible to write a complex application such as 3DSMax in the browser? Bevelity is an attempt to prove it true.
  • Water Simulation. Another physics simulation demo with always the web plus: you can move the ball :)
  • Hello Racer. This is one of the thousand demos that shows you a beautiful car with glossy reflects ... This one has a unique feature: you can move the car! This must not have taken more than a few dozens of lines and yet has a huge impact!
  • Morph Target. Pixar also find uses for the web. Here's a demo to create facial expressions.
  • Rome. Browsers now embed a video tag. Using it in combination with WebGL, Google made a wonderful 3-minute animation. You can use the mouse to interact with it. It moves the camera, pixelate the video and even make appear various monsters.

Performance

Javascript performance are impressively improving from months to months. It is now possible to write computing intensive programs and make them run at decent speed.

  • JSPerf. If you have a doubt on which browser is faster for a specific feature or when you have two ways to do things, which one is faster, JSPerf is made for you!
  • JSLinux. Typed Arrays introduced for WebGL made possible to write a CPU Virtual Machine able to run a copy of linux in under 7 seconds.
  • repl.it. Emscriptem is a wonderful tool that translates LLVM assembly code into Javascript. It made possible to compile Ruby, Python, Lua and Scheme directly from their sources to Javascript.
  • Broadway. Last but not least, a H.264 video decoder has been compiled to Javascript using Emscriptem. It manages to decode the sample videos at 60 frames per second. This is an exceptional feat for a scripting language!

Conclusion

It is now possible to write the same complex applications we seen in the past in the browser. And it gives one huge added value: interactivity. There's absolutely nothing to install, you just have to give a link! You can combine all the render options such as HTML, CSS, Canvas, SVG and WebGL to make your program.

The next talk I'm going to do is at the JSConf! I hope to see you there!

I am happy to tell you that I am now a Facebook employee!

A bit of history

Two years ago, like many of you, I applied to Google (thanks tsuna). Obviously I didn't get in. I did not even made it to the second interview! After analysis, I screwed up everything!

  • Spoken English is hard without training (I'm French). I struggled explaining simple things such as "What's the difference between Linked Lists and Arrays".
  • I did not have parallelism nor Java courses yet. Therefore the implementation of the classical producer & consumer problem was painful.
  • At the end, I had no questions to ask. It made me look not motivated.
  • I have been asked about my hardest to fix bug. This was the lethal question, I had just no idea what to answer!

Meanwhile

What Would Google Do? Soon after the interview, I read the excellent book What Would Google Do?. It talks about business models from the new internet companies such as Google, Facebook, CraigsList, Wikipedia ... There is one chapter about blogs that was a revelation.

When I applied to Google, the only thing they had on me was a resume with the name of various projects I've been working on. I find excessively hard to judge my skills based on my resume. This is where a blog comes in. A blog lets you show off your skills and interests without constraints from a resume.

Most of the articles fall into one of those three categories:

  • Projects I've worked on using videos, dozen-pages reports ...
  • In-depth explanation of specific techniques (that no one cares about).
  • Fun programming stuff I found.

It gives me the opportunity to show what I am interested in and concrete examples of what I am capable of. If you scroll over the many pages of my blog, you will have a much better vision of who I am than a resume.

Another try

And one more thing: A blog also makes you visible! I have been contacted by a Facebook employee after he saw my post JSPP - Morph C++ into Javascript on Hacker News! (Yeah I know, that's crazy!!!). Since I did not want to fail miserably again, I took some more serious preparation (thanks Xavier). Here is a summary of what made me ace the interviews.

  1. Know the interview process. A typical 45 minutes interview goes like this:
    • Explain a project of your resume (10 minutes).
    • CS Puzzle (25 minutes)
    • Questions (10 minutes)

    I completely failed my Google interview because I had no idea how interviews work. As you can see, half of the interview is not about Computer Science! So you have to prepare for it as-well. Prepare a speech for 2 or 3 projects from your resume that makes you shine for the position you apply for. Make a list of 15-20 questions and you should be good to go.

  2. Cracking the Coding InterviewTrain on CS problems. More than half of the recruitment process is about your Computer Science skills. However the process is flawed: it is mostly focused on solving puzzles. You can be a wonderful programmer that excels at making easy-to-use APIs and wonderful self-documented code but that skills will not be tested.

    In order to train, the book Cracking the Coding Interview has 150 questions. The quality of individual questions and answers is not top notch, but it will give you a good insight of what will be asked. If you are done with it, you can get more on CareerCup.com.

  3. The Google ResumeYour interviewer should want to have a beer with you. This is probably the most helpful advice I have taken from the really good book The Google Resume. Your interviewer is going to be your co-worker right after you get hired, as a consequence, during your interview process, act like if it was a friend instead of it being a faceless institution.

Conclusion

All those adventures made me learn one thing. In order to get your dream job, you not only have to be a good programmer, you also have to learn how to sell yourself and have a good preparation for the extremely codified process that interviews are.

If you want to get a job at the Silicon Valley, I urge you to read the three books I referenced and start a blog right now. It is a long term investment that pays off!

Bonus

This is what I sent to accept the job offer :p

Here is a report of the Ray Tracer written by myself Christopher Chedeau. I've taken the file format and most of the examples from the Ray Tracer of our friends Maxime Mouial and Clément Bœsch. The source is available on Github.

It is powered by Open Source technologies: glMatrix, CodeMirror, CoffeeScript, Twitter Bootstrap, jQuery and Web Workers.

Check out the demo, or click on any of the images.

Objects

Our Ray Tracer supports 4 object types: Plane, Sphere, Cylinder and Cone.

The core idea of the Ray Tracer is to send rays that will be reflected on items. Given a ray (origin and direction), we need to know if it intersect an object on the scene, and if it does, how to get a ray' that will be reflected on the object.

Knowing that, we open up our high school math book and come up with all the following formulas.

Legend: Ray Origin \(O\), Ray Direction \(D\), Intersection Position \(O'\), Intersection Normal \(N\) and Item Radius \(r\).

Intersection Normal
Plane \[t = \frac{O_z}{D_z}\] \[
N = \left\{
\begin{array}{l}
x = 0 \\
y = 0 \\
z = -sign(D_z)
\end{array} \right.
\]
Sphere \[
\begin{array}{l l l}
& t^2 & (O \cdot O) \\
+ & 2t & (O \cdot D) \\
+ & & (O \cdot O) - r^2
\end{array}
= 0\]
\[
N = \left\{
\begin{array}{l}
x = O'_x \\
y = O'_y \\
z = O'_z
\end{array} \right.
\]
Cylinder \[
\begin{array}{l l l}
& t^2 & (D_x D_x + D_y D_y) \\
+ & 2t & (O_x D_x + O_y D_y) \\
+ & & (O_x O_x + O_y O_y - r^2)
\end{array}
= 0\]
\[
N = \left\{
\begin{array}{l}
x = O'_x \\
y = O'_y \\
z = 0
\end{array} \right.
\]
Cone \[
\begin{array}{l l l}
& t^2 & (D_x D_x + D_y D_y - r^2 D_z D_z) \\
+ & 2t & (O_x D_x + O_y D_y - r^2 O_z D_z) \\
+ & & (O_x O_x + O_y O_y - r^2 O_z O_z)
\end{array}
= 0\]
\[
N = \left\{
\begin{array}{l}
x = O'_x \\
y = O'_y \\
z = - O'_z * tan(r^2)
\end{array} \right.
\]

In order to solve the equation \(at^2 + bt + c = 0\), we use
\[\Delta = b^2 - 4ac \]\[
\begin{array}{c c c}
\Delta \geq 0 & t_1 = \frac{-b - \sqrt{\Delta}}{2a} & t_2 = \frac{-b + \sqrt{\Delta}}{2a}
\end{array}
\]

And here is the formula for the reflected ray:

\[
\left\{
\begin{array}{l}
O' = O + tD + \varepsilon D' \\
D' = D - 2 (D \cdot N) * N
\end{array}
\right.
\]

In order to fight numerical precision errors, we are going to move the origin of the reflected point a little bit in the direction of the reflected ray (\(\varepsilon D'\)). It will avoid to falsely detect a collision with the current object.

Coordinates, Groups and Rotations

We want to move and rotate objects. In order to do that, we compute a transformation matrix (and it's inverse) for each object in the scene using the following code:

\[
T = \begin{array}{l}
(Identity * Translate_g * RotateX_g * RotateY_g * RotateZ_g) * \\
(Identity * Translate_i * RotateX_i * RotateY_i * RotateZ_i)
\end{array}
\]\[ I = T^{-1} \]

\[Translate(x, y, z) = \left(\begin{array}{c c c c}
1 & 0 & 0 & x \\
0 & 1 & 0 & y \\
0 & 0 & 1 & z \\
0 & 0 & 0 & 1
\end{array}\right)\]
\[RotateX(\alpha) = \left(\begin{array}{c c c c}
1 & 0 & 0 & 0 \\
0 & cos(\alpha) & -sin(\alpha) & 0 \\
0 & sin(\alpha) & cos(\alpha) & 0 \\
0 & 0 & 0 & 1
\end{array}\right)\]
\[RotateY(\alpha) = \left(\begin{array}{c c c c}
cos(\alpha) & 0 & sin(\alpha) & 0 \\
0 & 1 & 0 & 0 \\
-sin(\alpha) & 0 & cos(\alpha) & 0 \\
0 & 0 & 0 & 1
\end{array}\right)\]
\[RotateZ(\alpha) = \left(\begin{array}{c c c c}
cos(\alpha) & -sin(\alpha) & 0 & 0 \\
sin(\alpha) & cos(\alpha) & 0 & 0 \\
0 & 0 & 1 & 0 \\
0 & 0 & 0 & 1
\end{array}\right)\]

We have written the intersection and normal calculations in the object's coordinate system instead of the world's coordinate system. It makes them easier to write. We use the transformation matrix to do object -> world and the inverse matrix to do world -> object.

\[
\left\{\begin{array}{l}
O_{world} = T * O_{object} \\
D_{world} = (T * D_{object}) - (T * 0_4)
\end{array}\right.
\]
\[
\left\{\begin{array}{l}
O_{object} = I * O_{world} \\
D_{object} = (I * D_{world}) - (I * 0_4)
\end{array}\right.
\]
\[0_4 = \left(\begin{array}{c} 0 \\
0 \\
0 \\
1
\end{array}\right)
\]


Bounding Box

The previous equations give us objects with infinite dimensions (except for the sphere) whereas objects in real life have finite dimensions. To simulate this, it is possible to provide two points that will form a bounding box around the object. On the intersection test, we are going to use the nearest point that is inside the bounding box.

This gives us the ability to do various objects such as mirrors, table surface and legs, light bubbles and even a Pokeball!


Light

An object is composed of an Intensity \(I_o\), a Color \(C_o\) and a Brightness \(B_o\). Each light has a Color \(C_l\) and there is an ambient color \(C_a\). Using all those properties, we can calculate the color of a point using the following formula:

\[
I_o * (C_o + B_o) * \left(C_a + \sum_{l}{(N \cdot D) * C_l}\right)
\]

Only the lights visible from the intersection point are used in the sum. In order to check this, we send a shadow ray from the intersection point to the light and see if it intersects any object.

The following images are examples to demonstrate the lights.


Textures

In order to put a texture on an object, we need to map a point \((x, y, z)\) in the object's coordinate system into a point \((x, y)\) in the texture's coordinate system. For planes, it is straightforward, we just the \(z\) coordinate (which is equal to zero anyway). For spheres, cylinders and cones it is a bit more involved. Here is the formula where \(w\) and \(h\) are the width and height of the texture.

\[
\begin{array}{c c}
\phi = acos(\frac{O'_y}{r}) & \theta = \frac{acos\left(\frac{O'_x}{r * sin(\phi)}\right)}{2\pi}
\end{array}
\]\[
\begin{array}{c c}
x = w * \left\{\begin{array}{l l} \theta & \text{if } O'_x < 0 \\
1 - \theta & \text{else}\end{array}\right. & y = h * \frac{\phi}{\pi}
\end{array}
\]

Once we have the texture coordinates, we can easily create a checkerboard or put a texture. We added options such as scaling and repeat in order to control how the texture is placed.


We also support the alpha mask in order to make a color from a texture transparent.

Progressive Rendering

Ray tracing is a slow technique. At first, I generated pixels line by line, but I found out that the first few lines do not hold much information.

Instead, what we want to do is to have a fast overview of the scene and then improve on the details. In order to do that, during the first iteration we are only generating 1 pixel for a 32x32 square. Then we generate 1 pixel for a 16x16 square and so on ... We generate the top-left pixel and fill all the unknown pixels with it.

In order not to regenerate pixels we already seen, I came up with a condition to know if a pixel has already been generated. \(size\) is the current square size (32, 16, ...).

\[\left\{\begin{array}{l}
x \equiv 0 \pmod{size * 2}\\
y \equiv 0 \pmod{size * 2}
\end{array}\right.
\]

Supersampling

Aliasing is a problem with Ray Tracing and we solve this issue using supersampling. Basically, we send more than one ray for each pixel. We have to chose representative points from a square. There are multiple strategies: in the middle, in a grid or random. Check the result of various combinations in the following image:

Perlin Noise

We can generate random textures using Perlin Noise. We can control several parameters such as \(octaves\), the number of basic noise, the initial scale \(f\) and the factor of contribution \(p\) of the high frequency noises.

\[ noise(x, y, z) = \sum_{i = 0}^{octaves}{p^i * PerlinNoise(\frac{2^i}{f}x, \frac{2^i}{f}y, \frac{2^i}{f}z)} \]


\[noise\] \[noise * 20 - \lfloor noise * 20 \rfloor\] \[\frac{cos(noise) + 1}{2}\]

As seen in the example, we can apply additional functions after the noise has been generated to make interesting effects.

Portal

Last but not least, Portals from the self-titled game. They are easy to reproduce in a Ray Tracer and yet, I haven't seen any done.

If a ray enters portal A, it will go out from portal B. It is trivial to implement it, it is just a coordinates system transformation. Like we did for world and object transformation, we do it between A and B using their transformation matrix.

\[
\left\{\begin{array}{l}
O_{a}' = T * O_{b} \\
D_{a}' = (T * D_{b}) - (T * 0_4)
\end{array}\right.
\]
\[
\left\{\begin{array}{l}
O_{b}' = T * O_{a} \\
D_{b}' = (T * D_{a}) - (T * 0_4)
\end{array}\right.
\]

Scene Editor

In order to create scenes more easily, we have defined a scene description language. We developed a basic CodeMirror syntax highlighting script. Just enter write your scene down and press Ray Trace :)

I've been working on code that works on Browser, Web Workers and NodeJS. In order to export my module, I've been writing ugly code like this one:

(function () {
  /* ... Code that defines MyModule ... */
 
  var all;
  if (typeof self !== 'undefined') {
    all = self; // Web Worker
  } else if (typeof window !== 'undefined') {
    all = window; // Browser
  } else if (typeof global !== 'undefined') {
    all = global; // NodeJS
  }
  all.MyModule = MyModule;
 
  if (typeof module !== 'undefined') {
    module.exports = MyModule;
  }
})();

One-line Solution

Guillaume Marty showed me that sink.js uses this as a replacement for self, window and global. I managed to add support for module.exports in a one-liner!

(function (global) {
  /* ... Code that defines MyModule ... */
 
  global.MyModule = (global.module || {}).exports = MyModule;
})(this);

I have been looking for this magic line for a long time, I hope it will be useful to you too :)

On MMO-Champion, we often paste World of Warcraft patch notes taken from Blizzard. The main problem is that it's plain text. We want to be able to add links to all the spells, quests, zones ... This way people can mouseover and see the description. It helps figuring out what changed.

We create a Trie that contains item/spell/... names as key and url as value. For each letter of the text, we search the longest string in the trie that matches this part of the text. If found, we link it and move right after the end of the name, else we advance by one character.

Specialized Rules

The algorithm above works well but there are many little problems that arise. In order to solve them, we apply several specialized rules.

  • There are names that have more than one link. We proritize the source (Ability > Item > Quest > ...) and sort them by descending id.
  • All the interesting names start by a capital letter. This removes a lot of noise but keeps the first word of sentences.
  • Stamina, Gladiator, Buff, Stat. There are many common words that are spells, we have a blacklist to remove them.
  • [Heal]ing. If the name found ends in the middle of a word, we discard it.
  • [Cinderweb Spiderling]s. But there's an exception, if there is only an s after.
  • [Fireball] Barrage. If the next word is capitalized, it means the name is wrong.
  • [Sanctuary] of Malorne. We also discard if the next word is of.

Example

Druid

  • Druids now gain 1 attack power per point of Strength, down from 2. They continue to gain 2 attack power per point of Agility while in Cat Form or Bear Form. In addition, Cat Form's scaling rate from gear upgrades was slower than other classes, which was causing them to fall behind in damage with higher item levels. To counter the Strength change and improve scaling, the following changes have been made. All numbers cited are for level-85 druids.
  • Ferocious Bite damage has been increased by 15%. In addition, its base cost has been reduced to 25 energy and it can use up to 25 energy, for up to a 100% damage increase.
  • Mangle (Cat) damage at level 80 and above has been increased to 540% weapon damage, up from 460%, and bonus damage has been lowered to 302.
  • Rake initial damage on hit now deals the same damage as each periodic tick (and is treated the same for all combat calculations). Periodic damage now gains 14.7% of attack power per tick, up from 12.6%, and base damage per tick has been lowered from 557 to 56. There is a known issue with Rake's tooltip being incorrect from this change will be corrected in a future patch.
  • Ravage damage at level 80 and above has been increased to 950% weapon damage, up from 850%, and bonus damage has been lowered to 532.
  • Savage Roar now grants 80% increased damage to melee auto attacks, up from 50%. The Glyph of Savage Roar remains an unchanged bonus of 5% to that total.
  • Shred damage at level 80 and above has been increased to 540% weapon damage, up from 450%, and bonus damage has been lowered to 302.
  • Entangling Roots and the equivalent spell triggered by Nature's Grasp no longer deal damage.
  • Innervate now grants an ally target 5% of his or her maximum mana over 10 seconds, but still grants 20% of the druid's maximum mana over 10 seconds when self-cast.
  • Omen of Clarity clearcasting buff from now lasts 15 seconds, up from 8 seconds.
  • Starfire damage has been increased by approximately 23%.
  • Swipe (Cat) now deals 600% weapon damage at level 80 or higher, down from 670%.
  • Wrath damage has been increased by approximately 23%.

Rest of the example ...

Conclusion

It takes around a minute to generate the trie, which needs to be done once per big patch. Then it takes less than a second to process a full patch note, automatically adding around 700 links.

The script does not generate a perfect output and needs to be reviewed by a human. However, it takes an order of magnitude less time to improve the generated result than doing it from scratch.

For a school project, I had to make a part of a spell-check program. Given a dictionnary of words, you have to determine all the words that are within K mistakes of the original word.

Trie

As input, we've got a list of words along with their frequency. For example, with the following list, we are going to build a trie.

do     100 000
dont    15 000
done     5 000
donald     400

In order to minimize the memory footprint, I've made a node structure that fits into 32 bits.

struct {
	unsigned short is_link : 1;
	unsigned short is_last_child : 1;
	union content {
		struct {
			unsigned short letter : 6; // 2^6  = 64 different charaters
			unsigned int next : 24;    // 2^24 = 16 777 216 nodes
		} link;
		struct {
			unsigned short is_overflow : 1;
			unsigned int freq : 29;    // 2^29 = 536 870 912
		} final;
	}
} node;

The frequence can be greater than 2^30. We're going to store values between 0 and 2^29 directly inside the node, and if it doesn't fit, we are going to retrieve the value in a separate memory location and store its corresponding id.

Damerau Levenshtein Distance

The distance between two words we use is Damerau Levenshtein. In order to calculate the distance, we compute the following table.

Where each slot D(i,j) is calculated using the following formula:

There are two things to tweak in order to use this distance with a trie.

Incremental computation

Obviously, we are not going to recompute the whole distance for each node. We can use one table for the whole search. For each node you explore, you are going to compute only the line corresponding to it's depth in the tree. For example, if you look for elephant and are currently at rel in the tree, you are only going to compute the 3rd row. The first two rows r and e are correct.

When to stop?

Now we need to find given a current table, when to stop. For example, if we search for elephant with 1 error and we are at zz, we can stop, there won't be any word that satisfy the request. I've found out that if the minimum value of the current row is bigger than the number of tolerated errors, we can stop the search. When searching for elephant with 1 error, we stop at the 4th row (rele) because the minimum is 2.

Heuristics

When we traverse the trie in a fuzzy way, we explore a lot of useless nodes. For example, when we fuzzy search for eve, we are going to explore the beginning of the word evening. Let's see some strategies to reduce the amount of nodes we explore.

Different trie per word length

Fuzzy search for the 5-letter word like happy with 1 error, you know that your results are going to be words of length 4, 5 or 6. In a generalized way, you are only going to look for words in the range [len - error; len + error]. You can create a trie for each length. Then you search in all the required tries and aggregate the results.

The downside with this approach is that you lose the high prefix compression of the trie.

Word length

Instead of making on trie per word length, you can encode the word lengths directly in the nodes. Each node is going to have a bitfield containing the lengths of the words in the sub-tree. For example in the example above, the first node leads to words of size 2, 4 and 6, therefore the field will have 0b010101....

If you are looking for a word of size 5 with 1 error, you are going to create a bitfield of word lengths you are interested in 0b000111.... For each node, you and both bitfields, if the result is not 0, then you've got a potential match.

Conclusion

In order to test for performance we search every word at a distance 2 of the 400 most popular english words. The dictionnary contains 3 millions words. It takes 22 seconds on my old 2Ghz server to run it. It takes an average of 55ms per fuzzy search and generates an average of 247 results.

Felix Abecassis, a friend of mine spent time working on parallelizing the search with Intel TBB. You might want to read it in order to improve by a factor your performance :)

Since it's a competitive school project, I'm not going to give the source publicly. Please ask them if you are interested :)