I've been playing Trackmania, a racing game, recently and they introduced a new concept called Cup of the Day. Every day a brand new map is released, for 15 minutes everyone is trying to get the fastest time and based on that time are put into groups of 64 players. Then for 23 rounds people play and the slowest ones are eliminated until there's only 1 winner left. This is super fun to play!

Each map has 4 times associated: "Author Time", "Gold Medal", "Silver Medal", "Bronze Medal". Not all those times are of equivalent difficulty for all the maps. I've been trying to get gold medals on all the tracks and for some it takes me a few minutes compared to hours for some others. So I've been trying to figure out a way to get a sense of how difficult it is to get.

Check out the results!

Getting the data

Fortunately, there's an in-game leaderboard that tells you the times everyone made. So my plan was to scrape this leaderboard, figure out how many people got which medal and hope to get a sense of how hard the map is.

Trackmania Leaderboard (I'm not GranaDy)

Fortunately, the website trackmania.io has all the information I needed. It has the medal times and the actual leaderboard.

Trackmania.io leaderboard page

Not only that but the way the website is written is a single page app using Vue that queries the data from a server endpoint using JSON. So this makes retrieving the information even more straightforward, no need to parse HTML.

Example of JSON for the leaderboard information

At this point, what I need is to figure out how to get the number of people that got each medal. The traditional way to do a leaderboard is to have the endpoint return a fixed number of results each time and have a pagination system. So I would do a binary search in order to find where the medal boundary lie.

But, it turns out that this was even easier, the pagination API is doing the limits based on a specific time. So the algorithm was to query the map medal times, and for each of them query the leaderboard and take the position of the first result to know how many people had the previous medal!

For example, Gold time is 1:03.000. I query the leaderboard starting at 1:03, the first person will have 1:03.012 and be at position 2310. They won't have the gold medal, only silver. But 2309 people will have either the author time or gold medal.

Coding the scraper

I decided to go with a nodejs script this time around. But you can use whatever language you want, I've been doing a lot of scraping in PHP in the past.

The first thing you want to do is to add a caching layer so you don't spam the server and get you banned, but also make it much quicker to iterate as the next times it'll be instant. Here's a quick & dirty way to build caching:

async function fetchCachedJSON(url) {
  const key = url.replace(/[^a-zA-Z0-9]/g, '-').replace(/[-]+/g, '-');
  const cachePath = `cache/${key}.json`;
  if (fs.existsSync(cachePath)) {
    return JSON.parse(fs.readFileSync(cachePath));
  }
  const json = await fetchJSON(url);
  fs.writeFileSync(cachePath, JSON.stringify(json));
  return json;
}

And this is what my cache/ folder looks like after it's been running for a while.

cache/ folder for the project

The great aspect about this is that all those are single files that can be looked at manually and edited if needs be. If you someone messed up or got banned, you can delete the specific files and retry later.

If you look at the code, you'll notice that I didn't use the fetch API directly but instead used a fetchJSON function. The reason for this is that you'll most likely want to do some special things.

You probably will need some sort of custom headers for authentication or mime type. It's also a good place to add a sleep so you don't spam the server too heavily and get banned.

async function fetchJSON(url) {
  const response = await fetch(url, {
    headers: {
      'Authorization': 'nadeo_v1 t=' + accessToken,
      'Accept': 'application/json',
      'Content-Type': 'application/json',
      'User-Agent': 'vjeux-totd-medal-ranks',
    }
  });
  const json = await response.json();
  await sleep(5000);
  return json;
}
function sleep(ms) {
  return new Promise(resolve => setTimeout(resolve, ms));
}

After this, the logic was pretty straightforward, where I would just do the algorithm I described at the beginning. In order to write it, the usual way is to first implement the deepest part and test it standalone (fetching a single time) then wrap it for a map, then for a month, then for all the months.

async function fetchRankFromTime(trackID, time) {
  const json = await fetchCachedJSON('https://trackmania.io/api/leaderboard/map/' + trackID + '?from=' + time);
  return json.tops[0].position - 1;
}
async function fetchRanks(trackID) {
  const map = await fetchCachedJSON('https://trackmania.io/api/map/' + trackID);
  const rankAT = await fetchRankFromTime(trackID, map.authorScore);
  const rankGold = await fetchRankFromTime(trackID, map.goldScore);
  const rankSilver = await fetchRankFromTime(trackID, map.silverScore);
  const rankBronze = await fetchRankFromTime(trackID, map.bronzeScore);
  return [map.authorplayer.name, rankAT, rankGold, rankSilver, rankBronze];
}
async function fetchTOTDMonth(month) {
  const json = await fetchCachedJSON('https://trackmania.io/api/totd/' + month);
  const days = [];  
  for (jsonDay of json.days) {
    [authorName, rankAT, rankGold, rankSilver, rankBronze] = await fetchRanks(jsonDay.map.mapUid);
    days.push({day: jsonDay.monthday, month: json.month, year: json.year, authorName, rankAT, rankGold, rankSilver, rankBronze});
  }
  return days;
}
async function fetchAll() {
  const days = [];
  for (month of [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]) {
    days.push(...await fetchTOTDMonth(month));
  }
  console.log('data =', JSON.stringify(days.sort((a, b) => b.rankAT - a.rankAT)));
}

There are two distinct parts of the project. The first part is the data collection, described above. The second is how do you display all this data. I like to keep both separate.

In this case, the end result of the collection is a standalone JSON document where I prepend data =, that I can then include as a <script>. This way in my front-end, also written in Vue for this project, I can use that global data variable to access it.

Displaying the data

I didn't really know how to display the data. We have the number of people that have all the medals before. I started displaying an horizontal bar for each where 1px = 1 position. It worked pretty well but it was way too large.

The Trackmania API stops giving any kind of precision past 10,000 so I used CSS and made all the numbers where 100% is 10,000 and it gave this results which worked well!

Since most tracks are finished by around 8k players it turns out to be working really well in practice.

Now that we have that for every single map, we can start having fun and sort this data in many ways.

Sorting by number of people that got author time gives us what I was looking when going into this project. We can find the easiest maps:

Easiest maps to get Author Time

As well as the hardest maps.

Hardest maps to get author Time

I've implemented various ways to sort such as date released, medal times and map author name. In doing so, I found out that if 10k people got a medal then the sort is going to give different orderings every time you sort them.

A quick and dirty fix is to sort the data by all the previous pivots so that it always give a stable list. It is wasteful but easy to implement by copy and pasting and the dataset is small enough that it doesn't matter much in practice.

sortAuthor: function() {
  this.days.sort((a, b) =&gt;
    (b.year * 10000 + b.month * 100 + b.day) -
    (a.year * 10000 + a.month * 100 + a.day));
  this.days.sort((a, b) =&gt; b.rankAT - a.rankAT);
  this.days.sort((a, b) =&gt; b.rankGold - a.rankGold);
  this.days.sort((a, b) =&gt; b.rankSilver - a.rankSilver);
  this.days.sort((a, b) =&gt; b.rankBronze - a.rankBronze);
  this.days.sort((a, b) =&gt; a.authorName.localeCompare(b.authorName));

Conclusion

This was a fun project and I'm happy that I was able to figure out how hard a map was in practice. I'd like to give big props to Miss who built all the Trackmania APIs I used during this project.

I'm now [in July 2018] in a group full of compiler engineers at Facebook and learning a lot. Yesterday, I read a post by David Detlefs (summarizing a collaborative idea involving several members of his team) about how to efficiently encode strings for concatenation and since it's very clever I figured I would share it.

Problem

A lot of programs are taking a string as input and building a string as output. You can imagine the following code. Note: I'm going to use JavaScript as an example but it applies to almost all the languages out there.

var str = '\n';
for (var elem of elems) {
  str += ' * ';
  if (elem.isExpired) {
    str += '[expired] ';
  }
  str += elem.name + '\n';
}

An example output might be

'
 * Nutella
 * Eggs
 * [expired] Milk
'

What is being executed is

'\n' + ' * ' + 'Nutella' + '\n' + ' * ' + 'Eggs' + '\n' + ' * ' + '[expired] ' + 'Milk' + '\n'

If you implement this naively, the execution would look something like:

'\n' + ' * ' = '\n * '
'\n * ' + 'Nutella' = '\n * Nutella'
'\n * Nutella' + '\n' = '\n * Nutella\n'
'\n * Nutella\n' + ' * ' = '\n * Nutella\n * '
'\n * Nutella\n * ' + 'Eggs' = '\n * Nutella\n * Eggs'
'\n * Nutella\n * Eggs' + '\n' = '\n * Nutella\n * Eggs\n'
'\n * Nutella\n * Eggs\n' + ' * ' = '\n * Nutella\n * Eggs\n * '
'\n * Nutella\n * Eggs\n * ' + '[expired] ' = '\n * Nutella\n * Eggs\n * [expired] '
'\n * Nutella\n * Eggs\n * [expired] ' + 'Milk' = '\n * Nutella\n * Eggs\n * [expired] Milk'
'\n * Nutella\n * Eggs\n * [expired] Milk' + '\n' = '\n * Nutella\n * Eggs\n * [expired] Milk\n'

Because strings are immutable, we need to do a full copy of the string for every small concatenation. In practice this turn a O(n) algorithm into O(n²).

Solution 1: Change the code

If this becomes a bottleneck, instead of using a string all the way through, you can use an array and push all the string pieces to it. Once you are done building the result, you can join all the pieces together into the final string. Since at this point you know all the strings the operation can sum all the sizes and allocate exactly the right size.

var buffer = ['\n'];
for (var elem of elems) {
  buffer.push(' * ');
  if (elem.isExpired) {
    buffer.push('[expired] ');
  }
  buffer.push(elem.name, '\n');
}
str = buffer.join('')

This pattern works to solve the problem but requires the programmer to know about it and the performance to be bad enough that it is worth writing code in a different way. In practice, a lot of code is not written that way and it's unclear that any amount of education will change this fact.

Note that a compiler to a bytecode format could, in many cases, make the transformation of the original code to the explicit StringBuffer code. But not in all cases, since compilers have to be conservative: if the string being concatentated is passed as an argument, all bets are off.

Broken Solution 2: Mutate the original string

The solution that comes to mind is: can normal strings act as a buffer?

The idea is that you allocate a buffer of characters and whenever you do a concatenation, you keep writing at the end of the buffer. If it isn't big enough, you allocate a bigger one, do a single copy and keep going.

var str = '\n';   // size = 1, ['\n', _, _, _, _, _, _, _]
str += ' * ';     // size = 4, ['\n', ' ', '*', ' ', _, _, _, _]
 
// The next one doesn't fit so we need to alloc a new buffer and do a full copy
str += 'Nutella'; // size = 11, ['\n', ' ', '*', ' ', 'N', 'u', 't', 'e', 'l', 'l', 'a', _, _, _, _, _]
str += '\n';      // size = 12, ['\n', ' ', '*', ' ', 'N', 'u', 't', 'e', 'l', 'l', 'a', '\n', _, _, _, _]

If you are curious, this is how the Java StringBuilder class is implemented. Performance-wise, this is what we want, but there's one problem...

Aliasing

You can assign the string to a variable and assign another variable with that variable. For example:

var str = '\n';
var str2 = str; // here we make an alias

In this case, both str and str2 are pointing to the same '\n' string. In the compiler literature this is called aliasing. The big question is what happens if you try to update one of the variable:

str += ' * ';

If you look at the JavaScript specification, strings are immutable meaning that you expect str2 to be unchanged but str to be:

str2 == '\n'
str  == '\n * '

Unfortunately, if you mutate the string like in the above solution, then both of them would be '\n * because they both point to the same underlying storage.

Solution 3: Linear Types

If you've not been living under a rock, you probably have heard about Rust and linear types. This is a fancy name to say that you cannot have aliasing: there's only a single variable that can point to a value at all time.

What this means in this case is that the line var str2 = str; would be illegal. If you want to do that, you need to do a full copy of the value so it's effectively a different one.

In practice, aliasing happens all the time in normal programs, for example calling a function with a string as argument is a form of aliasing. We wouldn't want to do full copies every time aliasing is happening.

Rust is getting away with it using a concept calling "borrowing" where you can create an alias if the compiler can guarantee that the previous variable cannot be accessed during the lifetime that the alias exists.

In my understanding, you need a strong type system in order to properly enforce those guarantees and in dynamic languages like JavaScript you would have to be too pessimistic and do way more copies than necessary when you are just passing the variable around, ruining the wins you get from building the string in the first place.

Solution 4: Size in the variable

Aliasing is usually a dealbreaker because you can mutate the underlying storage that another variable could observe. But in this particular case we can exploit the fact that the only mutation we care about is appending something at the end.

So, in the variable we not only keep a pointer to the buffer but also the size we care about. If someone else appends something at the end, it will not affect us because what's in the buffer for that size didn't change.

var str = '\n';
// buffer1, size = 1, ['\n', _, _, _, _, _, _, _]
// str : size = 1, buffer = buffer1
 
var str2 = str;
// str2: size = 1, buffer = buffer1
 
str += ' * ';
// buffer1, size = 4, ['\n', ' ', '*', ' ', _, _, _, _]
// str : size = 4, buffer = buffer1

At this point, str2 points to buffer1 with '\n * ' but because it has size = 1 then we know it really is '\n' as intended.

The only edge case to consider is if you are trying to also concatenate str2. If the size of the variable is not equal to the size of the underlying buffer, this means that someone else clobbered the buffer. In this case, our only option is to do a full copy.

str2 += '|';
// buffer2, size = 8, ['\n', '|', _, _, _, _, _, _]
// str2: size = 2, buffer = buffer2

Conclusion

Before joining the team, I knew about the string builder pattern but I had no idea that there was so much theory behind this particular problem like aliasing, linear types... I hope that explaining those concepts in terms of JavaScript is helpful to get some insights into what's happening inside of compilers.

During the past few weeks, I've been working on prettier, which is a JavaScript pretty printer. We are approaching the phase where we can actually use it so this is a good time to explain how it works.

We're going to go through an example

if (!pretty) { makePretty() }

String -> AST

The first step is to take this string that represents some JavaScript and to parse it in order to get an AST out of it. An AST is a tree that represents the program. Using either Babylon or Flow we can parse this example and we get the following tree.

Program
  IfStatement
    UnaryExpression(!)
      Identifier(pretty)
    BlockStatement({})
      ExpressionStatement
        CallExpression
          Identifier(makePretty)

You can explore the full AST using astexplorer.net.

AST -> IR

Now that we have this tree, we want to print it. For each type of node like IfStatement, UnaryExpression... we're going to output something. In the case of prettier, this something is an intermediate representation called a document as described by the paper a prettier printer by Philip Wadler.

[
  group([
    "if (",
    group([ indent(2, [ softline, "!", "pretty" ]), softline ]),
    ")",
    " ",
    "{",
    indent(2, [ hardline, "makePretty", "()", ";" ]),
    hardline,
    "}"
  ]),
  hardline
];

You can play around with this representation on the prettier explorer.

IR -> String

The interesting thing about this representation is that it is the same no matter what the line-length is. The basic idea is that the primitives such as group, indent, softline encode the way they should look if they fit in the line or if they don't.

The most important primitive is group. The algorithm will first try to recursively print a group on a single line. If it doesn't fit the desired width, then it's going to break the outer group and keep going.

Then, we have primitives that behave differently if they are in a group that fits a single line or not: softline that does not print anything if the group it is contained in fits and a line otherwise. indent adds a level of indentation if it doesn't fit. If you are curious, you can look at the short list of available commands.

So, we just need to take this IR, send it through a solver along with the desired line width and we get the result!

HN6WzI9wFW

Conclusion

Hopefully this gives you a better idea of how a pretty printer that takes into account the desired width work.

Yesterday, there was a big discussion on Twitter on how hard it is to start hacking on a js project. One comment by Dan Abramov struck me in particular: "Right: don’t use tools, face a problem, choose a tool or roll your own. Wrong: learn tools that don’t solve your problems, hate the tools."

This is spot on. All the solutions presented in this thread do not solve the problems I have when I'm hacking on a new project.

My dream PHP Setup

Back when I was 14, I had the best setup I've ever used in my life. Here it is:

  • Launch WinSCP and connect to fooo.fr.
  • Create a file test.php, write <?php echo 'Hello World'; locally with EditPlus 2, press cmd+s to save, wait 0.5 second for the upload.
  • Open my browser and go to http://fooo.fr/~vjeux/. Click on test.php.
  • ?????
  • Profit

Challenge

I want to get the same attributes but with JavaScript and React. Here are the constraints:

  • No setup: I'm happy to have to setup something initially (dedicated server, apache, php...) but nothing should be required to create a new project. No npm install, react-native init, creating a github project, yo webapp...
  • One file: I want to write a single js file to start with. No package.json, no .babelrc, no Procfile...
  • Sharable: I want to be able to share it with a url without any extra step. No git push heroku master or git push gh-pages.
  • Keeps working: Once online, it should stay there and keep working 6 months later. No npm start to run it, no special port that's going to conflict with my 10 other prototypes...
  • Not generic: I don't care about it being generic, I will use whatever transforms you decided. Happy to write js without semi-colons and using SASS for my CSS if you checked all the boxes above.
  • Not prod-ready: This setup doesn't have to be prod-ready, support unit testing or anything that involves it being a real product. This is meant for hacking on stuff. When the project becomes good, I'll spend the time to add all the proper boilerplate.

So, that's the challenge. Can you make it happen?

One common pattern when implementing user interface optimizations is to compute some value for a node where the computation involves looking at neighbor nodes and want to keep this value updated when the tree is mutated.

On this article, I'm going to explain the pattern we implement to solve this use case on various places in React Native.

Example: background color propagation

On React Native, we implement an optimization where the background color is propagated from the parent instead of being transparent. This provides a guarantee to the GPU that it won't need to paint pixels that are underneath. You can read this release notes for a more complete explanation.

In any case, the algorithm is pretty simple, if the background color is not set on an element, we take the one from the nearest parent that has one set. Here's an example:

Now, let say that the red node background color is being unset, the example would look like:

In order to implement this behavior, we first need to traverse up the hierarchy and find the background color and then down to propagate it, but stop at nodes that have set background colors. This means that we have to implement two different algorithms: one for the initial rendering and one for the update.

The complexity of this example as explained is small enough that it is easy to maintain the same invariants. But in practice, this algorithm is a bit more complex: we don't forward the color for transparent nodes nor for the image component in certain cases... We also experimented with more conditions that we didn't end up using.

Dirty-up and execute top-down

What would be great is to just write the top-down recursive algorithm you learn in school to apply colors however you want and whenever there's a mutation just re-run it on the entire tree. The problem with that approach is that you're going to spend a lot of CPU time running this algorithm on the entire tree when you only need to update a few nodes.

Instead, you can dirty the node you are mutating and all the nodes up to the root.

Then, run your algorithm top-down starting at the root.

You need to implement a way to figure out if a non-dirty node needs to be recomputed. The strategy we use is to cache all the arguments of getColor, for example (parentColor, nodeColor, isNodeText, ...) along with the result. If we're being called with the same arguments and the node is not dirty, then we don't need to go further and can just bail. The pseudo code looks like this:

function getColor(isDirty, prevArgs, prevValue, nextArgs) {
  if (isDirty || prevArgs !== nextArgs) {
    return realGetColorRecursive(...nextArgs);
  }
  return prevValue;  
}

Conclusion

The benefits of this technique is that you can write your business logic once in a traditional top-down recursive way. You don't need to care about the much harder update case.

The downside is that you are doing more work: instead of touching only the elements that have changed, you need to traverse everything up from the root.

Another downside is that it requires more infrastructure as you need to add a caching layer and refactor your code to add a pass that starts at the top.

We're making use of this technique for a few aspects of React Native:

  • Background color is propagation as explained in this article.
  • Computing the layout (top, left, width, height) values based on the css attributes (eg marginTop, flex: 1...).
  • Flag if a text node is the root of the tree of text nodes to be able to flatten it into a single native view.
  • Flag if a node only affect layout and doesn't paint anything so that it can safely be removed from the UI tree.