Vjeux

Oct

Birth of Prettier

vjeux Javascript Add you comment

React Conf is around the corner and it's been almost 10 years since Prettier was released. I figured it would be a good time to recount the journey from its early days to now.

This is the story of how the "Space vs Tabs Holy War" ended, not through one side winning over the other but instead a technological invention making it the underlying source of tensions no longer being a thing.

Back Story

School time

In order to understand my state of mind when I worked on Prettier, we need to go back in time when I did my college at EPITA, a French school dedicated to Computer Science.

When you start, you are given a PDF with the list of formatting rules you need to obey when turning your coding assignments. Not following them is extremely punishing where for each failure you are losing 2 points out of a grade over 20. So 10 formatting issues and you're getting a 0!

This taught people the hard way that they need to train themselves to format their code properly. As you can expect, I tried to code my way out of this and implement such a linter with fixers myself so that I wouldn't have to deal with it.

But it turns out that writing what is commonly called a linter + formatter for the C programming language was way out of my depth for a first year student. So I ended up learning these formatting rules but it engraved in the back of my mind the need for a way to solve this problem automatically.

Facebook time

A few years later, I joined Facebook and my first interaction with an awesome engineer there included a bunch of formatting style changes. Turns out the guidelines I learned at school was not the same ones that Facebook was using.

It felt like a pretty bad setup for everyone involved. Yuan Tian had to spend her valuable time going through all my code and pointing out all the formatting issues (eslint wasn't yet created!). I then had to go back and fix everything.

At first I thought this was pointless but then a few months later a new person joined the team and I found myself in the same situation in reverse because having a different style convention really did make the code harder to read and understand!

Potential Solutions

One thing I've learned is that if there's a big problem that a lot of people faced, many people must have tried to solve it. So I passively investigated over the years what were all these approaches and why they didn't succeed.

Lint fixers

The way most people approached this problem is by figuring out what rule you are breaking (eg: shouldn't be a space before a parenthesis) and writing custom logic to fix it.

This worked well for some rules, like spaces around operators, but didn't for the ones that revolve around line length. It also wasn't guaranteed that the operations converged so you could have rules fixing one and breaking another.

gofmt

When go was released, they published a formatter along with the programming language which they used for all their code and examples. As a result, it became the way most people wrote go.

For the first time it felt like there was a light at the end of the tunnel. But even though it was released in 2009, it didn't seem like any other programming language got such a similar treatment 8 years later in 2017 when Prettier was released.

Maybe they could only pull it off because they started with it...

Dartfmt

In 2015, dartfmt made the rounds with a blog post titled "The Hardest Program I’ve Ever Written". This was the first time a proper formatter got traction on an existing popular language.

This gave me hope. But reading through the post, I also disagreed with the way the problem was approached. The idea is to create a scoring function for the "prettyness" of a given AST output and find the output that maximizes this function.

While it makes sense from a coding perspective, this feels terrible from a human perspective as there's no clear explanation as to why a specific code is printed the way it is and there's likely going to be big swings that can happen with small variations of input if two very far away branches have a similar score.

It's also not immediately clear to me how to build that scoring function. And as the article says, it's very hard to optimize as you may need to explore a huge amount of possible states.

All the JavaScript formatters

There's been a long list of people building JavaScript formatters since gofmt came out but none of them got any strong adoption. I was very curious to understand why so I reached out to their authors.

The good thing I learned is that I could just send cold emails to these authors. They spent months of their life working on this so they were super happy when somebody reach out to geek out on the subject.

After talking to many people, I came to the conclusion that we are facing a very different flavor of project. Most programming projects in the wild follow a Pareto curve where you can build 80% of the project in 20% of the time, ship and then iterate to improve on the last 20%.

But the problem with formatters is that you can't ship if it's doing the right thing 80% of the time. This would mean that every 5 lines it format things in a weird way. People are very sensitive to the way their code is written so this won't fly.

Most of the projects failed not because the approach wasn't sound but because the authors were not willing to commit to build the 99.999% before the project could be viable.

Winter Break

In December of 2016, two people I know, James Long who evangelized React thanks to his amazing blog posts and did great work on the Firefox devtools and Pieter Vanderwerff who built a lot of the JavaScript infrastructure at Facebook both started working on a JavaScript pretty printer.

Testing

One of the most annoying part of working on such project is to have a good test setup. The usual way to set it up was to write manual tests for each examples. You put the input as a string within a test file, you need to make sure you escape it properly, run the printer and paste the output.

Unfortunately it's super annoying to have to do this for many tests and if you change the formatter output it may "break" a lot of tests and you need to manually repeat this process. This is a huge motivation killer.

Thankfully a few months earlier, Jest introduced "snapshot testing" support which was the perfect fit for this problem. With that, it was possible to just write js files in the test folder and all the "output" would be auto-generated.

When you run all the test suite again, you see the differences in formatting and you can figure out whether it's what you intended or not. With one command you can update all the output if it is.

Not only did it help with the person coding but it was invaluable for the reviewer. The pull request now shows all the differences that this change would do. This let us welcome contributions to the formatter as we could see all the changes. If nothing looks off, we can just merge it without fear.

Friendly Competition

I got super excited but I still didn't actually know how to build a pretty printer, so I instead took on the role of a "cheerleader". I've learned that motivation is a key aspect of having an ambitious project go through.

I setup the testing infra and created a leaderboard for how many AST nodes both their implementation supported, updating it every day and getting both to compete on who is going to support all of them first 🙂

I also kept feeding real world code to both their implementation and categorize + prioritizing all the weird looking code and giving it to them to fix it.

This was super fun seeing so much progress done so quickly! This is for these kind of moments that I love being in tech.

End of the break

Speed of execution wasn't just a nice to have but a necessity for this project. As I mentioned, the shape of the project requires the huge long tail to be implemented before it would be usable. My goal was to get as much progress done as possible while both of them were able to dedicate time to it.

When the break was over, both of them decided to get back to their real job and to shelve this side project. At this point, the 80% was done and working, I could see the problem that has been bothering me for years being solved...

So I decided to drop everything I had on my plate at work and focus full time on this. As you can expect, my manager wasn't super happy about it but this felt like a now or never kind of situation, I used all my acquired political capital at the company to make that bet.

Prettier

First order of business was to make a choice on which of the two projects to build on-top of. In reality this was an easy decision. Pieter's project was built in Reason, a dialect of OCaml. While I've written my fair share of OCaml at school, this was still not a language I was particularly comfortable with and it would drastically reduce the ability for people in the JavaScript ecosystem to contribute. So I went with James Long's project which was written in JavaScript.

Open Source

James Long did all the work to open source and used his incredible writing abilities to publish a blog post called "A Prettier JavaScript Formatter" which got people really excited about the idea of the project and kick started its growth.

I ended up contracting James Long for him to work on it for a few more weeks but my inexperience going through the contracting process at Facebook made it a bad experience for him. On the bright side, this led me to discover Open Collective which is amazing at handling money exchanges for open source projects.

Algorithm

At this point, I still had no idea how the printer actually worked. I know it was based on the paper called "A prettier printer" from Philip Wadler but even today this paper reads like gibberish to me as it is written in Haskel and full of scientific verbiage.

The good news is that it's actually extremely elegant and powerful. There are just two concepts: a list of "instructions" with string literal, group, indent, softline, hardline... and an algorithm that takes these instructions along with a line length which outputs the final string. I wrote a small blog post that explains how it works if you are interested.

The mind blowing thing to me is that we've been able to print all of JavaScript, CSS, HTML, Ruby... with only 23 commands. This is a testament to the brilliancy of the design.

Strategy

At this point we had the first 80%, now we needed to do the next 19.999%. My approach was to "go into a submarine", make as much progress as possible and once it would be ready for prime time, "go back to the surface" and get people to adopt it.

In 6 months, I shipped 500 commits, which is around 3 commits a day. The amazing thing with Open Source is that I only did half of the commits of the project during that period! 31 other people contributed and built many things that I would likely not been able to do.

Correctness

The #1 thing I was concerned about is that the formatter would output code that actually executed the same as the original. You can have the prettiest code but if it introduces bugs, nobody will use the software.

I tried many approach to test for correctness. The obvious one was checking that the AST would be the same but the issue is that adding or removing parenthesis would sometimes change the AST, but it was required to print it the way we wanted to. So had to do a lot of massaging and it wasn't clear if it actually properly tested correctness anymore.

What ended up being the best indicator is a simple idempotency test:

prettier(input) == prettier(prettier(input))

It's not intuitive why it actually works but the vast vast majority of all the correctness issues we've ever found broke this test.

The good news is that I had a massive JavaScript codebase within Facebook to play with. So for these 6 months I would wake up in the morning, run this test on the entire codebase, eye ball to see the most common issue, fix it and repeat. It was a grind but after about 4 months, we finally got it down to 0 correctness issues!

Formatting Decisions

The thing I was most anxious going into the project was figuring out how to deal with everyone's extremely strong opinions on formatting. People called "Space vs Tabs Holy War" for a reason.

My mindset going into it was that I would use my own judgement for printing most of the code and provide the minimum number of options so that I wouldn't have to deal with the most controversial takes.

So we ended up with a small set for tabs vs spaces, semicolons vs no semi, single quotes vs double quotes and a few smaller ones. While it would have been amazing to not have any options, I feel like this sidestepped the most likely distraction that would have killed the project.

For all the other decisions, the strategy I used was to grep the giant Facebook codebase to see what was the most used variant and use that one in case I needed to make a choice between two reasonable ways to print the same AST structure.

This also gave me some data driven way to justify my choices instead of just "I think it looks better". Being able to communicate with people in this project removing the preferences was a key part of how to remain sane.

It's a really weird feeling to think that I ended up defining how most of JavaScript code looks like in the world these days.

Printing Difficulties

There's a reason why nobody had solved the problem before, we ran into many types of code that were not trivial to print.

Comments

Comments have become the bane of my existence. They can be inserted literally anywhere, they have bunch of different forms, you sometimes want them to extend past 80 columns but not always... The code for this is a huge list of special cases that doesn't fully work.

If I were to reimplement prettier, I would change the way we handle them. Most JavaScript parsers are AST parsers where the A stands for Abstract. Instead I would go for a CST parser where the C stands for Concrete. It gives access to the list of all the tokens including comments. This is how we did it for the programming language Skip which solved it elegantly.

Chained Methods

The other one is chained.method().calls(). The reason for this one is more subtle. In practice for most of the language constructs, they have a single "intent". So you need to figure out what the meaning of it is and print it accordingly.

The issue with chaining is that people have used it to mean a ton of different things. People have built entire new languages out of it for example for building SQL queries, it is used to paper over the lack of namespaces, doing tons of various operations with jQuery...

As a result, we need 500 lines of code and a bunch of heuristics just for this single node type.

Expanded last argument

If you put a function definition or an object as the last argument of a function call, you want to kind of inline it but also expand it. It doesn't neatly fall into the primitives that the core algorithm gives us.

const response = await fetch("https://example.org/post", { <--
  method: "POST",
  body: JSON.stringify({ username: "example" }),
});

We hacked it up with "conditionalGroup" primitive which is unfortunately O(n²) so this is the only part of prettier that can actually spend a really large amount of time if you nest many of these. Thankfully this isn't a pattern that people actually do.

It haunted me so much that when I built the Skip pretty printer, I ended up finding a much more elegant solution using the notion of a "marker". If anyone reading is building a pretty printer based on the same infrastructure, I highly recommend using it instead.

Object literals

Like chained methods, object literals is used in a lot of different contexts. Unfortunately, there wasn't a heuristic that actually worked well for all the use cases.

We have a really important principle with prettier which is that we should print the code regardless of how it was formatted in the first place. This makes it so that however badly (or differently) the input was formatted, it'll always print the same way.

Sadly, we had to break this rule for object literals with a slight twist. If there's a new line anywhere in an object, we'll always print its expanded form. This means that if you want to shrink an object if it fits 80 columns, you'll have to manually remove all the empty lines.

To this day, nobody that I know about has come up with a better alternative unfortunately. If you have ideas please let me know!

No-semi

At that time there was a big movement around not putting semi-colons at the end of lines. While this worked most of the time, some of the edge cases were very subtle but completely changed the meaning of the code. For example this is how the following seemingly innocuous code would be evaluated:

let start, end;
 
let range = getRange()
[start, end] = range
// How it actually is executed by the language:
// let range = (getRange()[(start, end)] = range);
 
// How we print it with nosemi in prettier:
let range = getRange()
;[start, end] = range

In order to get around, the convention was to put a semicolon at the start of the next line. Finding out how to make this happen was actually pretty complicated as we needed to figure out what is the last character we would print in the previous line and the first of the current line and put a semicolon if they would trigger issues.

One of the great thing about prettier is that it would reformat whatever code you pass in to a human readable variant. So if you see prettier do weird things with your code, at this point it's way more likely that your code triggered some JavaScript parsing edge case than prettier being wrong.

Internal Rollout

Product Manager Shape

One thing I've learned through this experience is that I needed to learn how to shape shift based on what the project needed. At first it required me being a coding machine but then a lot more of coordination was required.

Within Meta, we needed to build all the integrations. It turns out having a working printer is only a piece of the puzzle. We needed to have IDE integration such that it would format on save, a linter and CI integration such that you couldn't ship incorrectly formatted code, a way to upgrade existing files in batch...

I ended up coding only a small part of these integrations but instead directed many people into the best way to do it.

Options

While prettier itself should support various options to help increase adoption, I wanted a single set of options within the company. That came with a challenge where it's really hard to enforce in a big company where people are one pull request away from changing the options.

The strategy I used was to figure out how to line up the incentives such that using a different set of options was orders of magnitude more annoying than not.

Prettier is used in two main places: CI and the IDE. I made it so that both would read their options from a different place with a different rollout schedule. CI would read the options from source control. The IDE would read the options from the IDE extension.

So if you wanted your own set of options, you'd need to add a coordination system so that both would read from the same place, which was not trivial to do and would require you to work on a foreign codebase.

A few people that weren't using the standard IDE did rollout their own options but when they started working with more people within the company eventually ended up aligning.

Granularity

There were a lot of debates internally as to how we're going to roll it out. So far all the formatting fixes were rolled out only on lines changed by the person modifying the code. While we did implement this mode in prettier, it meant losing a lot of the value proposition.

So we needed to figure out a way to ship it such that it wouldn't update a bunch of unrelated lines when people modified a file. The solution we came up with was to add a @format annotation in the first comment of the file. If it's there, then the file is always formatted, both in the IDE and with a pre-commit hook to enforce it.

Rollout

With that strategy planned, the rollout process was a script that took a list of files, would run the check mode to make sure prettier didn't change the meaning of the code, run the formatter and add the @format annotation to the top of the file.

My only rule with this rollout is that I would personally not reformat any single line of the codebase. The rationale behind it is that I wanted the printed code to be so good that people didn't need to be forced into it but instead they'd want to do it on their own accord.

This strategy worked really well, after 6 months we got 50% of our giant codebase converted and after a year 75%. At that point it was clear that the value proposition was there and I organized a "Getting to 100%" event where a bunch of volunteers updated the rest of the codebase.

Up until then we were at 0 production issues caused by prettier (as far as I'm aware) but we got the first and only one... Turns out it was caused by the script that added @format to the header, it accidentally reformatted the comment and a polyfill would no longer be bundled in.

The fact that we were able to do meaningful change to every JavaScript and CSS file at the company only causing one issue is one of my proudest engineering moment.

Upgrades

The setup described above worked really well for getting adoption but ended up being a nightmare to upgrade prettier. Since we were still pretty early in the evolution of prettier, we would update the way we print commonly written code pretty frequently.

As a result, anytime we needed to upgrade prettier, we would need to update a large percentage of the files in the codebase. Because we had a commit hook that prevents landing code that is not properly formatted, we needed to do it very quickly to avoid preventing people from shipping code.

So I ended up spending many weekends (where the chances of merge conflicts are the lowest) sending gigantic pull requests, hitting all sorts of limits of our source control and continuous integration infrastructure in the process.

Thankfully as time passed the amount of large changes in prettier's format went dramatically down and our infra got a lot better so it stopped becoming a big issue.

One interesting side effect is that I changed the most number of lines of code company wide that year and got my name on the "blame" for most of the JavaScript files that existed at that time. So over the years I kept getting random people or scripts pinging me about code I had no idea about!

Solving the problem

Format on Save

I completely underestimated the profound impact that format on save feature would have. I saw it as a nice to have feature but it ended up being the defining factor for the project.

The mindset of the programmer at the time was that you write your code mostly formatted and then you've got this annoying linter that tells you that you did all these things wrong and you need to spent time correcting.

At the beginning people were doing the same thing with prettier and got a slight productivity boost as prettier would fix the issues. But then people realized that they could just write code in whatever shape, press save and it would automagically format itself.

Ending the Tabs vs Spaces Holy War

During a programmer career, you've likely written millions of open curly bracket and based on your coding style put a space, newline or nothing before. If you do it wrong, you've got linters or coworkers annoyed at you.

The big realization is that if you do something many many many times, at some point it becomes part of your identity. So programmers, due to the nature of the work have become fundamentally attached with how they format their code.

As a result, they have all sorts of opinions that are strongly held and end up being very spicy debates such as tabs vs spaces, semicolon vs nosemi... But the magic of prettier is that you no longer need to do this repetitive task anymore, so people cared a lot less.

The other aspect is that changing the formatting style used to be a huge endeavor, you not only needed to update the codebase but also rewire the brain of all the engineers working on it. But with prettier, you just change the config, send a big diff and you're done.

So at the end of the day, the way the war ended was not by picking a winner but lowering the stakes so people just stopped caring.

Maintenance

While the early days are the exciting stories people want to hear, the reality is that Prettier is now one of the foundation upon which most of the JavaScript and CSS code is written in the world. This requires people to maintain this project.

Open Collective <3

After a few months somebody suggested that I open up a way for people to donate to the project. I signed up prettier on opencollective.com and added a link to the GitHub repo.

After a few years the project had received $50k of donations. I was very unsure on how to use that money. On one hand, this was a significant amount of money, but on the other hand, it was far from being enough to hire anyone competent to work on it.

My first attempt was to ask all the people that had significant contributions and give them a slice of that. But almost all of them declined. Either because they didn't feel justified to earn money on contributions that happened a few years prior at this point, or because the hassle of handling taxes on that money transfer were not worth it.

Paid Maintainers

A few years later, there were 2 people that kept contributing and were running the show. After doing the math of the amount of donations we receive, we could pay them $1.5k / month each. This is not enough to live off of but a not insignificant amount either.

So we started paying Fisker Cheung and Sosuke Suzuki and have been for the past many years. They've been doing an amazing job at maintaining the project, supporting latest versions of the various JavaScript standards and language extensions (TypeScript, Flow) that came up over the years.

Stressful Situation

I have to be honest, this part of the project has caused me a lot of stress. Even though Prettier is used by the entire industry, the project has received a grand total of $200k of donations over its lifetime.

In the past few years, the $3k * 12 = $36k a year of maintenance costs have been more than the natural flow of money coming in. So I had to spend time fund raising.

The challenge is that there's not a clear value proposition. The way the world works is that you pay money in exchange for something. Since Prettier itself is free, people are not paying for that.

Money Sources

There are a few things that people are willing to pay for. One is advertising as the Prettier website and GitHub repo are getting a fair amount of traffic. But a lot of the project interested didn't feel like something I would want to put my name behind.

The second common one is doing consulting related to the project. In practice Prettier is very easy to install or integrate with your environment (by design) so there isn't a real need there. There could be work for supporting new language features but that's a very niche use case.

We tried unconventional avenues, the team at Sentry ran a swag store for Prettier tshirts but it was a lot of logistics for not a lot of revenue in the end. I also quickly looked at getting grants but the amount of paperwork needed for likely not getting any money didn't feel like a good use of time.

Big shoutout to Facebook / Meta that has been critical paying for my salary and contributing significantly to the Open Collective.

Call for money

If you as an individual or your company want to support Prettier maintainers, please send money to opencollective.com/prettier. The reality is that you're not going to get anything concrete for that donation except for the knowledge that you kept a critical piece of infrastructure running.

If you have ideas on how to get it funded, please let me know. Or maybe there's a tech billionaire or VC reading this post that will do a one million dollar donation ¯\_(ツ)_/¯

Aftermath

My initial assumption was that I would down the submarine and only after I go back up would the adoption really kickstart. But thanks to the power of open source and adventurous people, people started adopting prettier extremely quickly.

By 2021, 83% mentioned that they were using Prettier in the State of JS survey. After that, the people running the survey stopped asking because they felt like everyone was using it and it didn't provide much value.

Other Languages

Łukasz Langa who worked at Facebook at the time saw my work on prettier and built an equivalent for Python called Black. It eventually got adopted by The Python Software Foundation who is the organization behind Python.

Prettier started with JavaScript but then expanded to cover languages in the web space like CSS, JSON, HTML, YAML, Markdown, GraphQL... and many of their popular variants.

Nowadays, it's pretty rare to see a mainstream programming language not have a formatter that's widely adopted by its community.

The End

This is pretty insane to me where we went from formatting being one of the most controversial engineering topic to the problem being truly solved in only the span of a few years.

It feels surreal, even after all these years, that I got to be part of the solution and ended up deciding how a large part of the code written in the world looks like.

Now that this chapter is closed, onto solving the next big problem faced by engineers!

Aug

Trackmania Medal Distribution

vjeux Javascript Add you comment

I've been playing Trackmania, a racing game, recently and they introduced a new concept called Cup of the Day. Every day a brand new map is released, for 15 minutes everyone is trying to get the fastest time and based on that time are put into groups of 64 players. Then for 23 rounds people play and the slowest ones are eliminated until there's only 1 winner left. This is super fun to play!

Each map has 4 times associated: "Author Time", "Gold Medal", "Silver Medal", "Bronze Medal". Not all those times are of equivalent difficulty for all the maps. I've been trying to get gold medals on all the tracks and for some it takes me a few minutes compared to hours for some others. So I've been trying to figure out a way to get a sense of how difficult it is to get.

Check out the results!

Getting the data

Fortunately, there's an in-game leaderboard that tells you the times everyone made. So my plan was to scrape this leaderboard, figure out how many people got which medal and hope to get a sense of how hard the map is.

Trackmania Leaderboard (I'm not GranaDy)

Fortunately, the website trackmania.io has all the information I needed. It has the medal times and the actual leaderboard.

Not only that but the way the website is written is a single page app using Vue that queries the data from a server endpoint using JSON. So this makes retrieving the information even more straightforward, no need to parse HTML.

Example of JSON for the leaderboard information

At this point, what I need is to figure out how to get the number of people that got each medal. The traditional way to do a leaderboard is to have the endpoint return a fixed number of results each time and have a pagination system. So I would do a binary search in order to find where the medal boundary lie.

But, it turns out that this was even easier, the pagination API is doing the limits based on a specific time. So the algorithm was to query the map medal times, and for each of them query the leaderboard and take the position of the first result to know how many people had the previous medal!

For example, Gold time is 1:03.000. I query the leaderboard starting at 1:03, the first person will have 1:03.012 and be at position 2310. They won't have the gold medal, only silver. But 2309 people will have either the author time or gold medal.

Coding the scraper

I decided to go with a nodejs script this time around. But you can use whatever language you want, I've been doing a lot of scraping in PHP in the past.

The first thing you want to do is to add a caching layer so you don't spam the server and get you banned, but also make it much quicker to iterate as the next times it'll be instant. Here's a quick & dirty way to build caching:

async function fetchCachedJSON(url) {
  const key = url.replace(/[^a-zA-Z0-9]/g, '-').replace(/[-]+/g, '-');
  const cachePath = `cache/${key}.json`;
  if (fs.existsSync(cachePath)) {
    return JSON.parse(fs.readFileSync(cachePath));
  }
  const json = await fetchJSON(url);
  fs.writeFileSync(cachePath, JSON.stringify(json));
  return json;
}

And this is what my cache/ folder looks like after it's been running for a while.

The great aspect about this is that all those are single files that can be looked at manually and edited if needs be. If you someone messed up or got banned, you can delete the specific files and retry later.

If you look at the code, you'll notice that I didn't use the fetch API directly but instead used a fetchJSON function. The reason for this is that you'll most likely want to do some special things.

You probably will need some sort of custom headers for authentication or mime type. It's also a good place to add a sleep so you don't spam the server too heavily and get banned.

async function fetchJSON(url) {
  const response = await fetch(url, {
    headers: {
      'Authorization': 'nadeo_v1 t=' + accessToken,
      'Accept': 'application/json',
      'Content-Type': 'application/json',
      'User-Agent': 'vjeux-totd-medal-ranks',
    }
  });
  const json = await response.json();
  await sleep(5000);
  return json;
}
function sleep(ms) {
  return new Promise(resolve =&gt; setTimeout(resolve, ms));
}

After this, the logic was pretty straightforward, where I would just do the algorithm I described at the beginning. In order to write it, the usual way is to first implement the deepest part and test it standalone (fetching a single time) then wrap it for a map, then for a month, then for all the months.

async function fetchRankFromTime(trackID, time) {
  const json = await fetchCachedJSON('https://trackmania.io/api/leaderboard/map/' + trackID + '?from=' + time);
  return json.tops[0].position - 1;
}
async function fetchRanks(trackID) {
  const map = await fetchCachedJSON('https://trackmania.io/api/map/' + trackID);
  const rankAT = await fetchRankFromTime(trackID, map.authorScore);
  const rankGold = await fetchRankFromTime(trackID, map.goldScore);
  const rankSilver = await fetchRankFromTime(trackID, map.silverScore);
  const rankBronze = await fetchRankFromTime(trackID, map.bronzeScore);
  return [map.authorplayer.name, rankAT, rankGold, rankSilver, rankBronze];
}
async function fetchTOTDMonth(month) {
  const json = await fetchCachedJSON('https://trackmania.io/api/totd/' + month);
  const days = [];  
  for (jsonDay of json.days) {
    [authorName, rankAT, rankGold, rankSilver, rankBronze] = await fetchRanks(jsonDay.map.mapUid);
    days.push({day: jsonDay.monthday, month: json.month, year: json.year, authorName, rankAT, rankGold, rankSilver, rankBronze});
  }
  return days;
}
async function fetchAll() {
  const days = [];
  for (month of [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]) {
    days.push(...await fetchTOTDMonth(month));
  }
  console.log('data =', JSON.stringify(days.sort((a, b) =&gt; b.rankAT - a.rankAT)));
}

There are two distinct parts of the project. The first part is the data collection, described above. The second is how do you display all this data. I like to keep both separate.

In this case, the end result of the collection is a standalone JSON document where I prepend data =, that I can then include as a <script>. This way in my front-end, also written in Vue for this project, I can use that global data variable to access it.

Displaying the data

I didn't really know how to display the data. We have the number of people that have all the medals before. I started displaying an horizontal bar for each where 1px = 1 position. It worked pretty well but it was way too large.

The Trackmania API stops giving any kind of precision past 10,000 so I used CSS and made all the numbers where 100% is 10,000 and it gave this results which worked well!

Since most tracks are finished by around 8k players it turns out to be working really well in practice.

Now that we have that for every single map, we can start having fun and sort this data in many ways.

Sorting by number of people that got author time gives us what I was looking when going into this project. We can find the easiest maps:

As well as the hardest maps.

I've implemented various ways to sort such as date released, medal times and map author name. In doing so, I found out that if 10k people got a medal then the sort is going to give different orderings every time you sort them.

A quick and dirty fix is to sort the data by all the previous pivots so that it always give a stable list. It is wasteful but easy to implement by copy and pasting and the dataset is small enough that it doesn't matter much in practice.

sortAuthor: function() {
  this.days.sort((a, b) =&gt;
    (b.year * 10000 + b.month * 100 + b.day) -
    (a.year * 10000 + a.month * 100 + a.day));
  this.days.sort((a, b) =&gt; b.rankAT - a.rankAT);
  this.days.sort((a, b) =&gt; b.rankGold - a.rankGold);
  this.days.sort((a, b) =&gt; b.rankSilver - a.rankSilver);
  this.days.sort((a, b) =&gt; b.rankBronze - a.rankBronze);
  this.days.sort((a, b) =&gt; a.authorName.localeCompare(b.authorName));

Conclusion

This was a fun project and I'm happy that I was able to figure out how hard a map was in practice. I'd like to give big props to Miss who built all the Trackmania APIs I used during this project.

Jul

Efficient String Encoding for Concatenation

vjeux Javascript Add you comment

I'm now [in July 2018] in a group full of compiler engineers at Facebook and learning a lot. Yesterday, I read a post by David Detlefs (summarizing a collaborative idea involving several members of his team) about how to efficiently encode strings for concatenation and since it's very clever I figured I would share it.

Problem

A lot of programs are taking a string as input and building a string as output. You can imagine the following code. Note: I'm going to use JavaScript as an example but it applies to almost all the languages out there.

var str = '\n';
for (var elem of elems) {
  str += ' * ';
  if (elem.isExpired) {
    str += '[expired] ';
  }
  str += elem.name + '\n';
}

An example output might be

'
 * Nutella
 * Eggs
 * [expired] Milk
'

What is being executed is

'\n' + ' * ' + 'Nutella' + '\n' + ' * ' + 'Eggs' + '\n' + ' * ' + '[expired] ' + 'Milk' + '\n'

If you implement this naively, the execution would look something like:

'\n' + ' * ' = '\n * '
'\n * ' + 'Nutella' = '\n * Nutella'
'\n * Nutella' + '\n' = '\n * Nutella\n'
'\n * Nutella\n' + ' * ' = '\n * Nutella\n * '
'\n * Nutella\n * ' + 'Eggs' = '\n * Nutella\n * Eggs'
'\n * Nutella\n * Eggs' + '\n' = '\n * Nutella\n * Eggs\n'
'\n * Nutella\n * Eggs\n' + ' * ' = '\n * Nutella\n * Eggs\n * '
'\n * Nutella\n * Eggs\n * ' + '[expired] ' = '\n * Nutella\n * Eggs\n * [expired] '
'\n * Nutella\n * Eggs\n * [expired] ' + 'Milk' = '\n * Nutella\n * Eggs\n * [expired] Milk'
'\n * Nutella\n * Eggs\n * [expired] Milk' + '\n' = '\n * Nutella\n * Eggs\n * [expired] Milk\n'

Because strings are immutable, we need to do a full copy of the string for every small concatenation. In practice this turn a O(n) algorithm into O(n²).

Solution 1: Change the code

If this becomes a bottleneck, instead of using a string all the way through, you can use an array and push all the string pieces to it. Once you are done building the result, you can join all the pieces together into the final string. Since at this point you know all the strings the operation can sum all the sizes and allocate exactly the right size.

var buffer = ['\n'];
for (var elem of elems) {
  buffer.push(' * ');
  if (elem.isExpired) {
    buffer.push('[expired] ');
  }
  buffer.push(elem.name, '\n');
}
str = buffer.join('')

This pattern works to solve the problem but requires the programmer to know about it and the performance to be bad enough that it is worth writing code in a different way. In practice, a lot of code is not written that way and it's unclear that any amount of education will change this fact.

Note that a compiler to a bytecode format could, in many cases, make the transformation of the original code to the explicit StringBuffer code. But not in all cases, since compilers have to be conservative: if the string being concatentated is passed as an argument, all bets are off.

Broken Solution 2: Mutate the original string

The solution that comes to mind is: can normal strings act as a buffer?

The idea is that you allocate a buffer of characters and whenever you do a concatenation, you keep writing at the end of the buffer. If it isn't big enough, you allocate a bigger one, do a single copy and keep going.

var str = '\n';   // size = 1, ['\n', _, _, _, _, _, _, _]
str += ' * ';     // size = 4, ['\n', ' ', '*', ' ', _, _, _, _]
 
// The next one doesn't fit so we need to alloc a new buffer and do a full copy
str += 'Nutella'; // size = 11, ['\n', ' ', '*', ' ', 'N', 'u', 't', 'e', 'l', 'l', 'a', _, _, _, _, _]
str += '\n';      // size = 12, ['\n', ' ', '*', ' ', 'N', 'u', 't', 'e', 'l', 'l', 'a', '\n', _, _, _, _]

If you are curious, this is how the Java StringBuilder class is implemented. Performance-wise, this is what we want, but there's one problem...

Aliasing

You can assign the string to a variable and assign another variable with that variable. For example:

var str = '\n';
var str2 = str; // here we make an alias

In this case, both str and str2 are pointing to the same '\n' string. In the compiler literature this is called aliasing. The big question is what happens if you try to update one of the variable:

str += ' * ';

If you look at the JavaScript specification, strings are immutable meaning that you expect str2 to be unchanged but str to be:

str2 == '\n'
str  == '\n * '

Unfortunately, if you mutate the string like in the above solution, then both of them would be '\n * because they both point to the same underlying storage.

Solution 3: Linear Types

If you've not been living under a rock, you probably have heard about Rust and linear types. This is a fancy name to say that you cannot have aliasing: there's only a single variable that can point to a value at all time.

What this means in this case is that the line var str2 = str; would be illegal. If you want to do that, you need to do a full copy of the value so it's effectively a different one.

In practice, aliasing happens all the time in normal programs, for example calling a function with a string as argument is a form of aliasing. We wouldn't want to do full copies every time aliasing is happening.

Rust is getting away with it using a concept calling "borrowing" where you can create an alias if the compiler can guarantee that the previous variable cannot be accessed during the lifetime that the alias exists.

In my understanding, you need a strong type system in order to properly enforce those guarantees and in dynamic languages like JavaScript you would have to be too pessimistic and do way more copies than necessary when you are just passing the variable around, ruining the wins you get from building the string in the first place.

Solution 4: Size in the variable

Aliasing is usually a dealbreaker because you can mutate the underlying storage that another variable could observe. But in this particular case we can exploit the fact that the only mutation we care about is appending something at the end.

So, in the variable we not only keep a pointer to the buffer but also the size we care about. If someone else appends something at the end, it will not affect us because what's in the buffer for that size didn't change.

var str = '\n';
// buffer1, size = 1, ['\n', _, _, _, _, _, _, _]
// str : size = 1, buffer = buffer1
 
var str2 = str;
// str2: size = 1, buffer = buffer1
 
str += ' * ';
// buffer1, size = 4, ['\n', ' ', '*', ' ', _, _, _, _]
// str : size = 4, buffer = buffer1

At this point, str2 points to buffer1 with '\n * ' but because it has size = 1 then we know it really is '\n' as intended.

The only edge case to consider is if you are trying to also concatenate str2. If the size of the variable is not equal to the size of the underlying buffer, this means that someone else clobbered the buffer. In this case, our only option is to do a full copy.

str2 += '|';
// buffer2, size = 8, ['\n', '|', _, _, _, _, _, _]
// str2: size = 2, buffer = buffer2

Conclusion

Before joining the team, I knew about the string builder pattern but I had no idea that there was so much theory behind this particular problem like aliasing, linear types... I hope that explaining those concepts in terms of JavaScript is helpful to get some insights into what's happening inside of compilers.

Jan

Anatomy of a JavaScript Pretty Printer

vjeux Javascript Add you comment

During the past few weeks, I've been working on prettier, which is a JavaScript pretty printer. We are approaching the phase where we can actually use it so this is a good time to explain how it works.

We're going to go through an example

if (!pretty) { makePretty() }

String -> AST

The first step is to take this string that represents some JavaScript and to parse it in order to get an AST out of it. An AST is a tree that represents the program. Using either Babylon or Flow we can parse this example and we get the following tree.

Program
  IfStatement
    UnaryExpression(!)
      Identifier(pretty)
    BlockStatement({})
      ExpressionStatement
        CallExpression
          Identifier(makePretty)

You can explore the full AST using astexplorer.net.

AST -> IR

Now that we have this tree, we want to print it. For each type of node like IfStatement, UnaryExpression... we're going to output something. In the case of prettier, this something is an intermediate representation called a document as described by the paper a prettier printer by Philip Wadler.

[
  group([
    "if (",
    group([ indent(2, [ softline, "!", "pretty" ]), softline ]),
    ")",
    " ",
    "{",
    indent(2, [ hardline, "makePretty", "()", ";" ]),
    hardline,
    "}"
  ]),
  hardline
];

You can play around with this representation on the prettier explorer.

IR -> String

The interesting thing about this representation is that it is the same no matter what the line-length is. The basic idea is that the primitives such as group, indent, softline encode the way they should look if they fit in the line or if they don't.

The most important primitive is group. The algorithm will first try to recursively print a group on a single line. If it doesn't fit the desired width, then it's going to break the outer group and keep going.

Then, we have primitives that behave differently if they are in a group that fits a single line or not: softline that does not print anything if the group it is contained in fits and a line otherwise. indent adds a level of indentation if it doesn't fit. If you are curious, you can look at the short list of available commands.

So, we just need to take this IR, send it through a solver along with the desired line width and we get the result!

HN6WzI9wFW

Conclusion

Hopefully this gives you a better idea of how a pretty printer that takes into account the desired width work.

Dec

Challenge: Best JavaScript Setup for Quick Prototyping

vjeux Javascript Add you comment

Yesterday, there was a big discussion on Twitter on how hard it is to start hacking on a js project. One comment by Dan Abramov struck me in particular: "Right: don’t use tools, face a problem, choose a tool or roll your own. Wrong: learn tools that don’t solve your problems, hate the tools."

This is spot on. All the solutions presented in this thread do not solve the problems I have when I'm hacking on a new project.

My dream PHP Setup

Back when I was 14, I had the best setup I've ever used in my life. Here it is:

Launch WinSCP and connect to fooo.fr.
Create a file test.php, write <?php echo 'Hello World'; locally with EditPlus 2, press cmd+s to save, wait 0.5 second for the upload.
Open my browser and go to http://fooo.fr/~vjeux/. Click on test.php.
?????
Profit

Challenge

I want to get the same attributes but with JavaScript and React. Here are the constraints:

No setup: I'm happy to have to setup something initially (dedicated server, apache, php...) but nothing should be required to create a new project. No npm install, react-native init, creating a github project, yo webapp...
One file: I want to write a single js file to start with. No package.json, no .babelrc, no Procfile...
Sharable: I want to be able to share it with a url without any extra step. No git push heroku master or git push gh-pages.
Keeps working: Once online, it should stay there and keep working 6 months later. No npm start to run it, no special port that's going to conflict with my 10 other prototypes...
Not generic: I don't care about it being generic, I will use whatever transforms you decided. Happy to write js without semi-colons and using SASS for my CSS if you checked all the boxes above.
Not prod-ready: This setup doesn't have to be prod-ready, support unit testing or anything that involves it being a real product. This is meant for hacking on stuff. When the project becomes good, I'll spend the time to add all the proper boilerplate.

So, that's the challenge. Can you make it happen?

All Posts

Frenchy Front-end Engineer

Birth of Prettier

Back Story

School time

Facebook time

Potential Solutions

Lint fixers

gofmt

Dartfmt

All the JavaScript formatters

Winter Break

Testing

Friendly Competition

End of the break

Prettier

Open Source

Algorithm

Strategy

Correctness

Formatting Decisions

Printing Difficulties

Comments

Chained Methods

Expanded last argument

Object literals

No-semi

Internal Rollout

Product Manager Shape

Options

Granularity

Rollout

Upgrades

Solving the problem

Format on Save

Ending the Tabs vs Spaces Holy War

Maintenance

Open Collective <3

Paid Maintainers

Stressful Situation

Money Sources

Call for money

Aftermath

Other Languages

The End

Trackmania Medal Distribution

Getting the data

Coding the scraper

Displaying the data

Conclusion

Efficient String Encoding for Concatenation

Problem

Solution 1: Change the code

Broken Solution 2: Mutate the original string

Aliasing

Solution 3: Linear Types

Solution 4: Size in the variable

Conclusion

Anatomy of a JavaScript Pretty Printer

String -> AST

AST -> IR

IR -> String

Conclusion

Challenge: Best JavaScript Setup for Quick Prototyping

My dream PHP Setup

Challenge

About Me

Recent Posts

@vjeux on Twitter

Categories