Hey, I'm Christopher Chedeau aka Vjeux, a front-end engineer at Facebook that graduated from EPITA! I hope you will find some of my stuff fun if not useful šŸ™‚

Here are some projects I worked on in the past:

  • Part of the original team that started Curse.com, helped with MMO-Champion and created sc2mapster.com.
  • Improved Facebook tagging flow with face detection.
  • Optimized Facebook image sizes infrastructure that led to major savings in CDN bandwidth and storage.
  • Promoted React by organizing the first two React Conf and started create-react-app.
  • Part of the original team that built React Native. Designed Core Components API and Animated library.
  • Started the CSS-in-JS movement.
  • Designed and implemented first version of Yoga, the layout engine that now powers most of Facebook mobile UIs.
  • Brought Prettier, a JavaScript pretty printing library, from a prototype to widespread use.
  • Helped open source many Facebook front-end projects, built website infrastructure that eventually became Docusaurus.

On January 1st I started building a little tool that lets you create diagrams that look like they are hand-written. That whole project exploded and in two weeks it got 12k unique active users, 1.5k stars on github and 26 contributors on github (who produced real code, we don't have any docs). If you want to play with it, go to Excalidraw.com.

Many people have asked me how I got so many people to contribute in such a short amount of time for Excalidraw, while this is still fresh in my mind, let me post about what I was thinking about during the process.

S curve

Before we get started with the actual content, here's an interesting concept that was in my mind thorough the project. I discovered the concept of a S curve through Kent Beck's video series. There are three rough phases:

  • the first phase is when you do R&D and develop the product, there's a lot of work done but no real visible impact
  • the second phase is the exponential part where everything is growing tremendously
  • the third phase is when the growth flattens and you're doing smaller improvements (which can still be huge if the baseline is huge)

The S curve is usually used to describe bigger projects but it turns out Excalidraw just went through a S curve as seen in this chart that plots the number of stars over the past two weeks.

The most important part for me was to capitalize on the growth phase so that the project doesn't die when it hits the stabilization phase.

Proven Value Proposition

Excalidraw didn't come out of nowhere, I've been using a tool called Zwibbler for probably 10 years in order to build hand-drawn like diagrams to illustrate my blog posts. I've always had this feeling that this tool was underrated. I seemingly was the only one to use it even though it felt like it could be used much more broadly.

Example of image drawn with Zwibbler

So when excalidraw came out, there was a clear value proposition and I knew it was going to be somewhat successful. Those days I don't have that much free time so I tend to spend my time on things that I believe have a high likelihood of being successful, especially side projects.

Make Some Noise

The first thing was to get people excited! I'm fortunate to have a sizable audience on Twitter so I used it by posting a bunch of videos of the progress of building the first version of the tool.

Convert Attention to Action

I got more attention than I anticipated so I felt like I could convert it into actual action. For this, the best way I've found is to create a bunch of issues about all the things that need to be done. I've been thinking about rebuilding a Zwibbler equivalent for a long time so I had a pretty good sense of what needed to be done.

People that wanted to contribute could just skim through the list of things to be done and start hacking. That worked really well!

Image

Who is Contributing?

When I open sourced React Native, I was convinced that the same people that contributed to React would contribute to React Native. It turns out I was plain wrong, a new set of people started contributing. This same pattern applied to all the subsequent projects I've worked on since then.

This is a very broad generalization but most people that tend to contribute significantly to early projects like this are unknown (if they were well known, they'd likely have better opportunities to spend their time) but experienced (they are able to jump in on a random codebase and contribute).

Keeping People Engaged

The name of the game is to get as much from people that are interested in contributing as possible. Your initial buzz is only going to last so long (a few days), so you want to capitalize on that time. Everyone (myself included) is likely going to have to go back to their real job soon.

For this, I usually try to be very responsive on the pull requests coming in. If you can get turnaround in less than 10 minutes, then you can have real-time work and people will stay engaged as long as you are.

I've tried something new this time and gave commit access to everyone that got a PR merged in. In the past I would do it after I've seen sustained work. This worked really well where this gave an extra motivation for people to contribute and they also started to review each other's code which was awesome! I am not worried about people abusing their power, people that spend energy getting something of quality in tend to be considerate.

A trick I've been also using is to merge pull requests even if they're not exactly the way I want and then push all the follow ups I had in mind. This way the person can have their feature shipped and likely to come back without having expensive back and forth (we never know when / if they're going to apply suggestions).

Be Decisive

People are going to try and stir the project in all sorts of directions with their ideas and pull requests. It's pretty tricky to think in advance what kind of suggestions you're going to get because people tend to get very creative (in both good and bad ways...).

If you want something to happen, you need to give a very clear "yes" with concrete things that need to be done. If you're not sure or change your mind multiple times or answer days/weeks later, people are either not going to invest their time making it happen, or will lose interest and not push it to conclusion.

On the flip side, you're likely going to see a lot of pull requests or suggestions that you don't think are a good idea. I've found that it's usually not a good idea to give a clear "no" as it's a hard message to give to a stranger over text. Instead, what I found tends to work better is to space out replies and ask for more information. The other party will naturally lose interest and move on. You should use this technique very sparingly as it is not a nice approach.

Keeper of Quality

With so many simultaneous contributions, the product can easily start losing quality. I view myself as the keeper of quality. I've been pretty obsessed about all the small details and things that feel off.

Every time I see a problem, I open an issue with a small repro case. In many cases, those issues are easy to fix and someone will get to it. I also make sure to clear the backlog so that we're always in a good enough shape.

I've also made sure that some core values were being maintained. I want minimal friction to get started drawing. In particular, this means that what you see first should be the shapes. I had to actively prevent people from adding title selection and login to keep this property.

Celebrate Success

Posting about all the good things that happen, be it a new cool feature, or interesting usage or thoughts in the topic will increase the size of that channel as those posts will attracts an audience.

The other interesting thing that will happen is that you will provide an audience to a lot of the people that are contributing. As I mentioned earlier, they're unlikely going to have a big one of their own that cares about this topic.

This is a win-win situation! It takes time to actually post all those things but I've seen it being valuable time and time again.

Empty Canvas

What I found fascinating with this project is that many people were able to project their dreams and ideas onto it. I've been told that I should quit my job by at least three people and build a startup around this project as they saw a lot of growth potential in different areas. (Sorry, I'm not, but if you want to, the business is up for grab!)

I'm not exactly sure what to make of that but it led to great conversations! That's more than I hoped for with this project.

Things That Went My Way

I wish anyone could read this and reproduce it but that's not completely true. I had a lot of things that went my way. I found it to be useful to know what advantages people behind success stories have to see how they affect their abilities to deliver.

  • I have more than 10 years of experience building front-end and it turns out that I learned very little on the technical front during this project. I've done all the pieces many times one way or another. So when it was time to architect the project, split up the work, review code or suggestions, do the work, manage contributors, evangelize... All of this was pretty much mechanical and didn't require much thinking. This helped speed up everything so that a lot more than usual would fit within one buzz cycle.
  • I have a large audience on Twitter and I've worked closely in the past with other people with large audiences (hi Dan Abramov and Jordan Walke!) who were willing to evangelize the project. Without that, I wouldn't have been able to get the project in front of so many people so quickly.
  • Excalidraw was built with other projects such as CodeSandbox, Zeit, Rough. They've been fantastic to use and were part of the reason why the project got off the ground so quickly. I encountered some small issues with those dependencies, which likely would have ended up somewhere on an issue tracker and eventually got fixed. But because I personally knew the owners of the first two projects and was visible enough for the third, I was able to get those issues resolved extremely quickly, which is not everyone's experience.

Conclusion

This was a fun project to work on while procrastinating on writing performance reviews. I'm not exactly sure what the future holds for Excalidraw but I'm happy that it is now at a point where I can finally use it to illustrate the blog post I wanted to write that started this whole project (hello rabbit hole!).

Now, go draw some things with excalidraw.com and if you see something you'd like improved, please contribute on github! https://github.com/excalidraw/excalidraw

I'm now [in July 2018] in a group full of compiler engineers at Facebook and learning a lot. Yesterday, I read a post by David Detlefs (summarizing a collaborative idea involving several members of his team) about how to efficiently encode strings for concatenation and since it's very clever I figured I would share it.

Problem

A lot of programs are taking a string as input and building a string as output. You can imagine the following code. Note: I'm going to use JavaScript as an example but it applies to almost all the languages out there.

var str = '\n';
for (var elem of elems) {
  str += ' * ';
  if (elem.isExpired) {
    str += '[expired] ';
  }
  str += elem.name + '\n';
}

An example output might be

'
 * Nutella
 * Eggs
 * [expired] Milk
'

What is being executed is

'\n' + ' * ' + 'Nutella' + '\n' + ' * ' + 'Eggs' + '\n' + ' * ' + '[expired] ' + 'Milk' + '\n'

If you implement this naively, the execution would look something like:

'\n' + ' * ' = '\n * '
'\n * ' + 'Nutella' = '\n * Nutella'
'\n * Nutella' + '\n' = '\n * Nutella\n'
'\n * Nutella\n' + ' * ' = '\n * Nutella\n * '
'\n * Nutella\n * ' + 'Eggs' = '\n * Nutella\n * Eggs'
'\n * Nutella\n * Eggs' + '\n' = '\n * Nutella\n * Eggs\n'
'\n * Nutella\n * Eggs\n' + ' * ' = '\n * Nutella\n * Eggs\n * '
'\n * Nutella\n * Eggs\n * ' + '[expired] ' = '\n * Nutella\n * Eggs\n * [expired] '
'\n * Nutella\n * Eggs\n * [expired] ' + 'Milk' = '\n * Nutella\n * Eggs\n * [expired] Milk'
'\n * Nutella\n * Eggs\n * [expired] Milk' + '\n' = '\n * Nutella\n * Eggs\n * [expired] Milk\n'

Because strings are immutable, we need to do a full copy of the string for every small concatenation. In practice this turn a O(n) algorithm into O(nĀ²).

Solution 1: Change the code

If this becomes a bottleneck, instead of using a string all the way through, you can use an array and push all the string pieces to it. Once you are done building the result, you can join all the pieces together into the final string. Since at this point you know all the strings the operation can sum all the sizes and allocate exactly the right size.

var buffer = ['\n'];
for (var elem of elems) {
  buffer.push(' * ');
  if (elem.isExpired) {
    buffer.push('[expired] ');
  }
  buffer.push(elem.name, '\n');
}
str = buffer.join('')

This pattern works to solve the problem but requires the programmer to know about it and the performance to be bad enough that it is worth writing code in a different way. In practice, a lot of code is not written that way and it's unclear that any amount of education will change this fact.

Note that a compiler to a bytecode format could, in many cases, make the transformation of the original code to the explicit StringBuffer code. But not in all cases, since compilers have to be conservative: if the string being concatentated is passed as an argument, all bets are off.

Broken Solution 2: Mutate the original string

The solution that comes to mind is: can normal strings act as a buffer?

The idea is that you allocate a buffer of characters and whenever you do a concatenation, you keep writing at the end of the buffer. If it isn't big enough, you allocate a bigger one, do a single copy and keep going.

var str = '\n';   // size = 1, ['\n', _, _, _, _, _, _, _]
str += ' * ';     // size = 4, ['\n', ' ', '*', ' ', _, _, _, _]
 
// The next one doesn't fit so we need to alloc a new buffer and do a full copy
str += 'Nutella'; // size = 11, ['\n', ' ', '*', ' ', 'N', 'u', 't', 'e', 'l', 'l', 'a', _, _, _, _, _]
str += '\n';      // size = 12, ['\n', ' ', '*', ' ', 'N', 'u', 't', 'e', 'l', 'l', 'a', '\n', _, _, _, _]

If you are curious, this is how the Java StringBuilder class is implemented. Performance-wise, this is what we want, but there's one problem...

Aliasing

You can assign the string to a variable and assign another variable with that variable. For example:

var str = '\n';
var str2 = str; // here we make an alias

In this case, both str and str2 are pointing to the same '\n' string. In the compiler literature this is called aliasing. The big question is what happens if you try to update one of the variable:

str += ' * ';

If you look at the JavaScript specification, strings are immutable meaning that you expect str2 to be unchanged but str to be:

str2 == '\n'
str  == '\n * '

Unfortunately, if you mutate the string like in the above solution, then both of them would be '\n * because they both point to the same underlying storage.

Solution 3: Linear Types

If you've not been living under a rock, you probably have heard about Rust and linear types. This is a fancy name to say that you cannot have aliasing: there's only a single variable that can point to a value at all time.

What this means in this case is that the line var str2 = str; would be illegal. If you want to do that, you need to do a full copy of the value so it's effectively a different one.

In practice, aliasing happens all the time in normal programs, for example calling a function with a string as argument is a form of aliasing. We wouldn't want to do full copies every time aliasing is happening.

Rust is getting away with it using a concept calling "borrowing" where you can create an alias if the compiler can guarantee that the previous variable cannot be accessed during the lifetime that the alias exists.

In my understanding, you need a strong type system in order to properly enforce those guarantees and in dynamic languages like JavaScript you would have to be too pessimistic and do way more copies than necessary when you are just passing the variable around, ruining the wins you get from building the string in the first place.

Solution 4: Size in the variable

Aliasing is usually a dealbreaker because you can mutate the underlying storage that another variable could observe. But in this particular case we can exploit the fact that the only mutation we care about is appending something at the end.

So, in the variable we not only keep a pointer to the buffer but also the size we care about. If someone else appends something at the end, it will not affect us because what's in the buffer for that size didn't change.

var str = '\n';
// buffer1, size = 1, ['\n', _, _, _, _, _, _, _]
// str : size = 1, buffer = buffer1
 
var str2 = str;
// str2: size = 1, buffer = buffer1
 
str += ' * ';
// buffer1, size = 4, ['\n', ' ', '*', ' ', _, _, _, _]
// str : size = 4, buffer = buffer1

At this point, str2 points to buffer1 with '\n * ' but because it has size = 1 then we know it really is '\n' as intended.

The only edge case to consider is if you are trying to also concatenate str2. If the size of the variable is not equal to the size of the underlying buffer, this means that someone else clobbered the buffer. In this case, our only option is to do a full copy.

str2 += '|';
// buffer2, size = 8, ['\n', '|', _, _, _, _, _, _]
// str2: size = 2, buffer = buffer2

Conclusion

Before joining the team, I knew about the string builder pattern but I had no idea that there was so much theory behind this particular problem like aliasing, linear types... I hope that explaining those concepts in terms of JavaScript is helpful to get some insights into what's happening inside of compilers.

Andres Suarez pointed me to some interesting code in the Hack codebase:

let slash_escaped_string_of_path path =
  let buf = Buffer.create (String.length path) in
  String.iter (fun ch ->
    match ch with
    | '\\' -> Buffer.add_string buf "zB"
    | ':' -> Buffer.add_string buf "zC"
    | '/' -> Buffer.add_string buf "zS"
    | '\x00' -> Buffer.add_string buf "z0"
    | 'z' -> Buffer.add_string buf "zZ"
    | _ -> Buffer.add_char buf ch
  ) path;
  Buffer.contents buf

What it does is to turn all the occurrences of \, :, /, \0 and z into zB, zC, zS, z0 and zZ. This way, there won't be any of those characters in the original string which are probably invalid in the context where that string is transported. But you still have a way to get them back by transforming all the z-sequences back to their original form.

Why is it useful?

The first interesting aspect about it is that it's using z as an escape character instead of the usual \. In practice, it's less likely for a string to contain a z rather than a \ so we have to escape less often.

But the big wins are coming when escaping multiple times. In the \ escape sequence, it looks something like this:

  • \ -> \\ -> \\\\ -> \\\\\\\\ -> \\\\\\\\\\\\\\\\

whereas with the z escape sequence:

  • z -> zZ -> zZZ -> zZZZ -> zZZZZ

The fact that escaping a second time doubles the number of escape characters is problematic in practice. I was working on a project once where we found out that the \ character represented 70% of the payload!

Conclusion

It's way too late to change all the existing programming languages to use a different way to escape characters but if you have the opportunity to design an escape sequence, know that \ escape sequence is not always the best šŸ™‚

It's very trendy to bash at Computer Science degrees saying that it costs a lot of time and money and at the end, you haven't learned much useful things for your day job. My experience going to EPITA, a French school in Paris, has been the complete opposite!

I started programming when I was around 10. By 13, I was already being contracted for real money by someone in the US (Hi Thott!). So, when I started EPITA at 18, I already had a ton of experience as a self-taught programmer. My biggest fear was: "Would I learn something new?" and, I did learn a ton! I know wouldn't have done all the impactful things I have at Facebook without it.

What struck me was that I already knew a lot and I knew there was a lot left to learn, but I didn't really know what nor how to get started learning it on my own.

Big Themes

Here are some themes where I learned entirely new domains of knowledge at school, many of which I would likely not have learned, or at least not as in depth if I didn't do a CS degree.

  • C: I've written a ton of C at school and it was super annoying but learning how to do manual memory management, use pointers and have raw access to memory was a huge breakthrough in how I understood how code was actually running.
  • Data Structures and Algorithms. I had 6 hours of courses per day for 2 years. There's so much to learn! What's interesting is that you don't need any of it to build what you want, but once you hit scale and you want to optimize and maintain it, knowing about all of them become critical!
  • OCaml: The whole idea of manipulating immutable lists via recursion to actually do something useful at the end was very weird. Even though I still despise this style of programming, there are good ideas behind it and React is a good example of a practical system that makes use of them.
  • Source Control: I used to change index2.php on the production machine and cp it to index.php using sftp when I was happy with my changes... for a website with 1 million unique visitors a day... It was fun and scary, but I wouldn't go back to that in a million years!
  • Assembly: Learning how a computer works from nand gates all the way to assembly was really mind blowing. There are so many levels of abstractions happening between the hardware and some piece of JavaScript that executes in the browser that it's hard to believe it works and isn't crazy slow.
  • Machine Learning (it wasn't hype at the time) is really not that magical. It's a bunch of heuristics in order to massage the data in a way that it can be separated by a line. It also made me happy that finally all the linear algebra I've learned for years was useful to something. (Math was also useful for all the signal processing stuff with fourrier transform!)
  • Academia. My theory is that nothing is new, everything has already been explored but because people there insist on using latex with maths symbols everywhere and publish to closed platforms, it doesn't get the reach it deserves.
  • Tradeoffs. If I were to summarize all the learnings, it's probably that there is not one perfect solution. There's usually a bunch of solutions with different sets of tradeoffs and you need to find the least shitty one. It was a sad realization but gave me a much better framework to work with.

Specific School Projects

What was awesome about EPITA is that not only we had classes like any normal school but also a ton of actual non-trivial projects to implement. Here are some that had a profound impact on me:

  • Reimplementing Warcraft 3 from scratch. I learned so much about parsing (and reverse engineering) data structures, the whole 3D rendering space... But also having a project with a team of 4 people for an entire year and how to sell. It was a pretty awesome first year project šŸ™‚
  • Reimplementing malloc from scratch (btw, printf uses malloc which makes debugging interesting...). Learned so much about memory management and running firefox using my malloc was so awesome!
  • Reimplementing bash from scratch. I learned a ton about how to parse and execute programming languages and how crazy the unix low level apis are. I'm still using this knowledge today when writing scripts thanks to this project.
  • Implementing a regex execution using OCaml. OCaml is --really-- good for working with well defined data structures and algorithms (I have big reservations for anything else outside of this narrow scope...). Since I'm using regexes almost on a daily basis, it's been super helpful.
  • We had 4-hour machine exams every couple of weeks for a year where you basically have to solve as many interview-style problems as possible. This was so much fun and helped to land a job at Facebook.
  • Implement the fastest possible fuzzy finder. I learned so much about orders of magnitude in perf: just opening the file in JavaScript took way too long, C is faster... malloc is really slow, custom bump allocator is better. Smaller structs is better, bit packing ftw. Pre-computing data for heuristics can dramatically help.

And many more... but that's a good enough list for now!

Actual Use Cases

Now, the big question is: but did you use any of that at work? And the answer is yes, so many times!

  • Small thing but when building our open source websites, I wanted to only write next: page in the markdown and not have to have to write prev nor have a global index and yet have an table of content. It took me a second to realize that I had a list of edges of a graph and I needed to rebuild a graph from it. Because I've written so many graph traversals at school, it took me under an hour to build it correctly.
  • EPITA enforced strict lint rules where for each violation you'd lose 2 points (out of 20) for all your assignments. The trick is that you were only given a pdf with the rules and no program to verify them, so you had to learn them the hard way. This motivated me years later to work on prettier so that we can have the benefits of a consistent codebase, the less hardcore way...
  • When I was pitched React for the first time, it sounded to me crazy that re-rendering everything could be fast, but once I was proved it was not the case, I already knew it would work. I already made the mental journey of all thinking about the tree traversals, functional programming paradigms... It didn't sound foreign to me as it did for so many other people.
  • I've spent a lot of time building lint rules, codemods, extended syntax, pretty printers... because I've had excellent courses on how to parse and execute programming languages and had to build many at school, the challenge was not about learning how it works but more how to actually implement it in the context of JavaScript.
  • While on the photos team at Facebook, one of the issue was that we cropped photos in a terrible way. Building an algorithm to find the best crop turned out to be manageable because I knew how to read the literature around existing approaches. Once I settled on one, I already wrote a ton of similar algorithms so it was just a matter of writing code down with little challenge.

Conclusion

I could have gotten a well paying programming job right out of high school as a web developer, but going to EPITA made me such a better programmer because I've been exposed in depth to a lot of areas that I would likely have never gone on my own.

Now, if I see a problem on a web developer task, I also know that changing the programming language, using machine learning, writing code in a more C-like way, using other data structures and algorithms... are possible tools to solve that problem. More importantly, if I need to use those, I can implement them because I've already done similar things in the past.

Your experience may vary, but for me, getting a CS degree at EPITA was a really good decision and I would do it again if I had to.

During the past few weeks, I've been working on prettier, which is a JavaScript pretty printer. We are approaching the phase where we can actually use it so this is a good time to explain how it works.

We're going to go through an example

if (!pretty) { makePretty() }

String -> AST

The first step is to take this string that represents some JavaScript and to parse it in order to get an AST out of it. An AST is a tree that represents the program. Using either Babylon or Flow we can parse this example and we get the following tree.

Program
  IfStatement
    UnaryExpression(!)
      Identifier(pretty)
    BlockStatement({})
      ExpressionStatement
        CallExpression
          Identifier(makePretty)

You can explore the full AST using astexplorer.net.

AST -> IR

Now that we have this tree, we want to print it. For each type of node like IfStatement, UnaryExpression... we're going to output something. In the case of prettier, this something is an intermediate representation called a document as described by the paper a prettier printer by Philip Wadler.

[
  group([
    "if (",
    group([ indent(2, [ softline, "!", "pretty" ]), softline ]),
    ")",
    " ",
    "{",
    indent(2, [ hardline, "makePretty", "()", ";" ]),
    hardline,
    "}"
  ]),
  hardline
];

You can play around with this representation on the prettier explorer.

IR -> String

The interesting thing about this representation is that it is the same no matter what the line-length is. The basic idea is that the primitives such as group, indent, softline encode the way they should look if they fit in the line or if they don't.

The most important primitive is group. The algorithm will first try to recursively print a group on a single line. If it doesn't fit the desired width, then it's going to break the outer group and keep going.

Then, we have primitives that behave differently if they are in a group that fits a single line or not: softline that does not print anything if the group it is contained in fits and a line otherwise. indent adds a level of indentation if it doesn't fit. If you are curious, you can look at the short list of available commands.

So, we just need to take this IR, send it through a solver along with the desired line width and we get the result!

HN6WzI9wFW

Conclusion

Hopefully this gives you a better idea of how a pretty printer that takes into account the desired width work.