Z Escape Sequence

Andres Suarez pointed me to some interesting code in the Hack codebase:

let slash_escaped_string_of_path path =
  let buf = Buffer.create (String.length path) in
  String.iter (fun ch ->
    match ch with
    | '\\' -> Buffer.add_string buf "zB"
    | ':' -> Buffer.add_string buf "zC"
    | '/' -> Buffer.add_string buf "zS"
    | '\x00' -> Buffer.add_string buf "z0"
    | 'z' -> Buffer.add_string buf "zZ"
    | _ -> Buffer.add_char buf ch
  ) path;
  Buffer.contents buf

What it does is to turn all the occurrences of \, :, /, \0 and z into zB, zC, zS, z0 and zZ. This way, there won't be any of those characters in the original string which are probably invalid in the context where that string is transported. But you still have a way to get them back by transforming all the z-sequences back to their original form.

Why is it useful?

The first interesting aspect about it is that it's using z as an escape character instead of the usual \. In practice, it's less likely for a string to contain a z rather than a \ so we have to escape less often.

But the big wins are coming when escaping multiple times. In the \ escape sequence, it looks something like this:

  • \ -> \\ -> \\\\ -> \\\\\\\\ -> \\\\\\\\\\\\\\\\

whereas with the z escape sequence:

  • z -> zZ -> zZZ -> zZZZ -> zZZZZ

The fact that escaping a second time doubles the number of escape characters is problematic in practice. I was working on a project once where we found out that the \ character represented 70% of the payload!

Conclusion

It's way too late to change all the existing programming languages to use a different way to escape characters but if you have the opportunity to design an escape sequence, know that \ escape sequence is not always the best 🙂

If you liked this article, you might be interested in my Twitter feed as well.
 
 

Related Posts

  • August 27, 2011 Start a technical blog, it’s worth it! (6)
    Lately, I've been advocating to all my student friends to start a blog. Here's an article with the most common questions answered :) What are the benefits? Being known as an expert. The majority of my blog posts are about advanced Javascript topics. As a result, I'm being tagged as […]
  • December 3, 2023 LLM As A Function (8)
    LLMs have seen a huge surge of popularity with ChatGPT by going from prompt to text for various use cases. But what's really exciting is that they are also extremely useful as ways to implement normal functions within a program. This is what I call LLM As A Function. For example, if you […]
  • August 4, 2009 Project – Shortest Path (2)
    A school project was to find the shortest path in a dungeon graph. You start with an amount of hit points, and each edge gives or removes hit points. You have to find the path from two points going through the minimum of edges (no matter their value) alive (hp > 0 all along the path). […]
  • September 22, 2011 URLON: URL Object Notation (43)
    #json, #urlon, #rison { width: 100%; font-size: 12px; padding: 5px; height: 18px; color: #560061; } I am in the process of rewriting MMO-Champion Tables and I want a generic way to manage the hash part of the URL (#table__search_results_item=4%3A-slot). I no longer […]
  • April 28, 2020 Make the game easy (16)
    If you watch pro pool players, most of the time the game is super boring, if you don’t believe me, watch this video from someone that puts 152 balls in a row. What’s interesting is that if you were to look at each shot individually, most of them are easy. I can likely make the 152 pots […]