Andres Suarez pointed me to some interesting code in the Hack codebase:
let slash_escaped_string_of_path path = let buf = Buffer.create (String.length path) in String.iter (fun ch -> match ch with | '\\' -> Buffer.add_string buf "zB" | ':' -> Buffer.add_string buf "zC" | '/' -> Buffer.add_string buf "zS" | '\x00' -> Buffer.add_string buf "z0" | 'z' -> Buffer.add_string buf "zZ" | _ -> Buffer.add_char buf ch ) path; Buffer.contents buf |
What it does is to turn all the occurrences of \, :, /, \0 and z into zB, zC, zS, z0 and zZ. This way, there won't be any of those characters in the original string which are probably invalid in the context where that string is transported. But you still have a way to get them back by transforming all the z-sequences back to their original form.
Why is it useful?
The first interesting aspect about it is that it's using z as an escape character instead of the usual \. In practice, it's less likely for a string to contain a z rather than a \ so we have to escape less often.
But the big wins are coming when escaping multiple times. In the \ escape sequence, it looks something like this:
\->\\->\\\\->\\\\\\\\->\\\\\\\\\\\\\\\\
whereas with the z escape sequence:
z->zZ->zZZ->zZZZ->zZZZZ
The fact that escaping a second time doubles the number of escape characters is problematic in practice. I was working on a project once where we found out that the \ character represented 70% of the payload!
Conclusion
It's way too late to change all the existing programming languages to use a different way to escape characters but if you have the opportunity to design an escape sequence, know that \ escape sequence is not always the best 🙂