Andres Suarez pointed me to some interesting code in the Hack codebase:
let slash_escaped_string_of_path path = let buf = Buffer.create (String.length path) in String.iter (fun ch -> match ch with | '\\' -> Buffer.add_string buf "zB" | ':' -> Buffer.add_string buf "zC" | '/' -> Buffer.add_string buf "zS" | '\x00' -> Buffer.add_string buf "z0" | 'z' -> Buffer.add_string buf "zZ" | _ -> Buffer.add_char buf ch ) path; Buffer.contents buf |
What it does is to turn all the occurrences of \
, :
, /
, \0
and z
into zB
, zC
, zS
, z0
and zZ
. This way, there won't be any of those characters in the original string which are probably invalid in the context where that string is transported. But you still have a way to get them back by transforming all the z-sequences
back to their original form.
Why is it useful?
The first interesting aspect about it is that it's using z
as an escape character instead of the usual \
. In practice, it's less likely for a string to contain a z
rather than a \
so we have to escape less often.
But the big wins are coming when escaping multiple times. In the \
escape sequence, it looks something like this:
\
->\\
->\\\\
->\\\\\\\\
->\\\\\\\\\\\\\\\\
whereas with the z escape sequence:
z
->zZ
->zZZ
->zZZZ
->zZZZZ
The fact that escaping a second time doubles the number of escape characters is problematic in practice. I was working on a project once where we found out that the \
character represented 70% of the payload!
Conclusion
It's way too late to change all the existing programming languages to use a different way to escape characters but if you have the opportunity to design an escape sequence, know that \
escape sequence is not always the best 🙂