When I was 14, Philippe Pelletier reached out to me with an interesting project. He is passionate about cinema and maintains a database (lots of physical sheets of paper and a growing number of Word documents) of every movie an actor played in, the cover of the movie, a biography of that person. He had the idea of turning this into a website to share this work.

I built the website called CinéArtistes over the summer and forgot about it. Turns out, 10 years later with no code change, it is still running and growing!

Finally something is broken!

Philippe contacted me this week because the website was getting unstable and those errors started appearing more and more regularly over the last few months. It has gotten to a point where it was unusable at peak hours.

error-cineartistes

As a first glance, it looked like the mysql database was saturated. When I went to cineartistes.com, the page would take something like 10 seconds to load. My intuition was that some queries were not scaling properly with the growing number of entries in the database.

When I browsed around the website and went to an actor page, it was actually really fast. This confirmed my suspicion that this was not a site-wide issue but just a problematic query on the home page.

Going down memory lane

Now that I had an idea of what was going on, I asked Philippe how do I connect to the server and he gave me a FTP url, username and password. It's been a long time since I haven't used FTP for anything!

The entire website was a single file 2800-lines file called index.php! Mind you, there isn't any version control and the staging environment is a file called index_dev.php.

Even though today it seems crazy, it's a good reminder that tools are there to help you achieve your goal. This codebase has worked flawlessly to power the website for more than 10 years with 0 intervention.

It is also surprisingly well structured. There's a router that calls a specific function for every page, good thing that at the time having pretty URL were not a thing 🙂

switch ($_POST['page']) {
  case "admin": OnAdmin(); break;
  case "import": OnImport(); break;
  case "modif": OnModif(); break;
  case "modiffestival": OnModifFestival(); break;
  case "ajout": OnAjout(); break;
  case "ajoutfestival": OnAjoutFestival(); break;
  ...

I already structured my code as components:

PrintHeader();
Liste($rech_nom, $rech_naiss, $rech_deces, $rech_activite);
PrintFooter();

Of course, the implementation isn't fancy: echoing concatenated strings for most of the code and ending php interpolation ?><html...><?php for static things.

Finding the performance bottleneck

Time to go back to the task at hand, we need to figure out why the front-page is so slow. My intuition is that there is a query that runs slowly, so I want to instrument all the mysql_query calls and log the time each one takes.

Thankfully, there is a database abstraction in place $db_mysql->query(...)! This kind of blows my mind! I probably got super annoyed that the mysql API was so hard to use. The abstraction code doesn't look like it was coded by me, I probably copy and pasted it from somewhere else 🙂

Since all the calls are centralized there, I just need to do

$t = microtime(true /* Seriously php ... */);
// do the query
print($query, (microtime(true) - $t) * 1000);

But there's one little problem, how do I print? If I'm using echo, it's going to show it in the middle of the content and everything will look broken. I could buffer it to a global variable and print it at the end but there's a better solution: I can use JavaScript console.log!

function print() {
  echo '<script>console.log(';
  foreach (func_get_args() as $elem) {
    echo '"'.$elem.'", ';
  }
  echo '"");</script>'; // notice the "" in order to avoid dealing with trailing comma
}

At first I tried to use json_encode to be safe if there was a " in the query but guess what, this 10 years old version of PHP doesn't implement it!

I was expecting to see one query takes many seconds but I was a bit disconcerted when the most costly one would only take ~100ms. It turns out that this one was being executed 80 times in a row!

SELECT COUNT( * ) FROM `images` WHERE `from`='4020' AND `type`='2' 104ms
SELECT COUNT( * ) FROM `images` WHERE `from`='4019' AND `type`='2' 117ms
SELECT COUNT( * ) FROM `images` WHERE `from`='4019' AND `type`='2' 101ms
SELECT COUNT( * ) FROM `images` WHERE `from`='4018' AND `type`='2' 125ms
SELECT COUNT( * ) FROM `images` WHERE `from`='4018' AND `type`='2' 104ms
... 80 times ...

This is a classical N+1 problem where we first query for the the elements in the list and then send one query for each one. The proper way to fix this is to refactor the code to merge the inner query in the outer one, but this is super annoying to do in practice.

At Facebook, we use GraphQL and Relay which solves this problem elegantly: it lets you write the queries in a nested fashion as it is in this example, but has a pre-process step that merges all those queries into one.

Make it fast

Anyway, I first wanted to figure out if I could optimize the query instead. It is just being used to check if an actor has at least one image of type 2 (a cover photo), it really shouldn't take 100ms.

I saw three possible improvements:

1) COUNT(*) is wasteful because we don't care about the total count, we just want to know if there's at least one. I learned the hard way that at Facebook, count is extremely expensive because you need to privacy check all the elements in order to know the real count. This means that count is as expensive as fetching all the elements. But in this case, it probably isn't doing such dramatic things.

2) While searching, I found someone writing that if you search for a string on an int field, mysql would be much much slower. That seemed to make sense so I removed ' around the value and unfortunately it made the query go 3 times slower!??!? This wasn't the quick win I was hoping to see.

3) I've read many times that adding an index is super important but had never done it before, so I google'd for it and found this command:

ALTER TABLE `images` ADD INDEX (`from`);

After running this single line of code, the time it took went from 100ms to 0.5ms! So, instead of spending 8 seconds to run those 80 queries it only took 40ms. Problem solved.

Conclusion

Rewrite

Philippe asked several people to try and fix this problem before reaching out to me and their response was along the lines of: this website is built using ancient technology, you need to port it to <insert framework/language name>.

Rewriting is usually the first response but I've learned that it is usually not the best answer. In this case, it took one hour end to end to fix the issue. It would have taken a month+, a lot of money and time and disrupted Philippe flow to bring a new website for no obvious gains.

Precision

Looking back at my code, I can't help seeing many good patterns such as use of components, wrapping mysql calls into a helper, centralizing routing code... But, they were not being used consistently and there was also a lot of terrible patterns. So the fact that they were there is probably a mix of luck, intuition and part of trial and error.

After 10 years, I am now able to pick out every single pattern in this code and talk about the pros and cons and decide whether it would be a good or bad idea to use there. Experience made me a lot more confident and intentional when I code.

Practice makes perfect

If you were to give a similar task to my 14-years old self, I would probably have been able to figure out that I needed to instrument the mysql_query wrapper and add an index. But I bet it would have taken me multiple days to do that.

The reason is because it required me to execute a lot of auxiliary tasks such as

  • Setup the dev environment to update files on a remote FTP on save.
  • Figure out how to find out the code responsible for outputting an element on the screen.
  • Write PHP code that can be run on a 10 years old version (we are spoiled with Hack at Facebook!).
  • Implement a print function with variadic arguments that outputs console.log using a script tag and know that I didn't need to implement escaping.
  • Connect to a mysql instance and select the right database in order to be able to run the query.
  • ...

I know that I struggled with each of those for many days/weeks/months in the past, but at this point, I've done variations of those so many times that I have muscle memory for all those tasks. I also used Google extensively during this hour, not to learn, but to remind me how to do those tasks.

Final thoughts

It will be interesting to read this article in 10 years and figure out what I was not seeing while writing this article. It'll certainly be more meta 🙂

PHP and JavaScript are both renowned to be languages with a lot of quirks. However two major initiatives on both sides, Hack for PHP and ES6 for JavaScript made the languages much better and modern. In this article I'm going to show all the ES6 features that are also in Hack.

Arrow Function

Both languages adopted the same shorter way to write functions. On JavaScript side, the main advantage is the automatic binding of this and for PHP it removes the need to declare all the variables you want to use from outside. ES6, Hack.

// JavaScript
var odds = evens.map(v => v + 1);
var nums = evens.map((v, i) => v + i);
nums.filter(v => {
  if (v % 5 === 0) {
    console.log(v);
    return true;
  }
  return false;
});
// Hack
$odds = array_map($v ==> $v + 1, $evens);
$nums = array_map(($v, $i) ==> $v + $i, $evens);
array_filter($nums, $v ==> {
  if ($v % 5 === 0) {
    echo $v;
    return true;
  }
  return false;
});

Class

JavaScript finally gets a class abstraction with ES6. It is however the bare minimal one to be useful, you cannot define constants, protected/private methods, traits ... PHP on this side is much better, without any Hack addition. ES6, PHP5.

// JavaScript
class SkinnedMesh extends THREE.Mesh {
  constructor(geometry, materials) {
    super(geometry, materials);
    this.idMatrix = SkinnedMesh.defaultMatrix();
    this.bones = [];
  }
  update(camera) {
    super.update();
  }
  static defaultMatrix() {
    return new THREE.Matrix4();
  }
}
// Hack
class SkinnedMesh extends THREE\Mesh {
  public function constructor($geometry, $materials) {
    parent::__construct($geometry, $materials);
    $this->idMatrix = SkinnedMesh::defaultMatrix();
    $this->bones = array();
  }
  public function update($camera) {
    parent::update();
  }
  static private function defaultMatrix() {
    return new THREE\Matrix4();
  }
}

Enhanced Object Literal

One long standing issue with object literals in JavaScript is the inability to use an expression as a key. This is fixed with the bracket notation in ES6. PHP 5.4 introduced a short notation for arrays as well. ES6, PHP.

// JavaScript
var obj = { [Math.random()]: true };
// Hack
$obj = [rand() => true];

Template Strings

Multiline strings and variable interpolations are something that have always been possible in PHP, yet they only start to work in ES6! ES6, PHP.

// JavaScript
var multiline = `In JavaScript this is
 not legal.`
var name = 'Bob',
    time = 'today';
`Hello ${name}, how are you ${time}?`
// Hack
$multiline = 'In PHP this is
 legal.';
$name = 'Bob';
$time = 'today';
"Hello $name, how are you $time?";

Default Arguments

It was possible to write default arguments in JavaScript but ES6 adds proper support for it right in the function declaration. Guess what, PHP had support for it all along. ES6, PHP.

// JavaScript
function f(x, y=12) {
  return x + y;
}
f(3) === 15;
f(2, 10) === 12;
// Hack
function f($x, $y=12) {
  return $x + $y;
}
f(3) === 15;
f(2, 10) === 12;

Iterator + for of

JavaScript has two ways to iterate on collections, either

for (var i = 0; i < array.length; ++i) { var element = array[i]; /* ... */ }
for (var key in object) { var element = object[key]; /* ... */ }

ES6 is now introducing a unified way to do iteration, that PHP always had, as well as a way to write custom collections via the iterator pattern, introduced in PHP5. ES6, PHP, PHP5.

// JavaScript
var fibonacci = {
  [Symbol.iterator]: function() {
    var previous = 0;
    var current = 1;
    return {
      next: function() {
        var new_previous = current; 
        current += previous; 
        previous = new_previous; 
 
 
        return {
          value: current,
 
 
          done: false
        }
      }
    }
  }
}
 
 
 
 
 
for (var n of fibonacci) {
  if (n > 1000) break;
  console.log(n);
}
// Hack
class Fibonacci implements Iterator<int> { 
  private $key = 0;    
  private $previous = 1;
  private $current = 0;
 
  public function next() { 
      $new_previous = $this->current; 
      $this->current += $this->previous; 
      $this->previous = $new_previous; 
      $this->key++; 
  } 
  public function current() { 
      return $this->current; 
  } 
  public function valid() { 
      return true; 
  } 
  public function key() { 
      return $this->key; 
  } 
  public function rewind() { 
      $this->previous = 1; 
      $this->current = 0; 
      $this->key = 0; 
  } 
}
foreach (new Fibonacci() as $n) { 
  if ($n > 1000) break; 
  echo $n; 
}

Generators

Python pioneered generators as another tool to manage control flow. It has originally been designed and promoted as an easier way to write iterators, but really shined as a better way to write asynchronous operations than callbacks. ES6, PHP5.

// JavaScript
var fibonacci = {
  [Symbol.iterator]: function*() {
    var previous = 1;
    var current = 0;
    for (;;) {
      var new_previous = current; 
      current += previous; 
      previous = new_previous; 
      yield current;
    }
  }
}
for (var n of fibonacci) {
  if (n > 1000) break;
  console.log(n);
}
// Hack
 
function fibonacci() {
  $previous = 1;
  $current = 0;
  for (;;) {
    $new_previous = $current; 
    $current += $previous; 
    $previous = $new_previous; 
    yield $current;
  }
}
 
foreach (fibonacci() as $n) { 
  if ($n > 1000) break; 
  echo $n; 
}

ES7 Async Await

C# introduced the concept of async/await combination to deal with asynchronous programming. The underlying implementation is very similar to generators but has proper syntax support. It is an addition of Hack on-top of PHP. ES7, Hack.

// JavaScript
async function chainAnimationsAsync(element, animations) {
  var result = null;
  try {
    for (var animation in animations) {
      result = await animation(element);
    }
  } catch (e) { /* ignore and keep going */ }
  return result;
}
// Hack
async function chainAnimationsAsync($element, $animations) {
  $result = null;
  try {
    foreach ($animations as $animation) {
      $result = await animation($element);
    }
  } catch (Exception $e) { /* ignore and keep going */ }
  return $result;
}

Map + Set

Both JavaScript and PHP are notorious for attempting to fit all the collection use cases into a single general purpose type. Both ES6 and Hack bring to the table proper support for Map and Set. ES6, Hack

// JavaScript
var s = new Set();
s.add('hello').add('goodbye').add('hello');
s.size === 2;
s.has('hello') === true;
 
var m = new Map();
m.set('hello', 42);
m.get('hello') === 42;
// Hack
$s = new Set();
$s->add('hello')->add('goodbye')->add('hello');
$s->count() === 2;
$s->contains('hello') === true;
 
$m = new Map();
$m->set('hello', 42);
$m->get('hello') === 42;

TypeScript

Last but not least, both languages are getting gradual typing. TypeScript, Hack.

// JavaScript
class Greeter<T> {
  greeting: T;
  constructor(message: T) {
    this.greeting = message;
  }
  greet() {
    return this.greeting;
  }
}
 
var greeter = new Greeter<string>("Hello, world");
console.log(greeter->greet());
// Hack
class Greeter<T> {
 
  public function __construct(private T $greeting) {}
 
 
  public function greet() {
    return $this->greeting;
  }
}
 
$greeter = new Greeter("Hello, world");
echo $greeter->greet();

Conclusion

With ES6 and Hack efforts, JavaScript and PHP are becoming languages with modern features. If you tried them 5 years ago, you should take another look, they are not as crappy as they once were 🙂

Mysqli Wrapper is shortening the code required to run queries, make them 100% safe against SQL Injections and gives a handy Array as result. No more pain writing queries!

You can view the source at the MysqliWrapper Github Repository.

When developing a PHP application, SQL queries is the most dangerous area. This is because the built-in tools are either not secure by conception or too much complicated to use. I've made a little wrapper to mysqli that solves all these problems and make you enjoy writing queries!

You are probably writing your queries like this. We assume we have $zip(string) and $pop(int) variables previously declared in the code.

// Basic Method
mysql_connect('host', 'user', 'pass');
mysql_select_db('database');
$query = "SELECT Name, CountryCode FROM City 
    WHERE Zip = '".mysql_real_escape_string($zip)."' AND Population > ".(int) $pop;
$result = mysql_query($query);
while ($row = mysql_fetch_array($result)) {
    // $row['Name'], $row['CountryCode']
}

With this method, you have to secure all the fields yourself. To do it, you have to type the loooong function mysql_real_escape_string and you must not forget the quotes around the parameter. The code is unreadable, not safe by concept because you may forget to sanitize a field ... And, it is annoying to write!

Mysqli was intended to be a wrapper around Mysql that would provide safe queries. You may wonder why this has not become the default way to write queries. Have a look at the exact same code written with mysqli.

// Mysqli Method
$mysqli = new mysqli('host', 'user', 'pass', 'database');
$query = "SELECT Name, CountryCode FROM City WHERE Zip = ? AND Population > ?";
if ($stmt = $mysqli->prepare($query)) {
    $stmt->bind_param('si', $zip, $pop);
    $stmt->execute();
    $stmt->bind_result($name, $code);
    while ($stmt->fetch()) {
        // $name, $code
    }
    $stmt->close();
}

Yes, 10 lines to make a SELECT. Obviously they failed to design an easy to use API. It is also too restrictive: you have to assign a variable for each of the field you want to select. This seems to be a really waste of time: it's already in the query, why would you write it another time? Also, that's not possible to use '*' to select all the fields.

Here is my attempt to easily to write and safe queries:

// My method
$db = new dbWrapper('host', 'user', 'pass', 'database', true);
$result = $db->q("SELECT Name, CountryCode FROM City WHERE Zip = ? AND Population > ?",
  'si', $zip, $pop);
foreach ($result as $key => $city) {
  // $city['Name'], $city['CountryCode']
}

What's good about this:

  • A single 6 characters function
  • Secure because parameterized
  • Returns an Array
  • Easy migration from the basic method

I'm really happy with that wrapper. It is faster to write the queries because you don't have to sanitize the parameters neither you have to think about safety and there's only a 6 character function to remember. The code is smaller and easier to use because it returns a real Array! Also, this allows to display mysql errors on the development server only and doesn't requires to type or die(mysql_error()); everytime.

If you don't know by advance the number of parameters (A custom search where field may be selected or not by the user for example), this implementation becomes really tricky to deal with. I have no solution at this time for this problem. The v8cgi Query API would handle this problem really well, however i am not confident enough in letting a script writing my queries.

The wrapper code is really hacky but fits in 60 lines so can be embedded really easily. Download

Class dbWrapper {
    protected $_mysqli;
    protected $_debug;
 
    public function __construct($host, $username, $password, $database, $debug) {
        $this->_mysqli = new mysqli($host, $username, $password, $database);
        $this->_debug = (bool) $debug;
        if (mysqli_connect_errno()) {
            if ($this->_debug) {
                echo mysqli_connect_error();
                debug_print_backtrace();
            }
            return false;
        }
        return true;
    }
 
    public function q($query) {
        if ($query = $this->_mysqli->prepare($query)) {
            if (func_num_args() > 1) {
                $x = func_get_args();
                $args = array_merge(array(func_get_arg(1)),
                    array_slice($x, 2));
                $args_ref = array();
                foreach($args as $k => &$arg) {
                    $args_ref[$k] = &$arg; 
                }
                call_user_func_array(array($query, 'bind_param'), $args_ref);
            }
            $query->execute();
 
            if ($query->errno) {
              if ($this->_debug) {
                echo mysqli_error($this->_mysqli);
                debug_print_backtrace();
              }
              return false;
            }
 
            if ($query->affected_rows > -1) {
                return $query->affected_rows;
            }
            $params = array();
            $meta = $query->result_metadata();
            while ($field = $meta->fetch_field()) {
                $params[] = &$row[$field->name];
            }
            call_user_func_array(array($query, 'bind_result'), $params);
 
            $result = array();
            while ($query->fetch()) {
                $r = array();
                foreach ($row as $key => $val) {
                    $r[$key] = $val;
                }
                $result[] = $r;
            }
            $query->close(); 
            return $result;
        } else {
            if ($this->_debug) {
                echo $this->_mysqli->error;
                debug_print_backtrace();
            }
            return false;
        }
    }
 
    public function handle() {
        return $this->_mysqli;
    }
}

Update February 17 2010: Now works for php >= 5.3.0. Added debug_print_backtrace to get the sql query when there is an error.