I was reading about text layout algorithms when I found a striking parallel with Google Plus image layout algorithm. This blog article explains how to find the best line breaks for the image layout.

Line Breaking Algorithm

"Breaking Paragraphs into Lines" article written in 1980 by Donald Knuth and Michael Plass explain how to find the best line breaks in order to make justified text look pretty. The goal is to find where to put line breaks on this paragraph:

In olden times when wishing still helped one, there lived a king
whose daughters were all beautiful; and the youngest was so
beautiful that the sun itself, which has seen so much, was aston-
ished whenever it shone in her face. Close by the king’s castle lay
a great dark forest, and under an old lime-tree in the forest was
a well, and when the day was very warm, the king’s child went
out into the forest and sat down by the side of the cool fountain;
and when she was bored she took a golden ball, and threw it up
on high and caught it; and this ball was her favorite plaything.

The first thing to notice is that you cannot break everywhere. In bold, are all the words that can be end of lines while maintaining the constraint of reasonably stretching white spaces in the line.

The first line only has two allowed words while the line before last line has seven. However, the seven are not allowed at the same time, each of them depend on the previous breaks configuration. You can draw a graph that shows all the possible breaks configurations.

graph-knuth

Each transition in this graph represents a line of text (between two breaks). If you can give a fitness value to each line, then you can apply a shortest path algorithm on the graph to find the best breaks configuration. The rest of the article studies a flexible fitness framework that covers all major text layout constraints, but that will not be of any help for our problem.

Parallel with Image Layout

Let's say we want to layout the following five images:

There are many ways to split the images into lines. Here is an exhaustive list of the lines we can have for our five images:

 1
 12      2
 123     23     3
 1234    234    34    4
 12345   2345   345   45

And, we need to link the lines such that they form 12345 at the end. For example 1-2-3-4-5, 12345, 123-4-5 ... I've drawn the full graph so you get an idea of what it looks like:

Like in the line breaking algorithm for text, not all the break positions are wanted. Having only one image would make it way too big, having more than three would make them too small. So we're going to remove them from the graph.

By disallowing some lines, some nodes are now unreachable, trimming down the graph as well. In our example, we are left with only two valid layouts: 123-45 and 12-345.

possible

Cost Metrics

I found it very hard to visually evaluate two layouts. Most of the time, you can easily spot extreme things you don't want such as a giant image or too many small ones. Then, the goal of the cost metric is to minimize the number of those occurrences.

Max height

The first approach tries to respect the constraints of the first-fit heuristic: as soon as the height goes below a threshold, we display the images. We therefore have a MAX_HEIGHT constant that no line can go above.

Because the graph is acyclic, we can use Dijkstra shortest path algorithm to find the longest path (Proof), we just have to return negative costs.

function cost(images, i, j) {
  var height = compute_height(images, i, j);
  if (height > MAX_HEIGHT) { return null; }
 
  // Maximize the number of breaks
  return -1;
  // Maximize the total height
  return -height;
  // Maximize the average image height
  return -(height * (j - i));
}

Target Height

The other approach is to have a target height per line and try to have all the lines as close as possible to the target. The function should return 0 when the line is exactly the size of the target line.

We can write several versions that are going to weight errors differently. You can have a greater variance with a more precise mean, or the opposite. You can prefer being a bit bigger than smaller ...

function cost(images, i, j) {
  var height = compute_height(images, i, j);
 
  // Minimize the total height difference from the target
  return Math.abs(TARGET_HEIGHT - height);
 
  // Minimize the difference to the target and
  // penalize more harshly big differences
  return Math.pow(TARGET_HEIGHT - height, 2);
 
  // Minimize the difference area from the target
  var diff = 0;
  for (; i < = j; ++i) {
    var ratio = images[i].width / images[i].height;
    var target_area = TARGET_HEIGHT * TARGET_HEIGHT * ratio;
    var area = height * height * ratio;
 
    diff += Math.abs(target_area - area);
  }
  return diff;
}

Conclusion

Check-out the first-fit demo versus best break demo (using cost = (target_height - height)^2)

Thanks to an unrelated reading, I've been able to draw a parallel between text display and image layout. Who would have thought that graph theory would be involved there too 🙂

We can now find the best layout at a cheap cost (nearly-linear with the number of images if you only allow a small number of possible breaks per line). The cost metric allow you to easily disallow patterns you find bad looking.

In this talk I give an overview of the big categories of image layout algorithms with examples for each of them and present criterias to look at when evaluating them.

I'm implementing a layout algorithm in C and want to let the user specify a callback to compute the height based on the width.

Using function pointers, we can provide the callback:

typedef struct {
  float (*measure)(float width);
} layout_node_t;
 
void layout(layout_node_t *node) {
  float width = 10;
  float height = node->measure(width);
}

It works well if we have a function measure that only uses global variables:

float measure(float width) {
  return width * 2;
}
 
int main() {
  layout_node_t node;
  node.measure = measure;
  layout(&node);
}

However, I would like my measure function to take some dynamic input. For example in order to measure an image, you need to take its aspect ratio into account. In JavaScript, I would write the following:

var aspect_ratio = 1.5;
node.measure = function mesure_image(width) {
  return width * aspect_ratio;
}

Unfortunately, C doesn't support closures. I haven't been able to find a way to get a function pointer alone somehow hold some state. The best trade-off I found was to have a void * metadata in the struct and pass it along with the function call. (Thanks Scott and Felix for the help!)

typedef struct {
  float (*measure)(void *context, float width);
  void *measure_context;
} layout_node_t;
 
void layout(layout_node_t *node) {
  float width = 10;
  float height = node->measure(node->measure_context, width);
}

The void * value lets us put anything we want in it. So, with some casting we are able to simulate a closure and write our measure_image function 🙂

float measure_image(void *context, float width) {
  float aspect_ratio = *(float *)context;
  return width / aspect_ratio;
}
 
int main() {
  layout_node_t node;
  node.measure = measure_image;
  float aspect_ratio = 1.5;
  node.measure_context = (void *)&aspect_ratio;
  layout(&node);
}

To compute the height of the image we use a float, but in order to handle text, we can pass a const char * instead. It works as well!

float measure_text(void *content, float width) {
  const char *text = (const char *)content;
  float line_height = 11;
  return ceil(strlen(text) / width) * line_height;
}
 
int main() {
  layout_node_t node;
  node.measure = measure_text;
  node.measure_context = (void *)"this is some super long text";
  layout(&node);
}

This solves the use case pretty well, which is remarkable since C doesn't support closure. The downside is that we are losing all the type information, have to do a lot of type casting and renaming.

PHP and JavaScript are both renowned to be languages with a lot of quirks. However two major initiatives on both sides, Hack for PHP and ES6 for JavaScript made the languages much better and modern. In this article I'm going to show all the ES6 features that are also in Hack.

Arrow Function

Both languages adopted the same shorter way to write functions. On JavaScript side, the main advantage is the automatic binding of this and for PHP it removes the need to declare all the variables you want to use from outside. ES6, Hack.

// JavaScript
var odds = evens.map(v =&gt; v + 1);
var nums = evens.map((v, i) =&gt; v + i);
nums.filter(v =&gt; {
  if (v % 5 === 0) {
    console.log(v);
    return true;
  }
  return false;
});
// Hack
$odds = array_map($v ==&gt; $v + 1, $evens);
$nums = array_map(($v, $i) ==&gt; $v + $i, $evens);
array_filter($nums, $v ==&gt; {
  if ($v % 5 === 0) {
    echo $v;
    return true;
  }
  return false;
});

Class

JavaScript finally gets a class abstraction with ES6. It is however the bare minimal one to be useful, you cannot define constants, protected/private methods, traits ... PHP on this side is much better, without any Hack addition. ES6, PHP5.

// JavaScript
class SkinnedMesh extends THREE.Mesh {
  constructor(geometry, materials) {
    super(geometry, materials);
    this.idMatrix = SkinnedMesh.defaultMatrix();
    this.bones = [];
  }
  update(camera) {
    super.update();
  }
  static defaultMatrix() {
    return new THREE.Matrix4();
  }
}
// Hack
class SkinnedMesh extends THREE\Mesh {
  public function constructor($geometry, $materials) {
    parent::__construct($geometry, $materials);
    $this-&gt;idMatrix = SkinnedMesh::defaultMatrix();
    $this-&gt;bones = array();
  }
  public function update($camera) {
    parent::update();
  }
  static private function defaultMatrix() {
    return new THREE\Matrix4();
  }
}

Enhanced Object Literal

One long standing issue with object literals in JavaScript is the inability to use an expression as a key. This is fixed with the bracket notation in ES6. PHP 5.4 introduced a short notation for arrays as well. ES6, PHP.

// JavaScript
var obj = { [Math.random()]: true };
// Hack
$obj = [rand() =&gt; true];

Template Strings

Multiline strings and variable interpolations are something that have always been possible in PHP, yet they only start to work in ES6! ES6, PHP.

// JavaScript
var multiline = `In JavaScript this is
 not legal.`
var name = 'Bob',
    time = 'today';
`Hello ${name}, how are you ${time}?`
// Hack
$multiline = 'In PHP this is
 legal.';
$name = 'Bob';
$time = 'today';
"Hello $name, how are you $time?";

Default Arguments

It was possible to write default arguments in JavaScript but ES6 adds proper support for it right in the function declaration. Guess what, PHP had support for it all along. ES6, PHP.

// JavaScript
function f(x, y=12) {
  return x + y;
}
f(3) === 15;
f(2, 10) === 12;
// Hack
function f($x, $y=12) {
  return $x + $y;
}
f(3) === 15;
f(2, 10) === 12;

Iterator + for of

JavaScript has two ways to iterate on collections, either

for (var i = 0; i &lt; array.length; ++i) { var element = array[i]; /* ... */ }
for (var key in object) { var element = object[key]; /* ... */ }

ES6 is now introducing a unified way to do iteration, that PHP always had, as well as a way to write custom collections via the iterator pattern, introduced in PHP5. ES6, PHP, PHP5.

// JavaScript
var fibonacci = {
  [Symbol.iterator]: function() {
    var previous = 0;
    var current = 1;
    return {
      next: function() {
        var new_previous = current; 
        current += previous; 
        previous = new_previous; 
 
 
        return {
          value: current,
 
 
          done: false
        }
      }
    }
  }
}
 
 
 
 
 
for (var n of fibonacci) {
  if (n &gt; 1000) break;
  console.log(n);
}
// Hack
class Fibonacci implements Iterator { 
  private $key = 0;    
  private $previous = 1;
  private $current = 0;
 
  public function next() { 
      $new_previous = $this-&gt;current; 
      $this-&gt;current += $this-&gt;previous; 
      $this-&gt;previous = $new_previous; 
      $this-&gt;key++; 
  } 
  public function current() { 
      return $this-&gt;current; 
  } 
  public function valid() { 
      return true; 
  } 
  public function key() { 
      return $this-&gt;key; 
  } 
  public function rewind() { 
      $this-&gt;previous = 1; 
      $this-&gt;current = 0; 
      $this-&gt;key = 0; 
  } 
}
foreach (new Fibonacci() as $n) { 
  if ($n &gt; 1000) break; 
  echo $n; 
}

Generators

Python pioneered generators as another tool to manage control flow. It has originally been designed and promoted as an easier way to write iterators, but really shined as a better way to write asynchronous operations than callbacks. ES6, PHP5.

// JavaScript
var fibonacci = {
  [Symbol.iterator]: function*() {
    var previous = 1;
    var current = 0;
    for (;;) {
      var new_previous = current; 
      current += previous; 
      previous = new_previous; 
      yield current;
    }
  }
}
for (var n of fibonacci) {
  if (n &gt; 1000) break;
  console.log(n);
}
// Hack
 
function fibonacci() {
  $previous = 1;
  $current = 0;
  for (;;) {
    $new_previous = $current; 
    $current += $previous; 
    $previous = $new_previous; 
    yield $current;
  }
}
 
foreach (fibonacci() as $n) { 
  if ($n &gt; 1000) break; 
  echo $n; 
}

ES7 Async Await

C# introduced the concept of async/await combination to deal with asynchronous programming. The underlying implementation is very similar to generators but has proper syntax support. It is an addition of Hack on-top of PHP. ES7, Hack.

// JavaScript
async function chainAnimationsAsync(element, animations) {
  var result = null;
  try {
    for (var animation in animations) {
      result = await animation(element);
    }
  } catch (e) { /* ignore and keep going */ }
  return result;
}
// Hack
async function chainAnimationsAsync($element, $animations) {
  $result = null;
  try {
    foreach ($animations as $animation) {
      $result = await animation($element);
    }
  } catch (Exception $e) { /* ignore and keep going */ }
  return $result;
}

Map + Set

Both JavaScript and PHP are notorious for attempting to fit all the collection use cases into a single general purpose type. Both ES6 and Hack bring to the table proper support for Map and Set. ES6, Hack

// JavaScript
var s = new Set();
s.add('hello').add('goodbye').add('hello');
s.size === 2;
s.has('hello') === true;
 
var m = new Map();
m.set('hello', 42);
m.get('hello') === 42;
// Hack
$s = new Set();
$s-&gt;add('hello')-&gt;add('goodbye')-&gt;add('hello');
$s-&gt;count() === 2;
$s-&gt;contains('hello') === true;
 
$m = new Map();
$m-&gt;set('hello', 42);
$m-&gt;get('hello') === 42;

TypeScript

Last but not least, both languages are getting gradual typing. TypeScript, Hack.

// JavaScript
class Greeter {
  greeting: T;
  constructor(message: T) {
    this.greeting = message;
  }
  greet() {
    return this.greeting;
  }
}
 
var greeter = new Greeter("Hello, world");
console.log(greeter-&gt;greet());
// Hack
class Greeter {
 
  public function __construct(private T $greeting) {}
 
 
  public function greet() {
    return $this-&gt;greeting;
  }
}
 
$greeter = new Greeter("Hello, world");
echo $greeter-&gt;greet();

Conclusion

With ES6 and Hack efforts, JavaScript and PHP are becoming languages with modern features. If you tried them 5 years ago, you should take another look, they are not as crappy as they once were 🙂

A cyber security provider’s main task is to protect your business from all forms of cyber-attacks.  If and when an attack happens, https://www.sapphire.net/ will know exactly what to do.

Originally posted on Perf Planet.

React is a JavaScript library for building user interfaces developed by Facebook. It has been designed from the ground up with performance in mind. In this article I will present how the diff algorithm and rendering work in React so you can optimize your own apps.

Diff Algorithm

Before we go into the implementation details it is important to get an overview of how React works.

var MyComponent = React.createClass({
  render: function() {
    if (this.props.first) {
      return <div className="first"><span>A Span</span></div>;
    } else {
      return <div className="second"><p>A Paragraph</p></div>;
    }
  }
});

At any point in time, you describe how you want your UI to look like. It is important to understand that the result of render is not an actual DOM node. Those are just lightweight JavaScript objects. We call them the virtual DOM.

React is going to use this representation to try to find the minimum number of steps to go from the previous render to the next. For example, if we mount <MyComponent first={true} />, replace it with <MyComponent first={false} />, then unmount it, here are the DOM instructions that result:

None to first

  • Create node: <div className="first"><span>A Span</span></div>

First to second

  • Replace attribute: className="first" by className="second"
  • Replace node: <span>A Span</span> by <p>A Paragraph</p>

Second to none

  • Remove node: <div className="second"><p>A Paragraph</p></div>

Level by Level

Finding the minimal number of modifications between two arbitrary trees is a O(n4) problem. As you can imagine, this isn't tractable for our use case. React uses simple and yet powerful heuristics to find a very good approximation in O(n).

React only tries to reconcile trees level by level. This drastically reduces the complexity and isn't a big loss as it is very rare in web applications to have a component being moved to a different level in the tree. They usually only move laterally among children.

List

Let say that we have a component that on one iteration renders 5 components and the next inserts a new component in the middle of the list. This would be really hard with just this information to know how to do the mapping between the two lists of components.

By default, React associates the first component of the previous list with the first component of the next list, etc. You can provide a key attribute in order to help React figure out the mapping. In practice, this is usually easy to find out a unique key among the children.

Components

A React app is usually composed of many user defined components that eventually turns into a tree composed mainly of divs. This additional information is being taken into account by the diff algorithm as React will match only components with the same class.

For example if a <Header> is replaced by an <ExampleBlock>, React will remove the header and create an example block. We don't need to spend precious time trying to match two components that are unlikely to have any resemblance.

Event Delegation

Attaching event listeners to DOM nodes is painfully slow and memory-consuming. Instead, React implements a popular technique called "event delegation". React goes even further and re-implements a W3C compliant event system. This means that Internet Explorer 8 event-handling bugs are a thing of the past and all the event names are consistent across browsers.

Let me explain how it's implemented. A single event listener is attached to the root of the document. When an event is fired, the browser gives us the target DOM node. In order to propagate the event through the DOM hierarchy, React doesn't iterate on the virtual DOM hierarchy.

Instead we use the fact that every React component has a unique id that encodes the hierarchy. We can use simple string manipulation to get the id of all the parents. By storing the events in a hash map, we found that it performed better than attaching them to the virtual DOM. Here is an example of what happens when an event is dispatched through the virtual DOM.

// dispatchEvent('click', 'a.b.c', event)
clickCaptureListeners['a'](event);
clickCaptureListeners['a.b'](event);
clickCaptureListeners['a.b.c'](event);
clickBubbleListeners['a.b.c'](event);
clickBubbleListeners['a.b'](event);
clickBubbleListeners['a'](event);

The browser creates a new event object for each event and each listener. This has the nice property that you can keep a reference of the event object or even modify it. However, this means doing a high number of memory allocations. React at startup allocates a pool of those objects. Whenever an event object is needed, it is reused from that pool. This dramatically reduces garbage collection.

Rendering

Batching

Whenever you call setState on a component, React will mark it as dirty. At the end of the event loop, React looks at all the dirty components and re-renders them.

This batching means that during an event loop, there is exactly one time when the DOM is being updated. This property is key to building a performant app and yet is extremely difficult to obtain using commonly written JavaScript. In a React application, you get it by default.

Sub-tree Rendering

When setState is called, the component rebuilds the virtual DOM for its children. If you call setState on the root element, then the entire React app is re-rendered. All the components, even if they didn't change, will have their render method called. This may sound scary and inefficient but in practice, this works fine because we're not touching the actual DOM.

First of all, we are talking about displaying the user interface. Because screen space is limited, you're usually displaying on the orders of hundreds to thousands of elements at a time. JavaScript has gotten fast enough business logic for the whole interface is manageable.

Another important point is that when writing React code, you usually don't call setState on the root node every time something changes. You call it on the component that received the change event or couple of components above. You very rarely go all the way to the top. This means that changes are localized to where the user interacts.

Selective Sub-tree Rendering

Finally, you have the possibility to prevent some sub-trees to re-render. If you implement the following method on a component:

boolean shouldComponentUpdate(object nextProps, object nextState)

based on the previous and next props/state of the component, you can tell React that this component did not change and it is not necessary to re-render it. When properly implemented, this can give you huge performance improvements.

In order to be able to use it, you have to have to be able to compare JavaScript objects. There are many issues that raises such as should the comparison be shallow or deep; if it's deep should we use immutable data structures or do deep copies.

And you want to keep in mind that this function is going to be called all the time, so you want to make sure that it takes less time to compute that heuristic than the time it would have taken to render the component, even if re-rendering was not strictly needed.

Conclusion

The techniques that make React fast are not new. We've known for a long time that touching the DOM is expensive, you should batch write and read operations, event delegation is faster ...

People still talk about them because in practice, they are very hard to implement in regular JavaScript code. What makes React stand out is that all those optimizations happen by default. This makes it hard to shoot yourself in the foot and make your app slow.

The performance cost model of React is also very simple to understand: every setState re-renders the whole sub-tree. If you want to squeeze out performance, call setState as low as possible and use shouldComponentUpdate to prevent re-rendering an large sub-tree.