Best Cropping Position

In Facebook image layout algorithm, we use square viewport to display the images. Since images are not usually square, we have an issue to solve.

Contain Cover

There are two possible ways to deal with it. You make the image fit entirely in the viewport and add black borders (think about viewing 4/3 movies in a wide screen). Or you can make the viewport fit entirely in the image. Instead of having black bars, you are going to remove some parts of the image.

In CSS, the names for those two concepts are implemented with background-size property that has two values: contain and cover.

In our case, we display images in a grid. The cover version works best because the images align nicely in the grid. It makes the edges much more visible that gives a structure to the page.

Set up the problem

The choice we made raises another issue: we are no longer displaying the entire photo but only a subset of it. Therefore we have to know what parts of the image we want to keep, and what parts we want to hide.

The first thing to notice, is that there is only one degree of freedom. You can either pan the image horizontally or vertically depending on the aspect ratio of the image and viewport.

In order to make that decision, we need to have an idea of what is important in the image. Thankfully, at Facebook, people can tag the images and tell us where the people and other point of interests are. We also know that people are important so we also use detected faces. In the future we could automatically find more such as text, animals ...

Now we have a clearer view of the inputs. We have a set of point of interests aligned in one dimension. We also have a window that we can slide on this dimension. Here's an example:

Find maximal window

The idea of the algorithm is to find the position of the window that maximizes the number of point of interests it contains.

A window can be defined only by its starting position since its width is constant. This makes the search space to be the number of pixels in a column/line of the image (minus the size of the window as it must not go outside). Then for each position, you have to compute how many points are inside.

A naive implementation is going to be in the order of O(p * n) where p is the number of pixels and n the number of point of interests. For a typical image with people this means 960 * 3 = 2880 checks. This is way too costly because the number of pixels is an order of magnitude higher than the number of point of interests.

We want to approach the problem the starting from the point of interests. A window can contain, or not, a point of interest. Two windows next to each other that contain the same point of interests can be considered equivalent. With this definition, we can find all the windows much quicker.

We are going to iterate on all the point of interests and consider that they are in the left-most edge of the window. This is the boundary between it between inside and outside of the window. We compute how many points are in that window and keep it if it's bigger than what we had before.

In order to implement this effectively, we can iterate on all the point of interests in O(n). Using binary search, we find the right-most point in O(log(n)), if the points of interests are sorted. It's a O(n * log(n)) to sort them at the beginning.

The total complexity is therefore O(n * log(n)), where n is the number of point of interests. Since most of our images have less than 10 point of interests, our algorithm is essentially free.

Center the window

With a window aligned on the left element, we can balance the right padding equally across left and right to center the window.

The previous algorithm can return multiple windows that have the same number of points of interest. In this case, we use the window that has the bigger amount of padding. This ensures that heads are less likely to be cut-off in half.

Force someone inside

When you are viewing all the photos someone is tagged in, you would like to make sure that the person isn't being cropped out by the algorithm. Previously, what was done was to center the image on that person. We could instead keep using the current algorithm but force the person in.

Again, we are going to look at the boundaries. The person can be from the left edge to the right one. So we are going to place the person at the left and right of the viewport and remove all the points that cannot be in the same screen.

Then, we have the guarantee that all the sets of points we are going to chose also contain the person. We run the algorithm again with that limited set of points to find the window.

Conclusion

The current heuristic is to have images horizontally centered and vertically centered at the first third of the photo, where most faces at. Those are empiric values that work surprisingly well. Also, most images are 4:3, therefore cropping to a 1:1 ratio removes 25% of the photo total, which is only 12.5% on each side.

In order to test the algorithm, we use more extreme viewports. A quick run on my photos and some friends photos shows that it either leaves the crop as is or improves it. It is going to be interesting to test it on Facebook views.

If you liked this article, you might be interested in my Twitter feed as well.
 
  • http://www.facebook.com/appu.shaji.5454 Appu Shaji

    Great article! I have a web ( + api ) service http://sight.io which tries to find the best looking crop using a suite of computer vision algorithms. Would like to know your thoughts about it. Specifically, if you find the crops suitable or not.

  • Adam

    Post pull of very interesting tidbits 🙂

    I have one comment though. Actually it is not necessary to use binary search. When you're iterating over the left edge of the window, the right edge is always moving to the right. The strategy is then to keep a pointer to the right edge and move it to the right as far as possible in each operation. That yields O(n) amortized complexity, since you're always moving to the right. That being said, the point have to be sorted leaving us with O(n log n) after all.

  • http://blog.vjeux.com/ Vjeux

    You are right 🙂 I didn't think about it that way!

    In our case, n is still very small (usually ~1-5, at most ~50). It's not really worth implementing it :p

 

Related Posts

  • August 14, 2012 Image Layout Algorithm – Facebook – Reordering (2)
    In this article, we are going to see how to support dynamic updates to Facebook Image Layout Algorithm. Beware, this is not an easy task and there are many special cases to handle :) Making images bigger To make images bigger we just run the algorithm all over again with the new […]
  • August 13, 2012 Image Layout Algorithm – Facebook (1)
    Layout Algorithms: Facebook | Google Plus | Lightbox | Lightbox Android | 500px For the redesign of the Photo Section of Facebook we wanted to highlight some photos by making them bigger. It all started with the following mock by Andy Chung: Layout Alternated Blocks My […]
  • September 1, 2012 Image Layout Algorithm – 500px (3)
    Layout Algorithms: Facebook | Google Plus | Lightbox | Lightbox Android | 500px 500px's front-page uses an interesting image layout algorithm. It stands out from the other ones as it does not use any algorithm nor mathematical properties to be computed. Instead it is based on […]
  • July 8, 2012 Image Layout Algorithm – Lightbox (1)
    Layout Algorithms: Facebook | Google Plus | Lightbox | Lightbox Android | 500px Lightbox.com has a really interesting image layout algorithm. We're going to see how it works and its best use case. How does it work? Column based The algorithm is column based. You pick a […]
  • January 14, 2013 CSS – Contain & Cover (8)
    MathJax.Hub.Config({"HTML-CSS": {scale: 130}});In a previous article I explained how CSS Percentage Background Position was working. This time I'm going to talk about the two ways to resize an image to a viewport: contain and cover. This is such a fundamental operation that I explained […]