The Bistromathique is an Arbitrary Precision Calculator EPITA project where the main focus is optimization. The input can be in any base (up to 250) and the following operations have to be performed: Addition, Subtraction, Multiplication, Division, Modulo.

Base Representation

Going back and forth a textual representation to a computable representation is the first part of the project and several optimizations are possible at this step.

Binary Representation

Encoding the numbers as binary allows you to make the best use of the processor calculation unit. There is no space lost and the number being represented as an uninterrupted series of 0 and 1 can be added and subtracted directly with the hardware.

However, in order to convert the number for a base B representation to a base 2 requires roughly n divisions by B. And to reconvert it back requires another n multiplications. This can be affordable for applications that uses the same numbers again and again but in our case, this cost is prohibitive.

The other major drawback is that you are required to have the multiplications and divisions (and therefore addition and subtraction) in order to have the numbers in the right form to start working on the operations ... In summary, you have to code all these operations before being able to test them!

char Representation

The binary representation not being possible, the solution that comes in mind is to put each digit into a char (8 bits). Since we are limited up to a base of 250, any base will fit. This comes at a price, there is a lot of space wasted, and it increases as the base size decreases.

We are in a 1 to 1 relation between the input and the represented number, this way we can store the represented number inside the input. This prevents from allocating memory at this step. The conversion can be efficiently done, each character goes through vector of 256 slots storing its representation and a magic value used as a non valid input. One final trick is to add a non number at the end of the input to remove a end of input check in the loop.

Filling more

A char can store values up to 256. If we are working on base 16 we could store two digits into 1 char and thus divide by 2 the number of digits. Additions are therefore 2 times faster and even more for n log n operations such as multiplication.

Calculating the number of digits to fill into a char is quite straightforward however some caution has to be taken during the transition. We have to know in advance the length of the input to decide how to start filling the digits. If you can fit 2 digits per char, if the number is even then it goes as expected, but if the number is odd you have to place 0 at the beginning to fill the gap.

Even: [1] [2] [3] [4] -> [12] [34]
Odd (Naive): [2] [3] [4] -> [23] [4?]
Odd (Smart): [2] [3] [4] -> [02] [34]

This method works well but requires 2 passes for each number and a special case at the beginning of each number. Not enough to stop us from using it.

Using a larger representation

Now that we filled the char at its maximum, why stick with it? This should ring a bell since you know that common architecture is 32-bit, and from now we are talking about a 8-bit representation. We would be tempted to change the char to an int (32-bit) or even a long (64-bit) that could benefit from register optimizations.

Moving away from a char, we lose the assumption that the output being smaller (or equal) than the input. Without it, this is no longer possible to work in place. We now have to compare the cost of a reduction in the number of operations and the cost of the memory allocation.

The overhead happens for values that are represented in less than 4 characters for an int representation. This is a maximum 3-characters overhead possible for each number, which is negligible. We could eat the operation character but we are left with 2 characters. I haven't found a solution to stay in place yet and this minor problem prevents from accessing a great speed up.

To be continued ...

In World of Raids Guild Recruitment Tool, we wanted the search to be instant! We achieved this goal by not having a single page refresh or ajax call while updating the settings. Since it runs in the browser, performance was a real concern. We will see in this article what were the tricks used to make this real.

Sorting

In order to sort the guilds, each guild is given a matching score (between 0 and 65535) based on the current filters. At start, each guild has the maximum points. Then, each filter is giving a malus. For example when a specialization doesn't match, the guild loses 10%.

This means that everytime you change a filter, you have to recompute that matching score for all the guilds. Instead of recomputing the whole matching score, the guild holds the result of each filter. Everytime a filter is updated, the value of this filter is upated on every guild. The global matching score is now recalculated by summing all the partial precomputed scores.

Every bit of code in the matching function has been optimized, no more jquery code, loops are unrolled and great use of lazy boolean expression evaluation. The sort function itself also got boosted thanks to a trick I commented on my previous article Speed Up Javascript Sort().

Guild Recruitment Search Filters

Display

Once we have the data sorted, it has to be displayed! And this is another big time eater. When it takes 50ms to sort the data, it takes about 150ms to display 10 guilds!

Since we wanted performances, DOM elements are created at page loading, they are then only being updated as the guilds change. And still, only the required changes are made, for example tooltips are created as the user trigger the mouseover event.

The main problem with this is that we are not able to make bulk changes. Everytime you make the slightest change, the whole UI has to be redrawn which is a waste of time. I hope there will be such API in the future.

Guild Display

Since it's a web application, we had to maintain the url up to date, this is done using SmallHash, a small library of mine that compresses filter data into the smallest possible hash.

Scaling

Having all the data being sent at page loading is a pain, it makes it load slowly. One way to avoid this problem would be to use the localstorage to keep these data accross sessions and only send the updated guilds. However at the time I started developping the application there was no released browser supporting HTML5.

However, the application is focused on World of Warcraft guilds. This is a limited audience and there is no need on heavy scaling there. So we decided that it was a good option to go.

Up to 3000 guilds, this is pretty much instant in every browser in my MacBook Pro. When increasing that number to 10 000 it is taking more than a second to update on IE8. It requires 50 000 guilds for Chrome 3 to get slow, which is a pretty good score. It means that it is possible to do heavy processing, the bandwith now being the limiting factor.

,

World of Raids Guild Recruitment

Guild recruitment is a recurrent problem in World of Warcraft, many attempt have been made but none succedeed so far. After a brainstorming we decided that the following points were crucial.

  • The guild recruiter has to spend less time as possible to set-up a guild and maintain it.
  • The guild search must be easy and focus on what people expect from their guild.

With these points in mind, we had to find technical responses in order to make the World of Raids Guild Recruitment Tool.

  • What You See Is What You Get: The guild management interface is the final display, there is no intermediate step.
  • Tailored Widgets: About every widget has been heavily customized to fit the user need.
  • Automatic Save: Every time you make a change, it is automatically published and available to anyone!
  • Javascript Search: Having all the guilds fetched during the loading allows to have a complex filtering system instantly updated.

Guild Search

Looking for a guild should no more be a pain! You just have to tweak the filters and result appear sorted as you edit them! No more long page reload or even ajax requests. It is instant!

We focused hard on making filters one click away. We also took great care of the raid time filter, it is an important aspect of the guild choice that is too often avoided because of its complexity.

For more details on the optimizations made, see the post Guild Recruitement – Search Optimizations.

Guild Management

Our original goal was to be able to create a guild in less than a minute (yes, 60 seconds!) and we pretty much did it. The proof in the following video.

In order to achieve this, we opted for a Wysiwyg approach to remove the need of making two interfaces for both display and edition. Every widget has been tailored to fit the user needs. For example, as soon as you enter your name and realm, it automatically gathers your progression from the official website.

Having to deal with dependencies in Makefile is a real pain, there are a lot of examples of way to deal with it on the web but none of them is satisfying.

For example using gcc -MM does not work with subfolders, a depend rule requires the user to use it everytimes he adds new files ...

Here is what is wanted:

  • Completly automatic: No user interaction is required when adding new files
  • Works with an arbitrary amount of files
  • Works with an arbitrary amount of level of folders
  • Is not recalculated when nothing changed
  • Use only one file to store dependencies
  • Do not depend on complicated regular expressions

Here is the result:

BINARY  = project.exe
CC      = gcc
CFLAGS  = 
FILES   = $(shell find src/ -name "*.c")
HEADERS = $(shell find src/ -name "*.h")
OBJS    = $(FILES:.c=.o)
 
all: $(BINARY)
 
-include Makefile.deps
 
$(BINARY): Makefile.deps $(OBJS)
        $(CC) $(CFLAGS) $(OBJS) -o $(BINARY)
 
Makefile.deps: $(FILES) $(HEADERS)
        makedepend -- $(CFLAGS) -- $(FILES) -f- > Makefile.deps

This is in fact really easy. In your $(BINARY) rule, you add Makefile.deps as a prerequisite.

In order to generate the Makefile.deps you mark all $(FILES) and $(HEADERS) as prerequisite, so every time you change a file or header it will recompile the list.
We use makedepend to generate the dependencies list. It works like gcc -MM except that it outputs the correct file path when used with folders.

Then all is required is to include the Makefile.deps. We include it with -include so it does work the first time you compile.

Thanks to Lemoine Gauthier who helped me to discover this technique.

Prime number recognition is a very hard problem and yet no good enough solution has been found using classical algorithms. There are two ways to get around those limitations: find an algorithm with a better complexity or find a way to compute faster. The first one has already been researched by a large amount of people so I decided to give a shot to the second one.

We are limited by the speed of the electricity, so my idea is to use speed of light to improve speed by a great factor. I've come up with a simple example of how to use it to compute prime numbers.

We set up 2 mirrors with light sources that are directed.

Caption

We first set the sensor to the position lower 2

2 had no light so we create a new one that is going to en-light all the multiples of 2

Caption

3 had no light so we create a new one that is going to en-light all the multiples of 3

4 is already in the light, we skip it

4 is already in the light, we skip it


Here is the pseudo code of the prime calculation with Light & Mirror.

hasLight(n):
// Sensor placed at lower n that tells weither if it is enlightened or not
 
newLight(n):
// Creates a light that starts from lower n, targeted at the upper n + n/2,
// that first reflects to lower 2n and then to all the multiples of n.
 
isPrime(n):
for i = 2 to sqrt(n)
  if not hasLight(i)
    newLight(i)
 
return not hasLight(n)

This method allows us to recognize prime numbers with light sources, mirrors, sensors and a bit of programming.

Recognizing if a number is prime or not requires to have a lot of lasers and sensors. I don't think that is a viable process, however this could be interesting to do multiplications. You want to fire 2 lights and then see in what point they cross. However there are 2 main problems:

  • It requires to have 2 really long mirrors possibly infinite which is not possible in practice. It may be possible to add a vertical mirror in order to work in a bounded area. Yet to know how many times the light hit that mirror.
  • We are required either to have an infinite number of sensors. We could have only one sensor that checks all positions sequentially but then we loose the speed of light! What is wanted instead are mirrors in each position that would redirect light to a unique sensor capable of knowing where it came from.

This is an experiment and it is not at the moment near to be faster than actual methods but this is a new way to think about programming. I don't know if anything useful is going to be deviated from this but it shows how to use the light to compute things.