For a project, I want to transparently be able to intercept all the included javascript files, edit the AST and eval it. This way I can manipulate all the code of an application just by inserting a custom script.

  1. Hook the <script> tag insertion.
  2. Download the Javascript file using XHR.
  3. Parse, transform and eval the AST.

Hook the <script> tag insertion

There is a DOM event called DOMNodeInserted. However it has two major drawbacks:

  1. It does not work with the nodes written directly in HTML.
  2. It does not work with <script> tags.

Given those constraints, we cannot intercept <script> tag written in HTML. However hope is not lost for dynamically included scripts.

Dynamic Script Include

There are libs like headjs that let you include your javascript files within Javascript. They all work the same way: create a <script> tag using document.createElement, set the type, src and onload and finally insert it in the DOM. The following snippet is a basic implementation:

function include(url, callback) {
  var script = document.createElement('script');
  script.type = 'text/javascript';
  script.src = url;
  script.onload = callback;
  document.getElementsByTagName('head')[0].appendChild(script);
}

Hook script creation

Using this include technique, we satisfy the dynamic property (A). In order to workaround the (B), we need to rename the <script> tag into something else, for example <custom:script>. This is done by hooking document.createElement.

var createElement = document.createElement;
document.createElement = function (tag) {
  if (tag === 'script') {
    tag = 'custom:script';
  }
  return createElement.call(document, tag);
};
 
document.addEventListener('DOMNodeInserted', function (event) {
  var el = event.target;
  if (el.nodeName !== 'CUSTOM:SCRIPT') {
    return;
  }

Do our stuff

The hardest part messy part is done, now we can get to work on our implementation:

Download the file

Let's download the file using a XHR request:

  var request = new XMLHttpRequest();
  request.open('GET', el.src, false);
  request.send(null);

The implementation is really basic, for a real implementation we should do something more sophisticated:

Transform the AST

In order to parse and transform the AST, I'm using the Reflect.js library. It has the advantage of containing a standalone javascript file that doesn't require NPM dependencies.

Note: I'm using window.eval in order to eval from the global scope.

  var ast = Reflect.parse(request.responseText);
  var ast = transform(ast)
  window.eval(Reflect.stringify(ast));

Fire DOM Load event

If you want your dynamic include library to work, you need to fire the load event and set the appropriate readyState value.

  // Fire DOM Loaded event
  el.readyState = 'loaded';
  el.onload();
});

Conclusion

This technique does not work with HTML <script> tags (both with src and with inline code). It is really sad since the majority of scripts are included this way. However, we found a way to make it work on dynamic script injection. It is better than nothing.

The technique also have a big drawback, it will make debugging a lot harder as both filename and line number are lost with the eval call.

Use Cases

I think that the technique is not completely useless. Here are some thoughts:

  • Better GreaseMonkey: Alter the code of a website you don't control.
    • Remove Closure. People often use closure to prevent people from accessing the code from the console. We could remove it.
    • Code injection: You may want to add some code like a console.log, making an additional check or setting up a hook on code you don't own.
  • Code Analysis: Instrument the code to extract data.
    • Code Coverage. It is useful to see if your tests pass through all the branches.
    • Dead-Code Removal. If you run the application, you can see what parts of the code you used and remove those that were not used. It will help an additional pass with an optimizer such as Google Closure Compiler.
    • Profile-Guided Optimization. It is often used to do branch prediction or type analysis.

For a project I will talk later on, I need to hook the function document.createElement. The code I wanted to write was:

var original = document.createElement;
document.createElement = function (tag) {
  // Do something
  return original(tag);
};

Problem

However, there's a silly Javascript exception triggered if you try to take a reference of the function

var createElement = document.createElement;
createElement('div');
// TypeError: Illegal Invocation

Naive Solution

Since it looks like we cannot use anything else but document.createElement to execute the function, I decided to restore the original document.createElement within the hook function. It is verbose but works.

var original = document.createElement;
var hook = function (tag) {
  document.createElement = original;
  // Do something
  var el = document.createElement(tag);
  document.createElement = hook;
  return el;
};
document.createElement = hook;

Why?

But then, I asked myself, how did they implement a function that could only be called with document.createElement form. Then I remembered that this calling convention sets this to be document. So they must be doing a check like this:

document.createElement = function () {
  if (this !== document) {
    throw new TypeError('Illegal Invocation');
  }
  // ...
}

Solution

Now that we know that they check for this === document, we can use .call to force it 🙂

var original = document.createElement;
document.createElement = function (tag) {
  // Do something
  return original.call(document, tag);
};

I've always be annoyed when I want to use a character such as » in HTML as I struggle to find the corresponding HTML Entity. This is why I made this small utility. Just paste the sexy UTF-8 character you found and it will give you the associated HTML-ready code 🙂

Enter any weird character:

The talk is over. Check out the Slides & Video.

For several months now I've been surveying my friends and teachers at EPITA and I came to the conclusion that they have absolutly no idea what Javascript really is. In order to help them discover a language that is getting a lot of traction nowadays, I'm organizing a 2-hour presentation on the subject.

If you know how to program and are interested in learning sexy advanced Javascript, you are more than welcome to attend this presentation. It will be Tuesday 25th October from 6:30pm to 8:30pm at EPITA Amphi Master (Metro Porte d'Italie). If you are in Paris at this time and speak French, do not hesitate to send me a mail at [email protected], I will explain how to get there 🙂

I've written a more lengthy explanation about my motives and presentation's content in the LRDE bulletin L'air de rien #23:

Edito par Olivier Ricou (Enseignant-Chercheur)

[...] Ce numéro est aussi l’occasion de mettre en valeur un étudiant du LRDE un peu fou, il aime Javascript, mais fort sympathique et qui sait partager sa passion. Il le fait à travers un article ici, mais aussi le mardi 25 octobre à 18h40 dans l’amphi masters (entrée libre). [...]

Présentation Javascript

Les sites tels que Gmail, Facebook et Google Maps sont des exemples classiques d'utilisation de Javascript. Mais saviez-vous que l'interface de Windows 8 ou les extensions de Chrome et Firefox sont écrites en Javascript? Ou qu'il est possible d'écrire des serveurs web en Javascript grâce à Node.js ?

Javascript est partout et pourtant, je me suis rendu compte en parlant autour de moi que personne ne connaissait réellement ce langage. C’est pourquoi je vous invite à une présentation de deux heures sur le sujet le Mardi 25 Octobre en Amphi Master de 18h40 à 20h40.

Javascript, le language

Pour commencer, un petit peu d'histoire. Brendan Eich raconte qu'il a pensé et implémenté le premier prototype de Javascript en 10 jours en 1995. En effet, Javascript est un langage qui contient un nombre extrêmement restreint de concepts. Cette idée provient du monde des langages fonctionnels tels que Lisp, Haskell ou Caml. Le génie de Javascript c'est d'avoir su s'écarter d'un modèle mathématique parfait au profit d'un confort d'utilisation pour le développeur.

Javascript a pour objectif d'être utilisé par le plus grand nombre de personnes. La syntaxe du langage a été fortement inspirée du C et ne contient aucune fantaisie. Cela rend le code source lisible et compréhensible par n'importe quel informaticien. Le language a été conçu pour exécuter un maximum de programmes, même mal formés. Par exemple, une heuristique va rajouter des point-virgules manquant. Au final, la barrière d'entrée au Javascript est très faible.

Lambda Fonctions et Objects

Javascript tire sa puissance de deux concepts fondamentaux: les Lambda fonctions et les Objets. La présentation a pour objectif principal de vous apprendre à manipuler ces deux outils. En guise d'introduction au langage, je montrerais comment reproduire des paradigmes de programmation connus, en particulier la Programmation Orientée Objet.

Le navigateur est un environnement hostile. Dans un site cohabitent une multitude de modules Javascript développés par des personnes différentes. On peut citer le site lui-même, les publicités, les commentaires, les statistiques, le bouton "like", etc. Nous verrons brièvement l'utilité des objets et des fonctions pour se placer dans l'un des trois points de vue suivant : être un citoyen respectueux, fortifier son code contre les attaquants ou au contraire s'amuser avec le code des autres.

Un langage dynamique

A l'école nous avons principalement étudié des langages de programmation statiques comme le C, C++ et Caml. Javascript quant à lui fait parti de la catégorie des langages dynamiques comme le PHP, Ruby ou Python. Les fonctionnalités de ces derniers ont pour objectif de simplifier la vie du développeur en s'éloignant des contraintes de la machine ou des théories mathématiques de typage. De ce fait, les langages dynamiques sont de plus en plus utilisés.

Nous étudierons les changements apportés par cette nouvelle façon de penser. Par exemple, les chaînes de caractères sont utilisées de façon quasi systématique afin de faciliter le débuggage, les objets sont construits à la volée dans définir leur structure dans un fichier séparé pour gagner du temps, etc.

Qui suis-je ?

Cette présentation n'est pas le fruit du hasard. Je me suis longuement intéressé à Javascript et aux langages dynamiques durant ces dernières années.

Le traitement d'image est largement developpé en utilisant des langages statiques en raison de l'important besoin en performance. Mon sujet de recherche au LRDE est d'adapter des concepts dynamiques tels que les lambda fonctions ou chaînage de méthode aux problématiques de traitement d'image

Je travaille en parallèle des cours pour la société Curse qui réalise des sites internet pour les joueurs de jeux en ligne dont World of Warcraft. J'utilise au quotidien Javascript, Python et PHP. Mes découvertes sont mises en ligne sur un blog : http://vjeux.com/.

À Épita, j'ai pu utiliser un langage de programmation dynamique dès ma première année. Lua est intégré dans Fooo, un remake de Warcraft III, afin de permettre des interfaces de jeu facilement personnalisables.

This article is about a difference algorithm. It extracts changes from one version of an object to another. It helps storing a smaller amount of information.

Template

In a project, I have a template object with all the default settings for a widget.

var template = {
    achievement: {
        page: 0,
        column: "name",
        ascending: true,
        search: "",
        filter: [1, 2, 3]
    }
};

Alter it

Default settings are changed during the use of the application. It can happen when you load a configuration from an object:

var override = {
    achievement: {
        page: 1,
        ascending: false
    }
};
var settings = _.extend({}, template, override);

Or when the user clicks on the buttons, you have script that edits a copy of the template:

var settings = _.extend({}, template);
settings.achievement.page = 1;
settings.achievement.ascending = false;

Find the differences

Now, we'd like to store the current user configuration. We could simply dump the current settings variable but we can do better. Instead we want to dump only what has changed from the defaults. This has the benefit of being smaller to store and more resilient to changes.

In order to extract changes between the template and settings, I made a small function called difference (If you have a better name, feel free to suggest :)).

var override = difference(template, settings);
console.log(override);
 
// Result:
{
    achievement: {
        page: 1,
        ascending: false
    }
}

Instead of saving the full object, we only have to save 2 fields. This is therefore smaller 🙂 In order to get the full configuration back, we simply extend the template object with the difference:

var settings = _.extend({}, template, override);

In a way, difference is the opposite of extend.

  • settings = extend(template, override)
  • override = difference(template, settings)

How it works

Once we know what we want it to do, it is fairly straightforward to write. We traverse the template and build a new object that contains only attributes that mismatch between template and override. The following implementation makes great use of underscore (isObject, isArray and isEqual).

function difference(template, override) {
    var ret = {};
    for (var name in template) {
        if (name in override) {
            if (_.isObject(override[name]) && !_.isArray(override[name])) {
                var diff = difference(template[name], override[name]);
                if (!_.isEmpty(diff)) {
                    ret[name] = diff;
                }
            } else if (!_.isEqual(template[name], override[name])) {
                ret[name] = override[name];
            }
        }
    }
    return ret;
}

Note: Any attribute that is present only in the override object will be ignored.

Conclusion

I'm still looking forward storing as much information as possible in the hash part of the URL. Two years ago, I did SmallHash to encode integer ranges. This week, with URLON and this difference algorithm, I explored another way to look at the problem, dealing with structured objects.

It is possible to combine both approaches in order to encode structured objects that also contain integer ranges. Maybe in a next blog post!