My JavaScript book is out! Don't miss the opportunity to upgrade your beginner or average dev skills.

Saturday, February 21, 2009

On My Vicious JavaScript Code

Disclaimer

This post is a reply to kangax question:I’m sorry but why such vicious coding style?
I'd like to explain my point of view about JavaScript code style so, if interested, please read until the end, thank you :)



Some Background

I study and use JavaScript since year 2000 and one of the most interesting thing about this language is that even if it is "that old" you never stop to learn something new (techniques, strategies, adapted patterns, etc).
At the same time, the "biggest trouble" of this language is its weight in a page, which is the reason we all use minified and gzipped or packed versions of our application. Coming from a "33Kb modem internet era" I always tried to pay attention about the size and I studied and created both a JavaScript minifier and a compressor.



Compressors Rules

Unless we are not using an "incremental" compressor, as GIF is, we should consider that packer algorithm, as gzip and deflate, are dictionary based compressors (Huffman coding).
These techniques to minimize our scripts size give us best compression ratio if we use a reduced dictionary. In few words, if we have these two piece of codes:

// code A
function tS(obj){
return obj.toString();
};

// code B
function toString(obj){
return obj.toString();
};

even if the length of the code A is smaller, it will use 1 or more bytes than code B once compressed.
The reason is really simple. Try to imagine that both codes are converted in this way:

// code A
0 1(2){3 2.4();};[function,tS,obj,return,toString]

// code B
0 1(2){3 2.1();};[function,toString,obj,return]

Code B has a reduced dictionary, and reduced dictionary means best ratio. That's why when I introduced years ago the little bytefx library I defined it "packer friendly" (variables name are similar in the entire code).



How Compressors Changed My Way To Code in JavaScript

Accordingly, and probably against every common programming naming convention practice, those studies let me think that as long as a variable name is meaningful enough, it does not matter if it is perfectly clear to recognize or not, since in every case when we want to understand a code, we need to read it from the beginning to the end. Here there are a couple of example:

// (pseudo)compilable language practices
var splitted = "a.b.c".split(".");
function argsToArray(args){
return Array.prototype.slice.call(args);
};

// my code style
var split = "a.b.c".split(".");
function slice(arguments){
return Array.prototype.slice.call(arguments);
};

The split variable is meaningful enough, it is something that has been splitted via split method, and the same is for the slice function: I use it to call the slice Array method over a generic ArrayLike object, as the Object arguments is.



Confusion, Ambiguity, WTF About arguments? Not Really!


The reason I chose inside the slice function the variable name arguments is simple: in that function I do not need its own ArrayObject arguments, while I will use that function mainly with an ArrayObject arguments variable from other functions.

function doStuff(){
var Array = slice(arguments); // other stuff
};

How can be then that arguments variable considered ambiguous?
Moreover, that function "will cost zero once compressed", because in every code I've seen so far the generic function arguments variable, plus the slice method from Array's prototype, are always part of the rest of the code. So, in terms of bytes, I have added a useful function in only 8 extra bytes: (){();};



OMG! I even called a variable Array inside the doStuff function!!!


Yes I did indeed, and the reason is still the same: almost every library or piece of code use the Array keyword but if I do not need the global Array constructor in that scope, why should I leave such meaningful dictionary word untouched?
Moreover, the Array global constructor is used mainly because of its prototype, but once I have an Array variable called Array I can simply use directly its methods instead of the global one, considering those are exactly the same.

function doStuff(){
var Array = slice(arguments);
Array.push.apply(this, Array);
return this
};

Finally, the day we will need the native global constructor, String, Object, Array, Function, we can always use the safe window.Array or, if we decided to name a variable window inside a scope, the self.Array or the this.Array or the in line global scope resolution call:

var toString = function(){return this.Object.prototype.toString}();




Is It Just About Naming Convention?

I use different strategies to be sure my code will be the smallest possible. As example, and as showed above on toString assignment, if I need a runtime called function which returns a value and it is not that big function, I do not use parenthesis around.
If an assignment or a return is before an end bracket "}" I do nt usually put the semi colon at the end. For the interpreter, it does not matter, because it transform in its own way the function.

alert(function(){return this.Object.prototype.toString});

// produced output
function(){
return this.Object.prototype.toString;
}
// indentation, plus semicolons where necessary

The rest is usually managed by the minifier, so the fact that a return"a" does not need a space between return and the String "a" is superfluous if we use, for example, a clever minifier as the YUI Compressor is. The same is for loop with a single line to execute, I do not use brackets (I love Python indentation style).
for(var key in obj)
if(typeof obj[key] == "string")
obj[key] = parse(obj[key]);




My Style Cons


  • tools like JSLint often fail to recognize my code as valid so I do not use JSLint (... but fortunately I do not need it, I now my code is beautiful :P)

  • in a Team we should all respect Team code conventions, so my code style could not be the best choice if other programmers do not know JavaScript that much and/or use a specific/different code style

  • if the development product will not use minifier + compressors my code style could produce bigger size instead of smaller (Object instead of obj or o)

  • my code could require a better lecture than a quick one, but this is true for every library if we would like to understand properly a trick, the entire scope, an assignment, etc, etc ...





My Style Pros


  • final size, minified and packed/gzipped, is most of the case reduced if compared with other "code scripters"

  • more big is the library I would like to extend, less my implementation costs, thanks to a wide range of words to use (common like key, Object, arguments, Function, etc plus others)

  • variable names are often more meaningful than other and instead of function(str, callback, obj){} I bet we cannot understand variables type with a function like function(String, Function, Object){}. So, if possible and when necessary, my code style is closer to strict programming language than every other, allowing me to save time writing directly String as variable name, instead of /* String */ str





Conclusion

I know for somebody my vicious style could be considered a non-sense or something too "extreme" or hard to maintain. But, on the other hand, I am not developing public libraries (I prefer to extend them rather than reinvent the wheel if it is not necesary at all) and my implementations, for me, are much more easy to understand/maintain than others since I often perform linear (sometime tricky) operations which make absolutely sense to me and in a single line when it is possible. You like it? You don't? It does not matter, at least now you know how much pervert a programmer mind could be for an "out of the common rules language" as JavaScript is.
Enjoy my craziness :D

Update
Since telling you I have studied these cases was not enough, here is a truly simple demonstration for the Anonymous user that left the comment:

<?php
$output = array();
for($i = 0; $i < 80; $i++)
$output[] = 'testMe';
$str = implode('.', $output);
echo 'WebReflection: '.strlen(gzcompress($str, 9)); // 21
echo '
';
$output = array();
for($i = 0; $i < 80; $i++)
$output[] = 'test'.str_pad($i + 1, 2, '0', STR_PAD_LEFT);
$str = implode('.', $output);
echo 'Anonymous: '.strlen(gzcompress($str, 9)); // 144
?>

21 bytes against 144 so please, if you think my point is wrong, demonstrate it :)

25 comments:

Anonymous said...

I use some codestyle tricks myself as well, but when posting code to the public, it's best to clean up your code so that even apprentice javascripters can understand or better yet, prevent confusing programmers from another language.

If lack for a better reason, it prevents other JavaScripters from commenting about your code style each and every time. =P

Andrea Giammarchi said...

Fortunately it does not happen every time :D

But yes, I agree with you, but I consider my last proposal simple to understand ;)

Anonymous said...

Great post, it had me thinking. I love to read heretical, well-founded arguments about established conventions, it makes you question everything, and that's a good thing.

By the way, there is another con to this coding style: Some syntax highlighters will make a mess with the variable coloring (it also happens in this blog if you notice). And another pro: You get a much deeper understanding of the language when coding like this (although this is a quite temporary pro, after you get used to it, that pro doesn't apply any more).
I also agree with Anonymous, when code is posted publically, it has to be cleaned up first, in order to not confuse the newbies.

Anonymous said...

Your whole point about code weight is based on a wrong belief :

Gzip/deflate uses a Lempel-Ziv 77 algorithm before building the huffman tree. Therefore having duplicate string or 'reducing' your dictionnary has virtually no inpact on compression result.

http://www.gzip.org/algorithm.txt

Andrea Giammarchi said...

Anonymous I simply tested to confirm my point and ... yeah, reduced dictionary make my code final size smaller with every common technique (packer, gzip, deflate)

Just try to gzip the simple toString example I showed or try to demonstrate that a file with 80 "testMe", once compressed, is bigger than a file with 80 "test01", "test02", ..., "test80"

Andrea Giammarchi said...

P.S. why on earth you write comments without a name? Don't you think in this way your critic/point becomes automatically less reliable?

Andrea Giammarchi said...

@Anonymous, demonstration in the update at the end of the post.
I am still up to learn something from you but a link to specs I read dunno how many times is probably not enough, is it? Regards

Anonymous said...

I only posted first comment, though at the time, I could only post as anonymous, this 2nd time I can actually give a name and url...

Anonymous said...

@Andrea

I post anonymously because I don't have any blogger and/or OpenID account, not to hide myself. You may see me on ajaxian posting under the name 'Ywg'.

Don't take it as an offense, but I still think you're size argument is wrong.
The example is not revelant, it do not have the caracteristics of real code, nor a sufficiant length to produce a reliable result.

In the first case you produce a string perfectly redundant :

'TestMe.TestMe.TestMe.TestMe.TestMe...'
Which will result in the something like that after the first compression step :

'TestMe.TestMe.TestMe...' [lz77 buffer][back_pointer][back_pointer]

Where each [back_pointer] contains several occurence of 'TestMe.', and so, very few [back_pointer].

In the second example you produce a string with very few redundancy and a lot of noise :

'test01.test02.test03.test04.test05.test06.test07.test08.test09.test10.test11...' [lz77 buffer][back_pointer]32[back_pointer]33[back_pointer]03[back_pointer]34...

Each [back_pointer] can only contain one occurence of '.test', this result on a lot of [back_pointer] pointing to only five bytes. Considering that a lempel-ziv pointer is usually made of 2 bytes, that's a lot of overhead.

In fact it may be even more broken, you're input string is so small I'm not even sure you fill up the lz buffer and get any compression at all before building the huffman tree.

Try with a real world code sample of at least 15kb.

PS : Sorry I didn't understand you're last question (my english is a bit limited...).

Andrea Giammarchi said...

So I am sorry, but I am still asking you to demonstrate that for Huffman based compressors, dictionary length is not important :)

Anonymous said...

I'm disappointed to hear these arguments of yours. Minification benefits is nothing comparing to unexpected pesky bugs that you or other developer (more likely) will run into. I'm surprised that you, having quite some experience writing/maintaining code - as I understand it, don't see this.

When you edit some part of the app, and you need to use some of the native objects, do you really want to scan the entire scope for variables named as those native objects? More often than not, the scope of a function that's being edited is not just a few lines block. It's silly to waste time on things like that.

You don't care if a reader likes it? It's part of your craziness?

Come on, that's just so arrogant of you. Do you not care about collaboration either?

Andrea Giammarchi said...

@kangax, all this post is based on differences between me coding for myself and showing a little function and team collaboration.
In a team of course rules are different and I worked in team enough to say that my vicious code is not that good for a team.
On the other hand, tell me when you need to use a native constructor in an entire library ... it hapens usually a couple of times and mainly to use one of its prototype methods. As I wrote in the post, if you have a variable which inherits from that constructor without overrides, there is no reason at all to look for the native one when that variable method will be in scope and shorter to write (and since it is in scope faster to execute as well).
We all use closures to avoid conflicts with external environment, so why do you think this way is so bad? It is like setting a variable "$" to define your library and say: No, I cannot use it since there are other "$" libraries outside. I cannot spot this big difference and I think if we are in a closure and we understand the meaning of a closure, limits are only those we choose and nothing else.
Finally, of course I take care about readers and I would like everybody understand my code, but that's why I am here: if there is something not clear, I can explain without problems.
So far, in a couple of years of posts, rarely somebody told me: what is that?

Andrea Giammarchi said...

P.S. the "it does not matter" means that I simply exposed my point of view without pretending you like or approve it - aka: I wrote my reasons, I am open to read your one.

Anonymous said...

@Andrea/WebReflection 'So I am sorry, but I am still asking you to demonstrate that for Huffman based compressors, dictionary length is not important'

I'm very disappointed, I took the time to write you an argumented reply showing how biased and irrevelant your demonstration is. You choosed not to publish it and instead write this comment pretending I didn't provided any argument.

This is very childish... continue to truncate people post so you can look like you're right on your own blog.

PS : as you will probably not publish this comment either I'll also post it on the ajaxian.

Andrea Giammarchi said...

ehr, I did not notice the post, it was on the list before last kangax one. I am not that kind of person, unless there is not offending stuff, I am here to write, listen to you and learn from other developers. Regards

Andrea Giammarchi said...

P.S. and I am not offended by false thoughts like the one you wrote before. Regards

Andrea Giammarchi said...

Going back in topic, if it is possible ... take a library bigger than 15Kb, replace \w+ with "same" and compress it.
Then compare the result with the original library compressed.
More occurrence you have, less will be the size via common compressors.
This is always true for packer algo, as example, while could be not true in an theoretically scenario we both cannot reproduce.

Anonymous said...

I'm very suspicious when posting on blogs... Sadly this kind of "incidents" have happened to me many times.

As it was not your intention, please accept my apologizes.

Going back to topic :

Of course dictionnary size is important, but considering the fact that your code pass through LZ algorithm before, its inpact is greatly reduced.

Using \w and replacing every word is a biased example, it's not different than you're first demonstration where you used perfect redundancy.

With your codestyle you only reduce the dictionnary of a small amount of word... Something more revelant is to replace whitespaces.

Try a replace \s -> s on prototype.js and gzip them after :

original prototype.js = 30498 bytes
dictionnary reduces prototype.js = 30371 bytes

A benefit of 127 bytes... Considering that even on the crappyest network you'll have a MTU of at least 1300 bytes, you don't have any benefit.

IMHO the legibility loss does not worth it : 'Early optimization is the root of all evil'.

Andrea Giammarchi said...

spaces are minifier stuff, I wrote it :)
The rest is superfluous extreme optimization which is probably not worthy in a team but could let me produce best ratio scripts. This is simply my point.

Anonymous said...

Spaces were just for the sake of the example, replace them with any set of 5~7 words (with regular repartition) if you prefer.

Andrea Giammarchi said...

ywg, what you call noise, are different characters combination ... so, again, give me a practical case to study and we can go on talking

Anonymous said...

'give me a practical case to study'

Take prototype.js, and do the following replacements :

IE (56 occurences)
Opera (20 occurences)
WebKit (11 occurences)
Gecko (4 occurences)
MobileSafari (2 occurences)

replace by ==> Browser (39 + 92 occurences)

apply (12 occurences)
call (47 occurences)

replace by ==> function (572 + 59 occurences)

div (29 occurences)
script (37 occurences)
style (127 occurences)
form (108 occurences)

replace by ==> for (234 + 301 occurences)

Thats a big dictionnary reduction, and filesize stay approximately the same. Despite that dictionnary reduction, the compression gain is near nothing :

original prototype = 134057 bytes
original prototype (gzip) = 30498 bytes
compress rate = 22,7%

reduced prototype = 134137 bytes
original prototype (gzip) = 30374 bytes
compress rate = 22,6%

We have reduced the dictionnary but we did not gain any significant gain. Dictionnary reduction has no inpact because the huffman tree is build on the output of the LZ77 algorithm, not on the original stream.

I don't see how can I give you a better usecase.

Andrea Giammarchi said...

ywg, you demonstrated that with few reductions you have already .1% of final size.
Now try to imagine that for an entire library (for myself and nobody else) all my practices could make this reduction up to 10% or more, which for 100Kb could mean 30Kb instead of 40.
I already said my style is probably too maniac, but as long as practical results do not demonstrate I am wrong, I will stay with my point (what I mean is that I prefer the kangaz point rather than a test which demonstrate my way still produces a smaller output, even if it is not that smaller and specially removing short works as div, script, IE, and others are).

I am not planning to rewrite the prototype library, but if you want to change every variable name or every function which uses up to 3 different arguments with my practices, global prototypes included and everything else, I am sure the margin can only be bigger.

Finally, I think my code is readable enough for shorts functions, but I got your points (all of you).

Cheers.

Andrea Giammarchi said...

kangaz => kangax
works => words
:)

Andrea Giammarchi said...

@kangax, guys, I have modified the source since I will probably add more and more stuff and I agree with you that a proposal should be as clear as possible.
As summary, I have updated the extend proposal, hoping it will be more clear for everybody.