Monday, October 25, 2010

JavaScript Coercion Demystified

This post is another complementary one for my front-trends slides, about performances and security behind sth == null rather than classic sth === null || sth === undefined.
I have already discussed about this in my JSLint: The Bad Part post but I have never gone deeper into this argument.

Falsy Values

In JavaScript, and not only JavaScript, we have so called falsy values. These are respectively: 0, null, undefined, false, "", NaN. Please note the empty string is empty, 'cause differently from php as example, "0" will be considered truish, and this is why we need to explicitly enforce a number cast, specially if we are dealing with input text value.
In many other languages we may consider falsy values even objects such arrays or lists:

<?php
if (array())
echo 'never'
;
?>

#python
if []:
print 'never'


Above example will not work in JavaScript, since Array is still an instanceof Object.
Another language that has falsy values is the lower C, where 0 as example could be considered false inside an if statement.
Falsy values are important to understand, specially if we would like to understand coercion.

About Coercion

In JavaScript world, coercion is considered a sort of evil and unexpected implicit cast, while in my opinion it's simply a feature, if we understand it and we know how to use it.
Coercion is possible only via == (eqeq) operator, but the thing is that everything is properly implemented cross browsers accordingly with ECMAScript 3 Standard.
This is the scary list newcomers have probably never read, directly from the even newer ECMAScript 5 specification, just to inform you that coercion will hopefully always be there, and nothing is evil.

The comparison x == y, where x and y are values, produces true or false. Such a comparison is performed as
follows:


  1. If Type(x) is the same as Type(y), then

    1. If Type(x) is Undefined, return true: undefined == undefined

    2. If Type(x) is Null, return true: null == null

    3. If Type(x) is Number, then

      1. If x is NaN, return false: NaN != NaN

      2. If y is NaN, return false: NaN != NaN

      3. If x is the same Number value as y, return true: 2 == 2

      4. If x is +0 and y is −0, return true: 0 == 0

      5. If x is −0 and y is +0, return true: 0 == 0

      6. Return false: 2 != 1


    4. If Type(x) is String, then return true if x and y are exactly the same sequence of characters (same length and same characters in corresponding positions). Otherwise, return false: "a" == "a" but "a" != "b" and "a" != "aa"

    5. If Type(x) is Boolean, return true if x and y are both true or both false. Otherwise, return false: true == true and false == false but true != false and false != true

    6. Return true if x and y refer to the same object. Otherwise, return false: var o = {}; o == o but o != {} and {} != {} and [] != [] ... etc etc, all objects are eqeq only if it's the same


  2. If x is null and y is undefined, return true: null == undefined

  3. If x is undefined and y is null, return true: undefined == null

  4. If Type(x) is Number and Type(y) is String, return the result of the comparison x == ToNumber(y): 2 == "2"

  5. If Type(x) is String and Type(y) is Number, return the result of the comparison ToNumber(x) == y: "2" == 2

  6. If Type(x) is Boolean, return the result of the comparison ToNumber(x) == y: false == 0 and true == 1 but true != 2

  7. If Type(y) is Boolean, return the result of the comparison x == ToNumber(y)

  8. If Type(x) is either String or Number and Type(y) is Object, return the result of the comparison x == ToPrimitive(y): ToPrimitive means implicit valueOf call or toString if toString is defined and valueOf is not


About last point, this is the object coercion we are all scared about ...

var one = {
valueOf: function () {
return 1;
},
toString: function () {
return "2";
}
};

alert(one == 1); // true
alert(one == "2"); // false

If we remove the valueOf method, we will implicitly call the toString one so that one == "2" or, more generally, {} == "[object Object]".

null == undefined And null == null, Nothing Else!

99% of the time we do a check such:

function something(arg) {
if (arg === undefined) {
}
}

We are asking the current engine to check if the undefined variable has been redefined in the current scope, up to the global one, passing through all outer scopes.
Even worse, we may end up with the most silly check ever:

function something(arg) {
if (arg === undefined || arg === null) {
}
}

which shows entirely how much we don't know JavaScript, being the exact equivalent of:

function something(arg) {
if (arg == null) {
}
}

with these nice to have differences:

  • null cannot be redefined, being NOT a variable

  • null does NOT require scope resolution, neither lookup up to the global scope

The only side effect we may have when we check against null via == is that we consider for that particular case null and undefined different values .... now think how many times you do this ... and ask yourself why ...

Performances

Once again I send you to this simple benchmark page, where if you click over null VS undefined or null VS eqeq, or one of the lines showed under the header, you can realize that while it's usually faster and safer, it provides even more control when compared against the ! not operator.
The only way to reach better performances when we mean to compare against an undefined value in a safer way is to declare the variable locally without assigning any value, so that minifiers can shrink the variable name while the check will be safer.

// whatever nested scope ...
for (var undefined, i = 0; i < 10; ++i) {
a[i] === undefined && (a[i] = "not anymore");
}


There Is NO WTF In JavaScript Coercion!

Coercion in JavaScript is well described and perfectly respected cross browser being something extremely simple to implement in whatever engine. Rules are there and if we know what kind of data we are dealing with, we can always decide to be even safer and faster.
Of course if we are not aware the strict equivalent operator === is absolutely the way to go, but for example, how many times you have written something like this?

if (typeof variable === "string") ...

being typeof an operator we cannot overwrite neither change, and being sure that typeof always returns a string there is no reason at all to use the strict eqeqeq operator since String === String behavior is exactly the same of String == String by specs.
Moreover, as said before, coercion could be absolutely meant in some case, check what we can do with other languages, as example:

# Python 3
class WTF:
def __eq__(self, value):
return value == None

# Python 2
class WTF():
def __eq__(self, value):
return value == None

# in both cases ...
if WTF() == None:
"""WTF!!!"""

While this is a C# example:

using System;

namespace wtf {

class MainClass {

public static void Main (string[] args) {
WTF self = new WTF();
Console.WriteLine (
self ?
"true" : "false"
);
}
}

class WTF {
static public implicit operator bool(WTF self) {
return false;
}
}

}

Can we consider some sort of coercion latest cases as well? It's simply operator overloading, virtually the same JavaScript engines implemented behind the scene in order to consider specifications points when an eqeq is encountered.
Have fun with coercion ;-)

9 comments:

MichalBe said...

Great, thx for this. A lot of useful information in one place.

Anonymous said...

Wow brilliant post! But that's some scary stuff you report on!

Luke Page said...

Thanks, really interesting.

If you are interested I implemented == in javascript to make it easier to see what it was doing.

http://www.scottlogic.co.uk/2010/10/implementing-eqeq-in-javascript-using-eqeqeq/

Andrea Giammarchi said...

hey Luke, you are one day later :P

Anyway, good practical extension to this one

Take care

Peter van der Zee said...

I created a tool to help a little more with the demystification ;) It's partially inspired by your post.

http://jscoercion.qfox.nl

galambalazs said...

Art, style and the case of Real Programmers

While I think the technical content of this article is really valuable, I tend not to understand all the fuss about being hard on ourselves with coercion. I know it's all about Crockford and his recommendations.

There are several points of this argument which I want to highlight. First off, nobody forces you to use JSLint, and noone ever said that it's more than a person’s opinion. I think we have misunderstandings about the language on a much larger scale than slight differences of opinion about JSLint can ever come into play.

JSLint is not a syntax checker, it’s a Code Quality tool, and as such it evangelizes Crockford’s opinion on how one can write code that is more readable, maintainable and understandable to more people with less skill.

„Programs must be written for people to read, and only incidentally for machines to execute."
- Abelson & Sussman, SICP, preface to the first edition


When you do Software Development you should always aim at readable code, because we vary in skills, talents and attitude to a language so there has to be some kind of convention of what we use. It’s especially true for high level languages. I would also like to quote Crockford on this:

„The less we have to agree on, the more likely we can cooperate.”

Code Quality is all about these two things. There is hardly any situation where coercion is algorithmically indispensable. The ones you’ve mentioned like `null` and `undefined` are just a matter of taste. The key to understand is that the amount of time writing the code is
negligible compared to the time spent understanding other people’s code.

Excluding „bad parts”, „evil parts” is not about mystifying the language, or mystifying ourselves. Sometimes I feel people think of this question as if Crockford had thought „it’s not for the masses, these things can only be understood by me”. And here come the smart challengers who prove to understand deep things about the language.

Crockford has always been clear on this. He wants to keep the language simple to understand, simple to work with. This can be achieved by ignoring language constructs or style of programming that is error prone or hard to understand/debug.

Andrea Giammarchi said...

good points galambalazs, unfortunately in some case we are forced to use JSLint/caja which does not improve code quality at all ... at least this is what I have always thought: if you don't know the language and it's syntax or specs, go deeper into it rather than trust, without thinking, an automation tool.
JSlint is a "nice to have", not the bible, neither always the best way to write code if you know what you are doing and this post was about this

Joshua Gruber said...

I ran a little experiment, and it appears that coercion is significantly FASTER than strict comparison: http://jsfiddle.net/8rKeA/4/

Can someone please verify this for me?

Andrea Giammarchi said...

undefined is an empty reference that requires lookup up to the global scope while null is immutable and available inline plus internal operations VS manual checks are usually faster