Sunday, October 16, 2011

The Missing Tool In Scripting World

Few days ago I was having beers with @aadsm and @sleistner and we were talking about languages and, of course, JavaScript too.
That night I have realized there is a missing process, or better tool, that could open new doors for JavaScript world.

The Runtime Nightmare

The main difference between scripting languages and statically typed one is the inability to pre optimize or pre compile the code before it's actually executed.
Engineers from different companies are trying on daily basis to perform this optimization at runtime, or better Just In Time, but believe me that's not easy task, specially with such highly dynamic language as JavaScript is.
Even worst task is the tracing option: at runtime each reference is tracked and if its type does not change during its lifecycle, the code could be compiled as native one.
The moment a type, an object structure, or a property changes, the tracer has to compile twice or split the optimizations up to N exponential changed performed in a single loop so that this tracer has to be smart enough to understand when it's actually worth it to perform such optimization, or when it's time to drop everything and optimize only sub tasks via JIT.

Static Pros And Cons

As I have said, statically typed languages can perform all these optimizations upfront and create, as example, LLVM byte code which is highly portable and extremely fast. As example, both C and C++ can be compiled into LLVM.
There is also a disadvantage in this process ... if some unexpected input occurs runtime, the whole logic could crash, be compromised, or exit unexpectedly.
Latter part is what will rarely happen in scripting world, but it can be also a weak point for application stability and reliability since things may keep going but who knows what kind of disaster an unexpected input could cause.

What If ...

Try to imagine we have created unit tests for a whole application or, why not, just for a portion of it (module).
Try to imagine these tests cover 100% of code, a really hard achievement on web due feature detections and different browsers behaviors, but absolutely easy task in node.js, Rhino, CouchDB, or any JS code that runs in a well known environment.
The differential Mocking approach to solve the web situation requires time and effort but also what JS community is rarely doing, as example, is to share mocks of same native objects in both JS and DOM world. This should change, imo, because I have no idea how many different mocks of XMLHttpRequest or document we have out there and still there is no standard way to define a mock and listen to mocked methods or properties changes in a cross platform way.
Let's keep trying imagine now ... imagine that our tests cover all possible input accepted in each part of the module.
Try to imagine that our tests cover exactly how the application should behave, accordingly with all possible input we want to accept.
It's insane to use typeof or instance of operator per each argument of each function .... this will kill performances, what is not impossible is to do it in a way that, once in production, these checks are dropped.
Since with non tested input we can have unexpected behaviors, I would say that our application should fail or exit the moment something untested occurs .... don't you agree?
How many less buggy web apps we would have out there ? How much more stable and trustable could we be ?
The process I am describing does not exist even in statically typed languages since in that case developers trust unconditionally the compiler, avoiding runtime misbehavior tests ... isn't it ?

The Point Is ...

We wrote our code, we created 100% of code coverage and we created 100% of expected inputs coverage. At this point the only thing we are missing to compile JavaScript into LLVM is a tool that will trace, and trace only, the test while it's executed and will be able to analyze all cases, all types, all meant behaviors, all loops, and all function calls, so that everything could be statically compiled and in separate modules ... how great would this be if possible today?

Just try to imagine ...

8 comments:

David Bruant said...

I think that the idea is worth considering, but i disagree with some of your assumptions.

"The main difference between scripting languages and statically typed one is the inability to pre optimize or pre compile the code before it's actually executed."
=> Being a scripting/compiled language and weakly/strongly typed are different concerns. The 4 combinaisions could exist.

Moreover, being a scripting language doesn't prevent optimizations. There is just a "cultural" assumption that a scripting language gets executed right away (hence avoiding pre-optimisation). but that's cultural rather than a theorical impossibility.


"Even worst task is the tracing option: at runtime each reference is tracked and if its type does not change during its lifecycle, the code could be compiled as native one."
=> This is one very costly way to do it. Current Type Inference engine rather try to analyse a function and *prove* that some variable will always be of some type (or some "shape"/"hidden class" for objects).
This is sometimes impossible, so the engine makes a guess and put a guard if necessary.


"Try to imagine these tests cover 100% of code, a really hard achievement on web due feature detections and different browsers behaviors, but absolutely easy task in node.js, Rhino, CouchDB, or any JS code that runs in a well known environment."
=> Code in one branch of feature detection is always dead code. The equivalent in a known environment is what goes after a "if(false)" for instance


"imagine that our tests cover all possible input accepted in each part of the module."
=> This is infinite in many cases.


Overall, while the idea of compiling JS to LLVM is appealing in theory, I think that JavaScript semantics is so much richer (especially because of dynamicity) than LLVM that it may be impossible to efficiently do this compilation.

Andrea Giammarchi said...

thanks for your answer. About if(false) ... I have talked about differential mocks so that we could simulate a wrong implementation and reach that code too.

LLVM to JavaScript is possible, maybe a subset of JS could be compiled into LLVM after runtime analysis ?

check_ca said...

What about using a meta-language into comments (e.g. JSDoc) to help the JavaScript to LLVM compiler ?

Andrea Giammarchi said...

that's neat but at the same time it won't improve much code reliability and quality ... it's something tho

Duoyi wu said...

Check this out, someone already did it.
https://github.com/kripken/emscripten

Andrea Giammarchi said...

Duoyi wu that's the other way round ... a "much more simple" thing to do ... it does not create LLVM out of JS, it creates JS out of LLVM

check_ca said...

@Andrea: Thanks for the (truncated ?) response. JSDoc (for example) is used by Google closure compiler or Eclipse JSDT and it really helps to improve code reliability. Nevertheless, it may not be the appropriate language to help a JS->LLVM compiler.

Andrea Giammarchi said...

I meant that comments can be a help but that these are not enough.

Comments are like signatures, these are used by compilers to understand what's going on but once at runtime things can go really bad ( crash, killed process, etc etc )

One of the reason software sucks these days is that too many developers rely in compilers without providing a while spectrum of inputs per each unit test ... you know what I mean ? :-)