My JavaScript book is out! Don't miss the opportunity to upgrade your beginner or average dev skills.

Sunday, April 21, 2013

Playing With V8 Native And JavaScript

What my Don Quixote like adventure against __proto__ gave me, if nothing, is a better understanding about how V8 engine internals work and how to bind JavaScript to native and vice-versa.

Please Note

Nobody told me anything I've written in this article is true. These are all assumptions after pushing a patch to V8 and learning by mistakes, compilation errors, and "code around reading", done in a couple of hours so apologies in advance if some term or some concept is not perfectly and deeply explained.
Let's start with this!

JavaScript In JavaScript

This was kinda surprising to me, almost the whole ECMAScript is implemented in JavaScript itself.
In few words, what V8 does is not something like Rhino, reimplementing the whole language via a statically compiled one as Java is in Rhino case, V8 is rather the compiler for the specific ES syntax, following an example ...

JSON In JavaScript

It's funny when we compare in jsperf json2 vs native vs jQuery.json vs json3 etc etc, knowing that V8 native JSON is 90% JavaScript, isn't it?

// just a chunk/example from V8 JSON
function JSONParse(text, reviver) {
  var unfiltered = %ParseJson(TO_STRING_INLINE(text));
  if (IS_SPEC_FUNCTION(reviver)) {
    return Revive({'': unfiltered}, '', reviver);
  } else {
    return unfiltered;

Above snippet is simply a compact one with few things non common in Javascript: %PrefixedFunctions(), and UPPERCASE_FUNCTIONS().
These could or could not be bound in any V8 JavaScript files and we can see these JS files are those that will create the JS environment we are going to use but let's see how those special things work, OK?

Interoperation Between C++ And JavaScript and runtime.h files are dedicated to, as the name suggests, runtime JavaScript calls into C++ world.
The runtime.js files contains instead many JS functions used internally to satisfy ECMAScript specifications but not exposed to the outer world.
Bear in mind latter is not the only file that has not exposed JavaScript, while everything that will go out is defined at bootstrap, as you would expect, per each global context (here why two sandboxes have different native constructors) and you can find it in the file.
Scroll a bit, and you'll see how all known Array, String, Boolean, etc, are initialized, while in array.js you can see how the prototype is created: it's again a mix of native and JavaScript, passing though macros!


  // return type, for JS interaction
  // will be a MaybeObject* one

  // the exposed %DoNotExposeProtoSetter()
  // function in the hidden JS world
  // behind the user available scene
) {
  // isolate pointer is necessary to access
  // heap and all JS types in C++
  // even true or false are a specific type
  // we cannot return true directly, as example
  // or the type won't match
  NoHandleAllocation ha(isolate);
  // in this case, returning true or false
  // does not require allocation or new objects
  // we can simply use what's already in the heap
  return FLAG_expose_proto_setter ?
    // this is a flag defined at d8 launch,
    // I'll go there too
    isolate->heap()->false_value() :
    // returns a JS false or a JS true on
    // %DoNotExposeProtoSetter() invocation

Above function is the same I've used for the patch, it does not accept a thing, it checks a generic program flag and returns true or false accordingly.
As every developer with a bit of C background knows, when a function is declared, it should be defined in the file header too, so here the runtime.h declaration:

  /* __proto__ Setter */ \
  F(DoNotExposeProtoSetter, 0, 1) \

F is the common shortcut for RUNTIME_FUNCTION_LIST_ALWAYS_N while the first argument is the length of accepted arguments, -1 if unknown or dynamic.
The second argument seems to be a default for all functions, stick with 1 and things gonna be fine.

Dealing With JS Objects

My patch has a quite simplified signature, if you are aiming to accept and/or return objects, you should do things in a slightly different way, here an example:

// function call :-)
RUNTIME_FUNCTION(MaybeObject*, Runtime_Call) {
  // required to handle the current function scope
  HandleScope scope(isolate);
  // arguments are checked runtime, always
  ASSERT(args.length() >= 2);
  int argc = args.length() - 2;
  CONVERT_ARG_CHECKED(JSReceiver, fun, argc + 1);
  // the `this` context :-)
  Object* receiver = args[0];

  // If there are too many arguments,
  // allocate argv via malloc.
  const int argv_small_size = 10;
  // now you know, functions with more than 10 args
  // are slower :-)
  Handle<Object> argv_small_buffer[argv_small_size];
  // ...

  // in case it needs to throw an error ...
  bool threw;
  Handle<JSReceiver> hfun(fun);
  Handle<Object> hreceiver(receiver, isolate);

  // the result, could be undefined too,
  // which is a type indeed
  Handle<Object> result =
      // note, threw passed by reference
      Execution::Call(hfun, hreceiver, argc, argv,
      &threw, true);

  // do not return anything if there was an error
  if (threw) return Failure::Exception();

  // eventually, we got a
  return *result;
  // don't you feel a bit like "f**k yeah" ?

I've removed some extra loop for malloc but basically that's the lifecycle of a JavaScript call invocation.

What About Macros contains some (I believe) highly optimized and target specific (x86, x64, arm) function, able to use %_CPlusPlus() functions, and containing most common/used functions across all JavaScript APIs such IS_UNDEFINED(obj).
It also contains many constants used here and there such common date numbers:

const HoursPerDay      = 24;
const MinutesPerHour   = 60;
const SecondsPerMinute = 60;
const msPerSecond      = 1000;
const msPerMinute      = 60000;
const msPerHour        = 3600000;
const msPerDay         = 86400000;
const msPerMonth       = 2592000000;

Dunno you, but I find myself often writing similar variables so I wonder if we should simply have them as public static Date properties ... never mind ...

d8 Options And Flags

To complete my journey into V8, I had to set a --expose-proto-setter flag into d8, which is the command line tool or the dll what you'll have after a successful build of the V8 project.

flag-definitions.h is basically all you need to define a runtime flag providing a decent description that will show up if you d8 --help:

  // the flag name
  // d8 --expose_proto_setter
  // OR
  // d8 --expose-proto-setter
  // is the same
  // the default value
  // the flag description, better if meaningful ...
  "exposes ObjectSetProto though __proto__ descriptor.set")

In order to access that flag in runtime, do not forget to prefix it as FLAG_expose_proto_setter.
Remember, even a simple value as boolean is cannot be handled like that in JS so check again the RUNTIME_FUNCTION example.

Building V8

There is a V8 page specific for this task, all I can add here is that make x64.release is the fastest way to have a ./out/x64/release/d8 executable and start playing, at least in most common/modern PCs or Macs.

And That's All Folks!

I know it does not seem much, but this article covers:

  1. how to define a launch-time flag for d8
  2. how to interact with C++ bindings for JS code
  3. how to optimize some macro function
  4. how to understand magic V8 JS syntax
  5. how to manipulate, create, change, augment, natives and their prototypes
In few words with these steps anyone could write her/his own version of JavaScript, relying in a robust, extremely fast, and cross platform engine, able to also interact with PHP or other languages.
Bear in mind I am not suggesting anyone out there should start subfragmenting V8, I am sayng that if you need a very specific thing in the environment for a very specific project, you should not be scared by changing what you need and you can also help V8 to be better trying changes directly without just filing bugs: isn't this what Open Source is good for too?

I hope you enjoyed this post, have a nice day!

No comments: