My JavaScript book is out! Don't miss the opportunity to upgrade your beginner or average dev skills.

Monday, December 12, 2011

Create a JS builder with node.js

A very good and common practice with JS projects bigger than 100 lines of code is to split code in different files.
Benefits are clear:
  • smaller pieces of code to maintain
  • swappable portions for experiments and/or improvements or new features, as example including for a build magic2.js and get it, rather than change drastically magic.js and follow the repository logs
  • better organization of the code, and I'll come back on this in this post
  • possibility to distribute bigger closures, as example the jQuery approach
  • create ad hoc builds including or excluding portion of the library, specially suitable for specific version of the code that must be compatible with IE only


Solutions All Over The Place

There are really tons of solutions able to make the described build process easy to use and easy to go. As example, I have created my own one and I am using it with basically every project I am working with: the JavaScript Builder.
However, this builder requires a couple of extra technologies such Python and Java ... but aren't we using simply JavaScript?
So why not an easy to create guide on how to build your code via JS only?
This is what this post is about, and I hope you'll find useful.

How To Structure Your Project

If all files are in the same directory is not easy to find the right file immediately since these could be many. A good solution I came up with is folder related structure with both namespaces and private keywords paths.
Here an example on how I would structure this library ( and please ignore the library itself )

var myLib = (function (global, undefined) {"use strict";

// private scope function
function query(selector) {
return document.querySelectorAll(selector);
}

function Wrapper(nodeList) {
this.length = nodeList.length;
this._list = nodeList;
}

// a prototype method of the Wrapper "class"
Wrapper.prototype.item = function item(i) {
return this._list[i];
};

// public static query method
query.asWrapper = function (selector) {
return new Wrapper(query(selector));
};

var // private scope variables
document = global.document,
slice = [].slice
;

// the actual object/namespace
return {
query: query,
internals: {
Wrapper: Wrapper
}
};

}(this));

The code should be easy enough to understand. The object used as namespace for myLib has a couple of methods, few private variables and functions and something exposed through the internals namespace.
It does not matter what the library does or how good/badly is structured, what matters is that our folder structure should be smart enough to be able to scale with any sort of allowed JS pattern ... OK?

The Folder

Well, to start with, let's say our source code should be inside an src folder so we can add other folders for tests or builds beside in the same hierarchy.

dist
src
tests
builder.js

We'll see the builder.js later, in the meanwhile, let's have a look into the src folder:

dist
src
intro.js
outro.js
var.js
function
Wrapper.js
query.js
Wrapper
prototype
item.js
query
asWrapper.js
tests
builder.js

The distinction will be much cleaner once you read above list through your editor or even your shell ... query and files are well distributed but bear in mind this is only the first example.
Let's see what we are going to write into each file ?

src/intro.js

var myLib = (function (global, undefined) {"use strict";


src/function/query.js

// private scope function
function query(selector) {
return document.querySelectorAll(selector);
}


src/function/Wrapper.js

function Wrapper(nodeList) {
this.length = nodeList.length;
this._list = nodeList;
}


src/function/Wrapper/prototype/item.js

// a prototype method of the Wrapper "class"
Wrapper.prototype.item = function item(i) {
return this._list[i];
};


src/function/query/asWrapper.js

// public static query method
query.asWrapper = function (selector) {
return new Wrapper(query(selector));
};


src/var.js

var // private scope variables
document = global.document,
slice = [].slice
;


src/outro.js

// the actual object/namespace
return {
query: query,
internals: {
Wrapper: Wrapper
}
};

}(this));

Got it?

Structure Rules

  • every part of the scope can be distributed
  • each file can or cannot be compatible as stand alone with a parser because to test the library we need to build it first ( eventually with automations )
  • function declarations should be included in a dedicated folder called function accordingly with the nested level
  • var declaration per scope could be included in a folder var accordingly with the nested level. Do not create a var folder per each function where you define variables 'cause if you need it it means the function is too complex. Split it in sub task and do not define 100 variables per a single function: closures are the only exception.
  • nested closure must be named in order to be able to define nested closure structure following previous rules. Every minifier will be able to remove function expression names, included named closures, while not every developer would like to deeply understand the whole code to recognize why the nested closure was useful. A classic example is the inclusion inside our own closure of an external library that uses its own closure. in this case name that closure so you know were to look for the library inside your folder structure.
  • function prototypes should be placed inside a prototype folder, inside the function folder.
    We don't need to reassign an object when we want to pollute the function prototype so please stop this awkward common practice ASAP: MyFunction.prototype = { /* THIS IS WRONG */ } and use the already available prototype object defined by default in every ECMAScript standard and per each function declaration or expression.
    If your argument is that the code will be bigger, use the outer scoped variables definition to address the prototype once and reuse this reference within the prototype folder. This approach will make your life easier once you get use to work with structured and distributed JavaScript files.

Specially about last example, we could have set a shortcut to the Wrapper.prototype object in the var.js file and reuse the reference inside Wrapper.
The structured folders will always help you to find references in the library thanks to the lookup that you, as well as the code, have to do.

// in the var.js file
WrapperPrototype = Wrapper.prototype,

// in the Wrapper/prototype/item.js file
WrapperPrototype.item = function item(i) { ... };


The Order Partially Matters

In ECMAScript 3rd or higher edition function declarations are always available at the very beginning of the scope. I really don't know why these are so much underrated in daily basis code ... the fact these are always available means we can reference their prototype at any moment in our code:

var internalProto = (function () {

// address any declaration made in this scope
var WhateverPrototype = Whatever.prototype;
return WhateverPrototype;

// even if defined after a return!!!
function Whatever() {}
}());

alert(internalProto); // [object Object]

Now, the above code is simply a demonstration about how function declarations work ... I am not suggesting a return in the middle, and declarations after, all I am saying is that the order of things in JavaScript may not be relevant, and function declarations are a perfect example.
Another example is the usage of variables ... if a function, as declaration or as expression, reference a variable defined in the outer scope nothing will break unless we are invoking that function before the referenced variable has been defined.

This are really ABC concepts we all should know about JS before even claiming that we know JavaScript ... OK?
Is really important to get these points because to simplify ASAP the builder file we need to rely in these assumptions.

The builder.js File

It's time to create the magic file that will do the job for us in possibly a smart way so that we can cover all edge cases we could think of.
This is the content of builder.js file, in the root of our project

// @name builder.js
// @author Andrea Giammarchi
// @license Mit Style License

// list of files to include
var
scriptName = "myLib", // the namespace/object.project name
fileList = [
"intro.js", // beginning of the closure
"var.js", // all declared variables
"function/*", // all declared functions
"function/Wrapper/prototype/*", // all methods
"function/query/*", // all public statics
"outro.js" // end of the library
],
fs = require("fs"), // file system module
out = [], // output
alreadyParsed = [] // parsed files for visual feedback
;

// per each file in the list ...
fileList.forEach(function addFile(file) {
// if the file contains a wild char ...
if (file.charAt(file.length - 1) == "*") {
// read the directory and per each file found there ..
fs.readdirSync(
__dirname + "/src/" + file.slice(0, -2)
).forEach(function (file) {
// if the file type is js
// and the file has not been defined explicitly
// in the original list
if (
file.slice(-3) == ".js" &&
fileList.indexOf(file) < 0
) {
// call this same function providing the whole path
addFile(this + file);
}
// the path is passed as context to simplify the logic
}, file.slice(0, -1));
// if the file has not been included yet
} else if (alreadyParsed.indexOf(file) < 0){
// put it into the list of already included files
alreadyParsed.push(file);
// add the file content to the output
out.push(fs.readFileSync(__dirname + "/src/" + file));
} else {
// if here, we are messing up with inclusion order
// or files ... it's a nice to know in console
try {
console.log("duplicated entry: " + file);
} catch(e) {
// shenanigans
}
}
});

// put all ordered content into the destination file inside the dist folder
fs.writeFileSync(__dirname + "/dist/" + scriptName + ".js", out.join("\n"));

// that's it

The reason there are so many checks if a wild char is encountered is quite simple ... the order may not matter but in some case the order matters.
If as example a prototype property is used runtime to define other prototype methods or properties, this cannot be pushed in the output randomly but at the very beginning, example

// src/function/Wrapper/prototype/behavior.js
WrapperPrototype.behavior = "forEach" in [];

// src/function/Wrapper/prototype/forEach.js
WrapperPrototype.forEach = WrapperPrototype.behavior ?
function (callback) {[].forEach.call(this._list, callback, this)} :
function (callback) { /* a shim for non ES5 compatible browsers */ }
;

Being file 2 strongly dependent on file 1, the list of files could be written as this:

fileList = [
"intro.js", // beginning of the closure
"var.js", // all declared variables
"function/*", // all declared functions
"function/Wrapper/prototype/behavior.js", // precedence
"function/Wrapper/prototype/*", // all methods
"function/query/*", // all public statics
"outro.js" // end of the library
],

When the wild char will be encountered and the behavior passed to the forEach, this will be simply ignored since it has been pushed already in the previous call.
Same concept could happen if a specific file must be parsed runtime at the end:

fileList = [
"function/Wrapper/prototype/behavior.js", // precedence
"function/Wrapper/prototype/*", // all methods
"function/Wrapper/prototype/doStuff.js" // after all
],

I believe these are already edge cases most of the time but at least now we can better understand what the builder will do.

How To Use The Builder

In console, inside the project folder where the builder.js is:

node builder.js

That's pretty much it ... if you try to open dist/myLib.js after above call you will find your beautiful library all in one piece and ready to be minified, debugged, and tested.
If the process does not take long time you may bind the builder to the Constrol+S action with a potential sentinel able to inform you if any problem occurred, as example checking if the output has been polluted with some redundant file logged through the process.

As Summary

All these techniques may be handy for many reasons. First of all it's always good to maintain a structure, rather than a single file with thousands of lines of code, and secondly once we understand how the process work, nothing can stop us to improve, change, make it ad-hoc for anything we may need such regular expressions to strip out some code before the output push or whatever else could come up for some reason at some point.
The minification can be done the way you prefer, as example adding this single line of code at the end of the process assuming you have a jar folder with, as example, google closure compiler.

require('child_process').exec(
['java -jar "',
__dirname + "/jar/compiler.jar",
'" --compilation_level=SIMPLE_OPTIMIZATIONS --language_in ECMASCRIPT5_STRICT --js "',
__dirname + "/dist/" + scriptName + ".js",
'" --js_output_file "',
__dirname + "/dist/" + scriptName + ".min.js",
'"'].join(""),
function (error, stdout, stderr) {
if (error) console.log(stderr);
}
);

Enjoy your new builder :)

18 comments:

DBJDBJ said...

why not just bounding oneself to the WIN desktop and use the HTA ?

Proper javascript, HTML UI, drag and drop, visual builder, WSH, WMI, anything ... WIN is the limit...

kekscom said...

Hey, two minor notes:
the 'build' folder is always a bit confusing with 'builder'. 'dist' has done a good job for me.
Another one is the requirement for doing a build instead of just F5.
Add on the fly combining and get extra sparkles :-)

Andrea Giammarchi said...

dist is a good idea ... specially for the tab in console

on the fly combining may not be a good idea, Control+S bound to the build IS better, imo

I will write an on the fly combiner in any case but again, once you get use to build before testing, the problem does NOT exist

Andrea Giammarchi said...

dbj win has node support too ... where is the limit?

Andrea Giammarchi said...

P.S. on the fly compiler does not let you minify on the fly the code ... still not sure we really need it, a build takes a second and provides already visual feedbacks in console if something goes wrong

Andrea Giammarchi said...

P.S.2 also the on the fly compiler cannot benefit the wild char ... just build the bloody script and get use to this process ;-)

Andrea Giammarchi said...

last, but not least, as I have said there are tons of solutions ... to me, this is the simplest and also the easiest to customize and has no sides effects: I change code, I build, I test

Associated with the build there could be tests, and everything else ... this is the way I code, F5 never gave me such "WOW" or "COOL" feeling, just don't care about an extra steps that could automate A LOT for what I need ( e.g. runtime minification AND runtime tests for both console or web )

kekscom said...

I wouldn't expect a full unit tested + minified build by just hitting F5. It's just the quick check thing.
I care for the extra second during fine tuning and I apreciate a full build from time to time.

Andrea Giammarchi said...

this works with server side JS code too, something that has nothing to do with F5

If you have tests in places those tests would tell you if it's ok or not, at least if you follow TDD

if you don't, once again, you can add a "open webpage.html" at the end of the process so that rather than look and click for F5 you simply build and see the result.

For the last time, this builder is something you'll get use to same way I did and many others behind a build process ... F5 is not part of the process BUT you can add/simulate same behavior, this is the cool part.

Andrea Giammarchi said...

also I realized now who you are and I am surprised you are not familiar with a build process ... that's what we do on daily basis, don't you? I hope so, also because minification can produce unexpected results.

Forget F5, F5 in a browser is not why JavaScript is cool or anything, is just laziness :P

Andrea Giammarchi said...

another thing ... if you don't minify, the build takes zero.

If you use always same browser, you can create a plugin that trigger automatically the build before the page is loaded ... so, this is not necessarily work related stuff first, secondly this is not browser only since 3/4 of my recent projects are node.js compatible and 4/4 of them use a builder ... it has never been a problem for me, surely developers at the very beginning may have your own problem but get use to and it will never occur again.

Andrea Giammarchi said...

as possible solution, I have just thought that if you have a node.js server to trigger the build automatically each visit before producing the layout is also trivial ...

Andrea Giammarchi said...

kekscom, have a look at this, which does what you want and probably more

kekscom said...

This was never intended to be a big thing :-)
I am indeed talking about laziness - see above.
Common developmentcases - agreed with you.
Just to give you an idea what I really meant: imagine a visualization project where you have a huge amount of fine tuning and it needs very frequent checks, you may care about the extra second.
But hey, you don't need to.
Come on, grab one from the red box tomorrow :-)
Once you

Andrea Giammarchi said...

for me it's automatic to make the change, pass through the console via command tab and press enter whine I go to the browser and update ... I know is not a big thing for you but you are not the first one that talk about the advantage of F5 and I never really got it ... the opposite, I wish all build processes where easy as this one, you know what I mean ;)

See you tomorrow mate

Misha Reyzlin said...

Hey Andrea,
regarding the prototype extension, what about using a popular pattern such as:
WrapperPrototype = Wrapper.prototype;
_.extend( WrapperPrototype, {
method1 : function() { //… },
method2 : function() { //… }
});

Or even having a reference to that object that is being used to extend prototype. Would you place it in `var` folder or still in prototype folder?

Andrea Giammarchi said...

that is not popular to me, plus it does not matter as long as you distribute the object in more file rather than having 1000 lines of prototype to read, scroll, change when necessary.

The builder is not about patterns, is about distributing JS code which is really not common technique because builders assume each file has valid syntax as stand alone but JS files are massive and this is bad.

Look jQuery, the most adopted library has a build process few developers understood and is probably the part I like most of that library :D ( since I don't use it )

Andrea Giammarchi said...

_.extend is ok in the prototype folder as long as it's natural to find it there ... if you define the prototype with extend for whatever reason then place a definition.js inside prototype folder and do everything there.

If you have 1000 lines split the prototype and make your life a bit more organized ;-)

it pays back after a while