NWMatcher, A Selector Monster
First of all, I did not know or remember this project and I must congrats with Diego Perini for different reasons.The first one is that Diego used almost all best and unobtrusive technique to make it as reliable as possible, which is good, the second one is that Diego introduced something new to achieve its goal: create a Ferrari matcher via pre-compiled functions.
About NWMatcher
It is a lightweight stand alone library (so far, but already namespaced) which aim is to standardize some common DOM related task. The initial aim was to create a faster way to understand if a node matches a specific selector, something truly useful with event delegation.About The Technique
To make things as fast as possible, Diego used a pre compilation strategy which is absolutely brilliant. Without XPath transformation, which is also not fully compatible with CSS in any case, and taking care of every single browser "gotcha", Diego creates once, and never again, the function able to perform selector related tasks. The result is that first call a part, with a new selector, the match method will perform a light speed!This is the notable good part of NWMatcher, and now there's the bad one (not that bad in any case)
Few Inconsistencies
Except a weird procedure to understand if array slice could be performed over nodes, something I always got in 3 lines:
var slice = Array.prototype.slice;
try{slice.call(document.childNodes)}catch(e){slice=function(){
// whatever to emulate slice for this scope
}};
Diego uses a callback to understand native methods from fakes. This is generally good, and this function is isNative, but since these kind of checks are for greedy developers with global pollution maniac, I cannot understand why at some point there is a root.addEventListener ? ... and no checks if addEventListener is native one, something that could make the entire caching system useless or able to generate errors. OK, that would be silly, I mean to inject an event like that, impossible to emulate in Internet Explorer, but who knows what a library could do with such event ...
Another inconsistency is about being unobtrusive, goal reached 99.9% ... what is that, a public property attached directly in the used context? We need to be aware that the document will probably have a snapshot, plus an isCaching property, not a drama, but I think those are the exception that confirm the rule.
Another thing I'll never get is the usage of new in front of functions which aim is to return a non primitive value.
function A(){
return {};
};
Above function could be called with or without new and the result will be always the same, the returned object. This simply means that if we use new we are consuming CPU and RAM for no reason. So why a performances based library should not take care of this simple fact?
// example
function Snapshot(){
// here this exists
// and it has a prottype attached
// and it has been internally initialized
// to be used as Snapshot instance
// we return an instanceof Object?
// well done, so the garbage collector
// has to consider this instance
// which is a ghost instance, never used
// but it has been created
return {
// whatever ...
};
};
// let every comment inside the function
// make sense simply adding new
var snap = new Snapshot();
// a non-sense, imho, since this
// would have produced exactly the same
// without instance generations
var snap = Snapshot();
That's it. This stuff is really simple to get with C, C++, or those languages where we have to declare types, names, etc etc, and a new in front of a function is not transparent, is something to think about, cause a new is a new, we are asking for an instance, and a function able to return always another value, rather than the generated instance, shouldn't be called via new.
Diego, I am sorry, I am using this irrelevant gotcha simply because I wrote about these things dunno how many times ... please remove every new Snapshot from your code, or just use the instance, attaching proeprties.
Snapshot =
function() {
this.isExpired = false;
this.ChildCount = [ ];
this.TwinsCount = [ ];
this.ChildIndex = [ ];
this.TwinsIndex = [ ];
this.Results = [ ];
this.Roots = [ ];
}
Makes sense? For performances reason, I still suggest to simply remove every new, leaving just Snapshot ... OK, that is probably more than I was planning to say ...
Mutation Events Side Effects
For compatible browsers NWMatcher uses mutations events such DOMAttrModified, DOMNodeInserted, and DOMNodeRemoved. The cache is partially arbitrary, activated by defaults, deactivated via setCache(false).Mutations events are used to cache results. Why mutation events? Let's say in a single function is so common, specially in jQuery world as example, to search the same bloody thing hundreds of time ...
$("body").doStuff($whatever);
$("body").doOtherStuff($whatever);
$("body").doSomethingElse();
Above way to code is present in dunno how many examples, articles, books, and is as elegant as illogical. If we need a collection, I know we all like the $("thingy") style, but it's that difficult to write code such:
I am sure it is difficult, and Diego knows this stuff better than me, indeed he is using result cache to avoid repetitive expensive searches over potentially massive structures as a DOM could be, in order to guarantee best returned results performances. So far, so good ... and mutations events, attached to the root of the DOM, the document, are able to clear the cache ... how? via a callback. Simple? Sure, it is simple ... but there is a little detail we should consider.
var $body = $("body"); // that's it, wrapped once, SPEED!
$body.doStuff($whatever);
$body.doOtherStuff($whatever);
$body.doSomethingElse();
Nowadays, Ajax based Web applications are constantly muted, eachAjax call aim is to change something, show something, or remove something. With these perpetual changes whatever event is attached into a mutation event will be fired dozen of times but Diego, which is not the last JavaScript hacker, found a way to partially avoid this problem. He removes the mutation so the first fired one, will avoid same operations for other N times. This is good, but there is still an event to manage, internally, and a callback to call, but the bad part is that for each select, those events have to be reassigned, or checked, again (isCached attaced property).
Talking about performances, we need to consider the real case scenario, rather than a static benchmark over the same selection, or couple of selections, where nodes are, as I have said, constantly changed via innerHTML, innerText, appendChild, removeChild, insertBefore, insertAfter, all operations probably present in every single function of our JavaScripted page.
Now, is it truly worth it to cache results? Why, if it was the key, browser vendors are not caching results when we perform a getWhateverByWhatever or querySelectorAll?
Is our library truly that slow that a couple of searches performed consecutively are such big problem so we need to increase libraries complexities, side effects, and browser problems/behaviors, for something that on daily basis will be used I guess 5% of the time due to constant DOM manipulation?
5 comments:
Thanks for the writeup Andrea. A couple of notes: He uses a more complete check of the `slice` method because as he points out on line 90 native slice will bug out in at least Opera 9.27 when the nodeList contains an element with the id="length".
I think you have a valid point about memory consumption and more tests will have to be done. I would not be against having some kind of API to manage the compiled function cache.
Thanx
JDD
Andrea,
thank you for the nice post, I am currently changing the "new Snapshot()" stuff as for your good suggestions (saving memory).
The "slice()" feature testing has to remain the way it is since a bug in older Opera browsers will fold the test you suggest.
I am currently rewriting the core iteration/recursion of the select() method to better parse and maybe gain some extra milliseconds.
Just some notes...the results caching system is only used in the select() method and can easily be turned off through the setCache(true / false)
method I made available there.
The match() method only saves the compiled functions for reuse, no caching necessary there.
It is also very easy to completely remove the caching system from my code, it is currently used only in Firefox and Opera browsers, the ones currently supporting Mutation Events.
Diego
Hi Diego, about match and cache I meant the pre-compiled lambdas. I think should be wort it, specially with new memory tools, keep a session under control. Maybe there could be a timer every 5 minute able to empty the internal object with cached selectors related lambdas.
What do you think? Simple and easy to implement, and it won't change performances, while it will preserve memory usage during long sessions.
Thank you for the good job, I'll stay tuned.
Regards
I've always wondered why selector engines don't usually cache functions...great job!
@Diego
Since its all about speed why not cache "parts.length" in compileGroup (line 419) before iterating?
Oh, "i++" on line 1043 should probably be "++i". Its faster and since you are not using the value of "i" before incrementing there is no reason to return it...right?. Or do you know something I don't?
I look forward to seeing more of your work.
David,
your suggestions are both correct, but the code have changed in my local development branch...
I am lazily committing all the patches collected until now thanks to users suggestions/feedbacks.
In current work, I have changed that "for loop" in a "while loop" (419), while in the other part you suggested (1043) the byTags() have disappeared from the current code (was too ugly) but will probably be re-introduced later (rewritten).
Thank you for your interest, be aware that I recently added "callback" functionality to NWMatcher and I believe that to be a really and useful new approach.
Examples:
NW.Dom.select("table > tr td", document, [ ], function(element) { element.style.border = '1px solid black'; });
NW.Dom.select("div", document, [ ], Element.extend);
Regards,
Diego
Post a Comment