My JavaScript book is out! Don't miss the opportunity to upgrade your beginner or average dev skills.

Saturday, August 13, 2011

How To JSONP A Static File

Update I have discussed this object a part and I agree that the url could be used as unique id as well.
In this case the server should use the static url as unique id:

StaticJSONP.notify("http://cdn.com/static/article/id.js",{..data..});

So that on client side we can use the simplified signature:

StaticJSONP.request(
"http://cdn.com/static/article/id.js",
function (uid, data) {
}
);

The callback will receive the uid in any case so that we can create a single callback and handle behaviors accordingly.
The script has been updated in order to accept 2 arguments but, if necessary, the explicit unique id is still supported.



Under the list of "incomplete and never posted stuff" I found this article which has been eventually reviewed.
I know it's "not that compact" but I really would like you to follow the reason I thought about a solution to a not so common, but quite nasty, problem.

Back in 2001, my early attempts to include callbacks remotely were based on server side runtime compilation of some JavaScript data passed through a single function.

<?php // demo purpose only code

// do something meaningful with server data

// create runtime the output data
$output = '{';
foreach ($data as $key => $value) {
$output .= $key.':"'.$value.'"';
}
$output .= '}';

echo 'jsCallback('.$output.')';

?>

Above technique became deprecated few years ago thanks to the widely adopted JSON protocol and its hundreds of programming languages native/coded implementations.
Moreover, above technique became the wrong way to do it thanks to a definitively better solution as JSONP has been since the very beginning.
Here an example of what JSONP services do today:

<?php // still demo purpose only code

echo $_GET['callback'].'('.json_encode($data).')';

?>


JSONP Advantages

The callback parameter is defined on the client side, which means it can be "namespaced" or it can be unique per each JSONP request.
If we consider the first example every script in the page should rely into a single global jsCallback function.
At that time I was using my code and my code only so problems like conflicts or the possibility that another library would have defined a different jsCallback in the global scope were not existent.
Today I still use "my code and my code only" :D when it comes to my personal projects, but at least I am more than ever aware about multiple libraries conflicts the primordial technique may cause, even if all these libraries are my own one.

JSONP Disadvantages

Well, the same reason that makes JSONP powerful and more suitable technique, is the one that could make JSONP the wrong solution.
If we still consider the first code example, nobody could stop me to be "really smart" and precompile that file into a static one.

// static_service.js by cronjob 2011-08-14T10:00:00.000Z
jsCallback({category:'post',author:'WebReflection',title:'JSONP Limits'});

While precompiled static content may be or may be not what we need for our application/service, it is clear that if no server side language is involved the common JSONP approach will fail due limitations of "the single exit point" any callback in the main page depends on: the jsCallback function.

Advantages Of Precompiled Static Files

The fastest way to serve a file from a generic domain is a static one.
A static file can be both cached into disk memory, rather than be seek and retrieved each time, or directly into server RAM.
Also a static file does not require any programming language involved at all and the only code that will be executed will eventually be the one in charge of serving the file over the network, aka: the HTTP Server.
The most common real world example about static files is represented by a generic CDN where the purpose is indeed to support as many requests per second as possible and where static files are most likely the solution.
The only extra code that would be eventually involved is the one in charge of statistics on the HTTP Server layer but every file can be easily mirrored or stored in any sort of RAID configuration and be served as fast as possible.

Another real world example could be a system like blogger.com where pages do not necessarily need to be served dynamically.
Most of the content in whatever blog system can be precompiled runtime and many services/blog applications are doing it indeed.

Same is for any other application/service that does not require real times data computations and different cron job behind the scene are in charge of refreshing the content every N minutes or more.
If we think about any big traffic website we could do this basic analysis:

# really poor/basic web server performances analysis

# cost of realtime computation
1% of average CPU + RAM + DISK ACCESS per each user
# performances
MAX_USERS = 100;
AVERAGE_MAX_USERS = 100;

# cost of a threaded cron job
20% of average CPU + RAM + DISK ACCESS per iteration
# cost of static file serving
0.1% of CPU + RAM + DISK ACCESS per user
# performances
MAX_USERS_NOCRON = 1000;
MAX_USERS_WHILECRON = 800; # MAX_USERS_NOCRON - 20%
AVERAGE_MAX_USERS = 900;


If we consider that we may chose to delegate the cronjob to a server a part behind the intranet and the only operation per each changed static file will be a LOCK FILE $f EXCLUSIVE, WRITE NEW CONTENT INTO $f, UNLOCK FILE $f EXCLUSIVE so that basically only the DISK ACCESS will be involved, we can even increase AVERAGE_MAX_USERS to 950 or more.
I know this is a sort of off topic and virtual/conceptual analysis but please bear with me, I will bring you there soon.

Static Content And RESTful APIs

There is a huge amount of services out there based on JSONP. Many of them requires realtime but many probably do not. Specially in latter case, I bet nobody is implementing the technique I am going to describe.

A Real World Example

Let's imagine I work for Amazon and I am in charge of the RESTful API able to provide any sort of article related data.
If we think about it, a generic online shopping cart article is nothing more than a group of static info that will rarely change much during the day, the week, the month, or even the year.
Do online users really need to be notified realitme and per each request about current user rating, reviews, related content, article description, author, and any sort of "doesn't change so frequently" related to the article itself? NO.
The only field that should be as much updated as possible is the price but still, does the price change so frequently during the lifecycle of an Amazon article? NO.
Can my infrastructure be so smart that if, and only if, a single field of this article is change the related static file could be updated so that everybody will receive instantly the new info? YES.
... but how can do that if JSONP does not scale with static files ?

My StaticJSONP Proposal

The only difference from a normal JSONP request is that passing through the callback call any sort of library should be able to be notified.
Being the client side library in charge of creating the requested url and having the same library knowledge about what is going to be received and before what is going to ask, all this library needs is to be synchronized with the unique id the static server file will invoke. I am going to tell you more but as quick preview, this is how the static server file will look:

StaticJSONP.notify("unique_request_id", {the:response_data});


Server Side Structure Example

Let's say we would like to keep the folder structure as clear as possible. In this Amazon example we can think about splitting articles by categories.

# / as web server root

/book/102304.js # the book id
/book/102311.js
/book/102319.js

/gadgets/1456.js
/gadgets/4567.js

A well organized folder structure will result in both better readability for humans and easier access for most common filesystems.
Every pre compiled file on the list will contain a call to the global StaticJSONP object, e.g.

// book id 102311
StaticJSONP.notify("amazon_apiv2_info_book_102311",{...data...});


The StaticJSONP Object

The main, and only, purpose of this tiny piece of script that almost fits in a tweet once minzipped (282 bytes) is to:

  • let any library, framework, custom code, be able to request a static file

  • avoid multiple scripts injection / concurrent JSONP for the same file if this has not been notified yet

  • notify any registered callback with the result


Here an example of a StaticJSONP interaction on the client side:

var
// just as example
result = [],

// library 1
client1 = function (uri, uid, delay) {
function exec() {
StaticJSONP.request(uri, uid, function (uid, evt) {
result.push("client1: " + evt.data);
});
}
delay ?
setTimeout(exec, delay) :
exec()
;
},

// library 2
client2 = function (uri, uid, delay) {
function exec() {
StaticJSONP.request(uri, uid, function (uid, evt) {
result.push("client2: " + evt.data);
});
}
delay ?
setTimeout(exec, delay) :
exec()
;
}
;
// library 1 does its business
client1("static/1.js", "static_service_1", 250);
// so does library 2
client2("static/2.js", "static_service_2", 250);

setTimeout(function () {
// suddenly both requires same service/file
client1("static/3.js", "static_service_3", 0);
client2("static/3.js", "static_service_3", 0);

setTimeout(function () {
alert(result.join("\n"));
}, 500);
}, 1000);

It is possible to test the live demo ... just wait a little bit and you will see this alert:

// order may be different accordingly
// with website response time x file
client1: 1
client2: 2
client1: 3
client2: 3

If you monitor network traffic you will see that static/3.js is downloaded only once.
If the response is really big and the connection not so good ( 3G or worse than 3G ) it may happen that same file is required again while the first request is not finished yet.
Since the whole purpose of StaticJSONP is to simplify server side life any redundant request will be avoided on the client side.

The Unique ID ...

StaticJSONP can be easily integrated together with normal JSONP service.
As example, if we need to obtain the list of best sellers, assuming this list is not static due too frequent changes, we can do something like this:

// this code is an example purpose only
// it won't work anywhere

// JSONP callback to best sellers
JSONP("http://amazon/restful/books/bestSellers", function (evt) {
// the evt contains a data property
var data = evt.data;

// data is a list of books title and ids
for (var i = 0, li = []; i < data.length; i++) {
li[i] = '<a href="javascript:getBookInfo(' + data[i].id + ')">' + data[i].title + '</a>';
}

// show the content
document.body.innerHTML = '<ul><li>' + li.join('</li><li>') + '</li></ul>';


});

// the function to retrieve more info
function getBookInfo(book_id) {
StaticJSONP.request(

// the url to call
"http://amazon/restful/static/books/" + book_id + ".js",

// the unique id accordingly with the current RESTful API
"amazon_apiv2_info_book_" + book_id,

// the callback to execute once the server respond
function (uid, evt) {
// evt contain all book related data
// we can show it wherever we want
}
);
}

Now just imagine how many users in the world are performing similar requests right now to the same list of books, being best sellers ...

Unique ID Practices

It is really important to understand the reason StaticJSONP requires a unique id.
First of all it is not possible, neither convenient, to "magically retrieve it from the url" because any RESTful API out there may have a "different shape".
The unique id is a sort of trusted, pre-agreeded, and aligned information the client side library must be aware of since there is no way to change it on the server side, being the file created statically.
It is also important to prefix the id so that debugging will be easier on client side.
However, the combination to generate the unique id itself may be already ... well, unique, so it's up to us on both client and server side to define it in a possibly consistent way.
The reason I did not use the whole uri + id info on StaticJSONP request method is simple:
if both gadgets/102.js and books/102.js contains a unique 102 id there is no way on the client side to understand which article has been required and both gadgets and books registered callbacks will be notified, one out of two surely with the wrong data.
It's really not complicated to namespace a unique id prefix and this should be the way to go imho.

Conclusion

It's usually really difficult to agree unanimously to a solution for a specific problem and I am not expecting that from tomorrow everyone will adopt this technique to speed up server side file serving over common "JSONP queries" but I hope you understood the reason this approach may be needed and also how to properly implement a solution that does not cause client side conflicts, that scales, that does not increase final application size in any relevant way, and it's ready to go for that day when, and if, you gonna need it. Enjoy

7 comments:

Aadaam said...

I see Andreas may have asked your help on some projects he inherited... :)

Just 3 notes, perhaps useable for all readers:
- I always feared that the reason the callback isn't static is some kind of XSS attack; I mentioned it in every place possible, that if anyone knows about such tell me and once I even asked you

- A big disadvantage of JSONP is that a script inclusion will pretty much block a javascript-heavy webapp; that's why I worked hard to try to minimize request numbers in exchange for static file generation time and CDN space which seemingly wasn't followed later and effectively blocks the site.

- Cache issues could - and should! - be solved by using client version (eg. buildnumbers) as query parameters, which will be ignored by the server but may still affect cache/proxy behaviour

And, as always, you can use ESI to make dynamic JSON handling over static files on a CDN but it'll cost a fortune each week, and we all know how reliable ESI is and how akamai likes to maintain it...

lifesinger said...

If you use a module loader such as seajs.com, the jsonp is very simple:

// http://path/to/a.js
define({ name: 'John' });

var data = require('http://path/to/a.js');
alert(data.name);

the uri is the unique resource id. It is simple and DRY.

Andrea Giammarchi said...

hey @Adaam, to be honest Andreas has really nothing to do with this.
I have personally experimented this technique in UK already in order to avoid DB queries on an admin panel over the most complex DB I have ever seen.

I simply never finished the post but yes, it came out I discussed about this with another dev recently, that's why I decided to finish.

About your points:
- thanks lord the JSONP callback is not static or any page that uses a REST API would not be able to have two indipendent components/libraries based on the same API (who define the callback name? how script 2 can be notified if the callback has been already defined?)
This approach solves all these conflicts problem but does not make sense if the file is not static and you can define your own callback

- a JavaScript injection is non blocking so I don't understand your second point. That's again the whole advantage of JSONP and the reason you need a callback to be notified ... JSONP does not slow down while synchronous XHR do ( asynchronous do not slow down same as JSONP doesn't )

- cache issue does not exist. The whole reason you wold create static files is to use client cache as much as possible. Being the matter completely server side, cache should never be solved on the client one sine the server can send the correct header included the file creation time which is already unique.

For static files I would never suggest to invalidate the cache on the client or the whole thing won't make sense anymore, just think about current CDN and jquery library ... those are statics and if you ask those file with a random query string to avoid cache you basically destroyed all advantages about using the CDN across the web.

Same would be with repeated queries to generic REST services across the web ( amazon, twitter, etc )

If something changes on the server a new URL with the api version will be the right solution, not on client side.

About ESI once again any extra layer is not necessary or the whole point of having static files will be meaningless.

This a part, hope you are doing well mate :)

Andrea Giammarchi said...

@lifesinger that is so DRY that it won't work at all. Please read again the post and try to understand that synchronous blocking operations are not even mentioned.

More over, try to imagine that this content:
define({ name: 'John' });

is present in both
domain1.com/a.js
and
domain2.com/a.js

now if you found a solution to notify more than a callback without messing up with data because the file content and the callback name used there is the same ... that would be cool, but I bet you did not and please prove me wrong, thanks.

Andrea Giammarchi said...

@lifesinger I pressed reject by mistake via phone ... here your reply:

-------------------------

@Andrea:

It is possible to work properly.

// http://domain1.com/a.js
define({ name: 'a' });

// http://domain2.com/a.js
define({ name: 'a' });

// biz.js
define(function(require, exports, module) {
var data = require('http://domain1.com/a.js');
// do sth. with data
});


// biz2.js
define(function(require, exports, module) {
var data2 = require('http://domain2.com/a.js');
// do sth. with data2
});

All is ok. You can try it with sea.js or requirejs. It is simple enough and have beed used in some large projects.

-------------------------

So it does NOT work because it IS synchronous, then blocking, then not suitable as JSONP alternative over static files on the *client* side, got it?

lifesinger said...

@Andrea:

Actually, it is asynchronous and non-blocking.

data.json and biz.js can be downloaded in parallel, and the module loader can ensure the executation order.

/* init.js */
define(function(require, exports) {
var data = require('path/to/data.json');
var biz = require('path/to/biz.js');

biz.doSomeThingWith(data);
});

Once the init.js is loaded, the data.json and biz.js will downloaded in parallel. When all required modules are downdloaded, then the init.js will execute. All scripts are downloaded asychronously, and executed according to the dependencies tree.

Andrea Giammarchi said...

I don't understand ... if this is possible:

var data = require('http://domain1.com/a.js');
data.whatever();

it cannot be asynchronous ... I will look at the lib later in any case, cheers.