Sunday, August 19, 2012

Why JSON Won ... And Is Good As It Is

I keep seeing developers complaining about different things with JSON protocol and don't get me wrong, I've been the first one trying to implement any sort of alternative starting from JSOMON and many others ... OK?

Well, after so many years of client/server development is not that I've given up on thinking "something could be better or different", is just that I have learned on my skin all reasons JSON is damn good as it is, and here just a few of these reasons.

Reliable Serialization ?

No, 'cause YAGNI. There are few serialization processes I know that kinda work as expected and since ever, PHP serialize is a good example.
Recursion is not a problem, is part of the serialization process to solve it, as well as classes together with protected and private properties. You can save almost any object within its state, even if this object won't be, as reference, the same you serialized .. and I would say: of course!
There are also two handy methods, __sleep and __wakeup, able to let you save an object state in a meaningful way and retrieve it back or perform some action during deserialization.

Are these things available in JSON ? Thanks gosh NO! JSON should not take care of recursive objects ... or better, it's freaking OK if it's not compatible 'cause recursion is a developer matter or issue, not a protocol one!
All JSON can do is to provide a way to intercept serialization so that any object with a .toJSON() method can return it's own state and any time JSON.parse() is performed, it could bring back, if truly necessary, its recursive property.

So, at the end of the day, JSON implementations might provide already a similar way to __sleep and __wakeup objects but it should be the JSON string owner, the service, the developer, to take care of these problems, and simply because ....

Universal Compatibility

JSON is a protocol and as a protocol it should be as compatible as possible with all languages, not only those C like or others with similar comments ... there won't be comments ever in JSON, 'cause the moment you need comments, you don't need a transport protocol 'cause programming languages have always ignored developers comments ... and also, for compatibility reasons, not all programming languages would like to have // or /* */ or even # as inline or multiline comment ... why would they?

Specially in .NET world most of documentation is written in a pseudo XML, can you imagine you bothering yourself to write such redundant markup language to write something often ignored by developers ? Would you like to have that "crap" as part of the data you are sending or receiving via JSON as part of that protocol? I personally don't ... thanks! 'cause I believe a transport protocol should be as compact as possible and without problems.
Here JSON wins once again 'cause it's compatible, with its few universal rules, with basically everything.

Different Environments

This is the best goal ever reached from a protocol, the fact that every programming language can represent somehow what JSON transports.
Lists, Arrays, Dictionaries, Objects, Maps, Hashes, call them as you want, these are the most used and cross language entities we all deal with on daily bases, together with booleans, strings, and numbers.

OK, OK, specially numbers are quite generic but you might admit that the world is still OK with a generic Int32 or Float32 number and with 64bits compatible environments, these numbers could be of a different type but only if you will never deal with 32 bits environments ... make you choice ... you want a truly big number? Go for it, and loose the possibility to "talk" with any other 32 bit env ... not a big deal if you own your data, kinda pointless memory and CPU consumption if you deserialize everything as 64 bits ... but I am pretty sure you know what you are doing so ... JSON is good in that case too.

No Classes

And again thanks gosh! You don't want a protocol that deals with classes, trust me, 'cause you cannot write a class in all possible programming languages, can you? If you can, even in those programming languages where classes never existed 'cause classes are simply an abstract concept represented by the word "class" but representable in billion ways with other languages (e.g. via just objects in JavaScript).
Classes and namespaces issues, if you want, are there in any case.
The good part of JSON, once again, is the ability to intercept serialize and unserialize process so that if you like to send instances, rather than just objects, you can use all tools provided by the implementation, and I am showing in this case a JavaScript example;

function MyClass() {
// doesn't matter what we do here
// for post purpose, we do something
this.initialized = true;
}
MyClass.prototype.toJSON = function () {
this.__class__ = "window.MyClass";
return this;
};

var myClassObject = JSON.stringify(new MyClass);
// "{"initialized":true,"__class__":"window.MyClass"}"

Once we send this serialized version of our instance to any other client, the .__class__ property could be ignored or simply used to understand what kind of object was it.

Still in JavaScript, we can deserialize easily the string in such way:

function myReviver(key, value) {
if (!key) {
var instance = myReviver.instance;
delete instance.__class__;
delete myReviver.instance;
return instance;
}
if (key == "__class__") {
myReviver.instance = myReviver.createInstance(
this, this.__class__
);
}
return value;
}

myReviver.createInstance = "__proto__" in {} ?
function (obj, className) {
obj.__proto__ = myReviver.getPrototype(className);
return obj;
} :
function(Bridge) {
return function (obj, className) {
Bridge.prototype = myReviver.getPrototype(className);
return new Bridge(obj);
};
}(function(obj){
for (var key in obj) this[key] = obj[key];
})
;

myReviver.getPrototype = function (global) {
return function (className) {
for (var
Class = global,
nmsp = className.split("."),
i = 0; i < nmsp.length; i++
) {
// simply throws errors if does not exists
Class = Class[nmsp[i]];
}
return Class.prototype;
};
}(this);

JSON.parse(myClassObject, myReviver) instanceof MyClass;
// true

Just imagine that __class__ could be any property name, prefixed as @class could be, or with your own namespace value @my.name.Space ... so no conflicts if more than a JSON user is performing same operations, isn't it?

Simulating __wakeup Call

Since last example is about __sleep, at least in JavaScript easily implemented through .toJSON() method, you might decide to implement a __wakeup mechanism and here what you could add in the proposed revival method:

function myReviver(key, value) {
if (!key) {
var instance = myReviver.instance;
delete instance.__class__;
delete myReviver.instance;
// this is basically last call before the return
// if __wakeup was set during serialization
if (instance.__wakeup) {
// we can remove the prototype shadowing
delete instance.__wakeup;
// and invoke it
instance.__wakeup();
}
return instance;
}
if (key == "__class__") {
myReviver.instance = myReviver.createInstance(
this, this.__class__
);
}
return value;
}

Confused ? Oh well, it's easier than it looks like ...

// JSON cannot bring functions
// a prototype can have methods, of course!
MyClass.prototype.__wakeup = function () {
// do what you need to do here
alert("Good Morning!");
};

// slightly modified toJSON method
MyClass.prototype.toJSON = function () {
this.__class__ = "window.MyClass";
// add __wakeup own property
this.__wakeup = true;
return this;
};

Once again, any other environment can understand what's traveling in therms of data, but we can recreate a proper instance whenever we want.

How To Serialize

This is a good question you should ask yourself. Do you want to obtain exactly the same object once unserialized? Is that important for the purpose of your application? Yes? Follow my examples ... no? Don't bother, the less you preprocess in both serializing and unserializing objects, the faster, easier, slimmer, will be the data.

If you use weird objects and you expect your own thing to happen ... just use tools you have to intercept before and after JSON serialization and put there everything you want, otherwise just try to deal with things that any other language could understand or you risk to think JSON is your own protocol that's missing this or that, while you are probably, and simply, overcomplicating whatever you are doing.

You Own Your Logic

Last chapter simply demonstrates that with a tiny effort we can achieve basically everything we want to ... and the cool part is that JSON, as it is, does not limit us to create more complex structures to pass once stringified or recreate once parsed and this is the beauty of this protocol so please, if you think there's something missing, think twice before proposing yet another JSON alternative: it works, everywhere, properly, and it's a protocol, not a JS protocol, not a X language protocol ... just, a bloody, protocol!

Thanks for your patience

19 comments:

Kyle Simpson said...

FWIW, this is why I was trying to suggest to Crockford that JSON.stringify() should ignore circular references (just like it ignores functions) if it encounters them while serializing an object.

JSON-js issue #39

I have written a generic Object.prototype.toJSON() which does exactly this task: prefilter an object to remove/ignore any object references which create circular references.

Object.prototype.toJSON()

Andrea Giammarchi said...

Kyle ... 6th June 2005 ... and I still agree.

There's no magic behind serialization ... we, as developers, must understand it and avoid it when it comes to data transports ... then we can recreate it and use it to create any sort of memory leak in our code without caring ... but no, I don't like anyone creating any sort of Object.prototype.toJSON method ... EVER! I know what I am doing and I would never implement such solution for my lack of logic, architecture, or time 'cause I could not use my own toJSON() for my own objects if it's a bout removing circular references ...

Last, but not least, if I don't know I have circular references in my code, I am doing it wrong, whatever it is.
If I know it ... I know how to avoid them, adding some extra logic that does not need such generic solution for everyone else.

Sorry but this is my take on any Object.prototype thingy

Andrea Giammarchi said...

There's no magic behind serialization .. I meant, behind serialization over recursion

Kyle Simpson said...

It's easily avoided by running your for-loops with the, ironically insisted on by crockford-jslint, `hasOwnProperty()` guard, as I do, always, if you do a for-in style loop.

for (var i in obj) { if (obj.hasOwnProperty(i)) {

// do stuff

}}

I think any responsible developer, especially one who writes code that may be used in other environments besides their own, is already aware of this issue and this simple fix. They're especially aware of it if they use a linter which warns about it.

> If I know it ... I know how to avoid them, adding some extra logic that does not need such generic solution for everyone else.

I explained my two use cases for having a generic `toJSON` to use in conjunction with `JSON.stringify()`, in this comment.

Andrea Giammarchi said...

the hasOwnProperty() forced check for each bloody fo/in loop is the reason I don't like libraries that pollute Object.prototype and never used them ... hasOwnProperty per each check and per each object in an application slows down everything 100X and I write mainly for mobile ... i don't want this crap within my code, if I have Object.prototype.something anywhere, I drop that code 'cause it cannot be good no matter what it does, imho

Andrea Giammarchi said...

I forgot ... I don't personally use JSLint neither ... and NO, I don't personally want to write hasOwnProperty per each loop ... I discourage everyone to use any library that pollutes Object.prototype ... even if some example, for posting purpose, could have been showed here.

Kyle Simpson said...

I similarly don't agree with, in general, extending the Object prototype (or any other native). I've spoken out against it too many times to count.

But I think `toJSON()` is a very special case, and as such that general position does not necessarily apply. Blindly applying any "rule" (no matter how well intended that rule is) without thinking about the specific circumstances is, IMHO, short-sighted.

You're so concerned about having to write that guard into all your for-in loops? Easily fixed:

Object.defineProperty(Object.prototype,"toJSON",{enumerable:false});

I just don't see it being such a big deal in this one specific special use-case. `toJSON()` only exists for the purpose of JSON serialization (by `JSON.stringify()`).

It's not at all the same thing as suggesting to add methods to `String` or whatever.

Andrea Giammarchi said...

the defineProperty hint is something I don't want to deal with, eventually something your proposal should do by default but again, I might not want your proposal to work with my generic objects that I did not, on purpose, decide to make safe ... all I am saying, toJSON is indeed a very special case I am sorry i don't want to give you the ownership (you as your code) to deal with my objects ... maybe I want recursion to be there 'cause I don't want that object to be used with any transport protocol? Maybe I want a different behavior? Iwould agree with your code if I could chose if I need it ... if it's a must have 'cause I don't care about recursion ... then I don't want it, 'cause I do care and maybe that recursion was a mistake, a bug, a problem, I want to deal it, and not want to solve it magically 'cause someone introduced that Object.prototype.toJSON method.

Leave objects as they are, reuse that method whenever you want ... you need 1 callback, and the ability to attach it to any object that would like to do not care about recursion during serialization.

All objects ... oh well, that's a bit too much, some dev knows what he's doing ;-)

Andrea Giammarchi said...

also, as you already introduced defineProperty over an Object.prototype, think how many would feel OK introducing not enumerable, writable, and configurable, Object.prototype.whatever so that everyone else is screwed if that method does not provide what is expected or is too obtrusive as toJSON could be, since toJSON is as obtrusive as toString, and valueOf ... DO NOT TOUCH IT, thanks!

Kyle Simpson said...

> Iwould agree with your code if I could chose if I need it...

I started that whole run-in with Crockford because I specifically wanted to add the "drop recursion" behavior natively to `JSON.stringify()`, and I said it should probably be opt-in with a flag. So I agree with that perspective. But it's clear what Crockford thinks about that.

So, is it a lesser evil to have to use your own non-standard serializer instead of the built-in `JSON.stringify()`, or to monkey-patch objects so that the built-in standard parser behaves as desired?

Tough call, but I think the latter is slightly better.

Andrea Giammarchi said...

actually, I wonder if my next post should be Objet.freeze(Object.prototype) as mandatory very first script you write in any web page

Andrea Giammarchi said...

for performance? sure that would be but again, it's a protocol, not a "forget you are a developer, bring it on, I'll decide what's good or not for you" method ... throwing errors natively on recursion rather than implementing your own recursion checker ... that's cool for me already :D

Anonymous said...

I don't mean to sound facetious, but I am interested to know why you refer to JSON as a protocol?

I realize that you could make it fit the definition of 'protocol' but I wonder what value that adds?

And given you say JSON is a protocol, therefore this and that, I suppose it is important to understand why you use that term.

Thanks.

Andrea Giammarchi said...

as transport data protocol? as data schema to transport data? as "it could be json:{valid_json}" ?

Anonymous said...

...JSON dont support binary data. Having to Base64 encode/decode data or use alternative pathways sucks. A 'real protocol' should be data agnostic.

Andrea Giammarchi said...

it is data agnostic, the fact you need base64 is because http urls expect that, not because JSON is missing something. You know what I mean?

It's like saying that http is not a protocol because it does not support binary on query strings, cookies, etc etc ... don't you think?

Anonymous said...

I dont agree ;) since http supports binary transfers yet JSON doesn't since the attribute values (with binary data) would corrupt the json payload. And query strings are just sort of pointers to data, like a filename. I don't follow you on why the payload has anything to do with the format of the query strings. JSON is part of the http payload and not the urls, right? ;)

Andrea Giammarchi said...

if you could communicate binary content in advance, JSON is compatible ... this is what I was saying. The problem is that through current web technologies it does not scale as binary capable protocol but it could do it already and without problems.

Andrea Giammarchi said...

if not that visible, there was a link:
https://developer.mozilla.org/en-US/docs/DOM/The_structured_clone_algorithm