Tuesday, August 16, 2011

Last Version Of JSON Hpack

Update

created github repository with (currently) JavaScript, PHP5 and Python versions.

Update after quick chat on twitter with @devongovett who pointed out there is a similar standard called JSONDB I have created a JSONH(Flat) version. It looks slightly faster on mobile so I may opt for this one rather than Array of keys at index 0.
The whole array is flat and it changes from [{a:"A"},{a:"B"}] to [1,"a","A","B"] where the empty collection would be [0] rather than [[]].

Also more details here on how to JSONH Hybrid JS Objects.



A while ago I proposed a homogeneous collections optimizer nick named JSON.hpack.

What I wasn't expecting is that actually different projects and developers adopted this technique to shrink down JSON size.

Basic Advantage Of JSON.hpack

Gzip and deflate work really good with repeated chunks of strings and this is why homogeneous collections have really good compression ratio there.
However, gzip and deflate compression does not come for free!
If we compress everything on server side we can easily test the CPU overload compared with uncompressed data.
Via JSON.hpack we can still serve small or huge amount of dynamic and static data without necessarily use realtime compression.

Basic Notions On Compressors

There is no ideal algorithm yet for compressed data, any of them has pros and cons.
A really good compression ratio may cost a lot and the algorithm is efficient if at least decompression is fast. 7-Zip is one example, it takes more than normal zip to create a single file, but final ratio is usually much better and decompression extremely fast.
An incremental compressor as GIF is is both fast to encode and fast to decode. However, it's average compression ratio is really poor compared with PNG, which again is not fast as GIF to encode, but almost as fast as GIF to decode and capable of bringing much more complex data inside.
On the client side we may like a truly fast compressor in order to send data to the server where more horses power can decompress in a reasonable time. Still servers have not unlimited resources.

My Latest Improvements Over JSON.hpack


On the web is all about network layer latency, completely unpredictable specially in these smartphones/pads days.
We also need to consider high traffic, if things go really well, and most important mobile platforms computation power, basically the equivalent of a Pentium 3 with a GeForce card from 2001.

Which Is The Best Compromise

The original version of JSON.hpack is able to understand which compression level is the best one for the current collection of objects. Unfortunately this is slow both on the server side and even more on the client side.
In my opinion an intermediate layer as JSON.hpack is should bring advantages as fast as possible in both client and server.
I probably failed the first time because I was more focused on coolness rather than efficiency.
As example, if it takes 3X CPU load to save 5% of bytes compared with the most basic compression ratio, something is wrong because it's simply not worth it.
As summary, the best compromise for the latest version of this compressor is to be freaking fast with small overhead and providing a good average compression ratio.

Welcome JSONH

In a single call this object is able to pack and unpack homogeneous collections faster than native JSON and specially on mobile platforms.

How Is It Possible

To be honest I have no idea and I was surprised as well. All I could think about is the fact that JSONH makes data flat which means no recursions per each object in the original list.
This seems to boost up performances while packing and make JSON.parse life easier while unpacking.
The extreme simplification of the algorithm may have helped a lot as well.

JSONH Source Code

now on github!
And I had no time to create the equivalent C#, PHP, and Python version.
In any case you can see how simple is the logic and I bet anybody can easily reproduce those couple of loops in whatever programming language it is.
The minzipped size is 323 bytes but advantages over network calls can be massive. As example, if we check the console and the converted size in the test page, we can see the JSONH version of the same collection is 54% smaller ... and for a faster stringify and parse? ... it cannot be that good, isn't it :)

JSONH Is Suitable For


  • any RESTful API that returns homogenous collections

  • gzip on the fly costs too much due high traffic

  • map applications and routes, [{"latitude":1.23,"longitude":5.67},{"latitude":2.23,"longitude":6.67}]
    will be [["latitude","longitude"],1.23,5.67,2.23,6.67]

  • any other case I am not thinking about right now



As Summary

It is good to take old projects created a while ago and think what could be done better in current days. It's both about re-thinking with different skills and experience over real world cases. I am not sure I made everybody happy with this latest version but I am pretty sure I won't ask client or server side to be slower than native JSON + native gzip compression since at that point all advantages will be simply lost.
This revisited version of JSONH is surprisingly faster, smaller, and easier to implement/maintain than precedent one so ... enjoy if you need it ;)

2 comments:

Anonymous said...

Is there any way to compress this well?
{a: {a:{a:1}}}

Andrea Giammarchi said...

that's not an homogenous collection ... homogenous collections are typical database result sets:

{col1:colValue,col2:colValue,col3:colValue},
{col1:colValue,col2:colValue,col3:colValue},
{col1:colValue,col2:colValue,col3:colValue},
{col1:colValue,col2:colValue,col3:colValue},
...

your data makes almost no sense or at least I have never seen an object "doing" this

o.a.a.a ... I guess that's === WTF :D