My JavaScript book is out! Don't miss the opportunity to upgrade your beginner or average dev skills.

Wednesday, September 09, 2009

double tweet - up to 280 chars tweets!

Update
At least one other person did the same and before this post, here there's the prove.
The algo seems to be almost the same, except the length is probably padded to be module of 2 (\x00 at the end) so you can use double-tweet gadget to decode gareth message as well :)


Via encode template and same bookmark link, I would like to introduce my last simple experiment: WebReflection::double-tweet (just a click to give it a try, another one to remove)

The Concept

Twitter lets us type messages with a maximum of 140 characters. Fortunately, twitter accepts Unicode characters and still fortunately, it is extremely easy to pack two ASCII characters into a single Unicode one. Accordingly, 140 ASCII characters could be packed into 280.

A Fast ASCII pack / unpack

function ASCIIPack(s){
// WebReflection Mit Style License
for(var r = [], i = 0, length = s.length, c; i < length; ++i){
c = s.charCodeAt(i);
r.push(++i < length ? (c << 8) + s.charCodeAt(i) : c);
};
return String.fromCharCode.apply(String, r);
};

function ASCIIUnpack(s){
// WebReflection Mit Style License
for(var r = [], i = 0, length = s.length, c; i < length; ++i){
c = s.charCodeAt(i);
0xff < c ? r.push(c >> 8, c & 0xff) : r.push(c);
};
return String.fromCharCode.apply(String, r);
};
The main reason my code could be faster than others common de/encoder, is the usage of apply over a single String.fromCharCode call. You can try to pack and unpack massive documents but please do not forget I am using this for twitter ;)

Safe Pack - What Is It

I have already tested this technique for a 280 tweet but instantly I realized that except some geek one able to decode the message, nobody could receive, understand, or be involved in that tweet. In few words, I somehow killed twitter beauty, concept, and that is why I have added a safe option.
Basically, targets, specified via @target, keys, specified via #key and urls are preserved by defaults. It is still possible to disable this feature and send a 280 ASCCI characters tweet, but I think it does not make much sense.

About Searches Or Search Engines

Being a tweet search inevitable small, all we need to look for is a combination of the clean word, plus the packed one, trying with and without a space before to be sure that word has not been encoded in a different way. Moreover, if for some reason twitter will implement this search option, something I honestly do not think at all, its internal conversion will be still fast, assuming the search is performed via binary match and that the unpack option is that simple/fast (I know, we are talking about billions or records, that is why I think they'll never do it)

Conclusion

If I am not late with this double-tweet idea, I guess we can add the sixth way to send more than 140 characters via twitter :)

P.S. if you pass a link with some text after the anchor, it will be automatically put in the text area

P.S.

As somebody suggested, with few changes the function could theoretically pack 3 ASCII in a range 0 - 0xFFFFFF (3 bytes per char, UTF-8)
The problem is that JavaScript supports mainly range 0 - 0xFFFF so there is no point to even write down the function ;)

24 comments:

Anonymous said...

This is pretty awesome

Anonymous said...

This is actually very nice. I like it.

Good job.

V1 said...

If you really wanna go psyco you can use it to pack your javascript ;D and unpack it. Big overhead, but do able ;D

Andrea Giammarchi said...

V1 unfortunately the number of bytes will be still the same.
In ASCII, each char is 0xFF butes, via double-tweet each char will be 0xFFFF and we need to serve it via unicode (generally utf-8) and moreover the Hoffman tree, I am talking about gzip or deflate, could find difficult to spot chars redundancy due to pair forced to be a single char.
I am not even sure about performances, but I'll try to pack and unpack ExtJS straight on :D

Andrea Giammarchi said...

.. indeed, IE stuck just pack, no idea how long to unpack though ...

Lars Gunther (itpastorn) said...

About the idea:

Neat, but...

What about us who use characters that are non ASCII? (We do in Sweden...)

About the execution:

How about using 3-byte characters? That could let you expand an US-ASCII only tweet to 420 characters....

Andrea Giammarchi said...

sorry Lars, actually the problem is that JavaScript dos not support over 0xFFFF ...

String.fromCharCode(0xfffffe).charCodeAt(0)
65534

that's why ;)
so I guess I should create a server side version but then goodbye bookmark ...

Lars Gunther (itpastorn) said...

Hmmm. You learn something new everyday! Perhaps this should be reported as a bug. I assume you're testing in the Fox.

Andrea Giammarchi said...

I think it's JavaScript man, not Firefox, even JSON does not encode over 0xFFFF in JavaScript:
'\\u' + ('0000' + a.charCodeAt(0).toString(16)).slice(-4);
It should be somewhere in ECMAScript 3rd Edition, probably ES5 will support it and I have already the function :D

Andrea Giammarchi said...

P.S. Actually, IE spent few milliseconds to unpack, so probably we should give this pack a try with compressors

Andrea Giammarchi said...

ExtJS All 168.09 via gzip, 434ms and 233.61 via gzip over packed ASCII for 9 seconds to inflate it in the browser!!!

Shea Frederick (VinylFox) said...

Seriously Andrea, you have too much free time ;) - nicely done.

Andrea Giammarchi said...

well, the code has been written in 10 minutes over what already present in encode ... not a big deal ;)

while ideas usually come in the tube while I go to work or while I come back ... that is the most productive daily moment ever for me :D

Ricardo Tomasi said...

You're almost a month late :D

http://maxitweet.com/ (check the XX5 option)

Andrea Giammarchi said...

Ricardo, I do not see how maxiTweet is more efficient, also normal or X55 encode targets, keys, and url ... well, I am not that late then :P

.mario said...

Maybe here's some more inspiration...

http://sla.ckers.org/forum/read.php?24,29866

Andrea Giammarchi said...

.mario, I leave the link as long as you guys do not think you invented bit shift. I provided a pack/unpack function which is not there, my concept has "specs" and I leave targets, keys, and url clean.

In few words I did not use anything in that page to create double-tweet so to me that page is kinda unrelated but is still interesting :)

.mario said...

That's why I used the term 'inspiration'. No irony attached.

Andrea Giammarchi said...

I give you another one then, in a color RGB you can store 3 ASCII characters, I have a function able to create PNG as messages with a fake size compression, as this tech is (number of bytes is unchanged in both cases) :)

if table is ASCII standard, there is also the alpha channel, up to 127 ;)

.mario said...

Haha - that is nice indeed!

Andrea Giammarchi said...

.mario, and gareth if he is reading, I honestly did not inspire double-tweet at all about that 3D. These techniques are used since ages, the simple JSON encoder/decoder in PHP uses bytes compression/decompression to transform multibytes in a single Unicode char. What I mean, the fact somebody created a competition about a JavaScript alert without ASCII table does not mean this guy invented compression or it should be mentioned for every idea totally different which simply uses what always had in every programming language.

This is just to make things as clear as possible, I have never had a problem to put right credits when and where necessary.

Best Regards

.mario said...

Hey Andrea - I never said that ;) No need to be offended. And I really like the idea with the image. Am currently playing with same domain images and the JS/DOM canvas methods.

Unknown said...

I think gareth is talking about:
https://twitter.com/garethheyes/status/3362857317
https://twitter.com/garethheyes/status/3362912100

From 1 month ago

Anyway, we all agree that the bookmark is 100% your code, congrats! it works very cool as I already said.

Andrea Giammarchi said...

OK, I may have missed that one and now I know why gareth was pissed off. Well, I'll edit this post linking that tweet but if gareth does not create an entry, a packer/unpacker, a post in his blog, how can we follow "his ideas"?

The algo he used gave me a \x00 at the end though, but still, a tweet could be lost in the middle so there is no reason to blame, specially without links, at least you did.

Post updated, Regards