My JavaScript book is out! Don't miss the opportunity to upgrade your beginner or average dev skills.

Friday, November 10, 2006

PHP and JavaScript encoding comparison table

PHP and JavaScript are used every day for sync or async interactions.

I often read functions or libraries that use JS escape function and I often read forum discussions where someone writes that to solve chars problems You need to use escape function.

That's wrong, as is the usage of urlencode instead of rawurlencode.

What's wrong on PHP urlencode function ?
urlencode works quite perfectly with different charsets but has a big problem expecially with one char, the space char " ".
This char is not escaped but is replaced with a "+" sign.
If You write an url with this sign instead of a space there are no problems but if write a response for JavaScript with this char there are different problems.

As You can see from this comparative table JavaScript escape or encode perfectly the space char but doesn't escape the plus sing "+".
This means that when You'll unsecape an urlencoded string it will convert plus sign correctly.

To solve this simple but common problem, when You need to encode a PHP string, use always the raw version: rawurlencode.

What's wrong on JavaScript escape function ?
It has more and more problems than urlencode PHP function because it is an old function (too much) with the best encoding browsers compatibility but without unicode support.
The first point is that it doesn't escape correctly the plus sign, then if You send an escaped string with a simple addiction "1 + 2 = 3" to PHP page, it will recieve a string like this one "1 2 = 3" because PHP urldecode function converts plus sign into a space.
However, the real big problem with escape is that it doesn't support unicode and it converts "correctly" only ASCII chars.

For example at the end of comparative table You can view that escape converts only in range 0x00 - 0xFF but doesn't convert in UTF-8 compatible way every char in range 0x80 - 0xFF and produces, for example, a string like "%E0" that's not the correct multibyte utf-8 rappresentation of char #224.

If You read under this line You can see that escape cannot produce a compatible output for unicode chars in a cross language compatible way.

I mean that every multibyte char in range 0x0100 - 0xFFFF will be converted into a fake JSON rappresentation: %u0100 - %uFFFF that isn't absolutely a correct encoded char and isn't compatible with any other language or url string specs too.

You can use escape only if You send, and recieve, simple strings like [0-9a-zA-Z] with some extra char ( \w ) but isn't absolutely a portable and correct way to create iso or utf-8 PHP and JavaScript interactions.


kentaromiura said...

there is a typo in the title "encodind" instead of encoding.

Andrea Giammarchi said...

oops ... thank You :E

UNV said...

So use encodeURIComponent() instead of escape().

Anonymous said...

Do you know of a function for encoding in PHP that does the same conversion as Javascript?


Andrea Giammarchi said...

use rawurlencode with PHP and decodeURIComponent with JS

Anonymous said...

uhmmmmmm yeah it does work when displayed, but still the "+" sign appears if I try to display it on a textfield or textarea

devrim said...

saved at least a few hrs - cheers andrea!