Introduction
PHP has different in-core callbacks able to help us with daily deployment, debug, improvements. At the same time, PHP is loads of intrinsic "gotcha", too often hard to understand, hard to explain, or simply hard to manage. One common problem is about debug, caching, or freezing, and the way we would like to debug, cache, or freeze, variables.For freezing, I mean those procedures able to regenerate a stored variable and its status, in order to reuse that variable, to understand what happened in that moment with that variable, or just to speed up expensive tasks already completed.
The Problem
One of the most common procedures to freeze variables is their serialization, performed in core via a well known serialize function.Please consider this example:
$person = new Employee('Mr. Lucky Me');
// ... do some useful task
myCompanyFreezer($person);
// the myCompanyFreezer function
function myCompanyFreezer(Employee $p){
$company = Company::getInstance();
// note that this company has exclusive
// control over the employee work (reference)
$company->employees[] = &$p;
// on the other hand employee
// has finally a company to work with
// but no control over the company
$p->company = $company;
// update and freeze the employee status
$company->add(serialize($p));
}
So, while company has an exclusive contract, and each employee is totally under company control, the employee has nothing to do with company decisions, but it can proudly say: "Look at me, I work for Company::getInstance()!".
But being serialization recursive, we will find the company instance present as employee "company" property.
The problem is that the company instance has an "employees" property which contain one or more employees, included the employee Mr. Lucky Me.
And so on and on until infinite recursions, a massive waste of resources and ... ALT!, serialize is clever enough to understand when there are recursions and rather than going on with nested serializations it simply put a reference to the serialized object.
Got headache already?
Two Different Kind Of Recursions
Being serialize main purpose to freeze a variable status, and being PHP still a bit hybrid about references and shadow copies, serialize could produce two kinds of pointer: r and R.The lowercase "r" will be a recursion, while the uppercase "R" will be a recursion by reference.
// serialized recursion - the ABC
$o = new stdClass;
// recursion
$o->normal = $o;
// recursion by reference
$o->reference = &$o;
echo serialize($o);
// O:8:"stdClass":2:{s:6:"normal";r:1;s:9:"reference";R:1;}
We should focus into r:1; and R:1;.
While the "r", or the "R", means there is a recursion, the unsigned integer indicates the exact object that "caused" that recursion.
When we perform an unserialize operation, the parser cannot obviously de-serialize as we read, because if we have an instance or an array, internal values should be ready to be assigned already "unserialized".
This simply means that the number after the R is not sequential, and there is no relation with the length of the string, but only a relation with de-serialization process.
What Is WrongWith Serialize
First of all, PHP serialization is not human readable as JSON, as example, or an XML is.If we use this format to debug our application we'll definitively need an extra layer able "to introduce" us the object as is. In few words, what we need is something that is not serialized.
Moreover, serialize and unserialize would like to be as much reliable as possible, and for these reasons these functions are 3 times slower than json_encode or json_decode.
The truth is that JSON, as is, cannot compete with serialize and unserialize, due to protocol simplicity which is unable to store class names, lambdas, or public, protected, and private instances properties.
Last, but not least, JSON PHP parsers are a bit ambiguous, because an array not fully populated is usually converted into an object:
define('MY_WELCOME_STRING', 1);
$a = array();
$a[MY_WELCOME_STRING] = 'Hello World';
echo json_encode($a);
//{"1":"Hello World"}
// in JavaScript would have been
// [null,"Hello World"]
// where square brackets mean Array, and not Object
So again, another serializer is not worth it to freeze variables, what's left for us?
var_export
var_export() gets structured information about the given variable. It is similar to var_dump() with one exception: the returned representation is valid PHP code.
EUREKA! There is a core level function which aims is to serialize PHP into valid PHP, how can we ask something more efficient? I mean: "native performances to serialize and native performances to have back, it must be the solution"!
It's not!
$o = new stdClass;
$o->normal = $o;
echo var_export($o);
Fatal error: Nesting level too deep - recursive dependency?
Nice one! From bogus 39116 and Derick reply:
We can't change this by adding some text when this happens, as that
would not result in valid PHP code in that case (which is the purpose of
this function).
Let me summarize:
- serialize/unserialize understand recursions almost without problems but unserialize is slow and both are PHP dedicated
- json_encode is not compatible with recursion, and as general purpose PHP serializer, it looses too much PHP information
- var_export would be perfect but in PHP we cannot manually represent a recursion that will be valid and correctly parsed
- var_dump is magic but its produced output is not reliable, *RECURSION* won't be recognized as valid PHP value
- I had already headache at line 10 of this post, and now I am still here to see there are no solutions?
How To Remove Recursion Without Loosing It
Well, solutions are different, but performances speaking, we do not have too many chances. A first solution could be a maximum nested level limit, where an object cannot serialize its properties "forever" and after N times it has to stop!This technique has more cons than pros, and reasons are these:
- it could require a manual parser, slower, and due to the problem nature, not that simple to maintain or debug
- it could be extremely redundant, causing a lot of wasted resources, due to its artificial stupidity, since a recursion should never be serialized, being indeed a recursion, and in this way a waste of time, references, and resources
- as mentioned 5 words ago, in this way we are loosing the recursion, so we should stop saying we are serializing ...
$o = new stdClass;
$o->n = $o;
$o->r = &$o;
echo serialize($o), '
',
serialize(
remove_recursion($o)
)
;
Produced output:
O:8:"stdClass":2:{s:1:"n";r:1;s:1:"r";R:1;}
O:8:"stdClass":2:{s:1:"n";s:12:"?recursion_1";s:1:"r";s:12:"?Recursion_1";}
Et voilĂ ! problem solved! ... but what is that?
The remove_recursion function has been introduced in latest Formaldehyde Project Version 1.05, and its purpose is to make debuggable any kind of trace, backtrace, or logged information.
The resulting var_export will be something like this:
stdClass::__set_state(array(
'n' => '' . "\0" . 'recursion_1',
'r' => '' . "\0" . 'Recursion_1',
))
The chosen form to store a recursion is exactly the same used by PHP for lambdas
echo serialize(create_function('',''));
//s:10:"?lambda_1";
In PHP a lambda is stored as "protected" string, and the number at the end of the string "lambda_" indicates its reference. Until we restart our webserver, lambda functions will persist in the entire PHP context, that is why it is possible to serialize lambda functions and unserialize them, as long as the environment does not change, or restart.
The additional difference between "r" and "R" in case of recursion is necessary to avoid info about references.
On the other hand, recursions are truly useless to debug or export variables, but they can always be present.
PHP will not understand my chosen syntax, but only and if necessary, we can always use a function like this to recreate correct recursions:
function recreate_recursion($o){
return unserialize(
preg_replace(
'#s:[0-9]+:"\x00(r|R)ecursion_([0-9]+)";#',
'\1:\2;',
serialize($o)
)
);
}
Pros
- we can finally forget every kind of recursion problem, letting PHP understand them via serialize, without doing anything
- performances and produced size will be better than every other nested based parser, thanks to a simple parser which ... surprise!!! ... it does not use recursion at all!
- once we pass a variable through formaldehyde_remove_recursion we can transform that kind of variable in whatever format, included var_export, JSON and XML, forgetting recursions headaches
Cons
- being based over serialize and unserialize, the transformation could implicitly call, if present, both __sleep and __wakeup events, it's gonna happen in any case if we use serialize/unserialize, but if we serialize a transformed variable __sleep will be called twice
- it could require extra effort to regenerate internal recursions, in any case it is better than loose them forever as most of us have done 'till now
- the convertion is assumining that a serialized string will not contain an exact match, such a manual string. This is actually the same assumption PHP developers did about serialized lambdas.
Conclusion
With a lightweight function, and after this post, I hope we can better understand recursion problems, and relative serializations. My suggestion is to give Formaldehyde a try, but as long as the Mit Style License is respected, you can extract its internal formaldehyde_remove_recursion.Any question? :)
extremely interesting, really i need to cleanup often my objects as i like to inspect them via var_dump(). tnks for sharing this inspired idea!
ReplyDeleteandrea, come sempre scrivi girovita (waist) invece di waste (spreco)...
ReplyDeleteer ... LOL, one was correct though. Any other comment? :D
ReplyDeleteThat has to be the dumbest use of serialize I've ever seen. Serialize wasn't meant to be human readable or to "freeze state", it was meant as a simple way to store PHP values and restore them with unserialize mainly with databases. And as for trying Formaldehyde I'll stick with set_error_handler with a FirePHP wrapper.
ReplyDeleteShawn, I take as a compliment, since I have simply solved an extremely common problem via something native going further normal conventions - I call it strike, usually,so thanks.
ReplyDeleteAbout Formaldehyde, I have contacted the FirePHP author but the point is that FirePHP is a logger, not a debugger, so I am not sure how we will integrate these two projects, but I am still waiting for a reply otherwise I gonna comment your loved FirePHP and everything I did not like about it: the reason I have created Formaldehyde which does not suffer, thanks to my new and fresh idea, recursions (neither redundant code via nested encoding limits)
Regards
Your Formaldehyde isn't a debugger either. It's simply logging errors which is exactly what is possible with wrapping FirePHP's error method within set_error_handler. I don't see anywhere in Formaldehyde where you can set breakpoints or do anything else that constitutes debugging instead of just logging.
ReplyDeleteFirePHP fails with most problematic errors, it is in the source code.
ReplyDeleteFormaldehyde does not fail.
FirePHP requires manual logging implementation, Formaldehyde does not.
I think you are missing the whole point about what is Formaldehyde and what is FirePHP as well, so I can suggest this page hoping this will make things more clear.
I personally asked the FirePHP author to integrate Formaldehyde just because they are different but if you want to go on, that's fine, it's a sort of habit here for people that discover this blog "a bit late"
Regards
P.S. Shawn, you did not get the meaning of this post and Formaldehyde has a dedicated one, please keep talking about this post, if you have questions.
ReplyDeleteI would like to underline that nobody said serialize was created to be human readable, I wrote about human readability because if we want to debug something, which is frozen as a string, whatever you say, serialize is not "confortable" for debug purpose because it requires to be unserialized to be understood.
So, if you read again this post and formaldehyde sections I'll be more than happy to answer your question.
If you are here to say: why you write bullshit, without any code, argument, and talking for third parts, please feel free to leave ASAP this blog and do not come back, thanks for understanding, and see you next trick.
As far as I know serialization in used in $_SESSION storage between requests... Or am I wrong?
ReplyDeleteyes Giorgio, only if no session handler has been defined.
ReplyDeleteBut how does session cope with this post?
Because using sessions is using trasparently serialization (unless another storage is provided), so recursion "issues" are present. For instance, say you memorize in session an entity loaded from an ORM like Doctrine 2 or Zend_Db_Mapper and this entity has a lazy loading collection as a property... Since it has a reference to the entity manager every single object will be serialized and this has to be taken into account with a detached/merge process like Doctrine does.
ReplyDeleteGiorgio ... still, so what? This post is about a technique to serialize without recursion problem and without loosing recursions, when and if necessary in order to be able to transform variables into json, XML, whatever ... still without loosing recursions properties and avoiding useless nested serializations.
ReplyDeleteSo, what is your concern about SESSION which are not affected at all for a debug purpose like debug_backtrace, Exception->getTrace, or other?
As for your serialization comments: It's not supposed to be comfortable to debug because you're not supposed to bloody debug it.
ReplyDeleteCorrect, FirePHP does not automatically log errors that's why you do something like
function logPhpError($code, $error, $file, $line, $context)
{
if(!($code & ini_get('error_reporting'))) return;
$logItem = array(
'data' => $error,
'type' => 'error',
'file' => $file,
'line' => $line
);
fb($logItem, 'error');
}
set_error_handler('logPhpError');
Then in your .htaccess you'd do
php_value auto_prepend_file "/path/to/file/blah.php"
.htaccess, which requires a proper parsing for each page call? prepend file? which means you cannot escape from fb ?
ReplyDeleteI think you are still confused about what is Formaldehyde ... it's for Ajax calls, and it does not suppose to be in every page or goodbye performances.
Your main error is to compare these two files completely different. I send you back again to that page.
serialize has been used to debug since PHP 3, I am not sure why you have to insist with a pointless comment but you can go on, I do not mind, but I'll stop to reply off topic comments.
You're debugging, if you're stupid enough to leave the auto_prepend_file in when you launch to production then you deserve the performance hit.
ReplyDeleteAnd they aren't completely different. The code I provided acts almost identically to Formaldehyde, AJAX call or not save the fact that with Formaldehyde you have to do a console.log() in your javascript and FirePHP is done automatically since it's sent through the HTTP headers.
so you missed the second image, and you are talking withouth even knowing FirePHP which requires a plug-in specific for Firefox indeed ... ah ah ah, you are so funny, please go on!
ReplyDeleteIndeed, it does require a Firefox specific add-on which happens to be widely used, has a large community, works extremely well and only requires that you use PHP to log, has libraries for almost all of the most widely used frameworks and has excellent documentation.
ReplyDeleteAs a moot point I don't know any PHP developer that doesn't use Firefox while at least coding (Though of course we have to switch to IE to fix their bullshit)
Shawn, stop complain here and open your mind: Formaldehyde JS - THe Circle Is Close
ReplyDeleteYou are not stats, you are a single developer and you do not know how many configurations, situations, and problems are related with Ajax debug. I am simply trying to solve my problems making my code public, if you do not like it you do not have to use it and please find a new blog to complain 'cause I am kinda bored here.
Regards
Well im not convinced that i would like to use that solution. I think i know where you are getting with it but seems a bit too complicated to me. I like simple solutions that just work. Having to know hacks about internal implementations of each method or combining them or doing it manually is kind of adding complexity and increasing risk that someone else reading the code will never figure out what was the intention and what are the key aspects of the new method.
ReplyDeleteOn the other hand if that is what you need and it solves your problems then cool, i dont have a problem with that.
Personally i just like to keep stuff simple. If its exeptional errors logging just use serialize as i dont expect 10 errors per second ;- ) otherwise it means site is screwed up any way. If its for storage ... then again ... the less magic calls are involved or hacks the easier to follow the 'default'.
But thanks for the article it was actually interesting and detailed. I liked it :- )
THANKS!
Hello Andrea, it's a smart solution to avoid the recursion problem of other core functions' output.
ReplyDeleteThe first application that comes into my mind (and the one I was searching for before landing here :) is the object storage for caching purposes: has already been implemented some class to cache objects which makes use of this solution?