Web Reflection: XML To HTML Snippet

Sunday, November 29, 2009

XML To HTML Snippet

This John Resig post about nodeName demonstrates once again how trustfulness are edge cases in JavaScript.
I must agree 100% with @jdalton: frameworks or selector libraries should not be concerned about these cases.

First of all there is no universal solution so whatever effort able to slow down libraries won't be perfect, then why bother?

Secondly, I cannot even understand why on earth somebody could need to adopt XML nodes in that way.
Agreed that importNode or adoptNode should not be that buggy, but at the same time I have always used XSL(T) to inject XML into HTML and I have never had problems.

Different Worlds

In an XML document a tag is just a tag. It does not matter which name we chose or which JavaScript event we attached, XML is simply a data protocol, or transporter, and nothing else.
A link, a div, an head, the html node itself, does not mean anything different in XML so again: why do we need to import in that way?
In Internet Explorer we have the xml property which is, for HTML represented inside XML docs, the fastest/simplest way to move that node and what it contains inside an element and via innerHTML.
Moreover, namespaces are a problem, we cannot easily represent them into an HTML document so, in any case, we need to be sure about represented data.

A Better Import

Rather than ask every selector library to handle these edge cases we could simply adopt a better import strategy.
This XML to HTML transformer is already a valid alternative, but we can write something a bit better or more suitable for common cases.
As example, a truly common case in these kind of transformations is a CDATA section inside a script node.
With CDATA we can put almost whatever we want inside the node but 99.9% of the time what we need is a JavaScript code, rather than a comment.
Another thing to consider is that if we need to import or adopt an XML node, 99.9% of the time we need to import its full content and not an empty node (otherwise we should ask us why we are representing data like that, no?).
I bet the deep variable will be basically true by default so, here there is my alternative proposal which should be a bit faster, avoiding implicit boolean cast for each attribute or node, and considering what I have said few lines ago:


function XML2HTML(xml){
    // WebReflection Suggestion - MIT Style License
    for(var
        nodeName = xml.nodeName.toUpperCase(),
        html = document.createElement(nodeName),
        attributes = xml.attributes || [],
        i = 0, length = attributes.length,
        tmp;
        i < length; ++i
    )
        html.setAttribute((tmp = attributes[i]).name, tmp.value)
    ;
    for(var
        childNodes = xml.childNodes,
        i = 0, length = childNodes.length;
        i < length; ++i
    ){
        switch((tmp = childNodes[i]).nodeType){
            case 1:
                html.appendChild(XML2HTML(tmp));
                break;
            case 3:
                html.appendChild(document.createTextNode(tmp.nodeValue));
                break;
            case 4:
            case 8:
                // assuming .text works in every browser
                nodeName === "SCRIPT" ?
                    html.text = tmp.nodeValue :
                    html.appendChild(document.createComment(tmp.nodeValue))
                ;
                break;
        }
    };
    return html
};

I have tried to post above snippet into John post as well but for some reason it is still not there (maybe waiting to be approved)
We can test above snippet via this piece of code:


var data = '<section data="123"><script><![CDATA[\nalert(123)\n]]></script><option>value</option></section>';
try{
var xml = new ActiveXObject("Microsoft.XMLDOM");
xml.loadXML(data);
}catch(e){
var xml = new DOMParser().parseFromString(data, "text/xml");
}
var section = xml.getElementsByTagName("section")[0];
onload = function(){
    document.body.appendChild(XML2HTML(section));
    alert(document.body.innerHTML);
};

We can use latest snippet to test the other function as well and as soon as I can I will try to compare solutions to provide some benchmark.

11 comments:

a.in.the.k (@ainthek)01 December, 2009 10:12
missing line: xml.async=false;
ReplyDelete
Replies
Andrea Giammarchi01 December, 2009 10:25
not really, it's not a page, it's a string. Which IE gave you problems?
ReplyDelete
Replies
a.in.the.k (@ainthek)01 December, 2009 21:29
Does not matter on IE, matters on ActiveX used ;-)
Do we need failing examples to code by specs/docs ?
Try loading extra large strings...
If you do not believe me try believing MSDN,
http://msdn.microsoft.com/en-us/library/ms754585(VS.85).aspx

I had to fix this in many imlementations, the last one failng was this project. http://code.google.com/p/xmlhttprequest/issues/detail?id=18&can=1&q=async

P.S. No need to publish this post, feel free to reject, just fix the sample please
ReplyDelete
Replies
Andrea Giammarchi01 December, 2009 22:51
Do we need failing examples to code by specs/docs ?
Hell no, but which part of the linked page says loadXML with a string requires async = false?

I am just asking, AFAIK loadXML is synchronous (and its behavior demonstrates it) and the page you linked could have used simply a copy and paste from other DOMDocument examples.

In few words, same async property specs say:
Specifies whether asynchronous download is permitted
so unless you are not able to prove I am wrong, I don't think I am providing a bad example or it does not respect standards. Thanks in any case for the link.
ReplyDelete
Replies
Andrea Giammarchi01 December, 2009 22:56
P.S. the other link about the bug is not related with loadXML ... or maybe I did not get the problem there ... but XHR is another story, here we are talking about loadXML(stringAlreadyThere) no need to specify sync false, imho.
If so, they should update sync properties specs and mark it as necessary.
Regards
ReplyDelete
Replies
Andrea Giammarchi01 December, 2009 23:00
Moreover, it returns true on success so if asynchronous how can it return such value?
In any case if the string is already there and loadXML acts asynchronously means IE could handle WebWorker since it has to create a separated lower priority parsing over a synchronous content ... let me know if you find a single case where my code fails, cheers.
ReplyDelete
Replies
a.in.the.k (@ainthek)02 December, 2009 09:54
Funny argumentation:

0. "so unless you are not able to prove I am wrong"
CURRENTLY: I AM NOT !

with your PROGID expanded to Microsoft.XMLDOM.1.0 (on my machine ;-)
sample seems to work ok.
Howewer I'm experimenting with 2,3,4,6 version of MSXML,
and "the type of JS code from where the call is made",
and will let you know my results.

Coding with MSXML since msxml2.dll dark ages (back in 1998 ?)
I would BET my dirty keyboard,
that I have experienced async loadXML behaviour, several times
(that's why I keep fixing all codes always to async.false).

1."download",
just bad MSDN wording, load accepts also IStream, that means load(resource) does not always involve networking.
Readers beware, Url is not the only signature of this method ...

2. "Moreover, it returns true",
but both loadXml() as well as load() return true ;-)
boolValue = oXMLDOMDocument.load(xmlSource);
How would you interpret bool retVal on async load(url) ?
http://msdn.microsoft.com/en-us/library/ms762722(VS.85).aspx

4. Just warning, when using more recent MSXML2.DomDocument.4.0
there would be another missing "damned default reset": xml.validateOnParse=false;
Otherwise <root xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'><a xsi:nil='true'></a></root>.
Will fail with: "xsi:nil attribute on element 'a' is invalid."

5. onreadystatechange is fired even in sync mode ;-) as well as readyState, if you consult MSDN, this is
mentioned with async=true only.

6. Maybe it's just my paranoid approach,
I hate defaults,
I do not trust them,
I always tend to be explicit,
I do not want to rewrite code, if I change (upgrade/worngrade) PROGID of component,
(please do not discuss version independent vs. version dependent progid strategy here ;-))

7. Let readers decide, extra line, or "blind trust to unclear MSDN."

RESPECT,
you are still my favorite JS guru ;-)
I'm just trying to prevent "another unperfect sample" on the Web, not to disrespect your code, knowledge or ego...
ReplyDelete
Replies
Andrea Giammarchi02 December, 2009 10:18
Dude, being paranoid is part of our work, is intrinsic into DNA, etc etc ... all I am saying is that this method cannot be asynchronous.

First of all:
Implemented in: MSXML 3.0 and later
I am sure you had bad experience with a version unable to support this method properly ... but

You can use this method to check if the loaded XML document is well-formed
it has to be synchronous to check if the string is well formed and return the value.

With load the value returned means something different:
True if the load succeeded.
The load method can also take any object that supports IStream and the Microsoft® Internet Information Services (IIS) Request object.
load is about the sent content but AFAIK loadXML does NOT support streams as argument so it cannot be logically asynchronous.

All this conversation is to demonstrate you that is not me being ignorant about the missed async = false, it's me being more analytic than a copy and paste example in MSDN.

If loadXML is asynchronous, it's bugged as is because the string is there, there is nothing async to handle, and it cannot be, in this version, a stream.

I do agree about being paranoid, but my snippet won't fail in any IE with DOMDocument support.

Regards
ReplyDelete
Replies
a.in.the.k (@ainthek)02 December, 2009 10:29
0. testcase proving so far loadXml is SYNCHRONOUS. http://ainthek.blogspot.com/2009/12/blog-post.html

1. "so it cannot be logically asynchronous." - can be if someone decides to provide symetric API, nonblocking call and callback possibilities.

2. I said load() supports iStream, considering iStream over memory buffer (not network) what asynchronity would you expect ?

somehow, using "analytic" approach
I decide to quit this academic discussion over muddy designed and muddier documented component. Thanx for listening ;-)
ReplyDelete
Replies
a.in.the.k (@ainthek)10 December, 2009 09:37
loadXML MAY involve network activity, anyway even then You are right, runtime behavior is stil SYNC !

http://ainthek.blogspot.com/2009/12/loadxml-may-involve-network-activities.html
ReplyDelete
Replies
Anonymous19 December, 2009 03:40
Dear Author webreflection.blogspot.com !
It is a pity, that now I can not express - I am late for a meeting. I will return - I will necessarily express the opinion.
ReplyDelete
Replies

Add comment

Note: Only a member of this blog may post a comment.