My JavaScript book is out! Don't miss the opportunity to upgrade your beginner or average dev skills.

Monday, November 23, 2009

On element.dataset And data-* Attribute

Why on earth? I mean why we should put a data-whatever attribute into our layout? Where is the good old MVC? How can you consider data-* more semantic? more semantic than what? Why we would like to kill truly semantic pages and graceful enhancements? Why we need JavaScript redundant info inside nodes attributes but we cannot understand a script tag in the middle of the page? Why in the performances matter era we would like to let users download stuff that they will probably never use?

What Am I Talking About

We can find the "magic" data attribute description in the W3C Semantics, structure, and APIs for HTML documents page. These days there is a page that is going around even too much ... finally somebody realized that this data-thingy is nothing different than what we could have always used since ages: XML.

Why We Are Doing Wrong

If we are enthusiast about a custom attribute able to bring whatever information, we should ask ourself why on earth we are not using simply XML plus XLS to transform nodes with custom attributes. Each node can be easily and quickly transformed, in the client side as well and in a cross browser way, via one or more cached XSLT able to create runtime whatever we need.
We need to sort the music file list via data-length attribute? No problems at all, we can reorder via DOM every node we need and put the XML fragment transformed via XSLT into a single HTML node in that page. Moreover, we can use modular XSL to transform branches or specific cases ... but that is too clean and professional, isn't it?

Data Used Via Selectors

Let's say we have 3 songs with the same length, taking the example from W3 page, OK?

<ol>
<li data-length="2m11s">Beyond The Sea</li>
<li data-length="2m11s">Beside The Sea</li>
<li data-length="2m11s">Be The Sea</li>
</ol>

First of all that example is quite hilarious, to make a sort as fast as possible that kind of representation is not ideal, is it?
It does not matter, the point is that as soon as this stuff will be implemented, all jQuery users will start to think so semantic that everything will become a:

$("@data-whatever=whatevervalue").each(...)

Seriously ... if we want to make the web the worst semantic place ever, just put in medium skilled developers this kind of weapon, and we'll scream in few months HTML5 Failed!
Who will care anymore about the appropriate element when everything can be easily validated in the W3 for totally dirty and useless, for non JS users, layouts?

The Namespace War Has Started Already

I can imagine lots of new devs starting to use the data as if it is their own attribute, ignoring conflicts problem we've always had with the global namespace. Move this problem into the DOM, and the Cactus Jam is ready to eat.
On the other hand how best practice will be a DOM loads of nodes with multiple data attributes?

<div
data-jquery-pluginname-good-stuff="$(this).whatever()"
data-dojo-loadsync="thisNodeFile"
data-prototype-valueof="Object.prototype.valueOf=this"
/>

... I mean ... seriously, is this the HTML5 we are talking about? An empty element there just to make the anti semantic magic happen?

Dozens Of Different Ways

First of all querySelectorAll.
The document method we all were waiting for is finally 90% here and rather than use semantic and logic selectors to retrieve what we need and when we need, we prefer to make the DOM that dirty? Are we truly so monkeys that we cannot spot a list of songs and order them by their length that will be 99.9% of the time part of that song info and accordingly present in the DOM as context?
Where are classes? Where are external resources totally ignored by those users which aim is simply the one to surf quickly, or to crawl info? data-whatever?

Why We Don't Need data-*

First of all, classes. If we have a list of songs the class songlist in the single outer UL or OL node is everything we need to retrieve that list and order by title, duration, everything else present in that list, since in whatever grid we have ever used, we order by columns where we know the column content.
How can a user even decide to order by length if this length is not displayed?
How can he order by something not even showed?
It's like me in a shop asking to order by palm trees a list of sounds systems ... I think they'll look at me like a mad person and they would be right, don't you agree?
So, the semantic part of this attribute does not exist. The same example showed in the W3 draft is ridiculous for the reason I have already said. If the song length info is already in the DOM and properly showed we don't need redundant info for every user, we just need a good sort function for that list of songs and nothing else.

$.fn.sort = function(){
// proof of concept, not even tested
// written directly here in the blogger textarea
// don't try at home but please try to understand the logic
var li = Array.prototype.slice.call(this);
li.sort(function(a, b){
return
$(a).find(".length").text() <
$(b).find(".length").text() ?
1 : -1
});
return $(li).each(function(i, li){
li.parentNode.appendChild(li);
});
};
$(".songs li").sort();

Is above proof of concept that different from a $(".songs @data-length").sort() ? It's even shorter!

Map The DOM If Necessary

If we are struggling that much with this "so much missed data-*" we can still use the class to map common info ... how?

<ol class="songs">
<li class="map-0">Beyond The Sea</li>
<li class="map-0">Beside The Sea</li>
<li class="map-1">Be The Sea</li>
</ol>

If we need to attach properties into a DOM node we can still use the class attribute.
Semantic, classifying the node as a mapped one, not redundant, letting us provide more than once same kind of info for different nodes, and lightweight, avoiding data description provided by the mapped object.
In few words, and always via JavaScript, we can attach a file with this content:

var map=[{
length:"3m21s",
artist:"Nature"
},{
length:"2m59s",
artist:"You"
}];

For each node, when necessary, we could simply retrieve the associated map in this way:

function nodeInfo(node){
return map[
(
/(?:^|\s)map-(\d+)(?:\s|$)/.exec(node.className) ||
[0,-1]
)[1]
];
};

And that's it. Each mapped node will return an object or undefined when nodeInfo function is called. Again, all this stuff is a proof of concept, but can we agree that data-* is just the wrong response to solve a JavaScript problem that should be solved everywhere except into the HTML?

18 comments:

kkll2 said...

Forbidding data-length="2m" will only make people stay with current class="length:2m".

data- isn't RDFa alternative. It's aimed at people who abuse class, title and create custom attributes that completely lack any namespace.

Andrea Giammarchi said...

the namespace is node associate and I people will abuse data-* if the problem is that they are abusing already something else.
My question is why we need to abuse the DOM rather than use alternatives?

A.In.The.K said...

Bit off topic but:
"Each node can be easily and quickly transformed, in the client side as well and in a cross browser way, via one or more cached XSLT".

I'm using XSLT on client havily for last 8 years (MSIE) but CrossBrowser implementation is big trouble. XSLTProcessor in Safari does not support xsd:import and xsd:include, Opera's engine still crashes on
duplicit XSLT expressions {$}.

I do not believe world focuses on client side XSLT anymore.

Andrea Giammarchi said...

It depends what you are using XSL for. In my example I was talking about simple HTML transformations bringing where necessary info creating relations with data but I do agree that XSL requires skills and a bit of experience, as should be for everything in IT ;)

Azat Razetdinov said...

Both approaches are interesting, but do not scale well. What about passing a complex object to the initialization function? There’s another way of delivering data to js, which is heavily used in our company.

<div onclick="return {foo: 'bar', baz: [1, 2, 3]}">...</div>

The data can is accessible in one line of code:

var data = element.onclick();

That’s all! Yes, this is not semantic. But it’s 1) simple, 2) fast and 3) scales well.

Andrea Giammarchi said...

that is a data-* behavior ... still dirty DOM not everyone would like to download. Which part does not scale?

If you can attach an object inside an attribute, you can attach just its index which will point in a list of objects but in latter case, the user wont download JavaScript in the DOM but a part AND we can reuse objects, rather than create a new one for each onclick() or data-whatever.

Azat Razetdinov said...

We come to the broader question of bundling data, views and controllers. Again there are different approaches:

1. Views are delivered as html, data is stored in attributes (data-*, class, onclick, whatever). Controllers reside in static js-files and are used to find views and initialize them using provided data.

2. Data is delivered in dynamic js-files, views are generated on the fly by controllers using provided data and can be reused for new objects.

3. Mixed approach: views are delivered as html, data is brought in a dynamic js-file, controllers bind views with data through indexes, ids, whatever.

All three approaches are suitable for different situation. If we have a simple list of items (like songlist), the 3rd way will suffice.

If we have plenty of objects (we call them blocks) in different parts of the page, and the presence of each block depends on complex factors, then keeping the view and data together becomes reasonable. That way we don’t have to worry about synchronizing data and html generation logic.

Andrea Giammarchi said...

but in any case dat-* will simply move the problem from an attribute, to another one, causing other side effects. The mapped relation is, in my opinion, always suitable. The map-N could have different prefix. Agreed that it could stay in a dedicated attribute but we'll see how "well" this data-* will be used.
Anyway, interesting comments (eventually in this blog), thanks everybody to post here. Regards

Brett said...

It seems the powers-that-be are deliberately trying to throw a wrench into the development of the semantic web. There is a 10+ year bug in FF to support external DTDs, and even someone from NASA commented in favor of this bug! Some people seem to have an axe to grind with XML, and it becomes a political debate rather than both sides seeking to accommodate.

Despite my favoring furthering XML support and approaches, I don't think the web is only for IT people, so I for one do not look down on anyone who just wants to make things work. On the other hand, I do think it is the responsibility of the spec editors to foster good practices by the intrinsic design of the language, and data-* does seem to be a hack which can cause more headaches than the conveniences it offers.

In any case, here I think there are other existing solutions as you mention. But why has Node.setUserData() not been mentioned as one of them? Although its data is not as discoverable as namespaced attributes say by search engines which only look at the text, and without a handler being attached, is not serialized for later reuse, it unobtrusively yet formally and easily attaches user data to an element or even other node.

Andrea Giammarchi said...

Brett, in a page info should be for the *user* not for the crawler.
Remember the technique to put a massive list of keywords in a comment node just to let crawlers parse those keywords?
Awful!
What kind of useful info for the user is stored in such way the user cannot even visualize?
Web for who does not care, who's not IT, so why we moved into Web Standards approach rather than simply go on with Front Page and Dreamweaver?
data-* should never contain info for crawlers, just parameters for some JavaScript related stuff maybe able to bring there a better user experience but it's not about info.
data-* is the wrong answer, but if web standards are based over the assumption the week end developer should be able to be stuff, we can stop to talk about best practices, web standards, semantic, etc etc ... let it work whatever it is, I don't like it!

Brett said...

Maybe you misunderstood, but I actually agree with what you're saying about data-* (I just meant that a language should be easy enough to use flexibly--not that it should allow hacks which encourage bad practices).

If there is namespaced data for the sake of a crawler, I have no problem with that--a crawler ends up getting used by a user eventually, so it's good to be useful for a crawler too, but data-* would not be a good way since there is no concept of namespaces.

What do you say about Node.setUserData() for one-off stuff? Why isn't it being promoted or used for this?

Andrea Giammarchi said...

Brett I have used few points just to let people think about the problem, and the wrong solution we have here. Sorry, it was not strictly directed to you. Cheers

coder-zone said...

This data-* idea is yet another one to help developers to pollute the net with non-semantic content.

JavaScript, HTML and CSS excel in their original purpose. HTML is for semantic content, period.

I'm still fighting against the "javascript protocol" in links. So many wars outstanding, so few warriors in our army...

Nice blog, by the way!

Azat Razetdinov said...

coder-zone, how long object data has not been considered semantic? The purpose of html is to provide data. The point is, some types of data do not need a DOM representation, but are necessary for proper behavior. We are not talking about bringing behavior back to html using javascript protocol or inline event listeners. All we need is a way to provide static pieces of data for the initialization script, which itself is completely separated from html.

Breton Slivka said...

it seems to me that the main legitimate usage for data- attributes is the hCalendar microformat. the uF comminity is divided 3 ways on the issue of how to represent a date in HTML that's both machine readable and human readable. The current solution involves a weird abuse of title attribute to hold a machine readable date.

How would you solve that?

Keep in mind that whatever solution you come up with, 2/3rds of the uF community will probably think you're absolutely wrong. It's rather frustrating to watch.

Andrea Giammarchi said...

I would solve that with a timestamp attribute ;-)

coder-zone said...

@Azat Razetdinov: Not sure I fully understand what you meant with "object data", but, IMHO, HTML must carry information relevant to the end user. If a piece of data is only needed at script initialization, then probably it's not as relevant to the user as we may think in advance, therefore it should not be in DOM nor in HTML.
Maybe I'm complaining too much instead of collaborating with ideas, but thats how I feel now.

@Breton Slivka: No matter how much effort you put on a design, 66% of the community will think it's wrong.

Anonymous said...

In my opinion, you are mistaken.