Life in the kind of bleak future of HTML data

Evan Prodromou wrote in 2006:

I think that if microformats.org and the RDFa effort continue moving forward without coordinating their effort, the future looks kind of bleak.

I blogged about this at the time (and forgot and reblogged five months later). I recalled this upon reading a draft HTML Data Guide announced today, and trying to think of a tl;dr summary to at least microblog.

That’s difficult. The guide is intended to help publishers and consumers of HTML data choose among three syntaxes (all mostly focused on annotating data inline with HTML meant for display) and a variety of vocabularies, with heavy dependencies between the two. Since 2006, people working on microformats and RDFa have done much to address the faults of those specifications — microformats-2 allows for generic (rather than per-format) parsing, and RDFa 1.1 made some changes to make namespaces less needed, less ugly when needed, and usable in HTML5, and specifies a lite subset. In 2009 a third syntax/model, microdata, was launched, and then in 2011 chosen as the syntax for schema.org (which subsequently announced it would also support RDFa 1.1 Lite).

I find the added existence of microdata and schema.org suboptimal (optimal might be something like microformats process for some super widely useful vocabularies, with a relatively simple syntax but permitting generic parsing and distributed extensibility; very much like what Prodromou wanted in 2006), but when is anything optimal? I also wonder how much credit microdata ought get for microformats-2 and RDFa 1.1, due to providing competitive pressure? And schema.org for invigorating metadata-enhanced web-scale search and vocabulary design (for example, the last related thing I was involved in, at the beginning anyway)?

Hope springs eternal for getting these different but overlapping technologies and communities to play well together. I haven’t followed closely in a long time, but I gather that Jeni Tennison is one of the main people working on that, and you should really subscribe to her blog if you care. That leaves us back at the HTML Data Guide, of which Tennison is the editor.

My not-really-a-summary:

  1. Delay making any decisions about HTML data; you probably don’t want it anyway (metadata is usually a cost center), and things will probably be more clear when you’re forced to check back due to…
  2. If someone wants data from you as annotated HTML, or you need data from someone, and this makes business sense, do whatever the other party has already decided on, or better yet implemented (assuming their decision isn’t nonsensical; but if so why are you doing business with them?)
  3. Use a validator to test your data in whatever format. An earlier wiki version of some of the guide materials includes links to validators. In my book, Any23 is cute.

(Yes, CC REL needs updating to reflect some of these developments, RDFa 1.1 at the least. Some license vocabulary work done by SPDX should also be looked at.)

7 Responses

  1. […] look forward to future extracts. Thanks indirectly to Common Crawl for providing the […]

  2. […] The W3C published drafts recently that ought be of great interest to the Creative Commons technology community: a damily of documents regarding provenance and a guide to using microdata, microformats, and RDFa in HTML. I mentioned these on my personal blog here and here. […]

  3. […] Not to mention actually using the Robots Exclusion Protocol, and perish the thought, POWDER, or even annotating individual images with microdata/formats/RDFa. […]

  4. […] of structured data on the web is growing […]

  5. […] increased their dominance, abetting mass spying, and interop among federated social web experiments looks bleak (link on different topic, but […]

  6. […] and like RDFa, but mostly I think most metadata implementation is premature, and as with choosing a metadata format, it is best to just ignore it till there’s an unambiguous and immediate gain to be had from […]

Leave a Reply