Post Microformats

Semantic Technology Conference wrap

Saturday, March 11th, 2006

The 2006 Semantic Technology Conference was more interesting than I expected. The crowd was older and much more formally dressed and there was far less emphasis on open source solutions than any conference I’ve attended in a long time but it wasn’t merely a vendor schmoozefest.

James Hendler and Ora Lassila’s Semantic Web @5 keynote claimed that Semantic Web technologies have made great strides over the past five years. They pointed out that middle levels of the Semantic Web layer cake are mature and higher levels are subjects of funded research (in 2001 lower and middle levels were mature and research respectively). Near the end they made a strong call to “share; give it away!” — open source tools, datasets, and harvesters are needed to grow the Semantic Web.

My presentation on Semantic Search on the Public Web with Creative Commons went fairly well apart from some audio problems. I began with a hastily added segue (not in the slides) from the keynote, highlighting Science Commons’ database licenseing FAQ and Uniprot. Questions were all over the map, befitting the topic.

I think Uche Ogbuji’s Microformats: Partial Bridge from XML to the Semantic Web is the first talk I’ve heard on that I’ve heard from a non-cheerleader and was a pretty good introduction to the upsides and downsides of microformats and how can leverage microformats for officious Semantic Web purposes. My opinion is that the value in microformats hype is in encouraging people to take advantage of XHTML semantics in however a conventional in non-rigorous fashion they may. It is a pipe dream to think that most pages containing microformats will include the correct profile references to allow a spec-following crawler to extract much useful data via GRDDL. Add some convention-following heuristics a crawler may get lots of interesting data from microformatted pages. The big search engines are great at tolerating ambiguity and non-conformance, as they must.

Ogbuji’s talk was the ideal lead in to Ben Adida’s Interoperable Metadata for a Bottom-Up Semantic Web which hammered home five principles of metadata interoperability: publisher independence, data reuse, self-containment, schema modularity, and schema evolvability. , , Microformats, GRDDL, and RDF/A were evaluated against the principles. It is no surprise that RDF/A came out looking best — Adida has been chairing the relevant W3C taskforce. I think RDF/A has great promise — it feels like microformats minus annoyances, or microformats with a model — but may say otherwise. The oddest response to the talk came from someone of the opinion that [X]HTML is irrelevant — everything should be custom XML rendered with custom XSLT when necessary.

I was somewhat surprised by the strong focus of most talks and vendors on RDF and friends rather than any other “semantic technologies.” was one exception. He apparently claimed last year that by this year would be growing primarily through machine learning rather than input by knowledge engineers. A questioner called Lenat on this prediction. Lenat claimed the prediction came true but did not offer any quantatative measure. It looked like from the slides (unavailable) that Cyc can have databases and similar described to it and may access same (e.g., via JDBC), giving it access to an arbitrary number of “facts.”

If there was a theme that flowed through the conference it was first integrating heterogenous data sources (I don’t recall who, but someone characterized semantic technologies as liberating enterprises from vendors) and second multiplying the value of that data through linking and inference.

Mills Davis’ closing keynote blew up these themes, claiming outrageous productivity improvements are coming very shortly due to semantic technologies, including a slide. The conference hotel fire alarm went off during the keynote, serving as a hype alert to any willing to hear.

SemTech06 reinforces my confidence in what I said in the SemWeb, AI, Java: The Ontological Parallels mini-rant given at SXSW last year. Too bad they rejected my proposal for this year:

Semantic Web vs. Tags Deathmatch: Tags are hot, but are they a dead end? The Semantic Web is still a research project, but will it awaken in August, 2009? People in the trenches fight over the benefits and limits of tags, the viability of officious Semantic Web technologies, what the eventual semantic web will look like, and how to bridge the gap between the two.

I’m off to SXSW tomorrow anyway. My schedule.

Search 2006

Saturday, January 14th, 2006

I’m not going to make new predictions for search this year — it’s already underway, and my predictions for 2005 mostly did not come true. I predict that most of them will, in the fullness of time:

Metadata-enhanced search. Yahoo! and Google opened Creative Commons windows on their web indices. Interest in semantic markup (e.g., microformats) increased greatly, but search that really takes advantage of this is a future item. (NB I consider the services enabled by more akin to browse than search and as far as I know they don’t allow combinging tag and keyword queries.)

Proliferation of niche web scale search engines. Other than a few blog search services, which are very important, I don’t know of anything that could be called “web scale” — and I don’t know if blog search could really be called niche. One place to watch is public search engines using Nutch. Mozdex is attempting to scale up, but I don’t know that they really have a niche, unless “using open source software” is one. Another place is Wikipedia’s list of internet search engines.

On the other hand, weblications (as Web 2.0) did take off.

I said lots of desktop search innovation was a near certainty, but if so, it wasn’t very visible. I predicted slow progress on making multimedia work with the web, and I guess there was very slow progress. If there was forward progress on usable security it was slow indeed. Open source did slog toward world domination (e.g., Firefox is the exciting platform for web development, but barely made a dent in Internet Explorer’s market share) with Apple’s success perhaps being a speed bump. Most things did get cheaper and more efficient, with the visible focus of the semiconductor industry swinging strongly in that direction (they knew about it before 2005).

Last year I riffed on John Battelle’s predictions. He has a new round for 2006, one of which was worth noting at Creative Commons.

Speaking of predictions, of course Google began using prediction markets internally. Yahoo!s Tech Buzz Game has some markets relevant to search but I don’t know how to interpret the game’s prices.

Going overboard with Wikipedia tags

Thursday, January 12th, 2006

A frequent correspondent recently complained that my linking to articles about organizations rather than the home pages of organizations is detrimental to the of this site, probably spurred by my linking to a stub article about Webjay.

I do so for roughly two reasons. First, I consider a Wikipedia link more usable than a link to an organization home page. An organization article will link directly to an organization home page, if the latter exists. The reverse is almost never true (though doing so is a great idea). An organization article at Wikipedia is more likely to be objective, succinct, and informational than an organizational home page (not to mention there is no chance of encountering , window resizing, or other annoying distractions — less charitably, attempts to control my browser — at Wikipedia). When I hear about something new these days, I nearly always check for a Wikipedia article before looking for an actual website. Finally, I have more confidence that the content of a Wikipedia article will be relevant to the content of my post many years from now.

(link to webjay.org) is actually a good example of these usability issues. Perhaps I have an unusually strong preference for words, but I think its still very brief Wikipedia article should allow one to understand exactly what Webjay is in under a minute.1 If I were visiting the Webjay site for the first time, I’d need to click around awhile to figure the service out — and Webjay’s interface is very to the point, unlike many other sites. Years from now I’d expect webjay.org to be a yet another site — or since the Yahoo! acquisition, to redirect to some Yahoo! property — or the property of whatever entities own Yahoo! in the future. (Smart browser integration with the ‘s Wayback Machine could mitigate this problem.)

Anyway, I predict that in the forseeable future your browser will be able to convert a Wikipedia article link into a home page link if that is your preference, aided by Semantic Mediawiki annotations or similar.

The second reason I link to Wikipedia preferentially2 is that Wikipedia article URLs conveniently serve as “, as specified by the . If Technorati and its competitors happen to index this blog this month, it will show up in their tag-based searches, the names of the various Wikipedia articles I’ve linked to serving to name tags. I’ve never been enthusiastic about the overall utility of author applied tags, but I figure linking to Wikipedia is not as bad as linking to a tagreggator.

Also, Wikipedia serves as a tag disambiguator. Some tagging service is going to use Wikipedia data to disambiguate, cluster, merge, and otherwise enhance tags. I think this is pretty low hanging fruit — I’d work on it if I had concentration to spare.

Update: Chris Masse responds (see bottom of page). Approximate answer to his question: 14,000 links to www.tradesports.com, 17 links to en.wikipedia.org/wiki/Tradesports (guess where from). I’ll give Masse convention.

In the same post Masse claims that his own “following of Jakob Nielsen’s guidelines is responsible for the very high intergalactic popularity of my Internet presence.” How very humble of Masse to attribute the modest success of his site to mere guideline following rather than his own content and personality. Unfortunately I think there’s a missing counterfactual.

1 I would think that, having written most of the current Webjay article.

2 Actually my first link preference is for my past posts to this blog. I figure that if someone is bothering to read my ramblings, they may be interested in my past related ramblings — and I can use the memory aid.