Post Semantic Web

Most important software project

Sunday, December 10th, 2006

I don’t have a whole lot more to say about Semantic MediaWiki than I said over a year ago. The summary is to turn the universal encyclopedia into the universal database while simultaneously improving the quality of the encyclopedia.

Flip through Denny Vrandecic’s recent presentations on Semantic MediaWiki (a smaller pdf version not directly linked in that post). There’s some technical content, but flip past that and you should still get the idea and be very excited.

I predict that Semantic MediaWiki also will be the killer application for the Semantic Web that so many have been skeptical of.

Yaron Koren also says that Semantic MediaWiki is “the technology that will revolutionize the web” and has built DiscourseDB using the software. DiscourseDB catalogs political opinion pieces. Koren’s post on aggregating analysis using DiscouseDB. Unsurprisingly this analysis shows the political experts making bad calls.

Koren also has created Betocracy, another play money prediction market where users create claims. It looks like Betocracy is going for a blog-like interface, but I can’t say more as registration obtains a database error.

One prediction market and Semantic MediaWiki connection is that making data more accessible makes prediction markets more feasible. Obtaining data necessary to create and judge prediction market contracts is expensive.

On that note Swivel also looks interesting. Some have called it data porn. Speaking of porn, see Strange Maps.

Microformats are worse

Sunday, October 22nd, 2006

I almost entirely agree with Mark Birbeck’s comparison of RDFa and microformats. The only thing to be said in defense of is that a few of the problems Birbeck calls out are also features, from the microformats perspective.

But .

I will reveal what this means later.

Another quip: My problem with microformats is the s.

Evan Prodromou provided a still-good RDFa vs Microformats roundup (better title: “RDFa and Microformats, please work together”) in May. I somehow missed it until now.

Ah, metadata.

Update 20061204: I didn’t miss Prodromou’s roundup in May, I blogged about it. And forgot.

Meta those who can’t

Thursday, October 12th, 2006

I’ve been meaning to write a version of the aphorism “those who can, do; those who can’t, teach” to say something derogatory about metadata. Something like “those who can, code; those who can’t, twitter about standards.” But it doesn’t flow and like the “teach” version, is highly contestable. Plus, I’m projecting.

However, I realized that the general pattern of this aphorism is that those who can’t, do something “meta” relative to those who can, e.g., an extension of “teach” is “… those who can’t teach, administrate.”

Presumably those who can’t administrate, run for school board. And those who can’t write a pointed aphorism, write about aphorisms.

All of the “can’t” statements are metadata about a subject that does something more meta than the things he can’t do.

So I got my metadata out of it after all.

Google whenever

Sunday, September 3rd, 2006

For years I’ve heard speculation that Google is buiding a web archive. Now there are domain name purchases to fuel the speculation. The Internet Archive has been providing an invaluable service with the and has set up mirrors in multiple jurisdictions, but recording the web is too important to rely on any single organization, no matter how good or robust. So I hope Google and others are maintaining web archives and will make them available to the public.

Via Tim Finin, who also notes an interesting paper about using article and user history to assign trust levels to Wikipedia article fragments and a Semantic Web archive.

Archives are important for establishing provenance in many situations, though one I’m particularly interested in is citing that a particular work was offered under a Creative Commons license at a particular time. This and other uses (e.g., citation in general, which is often of the form “http://example.com accessed 2005-03-10”, though who knows if a copy of the content as it existed on that date exists) would be enhanced if on-demand archiving were available. The Internet Archive does offer Archive-It.org, but this service is for institutional use and uses periodic crawls rather than immediate archiving of individual pages.

Update, 2 minutes later: I should read a bit more before posting: does exactly what I want. However, I hate that it uses opaque identifiers, and as such is nearly as evil as TinyURL.

Wordcamp and wiki mania

Monday, August 7th, 2006

In lieu of attending maybe the hottest conference ever I did a bit of wiki twiddling this weekend. I submitted a tiny patch (well that was almost two weeks ago — time flies), upgraded a private MediaWiki installation from 1.2.4 to 1.6.8 and a public installation from 1.5.6 to 1.6.8 and worked on a small private extension, adding to some documentation before running into a problem.

1.2.4->1.6.8 was tedious (basically four successive major version upgrades) but trouble-free, as that installation has almost no customization. The 1.5.6->1.6.8 upgrade, although only a single upgrade, took a little fiddling make a custom skin and permissions account for small changes in MediaWiki code (example). I’m not complaining — clean upgrades are hard and the MediaWiki developers have done a great job of making them relatively painless.

Saturday I attended part of , a one day unconference for WordPress users. Up until the day before the tentative schedule looked pretty interesting but it seems lots of lusers signed up so the final schedule didn’t have much meat for developers. Matt Mullenweg’s “State of the Word” and Q&A hit on clean upgrade of highly customized sites from several angles. Some ideas include better and better documented plugin and skin APIs with more metadata and less coupling (e.g., widgets should help many common cases that previously required throwing junk in templates).

Beyond the purely practical, ease of customization and upgrade is important for openness.

Now listening to the Wikimania Wikipedia and the Semantic Web panel…

Free software needs P2P

Friday, July 28th, 2006

Luis Villa on my constitutionally open services post:

It needs a catchier name, but his thinking is dead on- we almost definitely need a server/service-oriented list of freedoms which complement and extend the traditional FSF Four Freedoms and help us think more clearly about what services are and aren’t good to use.

I wasn’t attempting to invent a name, but Villa is right about my aim — I decided to not mention the four freedoms because I felt my thinking too muddled to dignified with such a mention.

Kragen Sitaker doesn’t bother with catchy names in his just posted draft essay The equivalent of free software for online services. I highly recommend reading the entire essay, which is as incisive as it is historically informed, but I’ve pulled out the problem:

So far, all this echoes the “open standards” and “open formats” discussion from the days when we had to take proprietary software for granted. In those days, we spent enormous amounts of effort trying to make sure our software kept our data in well-documented formats that were supported by other programs, and choosing proprietary software that conformed to well-documented interfaces (POSIX, SQL, SMTP, whatever) rather than the proprietary software that worked best for our purposes.

Ultimately, it was a losing game, because of the inherent conflict of interest between software author and software user.

And the solution:

I think there is only one solution: build these services as decentralized free-software peer-to-peer applications, pieces of which run on the computers of each user. As long as there’s a single point of failure in the system somewhere outside your control, its owner is in a position to deny service to you; such systems are not trustworthy in the way that free software is.

This is what has excited about decentralized systems long before P2P filesharing.

Luis Villa also briefly mentioned P2P in relation to the services platforms of Amazon, eBay, Google, Microsoft and Yahoo!:

What is free software’s answer to that? Obviously the ’spend billions on centralized servers’ approach won’t work for us; we likely need something P2P and/or semantic-web based.

Wes Felter commented on the control of pointers to data:

I care not just about my data, but the names (URLs) by which my data is known. The only URLs that I control are those that live under a domain name that I control (for some loose value of control as defined by ICANN).

I hesitated to include this point because I hesitate to recommend that most people host services under a domain name they control. What is the half-life of http://blog.john.smith.name vs. http://johnsmith.blogspot.com or js@john.smith.name vs. johnsmith@gmail.com? Wouldn’t it suck to be John Smith if everything in his life pointed at john.smith.name and the domain was hijacked? I think Wes and I discussed exactly this outside CodeCon earlier this year. Certainly it is preferable for a service to allow hosting under one’s own domain (as Blogger and several others do), but I wish I felt a little more certain of the long-term survivability of my own [domain] names.

This post could be titled “freedom needs P2P” but for the heck of it I wanted to mirror “free culture needs free software.”

Long tail of metadata

Monday, May 29th, 2006

Ben Adida notes that people are writing about RDFa, which is great, and envisioning conflict with microformats, which is not. As Ben says:

Microformats are useful for expressing a few, common, well-defined vocabularies. RDFa is useful for letting publishers mix and match any vocabularies they choose. Both are useful.

In other words RDFa is a technology.

Evan Prodromou thinks the future is bleak without cooperation. I like his proposed way forward (strikeout added for obvious reasons):

  1. RDFa gets acknowledged and embraced by microformats.org as the future of semantic-data-in-XHTML
  2. The RDFa group makes an effort to encompass existing microformats with a minimum of changes
  3. microformats.org leaders join in on the RDFa authorship process
  4. microformats.org becomes a focus for developing real-world RDFa vocabularies

I see little chance of points one and three occuring. However, I don’t see this as a particularly bad thing. Point three will occur, almost by default: the simplest and most widely deployed microformats (e.g., , and rellicense) are also valid RDFa — the predicate (e.g., tag, nofollow, license) appearing in the default namespace to a RDFa application. More complex microformats may be handled by hGRDDL, which is no big deal as a microformat-aware application needs to parse each microformat it cares about anyway. From an RDF perspective any well-crafted metadata is a plus (and the microformats group do very careful work) as RDF’s killer app is integrating heterogenous data sources.

From a microformats perspecitve RDFa might well be ignored. While transformation of any microformat to RDF is relatively straightforward, transformation of RDF (which is a model, not a format) to microformats is nonsensical (well, I suppose the endpoint of such a transformation could be , though I’m not sure what the point would be). Microformats, probably wisely, is not reinventing RDF (as many do, usually badly).

So why would RDFa be of interest to developers? In a word, laziness. There is no process to follow for developing an RDF vocabulary (ironic), you can freely reuse existing vocabularies and tools, not write your own parsers, and trust that really smart people are figuring out the hard stuff for you (I believe the formal background of the Semantic Web is a long-term win). Or you might just want to, as Ben says “express metadata about other documents (embedded images)” which is trivial for RDF as images have URIs.

Addendum 20060601: The “simplest” microformats mentioned above have a name: elemental microformats.

RDFa.info

Wednesday, May 24th, 2006

I’ve mentioned a couple times in passing.

Ben Adida has been doing an awesome job leading the standards effort the last year and a half, which will pay off handsomely over the next six months. A few days ago he launched RDFa.info, the place to watch for interoperable web metadata tools, examples, and news.

Wikiforms

Thursday, May 11th, 2006

Brad Templeton writes about overly structured forms, one of my top UI peeves. The inability to copy and paste an IP address into a form with four separate fields has annoyed me, oh, probably hundreds of times. Date widgets annoy me slightly less. Listen to Brad when designing your next form, on the web or off.

The opposite of overly structured forms would be a freeform editing widget populated with unconstrained fields blank or filled with example data, or even a completely empty editing widget with suggested structure documented next to the widget — a wiki editing form. This isn’t as strange as it seems — many forms are distributed as word processor or plain text documents that recipients are expected to fill in by editing directly and return.

I don’t think “wikiforms” are appropriate for many cases where structured forms are used, but it’s useful to think of opposites and I imagine their (and hybrids — think a “rich” wiki editor with autocompletion — I haven’t really, but I imagine this is deja vu for anyone who has used mainframe-style data entry applications) niche could increase.

Ironically the currently number one use of the term wiki forms denotes adding structured forms to wikis!

On a marginally related note the Semantic MediaWiki appears to be making good progress.

Bitzi as Tagging 1.0 Metacrap

Sunday, March 12th, 2006

On the Tagging 2.0 panel just cited as (more or less) a non-successful predecessor to Tagging 2.0 applications, saying something like “things like Bitzi (mumble) Cory Doctorow called .”

Vander Wal recently explained in a comment at Joho the Blog:

The big thing that was different, from say Bitzi, was people tagging information in their own vocabulary for their own reuse. Tagging information for others as a priority seems to make it far less accurate as a person may not understand the terms they are using (well understand them as other may).

He’s right. There’s too little private benefit to “tagging” at Bitzi, largely because what interfaces to what you have individually contributed are lame to the extent they exist. The Bitzi use case is rather different from and but it can learn a lot from them.