Archive for the ‘Semantic Web’ Category

creativecommons.opportunities

Monday, March 19th, 2007

If working for a new project of a startup-like nonprofit in San Francisco involving [open] education, [copyright] law, and [semantic web] technology, perhaps you should look into applying for Executive Director of CC Learn. I could imagine an education, legal, or technology person with some expertise and much passion for the other two working out.

Student programmers, Creative Commons is participating in Google Summer of Code as a mentoring organization.

It is too late to apply for a summer technology or “free culture” internship, but keep CC in mind for next summer and (possibly) this fall.

Update 20070409: There are three open positions in addition to CC Learn ED above:

SXSW: Growth of Microformats

Saturday, March 17th, 2007

Monday afternoon’s packed The Growth and Evolution of Microformats didn’t strike me as terribly different from last year’s Microformats: Evolving the Web. Last year’s highlight was a Flock demo, this year’s was an Operator demo.

My capsule summary of the growth and (not much) evolution of Microformats over the past twelve months: a jillion names, addresses, and events have been marked up with hCard and hCalendar formatting.

SXSW: Semantic Web 2.0 and Scientific Publishing

Saturday, March 10th, 2007

Web 2.0 and Semantic Web: The Impact on Scientific Publishing, probably the densest panel I attended today (and again expertly moderated by Science Commons’ John Wilbanks), covered , new business models for scientific publishers, and how web technologies can help with these and data problems, but kept coming back to how officious Semantic Web technologies and controlled ontologies (which are not the same at all, but are often lumped together) and microformats and tagging (also distinct) complement each other (all four of ‘em!), even within a single application. I agree.

Nearly on point, this comment elsewhere by Denny Vrandecic of the Semantic MediaWiki project:

You are supposed to change the vocabulary in a wiki-way, just as well as the data itself. Need a new relation? Invent it. Figured out it’s wrong? Rename. Want a new category of things? Make it.

Via Danny Ayers, oringal posted to O’Reilly Radar, which doesn’t offer permalinks for comments. This just needs a catchy name. Web 2.0 ontology engineering? Fonktology?

Perils of a too cool name

Wednesday, February 14th, 2007

I’ve seen lots of confusion about microformats, but Jon Udell takes the cake in describing XMP:

It’s a bit of a mish-mash, to say the least. There’s RDF (Resource Description Framework) syntax, Adobe-style metadata syntax, and Microsoft-style metadata syntax. But it works. And when I look at this it strikes me that here, finally, is a microformat that has a shot at reaching critical mass.

How someone as massively clued-in as Jon Udell could be so misled as to describe XMP as a microformat is beyond me.

, which is basically a constrained RDF/XML serialization following super-ugly conventions that may be embedded in a number of file formats (most prominently PDF and JPEG, but potentially almost anything), is about as far from a as one could possibly get. Off the top:

  • XMP is RDF/XML and as such is arbitrarily extensible; each microformat covers a specific use case and goes through great lengths to favor interoperability among publishers of each microformat (sometime I will write about how microformat and RDF people mean completely different things by “interoperability”) at the expense of extensibility.
  • XMP is embedded in a binary file, completely opaque to nearly all users; microformats put a premium on (practically require) colocation of metadata with human-visible HTML.
  • XMP would be extremely painful to write by hand and there are very few tools that support publishing it; microformats, to a fault, put a premium on publisher ease–anyone with a passing familiarity with HTML could be writing microformats.

I don’t agree with everything the microformats folk have done, but they do have a pretty self-consistent approach, if one bothers to try to understand it. XMP ain’t it.

XMP is by far the most promising embedded metadata format for “media” files — which is mostly a testament to how terribly useless to non-existent the alternatives are (by some definitions there are none).

Addendum: I’m really only picking on one word from Udell’s post, the remainder of which is recommended. It is to learn that “There’s also good support in .NET Framework 3.0 for reading and writing XMP metadata.”

Update 20070215: Udell explains:

Now there is, as Mike points out, a big philosophical difference between XMP, which aims for arbitrary extensibility, and fixed-function microformats that target specific things like calendar events. But in practice, from the programmer’s perspective, here’s what I observe.

Hand me an HTML document containing a microformat instance and I will cast about in search of tools to parse it, find a variety of ones that sort of work, and then wrestle with the details.

Hand me an image file containing an XMP fragment and, lo and behold, it’s the same story!

Yes, for 99% of the .01% of the world that cares at all, microformats and XMP are the same: metadata, embedded data, or even just data. That’s what I was hinting at in the title of this post — in the minds of 99% of the .01%, microformats are becoming synonymous with metadata, i.e., genericized. This is either a marketing and naming coup or disaster, depending on one’s perspective (I don’t particularly care).

I considered this headline: If XMP is a microformat, RDFa sure the heck is a microformat.

“Querying Wikipedia like a Database”

Tuesday, January 23rd, 2007

I’ve mentioned several times as having the potential to tremendously increase the value of Wikipedia by unlocking (in the sense of making queryable) all of the data in the encyclopedia.

dbpedia.org has taken a different approach to “Querying Wikipedia like a Database” (their excellent tagline) — extract datasets from Wikipedia, presumably with a manual mapping of relevant categories and data populating infoboxes to triples (described in What have Innsbruck and Leipzig in common? Extracting Semantic from Wiki Content).

I suspect Wikipedia implementation of Semantic MediaWiki would only help dbpedia.org, but the latter is already impressive, requiring no changes at Wikipedia. In addition to making some of the data in Wikipedia queryable they’re exposing non-Wikipedia datasets.

The Semantic Web is so here, now. Doubters repent! ;-) Like I said before:

Once people get hooked on access to a semantic encyclopedia, perhaps they’ll want similar access to the entire web.

Wikipedia and Linking 2.0

Monday, January 22nd, 2007

has reasons for linking to a Wikipedia article about an organization rather than the organization’s site:

[A] lot of institutional sites are pathetic self-serving fluff served up in anodyne marketing-speak with horrible URIs that are apt to vanish.

Linking to the Wikipedia instead is tempting, and I’ve succumbed a lot recently. In fact, that’s what I did for the Canada Line. After all, the train is still under construction and there’s no real reason to expect today’s links to last; on top of which, the Line’s own site is mostly about selling the project to the residents and businesses who (like me) are getting disrupted by it, and the taxpayers who (like me) are paying for it.

Wikipedia entries, on the other hand, are typically in stable locations, have a decent track record for outliving transient events, are pretty good at presenting the essential facts in a clear, no-nonsense way, and tend to be richly linked to relevant information, including whatever the “official” Web site might currently happen to be.

I wrote something similar about a year ago:

I consider a Wikipedia link more usable than a link to an organization home page. An organization article will link directly to an organization home page, if the latter exists. The reverse is almost never true (though doing so is a great idea). An organization article at Wikipedia is more likely to be objective, succinct, and informational than an organizational home page (not to mention there is no chance of encountering Flash, window resizing, or other annoying distractions — less charitably, attempts to control my browser — at Wikipedia). When I hear about something new these days, I nearly always check for a Wikipedia article before looking for an actual website. Finally, I have more confidence that the content of a Wikipedia article will be relevant to the content of my post many years from now.

Why not preferntially link to Wikipedia? Bray feels bad about not linking directly to original content and says Wikipedia could go off the rails, though later provides a reason to not worry about the latter:

I’d be willing to bet that if Wikipedia goes off the rails and some new online reference resource comes along to compete, there’ll be an automated mapping between Wikipedia links and the new thing; so the actual URIs may retain some value.

Indeed; and the first argument explains why linking to Wikipedia is superior to linking to an institution. But what about “original content”? If the content isn’t simply a home page (of an organization, person, or product significant enough to be in Wikipedia), Wikipedia doesn’t help. For example, I linked to Bray’s post “On Linking”; only providing a link to his Wikipedia article would have been unhelpful. The Wikipedia article link in this case is merely supplementary.

So what to do to help with broken and crappy links to items not described in Wikipedia? Bray suggests “multi-ended links”. I think he’s on the right track, but this is not something a web content creator should need to worry about — robust linking need not involve choosing several typed (e.g., official, reference, search) links. The content creator’s CMS and the user’s browser ought to be able to figure this stuff out; the content creator should just use the best link available, as always.

Last year I wrote:

I predict that in the forseeable future your browser will be able to convert a Wikipedia article link into a home page link if that is your preference, aided by Semantic Mediawiki annotations or similar.

In the case of non-Wikipedia links (and those too), combatting linkrot and providing alternate and related (e.g., reference, reply, archival) links is an obvious feature add for social bookmarking services and can be made available to a CMS or browser via the usual web API/feed/scraping mechanisms.

NSFW as liberal content rating

Friday, December 29th, 2006

An observation I’ve wanted to make for awhile, given the right occasion, is that the common practice of nothing that something is is the bottom-up, liberal, mature, and responsible analog of (e.g., MPAA ratings).

NSFW is a friend telling you that viewing a link may not be appropriate in some contexts, but use your judgement. Content rating is a bureacracy telling you that viewing of some content by certain people is prohibited and perhaps enforced legally or .

Of course content rating may be used to aid in making an informed choice and NSFW hints could in theory be enforced, but nevertheless I think each’s common use and source is illustrative of something.

The occasion for mentioning this is someone proposing machine-readable NSFW annotation. I don’t have an opinion of the utility of this yet, but it is fun to see a much improved (technically) proposal come just five hours after the first.

Via Tim Lee.

Most important software project

Sunday, December 10th, 2006

I don’t have a whole lot more to say about Semantic MediaWiki than I said over a year ago. The summary is to turn the universal encyclopedia into the universal database while simultaneously improving the quality of the encyclopedia.

Flip through Denny Vrandecic’s recent presentations on Semantic MediaWiki (a smaller pdf version not directly linked in that post). There’s some technical content, but flip past that and you should still get the idea and be very excited.

I predict that Semantic MediaWiki also will be the killer application for the Semantic Web that so many have been skeptical of.

Yaron Koren also says that Semantic MediaWiki is “the technology that will revolutionize the web” and has built DiscourseDB using the software. DiscourseDB catalogs political opinion pieces. Koren’s post on aggregating analysis using DiscouseDB. Unsurprisingly this analysis shows the political experts making bad calls.

Koren also has created Betocracy, another play money prediction market where users create claims. It looks like Betocracy is going for a blog-like interface, but I can’t say more as registration obtains a database error.

One prediction market and Semantic MediaWiki connection is that making data more accessible makes prediction markets more feasible. Obtaining data necessary to create and judge prediction market contracts is expensive.

On that note Swivel also looks interesting. Some have called it data porn. Speaking of porn, see Strange Maps.

Microformats are worse

Sunday, October 22nd, 2006

I almost entirely agree with Mark Birbeck’s comparison of RDFa and microformats. The only thing to be said in defense of is that a few of the problems Birbeck calls out are also features, from the microformats perspective.

But .

I will reveal what this means later.

Another quip: My problem with microformats is the s.

Evan Prodromou provided a still-good RDFa vs Microformats roundup (better title: “RDFa and Microformats, please work together”) in May. I somehow missed it until now.

Ah, metadata.

Update 20061204: I didn’t miss Prodromou’s roundup in May, I blogged about it. And forgot.

Meta those who can’t

Thursday, October 12th, 2006

I’ve been meaning to write a version of the aphorism “those who can, do; those who can’t, teach” to say something derogatory about metadata. Something like “those who can, code; those who can’t, twitter about standards.” But it doesn’t flow and like the “teach” version, is highly contestable. Plus, I’m projecting.

However, I realized that the general pattern of this aphorism is that those who can’t, do something “meta” relative to those who can, e.g., an extension of “teach” is “… those who can’t teach, administrate.”

Presumably those who can’t administrate, run for school board. And those who can’t write a pointed aphorism, write about aphorisms.

All of the “can’t” statements are metadata about a subject that does something more meta than the things he can’t do.

So I got my metadata out of it after all.