Post Semantic Web

Peer producing think tank transparency

Wednesday, October 31st, 2007

Hack, Mash & Peer: Crowdsourcing Government Transparency from the looks like a reasonable exhortation for the U.S. jurisdiction government to publish data in so that government activities may be more easily scrutinized. The paper’s first paragraph:

The federal government makes an overwhelming amount of data publicly available each year. Laws ranging from the Administrative Procedure Act to the Paperwork Reduction Act require these disclosures in the name of transparency and accountability. However, the data are often only nominally publicly available. First, this is the case because it is not available online or even in electronic format. Second, the data that can be found online is often not available in an easily accessible or searchable format. If government information was made public online and in standard open formats, the online masses could be leveraged to help ensure the transparency and accountability that is the reason for making information public in the first place.

That’s great. But if peer produced (a more general and less inflammatory term than crowdsourced; I recommend it) scrutiny of government is great, why not of think tanks? Let’s rewrite that paragraph:

Think tanks produce an overwhelming number of analyses and policy recommendations each year. It is in the interest of the public and the think thanks that these recommendations be of high quality. However, the the data and methodology used to produce these positions are often not publicly available. First, this is the case because the data is not available online or even in electronic format. Second, the analysis that can be found online is often not available in an easily accessible or searchable format. Third, nearly everything published by think tanks is copyrighted. If think tank data and analysis was made public online in standard open formats and under open licenses, the online masses could be leveraged to help ensure the quality and public benefit of the policy recommendations that are the think tanks’ reason for existing in the first place.

Think tanks should lead by example, and improve their product to boot. Note the third point above: unlike , the output of think tanks (and everyone else) is restricted by copyright. So think tanks need to take an to ensure openness.

(Actually think tanks only need to lead in their domain of political economy — by following the trails blazed by the movement in scientific publishing.)

This is only the beginning of leading by example for think tanks. When has a pro-market think tank ever subjected its policy recommendations to market evaluation?

Via Reason.

Creative Commons accounting

Monday, October 8th, 2007

As usual, I don’t speak for Creative Commons nor any other organization here, not even remotely. Follow the links if you want officialdom.

Support CC - 2007 CC recently launched its fall campaign fundraising campaign (that time of year again) and site revamp. Boing Boing picked it up and noted one of the best bits:

Creative Commons has launched a site redesign to go with its fall fundraising campaign, featuring a new emphasis on the work being done by CC teams globally, backed by sweet open source code, including OpenLayers mapping and Semantic MediaWiki. For bloggers there are new map-themed “Support CC” buttons to help spread the word.

Yes, CC is finally using what I called the most important software project (to a much greater extent on the intranet; and not mentioned, other semantic technologies I’ve blogged about here), ironically now that I’m no longer CTO. Nathan Yergler has been doing a great job in that role, and 2007 will probably as much visible progress on CC technology fronts as the previous four years.

The CC Salon in San Francisco this Wednesday evening should be excellent, featuring researcher Giorgos Cheliotis (on counting CC licensed works–actually more than that, but the description works with this post’s title) and some very positive announcements, while Jon Phillips will be giving a brief talk on CC at the EFF Bootcamp during the day in Mountain View.

It took a long time to hire a new General Counsel, in no small part because Mia Garlick, the previous GC, set the bar very high.

And CC is hiring an accountant, full time in San Francisco.

Ridiculous simplicity

Monday, May 21st, 2007

is so ridiculous I’m not surprised it took so long for someone to invent it. But it is a thing of sublime beauty. Reminds me of some of the projects at last weekend’s .

pageoftext.com, which hosts wikiclock, is only ridiculous in its simplicity. Why didn’t I think of that?

Both projects via Evan Prodromou reporting on RoCoCo. I’m sad that I couldn’t make it to Montreal but glad to hear it’s coming to the SF Bay Area next year.

creativecommons.opportunities

Monday, March 19th, 2007

If working for a new project of a startup-like nonprofit in San Francisco involving [open] education, [copyright] law, and [semantic web] technology, perhaps you should look into applying for Executive Director of CC Learn. I could imagine an education, legal, or technology person with some expertise and much passion for the other two working out.

Student programmers, Creative Commons is participating in Google Summer of Code as a mentoring organization.

It is too late to apply for a summer technology or “free culture” internship, but keep CC in mind for next summer and (possibly) this fall.

Update 20070409: There are three open positions in addition to CC Learn ED above:

SXSW: Growth of Microformats

Saturday, March 17th, 2007

Monday afternoon’s packed The Growth and Evolution of Microformats didn’t strike me as terribly different from last year’s Microformats: Evolving the Web. Last year’s highlight was a Flock demo, this year’s was an Operator demo.

My capsule summary of the growth and (not much) evolution of Microformats over the past twelve months: a jillion names, addresses, and events have been marked up with hCard and hCalendar formatting.

SXSW: Semantic Web 2.0 and Scientific Publishing

Saturday, March 10th, 2007

Web 2.0 and Semantic Web: The Impact on Scientific Publishing, probably the densest panel I attended today (and again expertly moderated by Science Commons’ John Wilbanks), covered , new business models for scientific publishers, and how web technologies can help with these and data problems, but kept coming back to how officious Semantic Web technologies and controlled ontologies (which are not the same at all, but are often lumped together) and microformats and tagging (also distinct) complement each other (all four of ’em!), even within a single application. I agree.

Nearly on point, this comment elsewhere by Denny Vrandecic of the Semantic MediaWiki project:

You are supposed to change the vocabulary in a wiki-way, just as well as the data itself. Need a new relation? Invent it. Figured out it’s wrong? Rename. Want a new category of things? Make it.

Via Danny Ayers, oringal posted to O’Reilly Radar, which doesn’t offer permalinks for comments. This just needs a catchy name. Web 2.0 ontology engineering? Fonktology?

Perils of a too cool name

Wednesday, February 14th, 2007

I’ve seen lots of confusion about microformats, but Jon Udell takes the cake in describing XMP:

It’s a bit of a mish-mash, to say the least. There’s RDF (Resource Description Framework) syntax, Adobe-style metadata syntax, and Microsoft-style metadata syntax. But it works. And when I look at this it strikes me that here, finally, is a microformat that has a shot at reaching critical mass.

How someone as massively clued-in as Jon Udell could be so misled as to describe XMP as a microformat is beyond me.

, which is basically a constrained RDF/XML serialization following super-ugly conventions that may be embedded in a number of file formats (most prominently PDF and JPEG, but potentially almost anything), is about as far from a as one could possibly get. Off the top:

  • XMP is RDF/XML and as such is arbitrarily extensible; each microformat covers a specific use case and goes through great lengths to favor interoperability among publishers of each microformat (sometime I will write about how microformat and RDF people mean completely different things by “interoperability”) at the expense of extensibility.
  • XMP is embedded in a binary file, completely opaque to nearly all users; microformats put a premium on (practically require) colocation of metadata with human-visible HTML.
  • XMP would be extremely painful to write by hand and there are very few tools that support publishing it; microformats, to a fault, put a premium on publisher ease–anyone with a passing familiarity with HTML could be writing microformats.

I don’t agree with everything the microformats folk have done, but they do have a pretty self-consistent approach, if one bothers to try to understand it. XMP ain’t it.

XMP is by far the most promising embedded metadata format for “media” files — which is mostly a testament to how terribly useless to non-existent the alternatives are (by some definitions there are none).

Addendum: I’m really only picking on one word from Udell’s post, the remainder of which is recommended. It is to learn that “There’s also good support in .NET Framework 3.0 for reading and writing XMP metadata.”

Update 20070215: Udell explains:

Now there is, as Mike points out, a big philosophical difference between XMP, which aims for arbitrary extensibility, and fixed-function microformats that target specific things like calendar events. But in practice, from the programmer’s perspective, here’s what I observe.

Hand me an HTML document containing a microformat instance and I will cast about in search of tools to parse it, find a variety of ones that sort of work, and then wrestle with the details.

Hand me an image file containing an XMP fragment and, lo and behold, it’s the same story!

Yes, for 99% of the .01% of the world that cares at all, microformats and XMP are the same: metadata, embedded data, or even just data. That’s what I was hinting at in the title of this post — in the minds of 99% of the .01%, microformats are becoming synonymous with metadata, i.e., genericized. This is either a marketing and naming coup or disaster, depending on one’s perspective (I don’t particularly care).

I considered this headline: If XMP is a microformat, RDFa sure the heck is a microformat.

“Querying Wikipedia like a Database”

Tuesday, January 23rd, 2007

I’ve mentioned several times as having the potential to tremendously increase the value of Wikipedia by unlocking (in the sense of making queryable) all of the data in the encyclopedia.

dbpedia.org has taken a different approach to “Querying Wikipedia like a Database” (their excellent tagline) — extract datasets from Wikipedia, presumably with a manual mapping of relevant categories and data populating infoboxes to triples (described in What have Innsbruck and Leipzig in common? Extracting Semantic from Wiki Content).

I suspect Wikipedia implementation of Semantic MediaWiki would only help dbpedia.org, but the latter is already impressive, requiring no changes at Wikipedia. In addition to making some of the data in Wikipedia queryable they’re exposing non-Wikipedia datasets.

The Semantic Web is so here, now. Doubters repent! ;-) Like I said before:

Once people get hooked on access to a semantic encyclopedia, perhaps they’ll want similar access to the entire web.

Wikipedia and Linking 2.0

Monday, January 22nd, 2007

has reasons for linking to a Wikipedia article about an organization rather than the organization’s site:

[A] lot of institutional sites are pathetic self-serving fluff served up in anodyne marketing-speak with horrible URIs that are apt to vanish.

Linking to the Wikipedia instead is tempting, and I’ve succumbed a lot recently. In fact, that’s what I did for the Canada Line. After all, the train is still under construction and there’s no real reason to expect today’s links to last; on top of which, the Line’s own site is mostly about selling the project to the residents and businesses who (like me) are getting disrupted by it, and the taxpayers who (like me) are paying for it.

Wikipedia entries, on the other hand, are typically in stable locations, have a decent track record for outliving transient events, are pretty good at presenting the essential facts in a clear, no-nonsense way, and tend to be richly linked to relevant information, including whatever the “official” Web site might currently happen to be.

I wrote something similar about a year ago:

I consider a Wikipedia link more usable than a link to an organization home page. An organization article will link directly to an organization home page, if the latter exists. The reverse is almost never true (though doing so is a great idea). An organization article at Wikipedia is more likely to be objective, succinct, and informational than an organizational home page (not to mention there is no chance of encountering Flash, window resizing, or other annoying distractions — less charitably, attempts to control my browser — at Wikipedia). When I hear about something new these days, I nearly always check for a Wikipedia article before looking for an actual website. Finally, I have more confidence that the content of a Wikipedia article will be relevant to the content of my post many years from now.

Why not preferntially link to Wikipedia? Bray feels bad about not linking directly to original content and says Wikipedia could go off the rails, though later provides a reason to not worry about the latter:

I’d be willing to bet that if Wikipedia goes off the rails and some new online reference resource comes along to compete, there’ll be an automated mapping between Wikipedia links and the new thing; so the actual URIs may retain some value.

Indeed; and the first argument explains why linking to Wikipedia is superior to linking to an institution. But what about “original content”? If the content isn’t simply a home page (of an organization, person, or product significant enough to be in Wikipedia), Wikipedia doesn’t help. For example, I linked to Bray’s post “On Linking”; only providing a link to his Wikipedia article would have been unhelpful. The Wikipedia article link in this case is merely supplementary.

So what to do to help with broken and crappy links to items not described in Wikipedia? Bray suggests “multi-ended links”. I think he’s on the right track, but this is not something a web content creator should need to worry about — robust linking need not involve choosing several typed (e.g., official, reference, search) links. The content creator’s CMS and the user’s browser ought to be able to figure this stuff out; the content creator should just use the best link available, as always.

Last year I wrote:

I predict that in the forseeable future your browser will be able to convert a Wikipedia article link into a home page link if that is your preference, aided by Semantic Mediawiki annotations or similar.

In the case of non-Wikipedia links (and those too), combatting linkrot and providing alternate and related (e.g., reference, reply, archival) links is an obvious feature add for social bookmarking services and can be made available to a CMS or browser via the usual web API/feed/scraping mechanisms.

NSFW as liberal content rating

Friday, December 29th, 2006

An observation I’ve wanted to make for awhile, given the right occasion, is that the common practice of nothing that something is is the bottom-up, liberal, mature, and responsible analog of (e.g., MPAA ratings).

NSFW is a friend telling you that viewing a link may not be appropriate in some contexts, but use your judgement. Content rating is a bureacracy telling you that viewing of some content by certain people is prohibited and perhaps enforced legally or .

Of course content rating may be used to aid in making an informed choice and NSFW hints could in theory be enforced, but nevertheless I think each’s common use and source is illustrative of something.

The occasion for mentioning this is someone proposing machine-readable NSFW annotation. I don’t have an opinion of the utility of this yet, but it is fun to see a much improved (technically) proposal come just five hours after the first.

Via Tim Lee.