Archive for the ‘Semantic Web’ Category

Table selection, HSA, LugRadio, Music, Photographers, New Media

Monday, April 21st, 2008

A few observations and things learned from the last eight days.

Go to a page with a table, for example this one (sorry, semi-nsfw). Hold down the control key and select cells. How could I not have known about this!? Unfortunately, copy & paste seems to produce tab separated values in a single row even when pasting from mutliple rows in the HTML table (tried with Firefox and Epiphany). Still really useful when you only want to copy one column of a table, but if you want all of the columns, don’t hold down the control key and row boundaries get newlines as they should rather than tabs. (Thanks Asheesh.)

I feel really stupid about this one. I’ve assumed that a (US) was a spend within the year or lose your contributions arrangement, but that’s what a Flexible Spending Account is (I have no predictable medical expenses, so such an account makes no sense for me). A HSA is an investment account much like an IRA, except you can spend from it without penalty upon incurring medical expenses rather than old age. You can only contribute to a HSA while enrolled in a high deductible health insurance plan, which I’ll try to switch to next year. (Thanks Ahrash.)

I saw a few presentations at LugRadio Live USA, in addition to giving one. Miguel de Icaza’s on (content roughly corresponding to this post) and Ian Murdock’s on were both in part about software packaging. Taken together, they make a good case for open source facilitating cross polination of ideas and code across operating system platforms.

Aaron Bockover and Gabriel Burt did a presentation/demo on , showing off some cool track selection/playlist features and talking about more coming. I may have to try switching back to Banshee as my main audio player (from Rhythmbox, with occasional use of Songbird for web-heavy listening or checking on how the program is coming along). Banshee runs on Mono, and both are funded by Novell, which also (though I don’t know how their overall investment compares) has an .

John Buckman gave an entertaining talk on open source and open content (including the slide at right). My talk probably was not entertaining at all, but used the question ‘how far behind [free/open source software] is free/open culture?’ to string together selected observations about the field.

Benjamin Mako Hill did a presentation on Revealing Errors (meant both ways). I found myself wanting to be skeptical of the power of technical errors to expose political/power relationships, but I imagine the concept could use a little hype — there’s definitely something there. The talk made me more sensitive to errors in any case. For example, when I transferred funds from a money market account to checking to pay taxes, an email notice included this (emphasis in original):

Your confirmation number is 0.

Zero? Really? The transaction did go through.

Tuesday I attended the Media Web Meetup V: The Gulf Between NorCal and SoCal, is it so big?, the idea being (in this context pushed by Songbird founder Rob Lord; I presented at the first Media Web Meetup and have attended a few others) that in Northern California entrepreneurs are trying to build new services around music, nearly all stymied by protectionist copyright holders in Southern California. I really did not need to listen to yet another panel asking how the heck is the music recording distribution industry going to use technology to make money, but this was a pretty good one as those go. One of the panelists kept urging technologists to “fix [music] metadata” as if doing so were the key to enabling profit from digital music. I suppressed the urge to sound a skeptical note, as investing more in metadata is one of the least harmful things the industry might do. Not that I don’t think metadata is great or anything.


Wendy Seltzer / CC BY

Thursday evening I was on a ‘Copyright 2.0′ panel put on by the American Society of Media Photographers Northern California. I thought my photo selection for my first slide was pretty clever. No, copyright expansion is not always good for the interests of professional photographers. The other panelists and the audience were actually more open minded (both meanings) than I expected, and certainly realistic. The photographer on the panel even stated the obvious (my paraphrase from memory): new technology has allowed lots of people to explore their photographry talents who would otherwise have been unable to, and maybe some professional photographers just aren’t that good and should find other work. My main takeway from the panel is that it is very difficult for an independent photographer to successfully pursue unauthorized users in court. With the sometime exception of one, the other panelists all strongly advised photographers to avoid going to court except as a last resort, and even then, first doing a rational calculation of what the effort is likely to cost and gain. The best advice was probably to try to turn unauthorized users into clients.

Friday evening I went to San Jose to be on a panel about New Media Artists and the Law. Unlike Thursday’s panel, this one was mostly about how to use and re-use rather than how to prevent use. This (and some nostalgia) made me miss living in Silicon Valley — I lived in Sunnyvale two years (2003-2005) and San Jose (2005-2006) before moving back to San Francisco. Nothing really new came up, but I did enjoy the enthusiasm of the other panelists and the audience (as I did the previous day).

Staturday I went to Ubuntu Restaurant in Napa, which apparently does vegetable cuisine but does not market itself as vegetarian. I think that’s a good idea. The food was pretty good.

I’ve been listening to Hazard Records 59 and 60: Calida Construccio by various and Unhazardous Songs by Xmarx. Lovely Hell (mp3) from the latter is rather poppy.

Uberfact

Monday, February 18th, 2008

There are a number of fun things about a sketch of Uberfact: the ultimate social verifier. The first is that the post could be written without mentioning . The second is that the proposed project is a nice would-be example of political desires sublimated entirely into creating useful and voluntary tools. Third, Mencius Moldbug is a fun writer.

Something like Uberfact should absolutely be built, though I’m far from certain it would hit a sweet spot. It may be too decentralized or too centralized or both. All points from enhancing Wikipedia to the Semantic Web (with Uberfact somewhere between) are complementary and well worth pursuing, particularly if that pursuit displaces malinvestment in politics.

Relatedly, but no time to explain why:

Requirements for community funding of open source

Saturday, November 24th, 2007

Last month another site for aggregating donation pledges to open source software projects launched.

I’m not sure there’s anything significant that sets Cofundos apart from microPledge featurewise. Possibly a step where bidders (pledgers) vote on which developer bid to accept. However I’m not certain how a developer is chosen on microPledge — their FAQ says “A quote will be chosen that delivers the finished and paid product to the pledgers most quickly based on their current pledging rate (not necessarily the shortest quote).” microPledge’s scheme for in progress payments may set it apart.

In terms of marketing and associations, Cofundos comes from the Agile Knowledge Engineering and Semantic Web research group at the University of Leipzig, producers of , about which I’ve written. Many of the early proposed projects are directly related to AKSW research. Their copyright policy is appreciated.

microPledge is produced by three Christian siblings who don’t push their religion.

Cofundos lists 61 proposed projects after one month, microPledge lists about 160 after about three and a half months. I don’t see any great successes on either site, but both are young, and perhaps I’m not looking hard enough.

Cofundos and microPledge are both welcome experiments, though I don’t expect either to become huge. On the other hand, even modest success would set a valuable precedent. In that vein I’ve been pretty skeptical about the chances of Fundable, they seem to have attracted a steady stream of users. Although most projects seem to be uninteresting (pledges for bulk purchases, group trips, donations to an individual’s college fund, etc), some production of public goods does seem to being funded, including several film projects in the small thousands of dollars range. Indeed, “My short film” is the default project name in their form for starting a project.

It seems to me that creating requirements and getting in front of interested potential donors are the main challenges for sites focused on funding open source software like Cofundos and microPledge (both say they are only starting with software). Requirements are just hard, and there’s little incentive for anyone to visit an aggregator who hasn’t aggregated anything of interest.

I wonder if integrating financial donations into project bug tracking systems would address both challenges? Of course doing so would have risks, both of increasing bureaucracy around processing bugs and feature requests, necessity of implementing new features (and bugs) in the relevant bug tracking software, and altering the incentives of volunteer contributors.

Via Open Knowledge Foundation Blog.

bar : sex :: social networking site : spam

Thursday, November 22nd, 2007

Brad Templeton on Facebook apps that aggressively request access to your private data (relatedly Templeton on the economics of privacy and identity is a must read) and spam your friends:

Apps are not forced to do this. A number of good apps will let people see the data, even put it in feeds, without you having to “install” and thus give up all your privacy to the app. What I wish is that more of us had pushed back against the bad ones. Frankly, even if you don’t care about privacy, this approach results in lots of spam which is trying to get you to install apps. Everybody thinks having an app with lots of users is going to mean bucks down the road, with Facebook valued as highly as it is.

But a lot of it is plain old spam, but we’re tolerating it because it’s on Facebook. (Which itself is no champion. They have an extremely annoying email system which sends you an e-mail saying, “You got a message on facebook, click to read it” rather than just including the text of the message. To counter this, there is an “E-mail me instead” application which tries to make it easier for people to use real E-mail. And I recently saw one friend add the text “Use E-mail not facebook message” in her profile picture.)

The title of this post was my first Facebook status message earlier this year. In other words, social networking sites are all about lowering social boundaries. I am completely comfortable sending messages to people I barely know (if that) on Facebook that I would only consider (and often not) send to close friends and regular correspondents via email or instant messaging.

Ironically social networks could be used to fight spam and otherwise bootstrap reputation systems. I am mildly surprised that although trust is perhaps the most interesting feature of social networks, as far as I know nobody has done anything interesting with them (at least social networking sites) in this respect. An occasional correspondent even suggested recently that reputation is a kind of anti-feature for social networking sites, and reputation features tend to be hidden or turned off.

My other (unoriginal, but older) observation about social networking sites is that while at first blush the sector should be winner-take-all driven by network effects, but instead we’ve already seen a few leaders surpassed, and I highly doubt Facebook will take all. I have two explanations. First, the sites don’t have much power to lock users in, even though it is hard to export data — users have contact information for remotely valuable contacts outside the site, in address books, buddy lists, and email archives, and can recreate their network on a new site relatively easily. Second, social networking sites don’t yet have a killer application. Although Facebook has allowed many third party apps on its platform, I have yet to see one that I would miss, and very few I return to. I doubt I’d miss Facebook (or any other social networking site) much period if I were banned from it (I know that many students would disagree about Facebook and musicians about MySpace).

Semantic Web Web Web

Wednesday, November 21st, 2007

The and particularly its efforts do great, valuable work. I have one massive complaint, particularly about the latter: they ignore the Web at their peril. Yes, it’s true, as far as I can tell (but mind that I’m one or two steps removed from actually working on the problems), that the W3C and Semantic Web activities do not appreciate the importance of nor dedicate appropriate resources to the Web. Not just the theoretical Web of URIs, but the Web that billions of people use and see.

I’m reminded of this by Ian Davis’ post Is the Semantic Web Destined to be a Shadow?:

My belief is that trust must be considered far earlier and that it largely comes from usage and the wisdom of the crowds, not from technology. Trust is a social problem and the best solution is one that involves people making informed judgements on the metadata they encounter. To make an effective evaluation they need to have the ability to view and explore metadata with as few barriers as possible. In practice this means that the web of data needs to be as accessible and visible as the web of documents is today and it needs to interweave transparently. A separate, dry, web of data is unlikely to attract meaningful attention, whereas one that is a full part of the visible and interactive web that the majority of the population enjoys is far more likely to undergo scrutiny and analysis. This means that HTML and RDF need to be much more connected than many people expect. In fact I think that the two should never be separate and it’s not enough that you can publish RDF documents, you need to publish visible, browseable and engaging RDF that is meaningful to people. Tabular views are a weak substitute for a rich, readable description.

Peer producing think tank transparency

Wednesday, October 31st, 2007

Hack, Mash & Peer: Crowdsourcing Government Transparency from the looks like a reasonable exhortation for the U.S. jurisdiction government to publish data in so that government activities may be more easily scrutinized. The paper’s first paragraph:

The federal government makes an overwhelming amount of data publicly available each year. Laws ranging from the Administrative Procedure Act to the Paperwork Reduction Act require these disclosures in the name of transparency and accountability. However, the data are often only nominally publicly available. First, this is the case because it is not available online or even in electronic format. Second, the data that can be found online is often not available in an easily accessible or searchable format. If government information was made public online and in standard open formats, the online masses could be leveraged to help ensure the transparency and accountability that is the reason for making information public in the first place.

That’s great. But if peer produced (a more general and less inflammatory term than crowdsourced; I recommend it) scrutiny of government is great, why not of think tanks? Let’s rewrite that paragraph:

Think tanks produce an overwhelming number of analyses and policy recommendations each year. It is in the interest of the public and the think thanks that these recommendations be of high quality. However, the the data and methodology used to produce these positions are often not publicly available. First, this is the case because the data is not available online or even in electronic format. Second, the analysis that can be found online is often not available in an easily accessible or searchable format. Third, nearly everything published by think tanks is copyrighted. If think tank data and analysis was made public online in standard open formats and under open licenses, the online masses could be leveraged to help ensure the quality and public benefit of the policy recommendations that are the think tanks’ reason for existing in the first place.

Think tanks should lead by example, and improve their product to boot. Note the third point above: unlike , the output of think tanks (and everyone else) is restricted by copyright. So think tanks need to take an to ensure openness.

(Actually think tanks only need to lead in their domain of political economy — by following the trails blazed by the movement in scientific publishing.)

This is only the beginning of leading by example for think tanks. When has a pro-market think tank ever subjected its policy recommendations to market evaluation?

Via Reason.

Creative Commons accounting

Monday, October 8th, 2007

As usual, I don’t speak for Creative Commons nor any other organization here, not even remotely. Follow the links if you want officialdom.

Support CC - 2007 CC recently launched its fall campaign fundraising campaign (that time of year again) and site revamp. Boing Boing picked it up and noted one of the best bits:

Creative Commons has launched a site redesign to go with its fall fundraising campaign, featuring a new emphasis on the work being done by CC teams globally, backed by sweet open source code, including OpenLayers mapping and Semantic MediaWiki. For bloggers there are new map-themed “Support CC” buttons to help spread the word.

Yes, CC is finally using what I called the most important software project (to a much greater extent on the intranet; and not mentioned, other semantic technologies I’ve blogged about here), ironically now that I’m no longer CTO. Nathan Yergler has been doing a great job in that role, and 2007 will probably as much visible progress on CC technology fronts as the previous four years.

The CC Salon in San Francisco this Wednesday evening should be excellent, featuring researcher Giorgos Cheliotis (on counting CC licensed works–actually more than that, but the description works with this post’s title) and some very positive announcements, while Jon Phillips will be giving a brief talk on CC at the EFF Bootcamp during the day in Mountain View.

It took a long time to hire a new General Counsel, in no small part because Mia Garlick, the previous GC, set the bar very high.

And CC is hiring an accountant, full time in San Francisco.

Ridiculous simplicity

Monday, May 21st, 2007

is so ridiculous I’m not surprised it took so long for someone to invent it. But it is a thing of sublime beauty. Reminds me of some of the projects at last weekend’s .

pageoftext.com, which hosts wikiclock, is only ridiculous in its simplicity. Why didn’t I think of that?

Both projects via Evan Prodromou reporting on RoCoCo. I’m sad that I couldn’t make it to Montreal but glad to hear it’s coming to the SF Bay Area next year.

creativecommons.opportunities

Monday, March 19th, 2007

If working for a new project of a startup-like nonprofit in San Francisco involving [open] education, [copyright] law, and [semantic web] technology, perhaps you should look into applying for Executive Director of CC Learn. I could imagine an education, legal, or technology person with some expertise and much passion for the other two working out.

Student programmers, Creative Commons is participating in Google Summer of Code as a mentoring organization.

It is too late to apply for a summer technology or “free culture” internship, but keep CC in mind for next summer and (possibly) this fall.

Update 20070409: There are three open positions in addition to CC Learn ED above:

SXSW: Growth of Microformats

Saturday, March 17th, 2007

Monday afternoon’s packed The Growth and Evolution of Microformats didn’t strike me as terribly different from last year’s Microformats: Evolving the Web. Last year’s highlight was a Flock demo, this year’s was an Operator demo.

My capsule summary of the growth and (not much) evolution of Microformats over the past twelve months: a jillion names, addresses, and events have been marked up with hCard and hCalendar formatting.

SXSW: Semantic Web 2.0 and Scientific Publishing

Saturday, March 10th, 2007

Web 2.0 and Semantic Web: The Impact on Scientific Publishing, probably the densest panel I attended today (and again expertly moderated by Science Commons’ John Wilbanks), covered , new business models for scientific publishers, and how web technologies can help with these and data problems, but kept coming back to how officious Semantic Web technologies and controlled ontologies (which are not the same at all, but are often lumped together) and microformats and tagging (also distinct) complement each other (all four of ‘em!), even within a single application. I agree.

Nearly on point, this comment elsewhere by Denny Vrandecic of the Semantic MediaWiki project:

You are supposed to change the vocabulary in a wiki-way, just as well as the data itself. Need a new relation? Invent it. Figured out it’s wrong? Rename. Want a new category of things? Make it.

Via Danny Ayers, oringal posted to O’Reilly Radar, which doesn’t offer permalinks for comments. This just needs a catchy name. Web 2.0 ontology engineering? Fonktology?

Perils of a too cool name

Wednesday, February 14th, 2007

I’ve seen lots of confusion about microformats, but Jon Udell takes the cake in describing XMP:

It’s a bit of a mish-mash, to say the least. There’s RDF (Resource Description Framework) syntax, Adobe-style metadata syntax, and Microsoft-style metadata syntax. But it works. And when I look at this it strikes me that here, finally, is a microformat that has a shot at reaching critical mass.

How someone as massively clued-in as Jon Udell could be so misled as to describe XMP as a microformat is beyond me.

, which is basically a constrained RDF/XML serialization following super-ugly conventions that may be embedded in a number of file formats (most prominently PDF and JPEG, but potentially almost anything), is about as far from a as one could possibly get. Off the top:

  • XMP is RDF/XML and as such is arbitrarily extensible; each microformat covers a specific use case and goes through great lengths to favor interoperability among publishers of each microformat (sometime I will write about how microformat and RDF people mean completely different things by “interoperability”) at the expense of extensibility.
  • XMP is embedded in a binary file, completely opaque to nearly all users; microformats put a premium on (practically require) colocation of metadata with human-visible HTML.
  • XMP would be extremely painful to write by hand and there are very few tools that support publishing it; microformats, to a fault, put a premium on publisher ease–anyone with a passing familiarity with HTML could be writing microformats.

I don’t agree with everything the microformats folk have done, but they do have a pretty self-consistent approach, if one bothers to try to understand it. XMP ain’t it.

XMP is by far the most promising embedded metadata format for “media” files — which is mostly a testament to how terribly useless to non-existent the alternatives are (by some definitions there are none).

Addendum: I’m really only picking on one word from Udell’s post, the remainder of which is recommended. It is to learn that “There’s also good support in .NET Framework 3.0 for reading and writing XMP metadata.”

Update 20070215: Udell explains:

Now there is, as Mike points out, a big philosophical difference between XMP, which aims for arbitrary extensibility, and fixed-function microformats that target specific things like calendar events. But in practice, from the programmer’s perspective, here’s what I observe.

Hand me an HTML document containing a microformat instance and I will cast about in search of tools to parse it, find a variety of ones that sort of work, and then wrestle with the details.

Hand me an image file containing an XMP fragment and, lo and behold, it’s the same story!

Yes, for 99% of the .01% of the world that cares at all, microformats and XMP are the same: metadata, embedded data, or even just data. That’s what I was hinting at in the title of this post — in the minds of 99% of the .01%, microformats are becoming synonymous with metadata, i.e., genericized. This is either a marketing and naming coup or disaster, depending on one’s perspective (I don’t particularly care).

I considered this headline: If XMP is a microformat, RDFa sure the heck is a microformat.

“Querying Wikipedia like a Database”

Tuesday, January 23rd, 2007

I’ve mentioned several times as having the potential to tremendously increase the value of Wikipedia by unlocking (in the sense of making queryable) all of the data in the encyclopedia.

dbpedia.org has taken a different approach to “Querying Wikipedia like a Database” (their excellent tagline) — extract datasets from Wikipedia, presumably with a manual mapping of relevant categories and data populating infoboxes to triples (described in What have Innsbruck and Leipzig in common? Extracting Semantic from Wiki Content).

I suspect Wikipedia implementation of Semantic MediaWiki would only help dbpedia.org, but the latter is already impressive, requiring no changes at Wikipedia. In addition to making some of the data in Wikipedia queryable they’re exposing non-Wikipedia datasets.

The Semantic Web is so here, now. Doubters repent! ;-) Like I said before:

Once people get hooked on access to a semantic encyclopedia, perhaps they’ll want similar access to the entire web.

Wikipedia and Linking 2.0

Monday, January 22nd, 2007

has reasons for linking to a Wikipedia article about an organization rather than the organization’s site:

[A] lot of institutional sites are pathetic self-serving fluff served up in anodyne marketing-speak with horrible URIs that are apt to vanish.

Linking to the Wikipedia instead is tempting, and I’ve succumbed a lot recently. In fact, that’s what I did for the Canada Line. After all, the train is still under construction and there’s no real reason to expect today’s links to last; on top of which, the Line’s own site is mostly about selling the project to the residents and businesses who (like me) are getting disrupted by it, and the taxpayers who (like me) are paying for it.

Wikipedia entries, on the other hand, are typically in stable locations, have a decent track record for outliving transient events, are pretty good at presenting the essential facts in a clear, no-nonsense way, and tend to be richly linked to relevant information, including whatever the “official” Web site might currently happen to be.

I wrote something similar about a year ago:

I consider a Wikipedia link more usable than a link to an organization home page. An organization article will link directly to an organization home page, if the latter exists. The reverse is almost never true (though doing so is a great idea). An organization article at Wikipedia is more likely to be objective, succinct, and informational than an organizational home page (not to mention there is no chance of encountering Flash, window resizing, or other annoying distractions — less charitably, attempts to control my browser — at Wikipedia). When I hear about something new these days, I nearly always check for a Wikipedia article before looking for an actual website. Finally, I have more confidence that the content of a Wikipedia article will be relevant to the content of my post many years from now.

Why not preferntially link to Wikipedia? Bray feels bad about not linking directly to original content and says Wikipedia could go off the rails, though later provides a reason to not worry about the latter:

I’d be willing to bet that if Wikipedia goes off the rails and some new online reference resource comes along to compete, there’ll be an automated mapping between Wikipedia links and the new thing; so the actual URIs may retain some value.

Indeed; and the first argument explains why linking to Wikipedia is superior to linking to an institution. But what about “original content”? If the content isn’t simply a home page (of an organization, person, or product significant enough to be in Wikipedia), Wikipedia doesn’t help. For example, I linked to Bray’s post “On Linking”; only providing a link to his Wikipedia article would have been unhelpful. The Wikipedia article link in this case is merely supplementary.

So what to do to help with broken and crappy links to items not described in Wikipedia? Bray suggests “multi-ended links”. I think he’s on the right track, but this is not something a web content creator should need to worry about — robust linking need not involve choosing several typed (e.g., official, reference, search) links. The content creator’s CMS and the user’s browser ought to be able to figure this stuff out; the content creator should just use the best link available, as always.

Last year I wrote:

I predict that in the forseeable future your browser will be able to convert a Wikipedia article link into a home page link if that is your preference, aided by Semantic Mediawiki annotations or similar.

In the case of non-Wikipedia links (and those too), combatting linkrot and providing alternate and related (e.g., reference, reply, archival) links is an obvious feature add for social bookmarking services and can be made available to a CMS or browser via the usual web API/feed/scraping mechanisms.

NSFW as liberal content rating

Friday, December 29th, 2006

An observation I’ve wanted to make for awhile, given the right occasion, is that the common practice of nothing that something is is the bottom-up, liberal, mature, and responsible analog of (e.g., MPAA ratings).

NSFW is a friend telling you that viewing a link may not be appropriate in some contexts, but use your judgement. Content rating is a bureacracy telling you that viewing of some content by certain people is prohibited and perhaps enforced legally or .

Of course content rating may be used to aid in making an informed choice and NSFW hints could in theory be enforced, but nevertheless I think each’s common use and source is illustrative of something.

The occasion for mentioning this is someone proposing machine-readable NSFW annotation. I don’t have an opinion of the utility of this yet, but it is fun to see a much improved (technically) proposal come just five hours after the first.

Via Tim Lee.

Most important software project

Sunday, December 10th, 2006

I don’t have a whole lot more to say about Semantic MediaWiki than I said over a year ago. The summary is to turn the universal encyclopedia into the universal database while simultaneously improving the quality of the encyclopedia.

Flip through Denny Vrandecic’s recent presentations on Semantic MediaWiki (a smaller pdf version not directly linked in that post). There’s some technical content, but flip past that and you should still get the idea and be very excited.

I predict that Semantic MediaWiki also will be the killer application for the Semantic Web that so many have been skeptical of.

Yaron Koren also says that Semantic MediaWiki is “the technology that will revolutionize the web” and has built DiscourseDB using the software. DiscourseDB catalogs political opinion pieces. Koren’s post on aggregating analysis using DiscouseDB. Unsurprisingly this analysis shows the political experts making bad calls.

Koren also has created Betocracy, another play money prediction market where users create claims. It looks like Betocracy is going for a blog-like interface, but I can’t say more as registration obtains a database error.

One prediction market and Semantic MediaWiki connection is that making data more accessible makes prediction markets more feasible. Obtaining data necessary to create and judge prediction market contracts is expensive.

On that note Swivel also looks interesting. Some have called it data porn. Speaking of porn, see Strange Maps.

Microformats are worse

Sunday, October 22nd, 2006

I almost entirely agree with Mark Birbeck’s comparison of RDFa and microformats. The only thing to be said in defense of is that a few of the problems Birbeck calls out are also features, from the microformats perspective.

But .

I will reveal what this means later.

Another quip: My problem with microformats is the s.

Evan Prodromou provided a still-good RDFa vs Microformats roundup (better title: “RDFa and Microformats, please work together”) in May. I somehow missed it until now.

Ah, metadata.

Update 20061204: I didn’t miss Prodromou’s roundup in May, I blogged about it. And forgot.

Meta those who can’t

Thursday, October 12th, 2006

I’ve been meaning to write a version of the aphorism “those who can, do; those who can’t, teach” to say something derogatory about metadata. Something like “those who can, code; those who can’t, twitter about standards.” But it doesn’t flow and like the “teach” version, is highly contestable. Plus, I’m projecting.

However, I realized that the general pattern of this aphorism is that those who can’t, do something “meta” relative to those who can, e.g., an extension of “teach” is “… those who can’t teach, administrate.”

Presumably those who can’t administrate, run for school board. And those who can’t write a pointed aphorism, write about aphorisms.

All of the “can’t” statements are metadata about a subject that does something more meta than the things he can’t do.

So I got my metadata out of it after all.

Google whenever

Sunday, September 3rd, 2006

For years I’ve heard speculation that Google is buiding a web archive. Now there are domain name purchases to fuel the speculation. The Internet Archive has been providing an invaluable service with the and has set up mirrors in multiple jurisdictions, but recording the web is too important to rely on any single organization, no matter how good or robust. So I hope Google and others are maintaining web archives and will make them available to the public.

Via Tim Finin, who also notes an interesting paper about using article and user history to assign trust levels to Wikipedia article fragments and a Semantic Web archive.

Archives are important for establishing provenance in many situations, though one I’m particularly interested in is citing that a particular work was offered under a Creative Commons license at a particular time. This and other uses (e.g., citation in general, which is often of the form “http://example.com accessed 2005-03-10″, though who knows if a copy of the content as it existed on that date exists) would be enhanced if on-demand archiving were available. The Internet Archive does offer Archive-It.org, but this service is for institutional use and uses periodic crawls rather than immediate archiving of individual pages.

Update, 2 minutes later: I should read a bit more before posting: does exactly what I want. However, I hate that it uses opaque identifiers, and as such is nearly as evil as TinyURL.

Wordcamp and wiki mania

Monday, August 7th, 2006

In lieu of attending maybe the hottest conference ever I did a bit of wiki twiddling this weekend. I submitted a tiny patch (well that was almost two weeks ago — time flies), upgraded a private MediaWiki installation from 1.2.4 to 1.6.8 and a public installation from 1.5.6 to 1.6.8 and worked on a small private extension, adding to some documentation before running into a problem.

1.2.4->1.6.8 was tedious (basically four successive major version upgrades) but trouble-free, as that installation has almost no customization. The 1.5.6->1.6.8 upgrade, although only a single upgrade, took a little fiddling make a custom skin and permissions account for small changes in MediaWiki code (example). I’m not complaining — clean upgrades are hard and the MediaWiki developers have done a great job of making them relatively painless.

Saturday I attended part of , a one day unconference for WordPress users. Up until the day before the tentative schedule looked pretty interesting but it seems lots of lusers signed up so the final schedule didn’t have much meat for developers. Matt Mullenweg’s “State of the Word” and Q&A hit on clean upgrade of highly customized sites from several angles. Some ideas include better and better documented plugin and skin APIs with more metadata and less coupling (e.g., widgets should help many common cases that previously required throwing junk in templates).

Beyond the purely practical, ease of customization and upgrade is important for openness.

Now listening to the Wikimania Wikipedia and the Semantic Web panel…