Archive for the ‘Semantic Web’ Category

Free software needs P2P

Friday, July 28th, 2006

Luis Villa on my constitutionally open services post:

It needs a catchier name, but his thinking is dead on- we almost definitely need a server/service-oriented list of freedoms which complement and extend the traditional FSF Four Freedoms and help us think more clearly about what services are and aren’t good to use.

I wasn’t attempting to invent a name, but Villa is right about my aim — I decided to not mention the four freedoms because I felt my thinking too muddled to dignified with such a mention.

Kragen Sitaker doesn’t bother with catchy names in his just posted draft essay The equivalent of free software for online services. I highly recommend reading the entire essay, which is as incisive as it is historically informed, but I’ve pulled out the problem:

So far, all this echoes the “open standards” and “open formats” discussion from the days when we had to take proprietary software for granted. In those days, we spent enormous amounts of effort trying to make sure our software kept our data in well-documented formats that were supported by other programs, and choosing proprietary software that conformed to well-documented interfaces (POSIX, SQL, SMTP, whatever) rather than the proprietary software that worked best for our purposes.

Ultimately, it was a losing game, because of the inherent conflict of interest between software author and software user.

And the solution:

I think there is only one solution: build these services as decentralized free-software peer-to-peer applications, pieces of which run on the computers of each user. As long as there’s a single point of failure in the system somewhere outside your control, its owner is in a position to deny service to you; such systems are not trustworthy in the way that free software is.

This is what has excited about decentralized systems long before P2P filesharing.

Luis Villa also briefly mentioned P2P in relation to the services platforms of Amazon, eBay, Google, Microsoft and Yahoo!:

What is free software’s answer to that? Obviously the ’spend billions on centralized servers’ approach won’t work for us; we likely need something P2P and/or semantic-web based.

Wes Felter commented on the control of pointers to data:

I care not just about my data, but the names (URLs) by which my data is known. The only URLs that I control are those that live under a domain name that I control (for some loose value of control as defined by ICANN).

I hesitated to include this point because I hesitate to recommend that most people host services under a domain name they control. What is the half-life of http://blog.john.smith.name vs. http://johnsmith.blogspot.com or js@john.smith.name vs. johnsmith@gmail.com? Wouldn’t it suck to be John Smith if everything in his life pointed at john.smith.name and the domain was hijacked? I think Wes and I discussed exactly this outside CodeCon earlier this year. Certainly it is preferable for a service to allow hosting under one’s own domain (as Blogger and several others do), but I wish I felt a little more certain of the long-term survivability of my own [domain] names.

This post could be titled “freedom needs P2P” but for the heck of it I wanted to mirror “free culture needs free software.”

Long tail of metadata

Monday, May 29th, 2006

Ben Adida notes that people are writing about RDFa, which is great, and envisioning conflict with microformats, which is not. As Ben says:

Microformats are useful for expressing a few, common, well-defined vocabularies. RDFa is useful for letting publishers mix and match any vocabularies they choose. Both are useful.

In other words RDFa is a technology.

Evan Prodromou thinks the future is bleak without cooperation. I like his proposed way forward (strikeout added for obvious reasons):

  1. RDFa gets acknowledged and embraced by microformats.org as the future of semantic-data-in-XHTML
  2. The RDFa group makes an effort to encompass existing microformats with a minimum of changes
  3. microformats.org leaders join in on the RDFa authorship process
  4. microformats.org becomes a focus for developing real-world RDFa vocabularies

I see little chance of points one and three occuring. However, I don’t see this as a particularly bad thing. Point three will occur, almost by default: the simplest and most widely deployed microformats (e.g., , and rellicense) are also valid RDFa — the predicate (e.g., tag, nofollow, license) appearing in the default namespace to a RDFa application. More complex microformats may be handled by hGRDDL, which is no big deal as a microformat-aware application needs to parse each microformat it cares about anyway. From an RDF perspective any well-crafted metadata is a plus (and the microformats group do very careful work) as RDF’s killer app is integrating heterogenous data sources.

From a microformats perspecitve RDFa might well be ignored. While transformation of any microformat to RDF is relatively straightforward, transformation of RDF (which is a model, not a format) to microformats is nonsensical (well, I suppose the endpoint of such a transformation could be , though I’m not sure what the point would be). Microformats, probably wisely, is not reinventing RDF (as many do, usually badly).

So why would RDFa be of interest to developers? In a word, laziness. There is no process to follow for developing an RDF vocabulary (ironic), you can freely reuse existing vocabularies and tools, not write your own parsers, and trust that really smart people are figuring out the hard stuff for you (I believe the formal background of the Semantic Web is a long-term win). Or you might just want to, as Ben says “express metadata about other documents (embedded images)” which is trivial for RDF as images have URIs.

Addendum 20060601: The “simplest” microformats mentioned above have a name: elemental microformats.

RDFa.info

Wednesday, May 24th, 2006

I’ve mentioned a couple times in passing.

Ben Adida has been doing an awesome job leading the standards effort the last year and a half, which will pay off handsomely over the next six months. A few days ago he launched RDFa.info, the place to watch for interoperable web metadata tools, examples, and news.

Wikiforms

Thursday, May 11th, 2006

Brad Templeton writes about overly structured forms, one of my top UI peeves. The inability to copy and paste an IP address into a form with four separate fields has annoyed me, oh, probably hundreds of times. Date widgets annoy me slightly less. Listen to Brad when designing your next form, on the web or off.

The opposite of overly structured forms would be a freeform editing widget populated with unconstrained fields blank or filled with example data, or even a completely empty editing widget with suggested structure documented next to the widget — a wiki editing form. This isn’t as strange as it seems — many forms are distributed as word processor or plain text documents that recipients are expected to fill in by editing directly and return.

I don’t think “wikiforms” are appropriate for many cases where structured forms are used, but it’s useful to think of opposites and I imagine their (and hybrids — think a “rich” wiki editor with autocompletion — I haven’t really, but I imagine this is deja vu for anyone who has used mainframe-style data entry applications) niche could increase.

Ironically the currently number one use of the term wiki forms denotes adding structured forms to wikis!

On a marginally related note the Semantic MediaWiki appears to be making good progress.

Bitzi as Tagging 1.0 Metacrap

Sunday, March 12th, 2006

On the Tagging 2.0 panel just cited as (more or less) a non-successful predecessor to Tagging 2.0 applications, saying something like “things like Bitzi (mumble) Cory Doctorow called .”

Vander Wal recently explained in a comment at Joho the Blog:

The big thing that was different, from say Bitzi, was people tagging information in their own vocabulary for their own reuse. Tagging information for others as a priority seems to make it far less accurate as a person may not understand the terms they are using (well understand them as other may).

He’s right. There’s too little private benefit to “tagging” at Bitzi, largely because what interfaces to what you have individually contributed are lame to the extent they exist. The Bitzi use case is rather different from and but it can learn a lot from them.

Semantic Technology Conference wrap

Saturday, March 11th, 2006

The 2006 Semantic Technology Conference was more interesting than I expected. The crowd was older and much more formally dressed and there was far less emphasis on open source solutions than any conference I’ve attended in a long time but it wasn’t merely a vendor schmoozefest.

James Hendler and Ora Lassila’s Semantic Web @5 keynote claimed that Semantic Web technologies have made great strides over the past five years. They pointed out that middle levels of the Semantic Web layer cake are mature and higher levels are subjects of funded research (in 2001 lower and middle levels were mature and research respectively). Near the end they made a strong call to “share; give it away!” — open source tools, datasets, and harvesters are needed to grow the Semantic Web.

My presentation on Semantic Search on the Public Web with Creative Commons went fairly well apart from some audio problems. I began with a hastily added segue (not in the slides) from the keynote, highlighting Science Commons’ database licenseing FAQ and Uniprot. Questions were all over the map, befitting the topic.

I think Uche Ogbuji’s Microformats: Partial Bridge from XML to the Semantic Web is the first talk I’ve heard on that I’ve heard from a non-cheerleader and was a pretty good introduction to the upsides and downsides of microformats and how can leverage microformats for officious Semantic Web purposes. My opinion is that the value in microformats hype is in encouraging people to take advantage of XHTML semantics in however a conventional in non-rigorous fashion they may. It is a pipe dream to think that most pages containing microformats will include the correct profile references to allow a spec-following crawler to extract much useful data via GRDDL. Add some convention-following heuristics a crawler may get lots of interesting data from microformatted pages. The big search engines are great at tolerating ambiguity and non-conformance, as they must.

Ogbuji’s talk was the ideal lead in to Ben Adida’s Interoperable Metadata for a Bottom-Up Semantic Web which hammered home five principles of metadata interoperability: publisher independence, data reuse, self-containment, schema modularity, and schema evolvability. , , Microformats, GRDDL, and RDF/A were evaluated against the principles. It is no surprise that RDF/A came out looking best — Adida has been chairing the relevant W3C taskforce. I think RDF/A has great promise — it feels like microformats minus annoyances, or microformats with a model — but may say otherwise. The oddest response to the talk came from someone of the opinion that [X]HTML is irrelevant — everything should be custom XML rendered with custom XSLT when necessary.

I was somewhat surprised by the strong focus of most talks and vendors on RDF and friends rather than any other “semantic technologies.” was one exception. He apparently claimed last year that by this year would be growing primarily through machine learning rather than input by knowledge engineers. A questioner called Lenat on this prediction. Lenat claimed the prediction came true but did not offer any quantatative measure. It looked like from the slides (unavailable) that Cyc can have databases and similar described to it and may access same (e.g., via JDBC), giving it access to an arbitrary number of “facts.”

If there was a theme that flowed through the conference it was first integrating heterogenous data sources (I don’t recall who, but someone characterized semantic technologies as liberating enterprises from vendors) and second multiplying the value of that data through linking and inference.

Mills Davis’ closing keynote blew up these themes, claiming outrageous productivity improvements are coming very shortly due to semantic technologies, including a slide. The conference hotel fire alarm went off during the keynote, serving as a hype alert to any willing to hear.

SemTech06 reinforces my confidence in what I said in the SemWeb, AI, Java: The Ontological Parallels mini-rant given at SXSW last year. Too bad they rejected my proposal for this year:

Semantic Web vs. Tags Deathmatch: Tags are hot, but are they a dead end? The Semantic Web is still a research project, but will it awaken in August, 2009? People in the trenches fight over the benefits and limits of tags, the viability of officious Semantic Web technologies, what the eventual semantic web will look like, and how to bridge the gap between the two.

I’m off to SXSW tomorrow anyway. My schedule.

Creative Commons Salon San Francisco

Thursday, March 2nd, 2006

Next Wednesday evening in San Francisco come to the Creative Commons Salon sadly not featuring avant-drone-noise electric violin music played on a stage behind white sheets (apropos of nothing apart from listening to that now and not wanting anything else), but should be pretty excellent anyway.

I’ve wanted to do a one-time event like this for a long time, but Eric Steuer and Jon Phillips, who are curating the event and series to be are doing a far better job than I ever could have.

Also next week I’ll be presenting at the 2006 Semantic Technology Conference, then on to SXSW for a panel on digital preservation and blogs and silly parties, but leaving too soon to see the great Savage Republic perform on Friday (in two weeks). Perhaps I should change my flight and find somewhere to crash for two days?

My partner and I are also looking for a new apartment. Know of a great place in San Francisco around $2,000/month and not in the far west or south?

Update 20060303:

CC salon invite

Note Shine’s 1337 address.

Update 20060311: Success.

content.exe is evil

Thursday, February 16th, 2006

I occasionally run into people who think users should download content (e.g., music or video) packaged in an executable file, usually for the purpose of wrapping the content with where the content format does not directly support DRM (or the proponent’s particular DRM scheme). Nevermind the general badness of Digital Restrictions Management, requiring users to run a new executable for each content file is evil.

Most importantly, every executable is a potential vector. There is no good excuse for exposing users to this risk. Even if your executable content contains no malware and your servers are absolutely impenetrable such that your content can never be replaced with malware, you are teaching users to download and run executables. Bad, bad, bad!

Another problem is that executables are usually platform-specific and buggy. Users have enough problem having the correct codec installed. Why take a chance that they might not run Windows (and the specific versions and configurations you have tested, sure to not exist in a decade or much less)?

I wouldn’t bother to mention this elementary topic at all, but very recently I ran into someone well intentioned who wants users to download content wrapped in , if I understand correctly for the purposes of ensuring users can obtain content metadata (most media players do a poor job of exposing content metadata and some file formats do a poor job of supporting embedded metadata, not that hardly anyone cares — this is tilting at windmills) and so that content publishers can track use (this is highly questionable), all from a pretty cross platform GUI. A jar file is an executable Java package, so the platform downside is different (Windows is not required, but a Java installation, of some range of versions and configurations, is), but it is still an executable that can do whatever it wants with the computer it is running on. Bad, bad, bad!

The proponent of this scheme said that it was ok, the jar file could be . This is no help at all. Anyone can create a certificate and sign jar files. Even if a creator did have to have their certificate signed by an established authority it would be of little help, as malware purveyors have plenty of resources that certificate authorities are happy to take. The downsides are many: users get a security prompt (”this content signed by…”) for content, which is annoying, misleading as described above and conditions the user to not pay attention when they install things that really do need to be executable, and a barrier is raised for small content producers.

If you really want to package arbitrary file formats with metadta, put everything in a zip file and include your UI in the zip as HTML. This is exactly what P2P vendor ’s Packaged Media File format is. You could also make your program (which users download only once) look for specific files within the zip to build a content-specific (and safe) interface within your program. I believe this describes ’s Kapsules, though I can’t find any technical information.

Better yet put your content on the web, where users can find and view it (in the web design of your choice), you get reasonable statistics, and the don’t get fed. You can even push this to 81/19 by including minimal but accurate embedded in your files if they support it — a name users can search for or a URL for your page related to the content.

Most of the pushers of executable content I encounter when faced with security concerns say it is an “interersting and hard problem.” No, it is a stupid and impossible problem. In contrast to web, executable content is a 5/95/-1000 solution — that last number is a .

If you really want an interesting and hard problem, executable content security is the wrong level. Go work on platform security. We can now run sophisticated applications within a web browser with some degree of safety (due to Java applet and Flash sandboxes, JavaScript security). Similar could be pushed down to the desktop, so that executables by default have no more rights to tamper with your system than do web pages. is an aggressive approach to this problem. If that sounds too hard and not interesting enough (you really wanted to distribute “media”), go the web way as above — it is subsuming the desktop anyhow.

CodeCon Sunday

Monday, February 13th, 2006

Dido. I think this provides AGI, or a way to script voice response systems using and a voice template system analogous to scripting and HTML templates for web servers, though questioners focused on a controversial feature to reorder menus based on popularity. The demo didn’t really work, except as a demonstration of everyone’s frustration with IVRs, as an audience member pointed out.

Deme. Kitchen sink collaboration web app. They aren’t done putting dishes in the sink. They’re thinking about taking all of the dishes out of the sink, replacing the sink, and putting the dishes back in (PHP to something cooler). Let’s vote on what kind of vote to put this to.

Monotone. Elegant distributed , uses SHA1 hashes to identify files and repository states. Hash of previous repository state included in current repository state, making lineage cryptographically provable. used to quickly determine file level differences between repositories (for sync). Storage and (especially) merge and diff are loosely coupled. Presentation didn’t cover day to day use, probably a good decision in terms of interestingness. The revision control presentations have been some of the best every year at CodeCon. They should consider having two or three next year. may be the only project presented this year that had a Wikipedia article before the conference.

Rhizome. Unlike Gordon (and perhaps most people), hearing the triplet doesn’t make my eyes glaze over, but I’m afraid this presentation did. Some of the underlying code ( etc) might be interesting, but was the second to last presentation, and the top level project, Rhizome, amounts to yet another idiosyncratic , with the idiosyncratic dial turned way up.

Elkhound/Elsa/Oink/Cqual++. generator that handles ambiguous grammars in a straightforward manner, C++ parser and tools built on top of same. Can find with a reasonable false positive rate. Expressed confidence that future work would lead the compiler catching far more bugs than usually thought possible (as opposed to only at runtime). Cool and important stuff, too bad I only grok it at a high level. Co-presenter Dan Wilkerson (and sole presenter on Saturday of Delta) is with the Open Source Quality Project at UC Berkeley.

Saturday
Sunday 2005

[Hot]link policy

Sunday, January 15th, 2006

I’m out of the loop. Until very recently (upon reading former Creative Commons intern Will Frank’s writeup of a brief hotlink war) I thought ‘‘ was an anachronistic way to say ‘link’ used back when the mere fact that links led to a new document, perhaps on another server, was exciting. It turns out ‘hotlink’ is now vernacular for inline linking — displaying or playing an image, audio file, video, or other media from another website.

Lucas Gonze, who has lots of experience dealing with hotlink complaints due to running Webjay, has a new post on problems with complaint forms as a solution to hotlinks. One thing missing from the post is a distinction between two completely different sets of complainers who will have different sets of solutions beyond complaining.

One sort of complainer wants a link to a third party site to go away. I suspect the complainer usually really wants the content on the third party site to go away (typically claiming the third party site has no right to distribute the content in question). Removing a link to that content from a link site works as a partial solution by making the third party hosted content more obscure. A solution in this case is to tell the complainer that the link will go away when it no longer works — in effect, the linking site ignore complaints and it is the responsibility of the complainer to directly pursue the third party site via and other threats. This allows the linking site to completely automate the removal of links — those removed as a result of threatened or actual legal action look exactly the same as any other link gone bad and can be tested for and culled using the same methods. Presumably such a hands-off policy only pisses off complainers to the extent that they become more than a minor nuisance, at least on a Webjay-like site, though it must be an option for some.

Creative Commons has guidelines very similar to this policy concerning how to consider license information in files distributed off the web — don’t believe it unless a web page (which can be taken down) has matching license information concerning the file in question.

Another sort of complainer wants a link to content on their own site to go away, generally for one or two reasons. The first reason is that hotlinking uses bandwidth and other resources on the hotlinked site which the site owner may not be able to afford. The second reason, often coupled with the first, is that the site owner does not want their content to be available outside of the context of their own site (i.e., they want viewers to have to come to the source site to view the content).

With a bit of technical savvy the complainer who wants a link to their own site removed has several options for self help. Those merely concerned with cost could redirect requests without the relevant referrer (from their own site) or maybe cookie (e.g., for a logged in user) to the free or similar, which should drastically reduce originating site bandwidth, if hotlinks are actually generating many requests (if they are not there is no problem).

A complainer who does not want their content appearing in third party sites can return a small “visit my site if you want to view this content” image, audio file, or video as appropriate in the abscense of the desired referrer or cookie. Hotlinking sites become not an annoyance, but free advertising. Many sites take this strategy already.
Presumably many publishers do not have any technical savvy, so some Webjay-like sites find it easier to honor their complaints than to ignore them.

There is a potential for technical means of saying “don’t link to me” that could be easily implemented by publishers and link sites with any technical savvy. One is to interpret exclusions to mean “don’t link to me” as well as “don’t crawl and index me.” This has the nice effect that those stupid enough to not want to be linked to also become invisible to search engines.

Another solution is to imitate — perhaps rel=nolink, though the attribute would need to be availalable on img, object, and other elements in addtion to a, or simply apply rel=nofollow to those additional elements a la the broader interpretation of robots.txt above.

I don’t care for rel=nolink as it might seem to give some legitimacy to brutally bogus link policies (without the benefit of search invisibility), but it is an obvious option.

The upshot of all this is that if a link site operator is not as polite as Lucas Gonze there are plenty of ways to ignore complainers. I suppose it largely comes down to customer service, where purely technical solutions may not work as well as social solutions. Community sites with forums have similar problems. Apparently Craig Newmark spends much of his time tending to customer service, which I suspect has contributed greatly to making such a success. However, a key difference, I suspect, is that hotlink complainers are not “customers” of the linking site, while most people who complain about behavior on Craigslist are “customers” — participants in the Craigslist community.

Search 2006

Saturday, January 14th, 2006

I’m not going to make new predictions for search this year — it’s already underway, and my predictions for 2005 mostly did not come true. I predict that most of them will, in the fullness of time:

Metadata-enhanced search. Yahoo! and Google opened Creative Commons windows on their web indices. Interest in semantic markup (e.g., microformats) increased greatly, but search that really takes advantage of this is a future item. (NB I consider the services enabled by more akin to browse than search and as far as I know they don’t allow combinging tag and keyword queries.)

Proliferation of niche web scale search engines. Other than a few blog search services, which are very important, I don’t know of anything that could be called “web scale” — and I don’t know if blog search could really be called niche. One place to watch is public search engines using Nutch. Mozdex is attempting to scale up, but I don’t know that they really have a niche, unless “using open source software” is one. Another place is Wikipedia’s list of internet search engines.

On the other hand, weblications (as Web 2.0) did take off.

I said lots of desktop search innovation was a near certainty, but if so, it wasn’t very visible. I predicted slow progress on making multimedia work with the web, and I guess there was very slow progress. If there was forward progress on usable security it was slow indeed. Open source did slog toward world domination (e.g., Firefox is the exciting platform for web development, but barely made a dent in Internet Explorer’s market share) with Apple’s success perhaps being a speed bump. Most things did get cheaper and more efficient, with the visible focus of the semiconductor industry swinging strongly in that direction (they knew about it before 2005).

Last year I riffed on John Battelle’s predictions. He has a new round for 2006, one of which was worth noting at Creative Commons.

Speaking of predictions, of course Google began using prediction markets internally. Yahoo!s Tech Buzz Game has some markets relevant to search but I don’t know how to interpret the game’s prices.

CodeCon 2006 Program

Thursday, January 12th, 2006

The 2006 program has been announced and it looks fantastic. I highly recommend attending if you’re near San Francisco Feb 10-12 and any sort of computer geek. There’s an unofficial CodeCon wiki.

My impressions of last year’s CodeCon: Friday, Saturday, and Sunday.

Via Wes Felter

Going overboard with Wikipedia tags

Thursday, January 12th, 2006

A frequent correspondent recently complained that my linking to articles about organizations rather than the home pages of organizations is detrimental to the of this site, probably spurred by my linking to a stub article about Webjay.

I do so for roughly two reasons. First, I consider a Wikipedia link more usable than a link to an organization home page. An organization article will link directly to an organization home page, if the latter exists. The reverse is almost never true (though doing so is a great idea). An organization article at Wikipedia is more likely to be objective, succinct, and informational than an organizational home page (not to mention there is no chance of encountering , window resizing, or other annoying distractions — less charitably, attempts to control my browser — at Wikipedia). When I hear about something new these days, I nearly always check for a Wikipedia article before looking for an actual website. Finally, I have more confidence that the content of a Wikipedia article will be relevant to the content of my post many years from now.

(link to webjay.org) is actually a good example of these usability issues. Perhaps I have an unusually strong preference for words, but I think its still very brief Wikipedia article should allow one to understand exactly what Webjay is in under a minute.1 If I were visiting the Webjay site for the first time, I’d need to click around awhile to figure the service out — and Webjay’s interface is very to the point, unlike many other sites. Years from now I’d expect webjay.org to be a yet another site — or since the Yahoo! acquisition, to redirect to some Yahoo! property — or the property of whatever entities own Yahoo! in the future. (Smart browser integration with the ’s Wayback Machine could mitigate this problem.)

Anyway, I predict that in the forseeable future your browser will be able to convert a Wikipedia article link into a home page link if that is your preference, aided by Semantic Mediawiki annotations or similar.

The second reason I link to Wikipedia preferentially2 is that Wikipedia article URLs conveniently serve as “, as specified by the . If Technorati and its competitors happen to index this blog this month, it will show up in their tag-based searches, the names of the various Wikipedia articles I’ve linked to serving to name tags. I’ve never been enthusiastic about the overall utility of author applied tags, but I figure linking to Wikipedia is not as bad as linking to a tagreggator.

Also, Wikipedia serves as a tag disambiguator. Some tagging service is going to use Wikipedia data to disambiguate, cluster, merge, and otherwise enhance tags. I think this is pretty low hanging fruit — I’d work on it if I had concentration to spare.

Update: Chris Masse responds (see bottom of page). Approximate answer to his question: 14,000 links to www.tradesports.com, 17 links to en.wikipedia.org/wiki/Tradesports (guess where from). I’ll give Masse convention.

In the same post Masse claims that his own “following of Jakob Nielsen’s guidelines is responsible for the very high intergalactic popularity of my Internet presence.” How very humble of Masse to attribute the modest success of his site to mere guideline following rather than his own content and personality. Unfortunately I think there’s a missing counterfactual.

1 I would think that, having written most of the current Webjay article.

2 Actually my first link preference is for my past posts to this blog. I figure that if someone is bothering to read my ramblings, they may be interested in my past related ramblings — and I can use the memory aid.

XTech 2006 CFP deadline

Tuesday, January 3rd, 2006

I mentioned elsewhere that I’m on the program committe for XTech 2006, the leading web technology conference in Europe, to be held in Amsterdam May 16-19.

Presentation, tutorial and panel proposals are due in less than a week–January 9. If you’re building an extraordinary Web 2.0 application or doing research that Web 2.0 (very broadly construed) developers and entrepreneurs need to hear about, please consider submitting a proposal.

See the CFP and track descriptions.

Best tech, policy, and idea blogs of 2005

Saturday, December 31st, 2005

Only one of each, according to me, highly subjective:

Technology: Danny Ayers’ Raw provides one stop for very well done semantic web (and nearby) news and analysis, written at a level perfect for me. He also has a knack for posting about obscure (to me) topics I’ve wondered about recently, or will soon, most recently about accounting for whether something is known.

Policy: Ronnie Horesh doesn’t post all that often and his Social Policy Bonds Blog is mostly about one topic. Regardless of what you think of his proposed implementation, Horesh’s mantra, that policies should be subordinated to outcomes, is so simple, obvious, and rarely followed, that it needs to be heard around the world. Here’s to a great new year.

Ideas: Brad Templeton posts (mostly good) Brad Ideas. Many are moderately ambitious, few are crazy. Executives with more ambition than imagination (especially airline executives), please read Templeton’s blog. The most recent Brad Idea, that crash avoidance technology could be financially justified by lower insurance rates, is less concrete than most.

Sorry, no recommendations for celebrity gossip, sex, photo, conspiracy, spam/seo/marketing, or war blogs.

Darkfox

Tuesday, December 27th, 2005

I hate to write about software that could be vaporware, but AllPeers (via Asa Dotzler) looks like a seriously interesting darknet/media sharing/BitTorrent/and more Firefox extension.

It’s sad, but simply sending a file between computers with no shared authority nor intermediary (e.g, web or ftp server) is still a hassle. IM transfers often fail in my experience, traditional filesharing programs are too heavyweight and are configured to connect to and share with any available host, and previous attempts at clients (e.g., ) were not production quality. Merely solving this problem would make AllPeers very cool.

Assuming AllPeers proves a useful mechanism for sharing media, perhaps it could also become a lightnet bridge– as a Firefox extension.

Do check out AllPeers CTO Matthew Gertner’s musings on the AllPeers blog. I don’t agree with everything he writes, but his is a very well informed and well written take on open source, open content, browser development and business models.

Songbird Media Player looks to be another compelling application built on the (though run as a separate program rather than as a Firefox extension), to be released real soon now. 2006 should be another banner year for Firefox and Mozilla technology generally.

Lucas Gonze’s original lightnet post is now near the top of results for ‘lightnet’ on Google, Yahoo!, and MSN, and related followups fill up much of the next few dozen results, having displaced most of the new age and lighting sites that use the same term.

Annotating Wikipedia

Saturday, September 3rd, 2005

The Semantic MediaWiki proposal looks really promising.

Anyone who knows how to edit articles should find the syntax simple and usable:

Berlin is the captial of [[is capital of::Federal Republic of Germany|Germany]].

Berlin has about [[Population:=3.390.444|3.4 Mio]] inhabitants.

All that fantastic data, unlocked. (I’ve been meaning to write on post on why explicit metadata is democratic.) Wikipedia database dump downloads will skyrocket.

There are also interesting proposals under Wikidata as well (though big forms make me uneasy), but those mostly seem more applicable to new data-centric projects, while the Semantic MediaWiki proposal looks just right for the encyclopedia. Gordon Mohr’s Flexible Fields for MediaWiki proposal could probably serve both roles.

Once people get hooked on access to a semantic encyclopedia, perhaps they’ll want similar access to the entire web.

Via Danny Ayers.

Ontology is Underrated

Monday, August 8th, 2005

A couple months ago I checked to see if anyone had written the exact and obvious words “ontology is underrated” or “ontologies are underrated” in response to Clay Shirky’s somewhat overrated Ontology is Overrated. Nothing, and amazingly, still nothing (according to Google and Yahoo).

I don’t feel up to writing a real Ontology is Underrated essay, not least because I don’t have strong feelings either way, apart from seeing mischaracterization (link only tangentially relevant to subject of this post) put to rest.

Peter Merholz’s Clay Shirky’s Viewpoints are Overrated would be a pretty good start on a definitive Ontology is Underrated.

Ugly metadata deployed

Friday, June 3rd, 2005

Peter Saint-André, a good person for preferring the public domain and much else, writes about Creative Commons metadata:

It’d be cool if smart search engines could automagically find web pages that are offered under one of the Creative Commons licenses.

I agree, which is why we (I work for Creative Commons, though I do not speak for them in this publication) built a prototype in early 2004 and a more robust beta based on Nutch later that year. March this year brought Yahoo! Search for Creative Commons, very recently also added to Yahoo! Advanced Search. I predict more and better for CC and other potentially metadata-enhanced searches.

For reasons unknown to mere mortals like me, CC recommends placing some RDF in an HTML comment as the proper way to “tag” a web page (Uche explains more here). Well, gosh, who thought that up? Are we not in possession of fine XHTML metadata technologies like the <meta/> tag?

Aaron Swartz thought it up, for good reasons. You can find a brief explanation I believe written by Aaron here (linked at the Wayback machine for reference as the current documentation may change). However, this doesn’t capture the most important reason, which I’ve had the pleasure of explaining a gazillion times, e.g., here

A separate RDF file is a nonstarter for CC. After selecting a license a user gets a block of HTML to put in their web page. That block happens to include RDF (unfortunately embedded in comments). Users don’t have to know or think about metadata. If we need to explain to them that you need to create a separate file, link to it in the head of the document, and by the way the separate file needs to contain an explicit URI in rdf:about … forget about it.

and here

Requiring metadata placed in the HEAD of an HTML page will dramatically decrease metadata adoption. The only reason so much CC metadata is out there now is that including it is a zero-cost operation. When the user selects a license and copies&pastes the HTML with a license statement and button into their web page, they get the embedded RDF without having to know anything about it. Getting people to take extra steps to include or produce metadata is very hard, perhaps futile. I tend to believe that good metadata must either be a side effect of some other process (e.g., selecting a license) or a collaborative effort by an interested community (e.g., Amazon book reviews, Bitzi, DMoz, MusicBrainz) (leaving out the case of $$$ for knowledge workers).

in reply to people who want CC metadata included with web documents in various fashions. On that, see my recent reply to someone else suggesting the same method Peter proposes:

There are zillions of options for sticking metadata into a [X]HTML document. If you must use whatever you prefer. It is my concern to encourage dominant uses so that software can reliably find metadata. IMO there are now three fairly widely deployed schemes for CC licenses, not all mutually exclusive:

1. Embed RDF in HTML comment
2. rel=”license” attribute on <a href=”license-uri”>
3. <link> to an external RDF file

#1 is our legacy format, the default produced by licensing engine, very widely deployed
#2 is also now produced by licensing engine, has support of small-s semantic web/semantic XHTML people, and will be RDF-compatible via GRDDL eventually
#3 is used by other RDF apps and is only non-controversial means of including RDF with an XHTML document. Wikipedia publishes CC compatible metadata using this method

In the future we’ll probably add a fourth, which will replace #1 and #2 in license engine output, when it gets baked into a W3C standard, which is ongoing — http://www.formsplayer.com/notes/rdf-a.html

Yes, RDF embedded in HTML comments is a horribly ugly hack. Eventually it’ll be superseded. In the meantime, massive deployment wins. Sorry.

H C

Wednesday, March 23rd, 2005

This music had every cell and fiber in my body on heavy sizzle mode.

Thurston Moore on mixtapes, could be describing me listening to early Sonic Youth or one of my many ecstasy-inducing 120 minute cassettes that I’m mostly afraid to touch, really need to digitize. Yes, Moore relates it all to MP3, P2P, etc., sounding like he’s from the EFF:

Once again, we’re being told that home taping (in the form of ripping and burning) is killing music. But it’s not: It simply exists as a nod to the true love and ego involved in sharing music with friends and lovers. Trying to control music sharing - by shutting down P2P sites or MP3 blogs or BitTorrent or whatever other technology comes along - is like trying to control an affair of the heart. Nothing will stop it.

[Via Lucas Gonze.]

I’d like little more right now than to have Sonic Youth or one of Moore’s many avant projects to release some crack under a Creative Commons license. Had they already you could maybe find it via the just released Yahoo! Search for Creative Commons. (How’s that for a lame segue?)