Archive for March, 2005

BlogPulse Conversation Tracker

Tuesday, March 29th, 2005

BlogPulse Conversation Tracker comes closer to fulfilling my wish for a blogversation interface than anything I’ve seen before. Missing: ability to see presence of a blogthread outside the context of the conversation tracker.

I built a similar tool on top of Technorati’s API: DeepCosmos. It’s slower (was last night and ought to be; BlogPulse must be getting a flood of traffic now) than BlogPulse as it must recursively query Technorati and is harder to use as it requires obtaining a Technorati API key.

BlogPulse queries matching the two examples I gave in my DeepCosmos post:

I’m not exactly sure when BlogPulse launched its conversation tracker, but it’s getting lots of attention since the release of BlogPulse 2.0 yesterday. Technorati, PubSub, Feedster, et al: BlogPulse just raised the bar several notches. Do or die.

[Via Danny Ayers.]

Economic Neanderthals

Sunday, March 27th, 2005

What is wrong with this headline?

Did Use of Free Trade Cause Neanderthal Extinction?

See release at the U of Wyoming and a shorter version at Newswise and in Dutch, with pictures. Richard Horan, Erwin Bulte, Jason Shogren: “How trade saved humanity from biological exclusion: an economic theory of biological exclusion” in the Journal of Economic Behavior & Organization is apparently not online yet, though you’ll eventually be able to buy an outrageously priced copy here.

The claim is that Homo neanderthalensis were economic numbskulls, failing to trade or use division of labor, i.e., failure to cooperate. Homo sapiens traded, specialized, and out-competed Neanderthals in their hunting grounds, end of story for the Neanderthals. Sounds interesting and plausible.

So what is wrong with the headline above? The word free. That modifier only makes sense in contradistinction to protectionism and mercantilism. The humans just traded while the Neanderthals were presumably too stupid to trade. I doubt there was a Neanderthal Ross Perot, Pat Buchanan, or Dick Gephardt clamoring that to save jobs Neanderthal camps ought not trade with each other.

The best thing to come from this are two new epithets:

  • Anti-trade economics is Neanderthal Economics.
  • Anyone who advocates restriction on trade is an Economic Neanderthal.

Update 20050331: Arnold Kling points to an online copy of the paper:

The paper has a high ratio of superfluous math to convincing evidence.

H C

Wednesday, March 23rd, 2005

This music had every cell and fiber in my body on heavy sizzle mode.

Thurston Moore on mixtapes, could be describing me listening to early Sonic Youth or one of my many ecstasy-inducing 120 minute cassettes that I’m mostly afraid to touch, really need to digitize. Yes, Moore relates it all to MP3, P2P, etc., sounding like he’s from the EFF:

Once again, we’re being told that home taping (in the form of ripping and burning) is killing music. But it’s not: It simply exists as a nod to the true love and ego involved in sharing music with friends and lovers. Trying to control music sharing – by shutting down P2P sites or MP3 blogs or BitTorrent or whatever other technology comes along – is like trying to control an affair of the heart. Nothing will stop it.

[Via Lucas Gonze.]

I’d like little more right now than to have Sonic Youth or one of Moore’s many avant projects to release some crack under a Creative Commons license. Had they already you could maybe find it via the just released Yahoo! Search for Creative Commons. (How’s that for a lame segue?)

SemWeb, AI, Java: The Ontological Parallels

Friday, March 18th, 2005

“The Semantic Web: Promising Future or Utter Failure”, the panel I took part in at SXSW shed little light on the topic. Each panelist (including me) brought their own idiosyncratic views to bear and largely talked past each other. The overall SXSW interactive crowd seemed to tend toward web designers and web marketers, not sure about the audience for this panel. Some people, e.g., Chet Campbell, and others in person, apparently left with the impression that all of the panelists agreed that the semantic web is an utter failure (not my view at all).

Sam Felder and Josh Knowles have posted loose transcripts and Christian Bradford a photo of the panel.

The approximate (with links and a few small corrections) text of my introductory statement follows. I got a few laughs.

I want to draw some parallels between semantic web technologies and artificial intelligence and between semantic web technologies and Java.

AI was going to produce intelligent machines. It didn’t and since the late 80s we’ve been in an “AI winter.” That’s nearly twenty years, so web people who suffered and whined in 2001-3, your cup is more than half full. Anyway since then AI techniques have been used in all sorts of products, but once deployed the technology isn’t seen as AI. I mean, where are the conscious robots?

Semantic web technologies have a shorter history, but may play out similarly: widely used but not recognized as such. Machine “agents” aren’t inferring a perfect date for me from my FOAF profile. Or something. This problem is magnified because there’s a loose connection between sematnic web grand visions and AI. People work on both at MIT after all.

Now Java. Applets were going to revolutionize the web. In 1996! Applets didn’t work very well, but lots of people learned Java and it runs out Java is a pretty good solution on the server side. Java is hugely successful as the 21st century’s COBOL. Need some “business logic?” You won’t get fired for implementing it in Java, preferably using JDBC, JSP, JMX, JMS, EJB, JAXB, JDO and other buzzword-compliant APIs.

Semantic web technologies may be following a similar path. Utter failure to live up to initial hype in a sexy domain, but succeeding in the enterprise where the money is anyway. I haven’t heard anyone utter the word enterprise at this conference, so I won’t repeat it.

It turns out that semantic web technologies are really useful for data integration when you have heterogenous data, as many people do these days. Just one example: Oracle will support a “Network Data Model” in the next release of their database. That may sound like a throwback if you know database history, but it basically means explicit support for storing and querying graphs, which are the data model of RDF and the semantic web.

If you talk to a few of the people trying to build intelligent machines today, who may use the term Artificial General Intelligence to distinguish themselves from AI, you may get a feeling that AI research hasn’t really moved us toward the goal of building an AGI.

Despite Java’s success on the server it is no closer to being important on the web client than it was in 1996. It is probably further. If what you care about is sexy web browser deployment, all Java’s server success has accomplished is to keep the language alive.

Semantic web technologies may be different. Usefulness in behind the scenes data integration may help these technologies gain traction on the web. Why? Because for someone trying to make use of data on the web, the web is one huge heterogenous data integration problem.

An example of a project that uses RDF for data integration that you can see is mSpace. You can read a a paper about how they use RDF inside the application, but it won’t be obvious to an end user that they’re a semantic web technologies application, and that’s as it should be.

One interesting thing about mSpace is that they’re using a classical music ontology developed by someone else and found on SchemaWeb. SchemaWeb is a good place to look for semantic web schemas that can be reused in your project. Similarly, rdfdata.org is a good place to look for RDF datasets to reuse. There are dozens of schemas and datasets listed on these sites contributed by people and organizations around the world, covering beer, wine, vegetarian food, and lots of stuff you don’t put in your mouth.

I intended to close my statement with a preemption of the claim that use of semantic web technologies mandates hashing everything out in committees before deployment (wrong), but I trailed off with something I don’t recall. The committee myth came up again during the discussion anyway.

Perhaps I should’ve stolen Eric Miller’s The Semantic Web is Here slides.

Collective Market Intelligence

Wednesday, March 16th, 2005

At Etech yesterday morning Gary Flake of Yahoo! Labs said his organization has four research areas. I only remember three: collective intelligence, machine learning and (fairly obviously) text mining. After pointing people to Yahoo! Next, Flake launched the Tech Buzz Game. It’s a prediction market where participants bet funny money on future search traffic for keywords associated with a technology relative to keywords associated with competing technologies (e.g., the programming language market includes C, C#, C++, Java, and several others).

Either I or many game participants horribly understand how buzz scores are calculated. The game FAQ says:

The buzz score of a stock is the number of searches on any of the stock’s buzz words over the past seven days, as a percentage of all the stocks in the same market.

Yesterday Ruby was worth fifty percent more than any other language. I suspected that participants think the buzz score is a measure of relative change rather than of quantity. However, now I suspect people are voting for their favorite technologies rather than betting on results. Those players will lose on (every) Friday when prices are adjusted to reflect actual buzz score.

I wonder how weekly revaluation will impact the ability to gauge long term predictions? If participants expect a technology to become more popular over the next year (with an attendant increase in search traffic), how will the corresponding security behave week to week? Would traders consistently bidding the price of a security up and losing money at each weekly revaluation be the predictor of a long term increase in search traffic?

Though it feels toy-like, I’m gratified that this prediction market is considered a collective intelligence application. I often hear people saying that humanity needs to increase intelligence to have any hope of surviving whatever dangers are supposedly near, usually accompanied by complete ignorance of markets’ role as a distributed discovery mechanism and the potential for markets designed explicitly for information discovery.

In other idea futures news, check out open source market infrastructure to be Zocalo and its motivating proposal, to be developed at CommerceNet Labs.

Update 20050318: I was correct about scoring and revaluation. I made a 150% funny money profit after today’s revaluation, before which I had a loss. I made no trades after becoming fully invested. Will be interesting to see what happens in the next week. Will the Buzz Game merely be a “day trading” and game-rules-ignorance-arbitrage phenomenon? I suspect so. Too bad. A market structured to make predictions about technology success would be really interesting.

Snap Associative Decision Recall

Sunday, March 13th, 2005

Malcolm Gladwell gave an interesting afternoon keynote at SXSW today. Many others have already published extensive notes, including Matt May, Liz Lawly, Scott Benish, Tony, and Nancy White.

My two point summary of Gladwell:

  • Snap decisions play a much greater role than you’d think.
  • More information does not make for better snap decisions.

I can’t help but think there is some connection between the importance of and our ability to make snap judgements and Jeff Hawkins’ claims in On Intelligence for the primary importance of auto-associative memory and prediction recall (as opposed to computation). A brain that works as Hawkins describes should be fantastic at making snap decisions and a brain should do lots of whatever it excels at.

I’m only halfway through On Intelligence (excellent so far) and haven’t looked at Gladwell’s Blink at all.

SemWeb not by committee

Sunday, March 13th, 2005

At SXSW today Eric Meyer gave a talk on Emergent Semantics. He humorously described emergent as a fancy way of saying grassroots, groundup (from the bottom or like ground beef), or evolutionary. The talk was about adding rel attributes to XHTML <a> elements, or the lowercase semantic web, or Semantic XHTML, of which I am a fan.

Unfortunately Eric made some incorrect statements about the uppercase Semantic Web, or RDF/RDFS/OWL, of which I am also a fan. First, he implied that the lowercase semantic web is to the Semantic Web as evolution is to intelligent design, the current last redoubt of apolgists for theism.

Very much related to this analogy, Eric stressed that use of Semantic XHTML is ad hoc and easy to experiment with, while the Semantic Web requires getting a committee to agree on an ontology.

Not true! Just using rel="foo" is equivalent to using a http://example.com/foo RDF property (though the meaning of the RDF property is better defined — it applies to a URI, while the application of the implicit rel property is loose).

In the case of more complex formats, an individual can define something like hCard (lowercase) or vCard-RDF (uppercase).

No committee approval is required in any of the above examples. vCard-RDF happens to have been submitted to the W3C, but doing so is absolutely not required, as I know from personal experience at Bitzi and Creative Commons, both of which use RDF never approved by committee.

At best there may be a tendency for people using RDF to try to get consensus on vocabulary before deployment while there may be a tendency for people using Semantic XHTML to throw keywords at the wall and see if they stick (however, Eric mentioned that the XFN (lowercase) core group debated whether to include me in the first release of their spec). Neither technology mandates either approach. If either of these tendencies to exist, they must be cultural.

I think there is value in the ad hoc culture and more importantly closeness of Semantic XHTML assertions to human readable markup of the lowercase semantic web and the rigor of the uppercase Semantic Web.

It may be useful to transform a rel="" assertions to RDF assertions via GRDDL or a GRDDL-inspired XMDP transformation.

I will find it useful to bring RDF into XHTML, probably via RDF/A, which I like to call Hard Core Semantic XHTML.

Marc Canter as usual expressed himself from the audience (and on his blog). Among other things Marc asked why Eric didn’t use the word metadata. I don’t recall Eric’s answer, but I commend him for not using the term. I’d be even happier if we could avoid the word semantic as well. Those are rants for another time.

Addendum: I didn’t make it to the session this afternoon, but Tantek Çelik‘s slides for The Elements of Meaningful XHTML are an excellent introduction to Semantic XHTML for anyone familiar with [X]HTML.

Addendum 20050314: Eric Meyer has posted his slides.

SXSW & Etech

Saturday, March 12th, 2005

I’m in Austin now through Monday for SXSW and in San Diego Tuesday through Thursday for Etech. I’m sad that I won’t be around for any music showcases this year and that I have to leave Austin for one of my less favorite places, but Etech is the better conference.

I’m helping Matt Haughey with a SXSW panel, The Semantic Web: Promising Future or Utter Failure (I’ll be the SemWeb technologies advocate) and an Etech session, Remixing Culture with RDF: Running a Semantic Web Search in the Wild.

Creative Commons will have other events and a party at SXSW.

Open Source P2P: No Malware, EULA

Wednesday, March 9th, 2005

Ben Edelmen asks what P2P programs install what spyware and answers with a Comparison of Unwanted Software Installed by P2P Programs. Of the five programs analyzed, four (eDonkey, iMesh, Kazaa, and Morpheus) install malware or even more malware and come with voluminous End User License Agreements. LimeWire installs no additional software and has no EULA.

The comparison currently doesn’t note that only one of the five programs is open source: LimeWire. Note that LimeWire, like the others, is produced by a company that pays developers, so being commercial is no excuse for the others.

What about other open source P2P applications? I installed the current versions of BitTorrent, eMule, Phex, and Shareaza. No bundled software. BitTorrent has no installation interface to speak of, and no EULA. The others ask the user to agree to the GNU General Public License, which concerns freedoms associated with the program source code, not obtaining permission for the program to do whatever it wants with the user’s computer and data.

Each of the open source programs (excepting BitTorrent, which is a different kind of P2P app) has the same features as the proprietary P2P apps listed above. All of the open source programs lack the spyware anti-features of their proprietary equivalents.

Notice a trend?

If you want to keep control of your computer and your data, stick to open source. The threat is very real. I’ve seen friends’ computers (particularly those used by teenagers) with proprietary P2P programs that had dozens of distinct malware programs installed and were completely unusable (browsing porn sites with Internet Exploder, which teens are apparently really keen on doing, doesn’t help either; get FireFox already).

[Via Boing Boing.]

Bitcollider-PHP

Saturday, March 5th, 2005

Here you’ll find a little PHP API that wraps the single file metadata extraction feature of Bitzi’s bitcollider tool. Bitcollider also can submit file metadata to Bitzi. This PHP API doesn’t provide access to the submission feature.

Other possibly useful code provided with Bitcollider-PHP:

  • function hex_to_base_32($hex) converts hexidecimal input to Base32.
  • function magnetlink($sha1, $filename, $fileurl, $treetiger, $kzhash) returns a MAGNET link for the provided input.
  • magnetlink.php [file ...] is a command line utility that outputs MAGNET links for the files specified, using the bitcollider if available (if not kzhash and tree:tiger are not included in MAGNET links).

Versions of this code are deployed on a few sites in service of producing MAGNET links or urn:sha1: identifiers for RDF along these lines, both in the case of CC Mixter.

Criticism welcome.