Post Bitzi


Saturday, March 5th, 2005

Here you’ll find a little PHP API that wraps the single file metadata extraction feature of Bitzi’s bitcollider tool. Bitcollider also can submit file metadata to Bitzi. This PHP API doesn’t provide access to the submission feature.

Other possibly useful code provided with Bitcollider-PHP:

  • function hex_to_base_32($hex) converts hexidecimal input to Base32.
  • function magnetlink($sha1, $filename, $fileurl, $treetiger, $kzhash) returns a MAGNET link for the provided input.
  • magnetlink.php [file ...] is a command line utility that outputs MAGNET links for the files specified, using the bitcollider if available (if not kzhash and tree:tiger are not included in MAGNET links).

Versions of this code are deployed on a few sites in service of producing MAGNET links or urn:sha1: identifiers for RDF along these lines, both in the case of CC Mixter.

Criticism welcome.

Ordinary Submissions

Thursday, November 25th, 2004

Two bloggers on using Bitzi.

Neil Turner recommends “Wonderful Life” by Ordinary People feat. Tina Cousins. Apparently the track is hard to find. Having lost a copy once due to disk troubles, Neil submitted the file to Bitzi. Now others can use the Bitzi ticket to find the file he recommends (and Neil will have an easier time in the event of another storage failure). I was pleasantly surprised to find that after a few tries I could download the exact file from two Gnutella hosts. Unfortunately dance music and electronica aren’t my thing.

Fareed of Cairo and “survivor of a car crash” writes about checking file integrity:

Have you ever downloaded files and you were not sure of it’s integrity. Now their is a way to be sure, a site called Bitzi allows you to check file integrity. Users can submit using a program called Bit Collider file hashes to the site or verify that the integrity is ok.

So Bitzi can help identify good and bad files. Good luck avoiding future hard disk and car crashes!

MusicBrainz Discovery (II)

Friday, October 15th, 2004

Continuation of MusicBrainz Discovery (I).

One notable thing about MusicBrainz is that Rob Kaye and a small number of core developers and supporters have pursued a consistent vision for roughly six years with very little funding or even understanding outside this small group. It isn’t easy to really “get” MusicBrainz (I think it took me two years), though I think that at some point in the next few years everyone will “get” MusicBrainz more or less all at once.

If you’re a geek it’s hard not to get hung up on MusicBrainz use of acoustic fingerprint-based technology. Acoustic fingerprinting is fragile in three ways — it is subject to false positives and false negatives, there is no open source implementation of the concept, and the technology MusicBrainz uses, Relatable TRM, is proprietary and requires a centralized server. Indeed, many of the technology questions at Tuesday’s music metadata panel concerned acoustic fingerprints.

It is important to understand that while MusicBrainz uses acoustic fingerprints, it does not rely on them. TRM matching is just one mechanism for track identification. File metadata included in (e.g., ID3 tags) or with (filename) the file can and I believe are used to match existing records, as could track duration and file hashes (see if Bitzi or a P2P network has any metadata for the file in question). Additionally, file identification is only one component of MusicBrainz.

If you’re not a geek, you won’t notice acoustic fingerprints, because you wouldn’t, and because you’re not likely to get that far. So what the heck does MusicBrainz do? Here’s an attempt:

  • MusicBrainz can organize your music collection. Download the tagger.
  • MusicBrainz uniquely identifies artists, albums, and songs, facilitating rich and precise music applications, all on a level playing field.
    • Not at all speculative potential: include a MusicBrainz song identifier in a blog post, cover art (with your Amazon referrer of course) automagically appears in blog post, blog aggregator publishes top n lists and personalized recommendations.
    • Another: publish a playlist of MusicBrainz identifiers and others can recreate the experience so defined with no file transfer involved.
    • There are several others, some that could be offered by MusicBrainz itself, outlined in MusicBrainz tomorrow. I have to quote one because it’s fun:

      Music Genealogy: MusicBrainz may keep track of which artists/performers/engineers contributed to a piece of music, and when these contributions took place. Combining this contribution data with data on how artists influenced each other will create a genealogy of modern music. Imagine being able to track Britney Spears back to Beethoven!

  • The MusicBrainz database, created by the community, will remain free, unlike others.

Having been around for awhile, MusicBrainz has run into many of the technical and social problems inherent in music metadata and an evolving community website, and produced much good documentation on solutions, realized and potential. Here’s a sampling:

By the way, as of Wednesday MusicBrainz has a blog.

MusicBrainz Discovery (I)

Wednesday, October 13th, 2004

Earlier this evening I gave a brief introduction (slides PDF) to MusicBrainz at SDForum’s Emerging Technology SIG meeting on music metadata in the stead of MusicBrainz founder and leader Rob Kaye, who couldn’t make it up to Palo Alto. (I’m fairly familiar with MusicBrainz, having worked with Rob at Bitzi and getting updates when we cross paths in this small world.)

If I could pick a theme for the meeting (which included two other very interesting speakers — Stephen Bronstein of the Independent Online Distribution Alliance and David Marks of Loomia), and for recent months in general, it would be that in case you haven’t noticed, it’s clearly now a discovery problem, not a delivery problem.

SIG leader William Grosso led off with some quotes from the much-discussed Wired magazine article The Long Tail, which seems to have captured this zeitgeist. (Grosso also had a novel to me presentation technique — a slideshow of potentially relevant slides plays while he speaks, and if a slide happens to be relevant to the current sentence, he uses the slide to augment the point. Is there a name for this?)

Obviously there was tremendous interest in Creative Commons in this context, and several people seemed to be happy to learn of CC’s search engine and the great services and products offered by the Internet Archive (free hosting for CC-licensed audio and video, built in format conversion), Magnatune (all CC-licensed music label) and more.

Unfortunately in the eleven years I’ve been in the SF bay area I only definitively recall attending two previous SDForum events — a 1994 talk by Atari Jaguar developers in San Jose and in 2001 an evening with Phil Zimmermann in San Francisco (I suspect others who were there would deem the “an evening with” cliche appropriate in this case). This evening’s meeting was a total geekfest. I hung around for well over an hour commiserating on all manner of software development topics (I think that’s what “SD” stands for) with a number of hardcore geeks (no whatever-Dilbert’s-boss’s-name-is there) while two guys were lauging their asses off whiteboarding issues with Unicode encoding (as far as I could tell). I’ll have to go back.

More about what I’ve learned about MusicBrainz over the years and in preparing for the evening in a future post.

Update: part 2

Morpheus with Bitzi “anti-spoofing”

Thursday, October 7th, 2004

Morpheus, a popular filesharing client at version 4.5:

Includes free Bitzi anti-spoofing look-ups to download only the files you want.

That’s an accurate description (and has been true since at least Morpheus 4.1.1). If you see something you’re interested in downloading in search results, you’re a right-click away from a Bitzi lookup. If Bitzi users have judged the file to be spam, virus-laden, or corrupted, you’ll get a response similar to this:

bad file

Thirty Bitzi users have judged this file as dangerous or misleading (one bozo recommended the file). A few of the judgement notes tell the story:

Every time I write something,it will come up in my searching files. The size is 105,2 kb every time.


stupid spam, coming from….discuises itself under many wma files

If Bitzi users have judged the file you’re interested in to be worthy, you’d see something similar to this:

good file

That file happens to be the current release for Windows of Bitzi’s bitcollider file metadata collection tool.

Derek Slater notes that Bitzi metadata is itself subject to spoofing, and

even if Bitzi helps people sort out spoofs, the technological arms race will continue.

Very true. Bitzi is dependent upon community policing, and a concerted effort to create dangerous files and submit fraudulent judgements to Bitzi would work, at least for awhile. There are steps Bitzi can take to militate against such attacks should they become a problem. Unfortunately, as I noted recently, development proceeds at a bear in winter’s pace.

It should also be noted that Morpheus is just one of several applications that enable Bitzi lookups or submissions, though most of these send users to a Bitzi web page rather than integrating raw data from Bitzi lookups into their user interface as Morpheus has (see screenshots above).

Best Bitizen

Monday, October 4th, 2004

After over three and a half years I am finally the best bitizen, as defined by a formula that takes into account the amount of file metadata contributed to Bitzi and the quality of that metadata, as rated by other “bitizens” (Bitzi users):

How did I obtain this dubious (as a Bitzi cofounder) honor?

I’ve more or less consistently reported some of the random junk I may have encountered (though around 30 Bitzi users have reported more, all over a shorter active period) and more importantly, have occasionally taken care to add accurate metadata to reports.

On the negative side, I’ve more or less consistently failed to put much effort into adding new features since Bitzi went into hibernation. If I had, doubtless many more prolific than I (nearly everything I download is from the web — file sharing networks, especially post-Napster (really post-AudioGalaxy), are still practically useless in my estimation, unless you have lots of time to kill, i.e., you’re a bored teenager) would’ve stuck around and I’d be nowhere near numero uno.

Download the bitcollider (file metadata reporting tool) and knock me off my throne!

A pin maybe found in a haystack

Monday, August 16th, 2004

Ed Felter spreads a SHA-1 hash collision rumor.

Eric Rescorla does a good job of explaining that even if true, a collision is not of great practical import.

Why? Most uses of SHA-1, including Bitzi, rely on the practical impossibility of finding content that will generate a specific hash.

A collision merely means that two pieces of content (“messages” in crypto-speak) have been found that generate the same arbitrary hash.

For reasons that aren’t all that intuitive, it is much harder to find a specific match than an arbitrary collision.

I think in day-to-day experience a good analogue would be this: it’s pretty easy to find odd coincidences if you look. If you know how to conjure up specific odd coincidences on demand, tell me.

All that said, Bitzi also uses the Tiger hash, which is not from the same family as SHA-1, as an insurance policy among other things.

Disclaimer: I am not a crypto expert. If true this rumor may be huge news for crypto theorists.


Wednesday, June 23rd, 2004

Limacat discovers and understands Bitzi, calls it “the most important stuff” (um, found in one browsing session) and “it is possible that partial support for Bitzi will land in PMX.” Limacat didn’t provide a link for PMX, so I had to dig: Ladies and Gentlemen: Personal Metadata Exchanger!. Looks like vapor so far.

DirectConnect increment[al download verification]

Thursday, March 4th, 2004

Slyck reports on a major DirectConnect upgrade. DirectConnect hasn’t seen much interest from the press or technologists, but it does have a significant userbase, with 215,880 users currently online according to the Slyck home page, slightly smaller than Gnutella’s 234,618. I have no idea how Slyck obtains those numbers.

Anyhow, it is good to see that DirectConnect has adopted file hashing, specifically THEX (Tree Hash EXchange) using the Tiger hash. This allows DirectConnect clients to find exact alternate download sources and to verify downloads as they progress and opens to door to future MAGNET and Bitzi lookup support.

Here’s a MAGNET link that includes the tiger tree root (second component of the bitprint) and a corresponding Bitzi info lookup by urn:tree:tiger. (631.9MB)

Creative Commons Moving Image Contest Winners

Saturday, February 28th, 2004

Announced today. Copied from the Creative Commons home page:

We’re very happy to announce the winners of the GET CREATIVE! Moving Images Contest: First Place goes to Justin Cone, for the inspired and powerful short film “Building on the Past,” which uses all sorts of Prelinger Archives footage to great effect. Second Place: Sheryl Seibert, for “Mix Tape,” which perfectly captures the found-art ethos of Creative Commons and uses the Creative Commons-licensed song “Mix Tape” by Jim’s Big Ego. Third Place: Kuba and Alek Tarkowski, for “CCC,” a historical look at free culture. Check them out, download them, mirror them, share them with friends. Thanks to all of you who made submissions!

The first place entry is really good, though my favorite scene is midway through the third placer — “a mutation of the system, if you will.”

MAGNET/Bitzi links for easy sharing and info: (7.0MB) (31.8MB)
Kuba_and_Alek_Tarkowski_-_CCC.mpg (14.9MB)