Archive for December, 2004

Deployment Matters

Thursday, December 30th, 2004

Most popular descriptions of why BitTorrent works so well are off the mark, The BitTorrent Effect in the current Wired Magazine included. Excerpts:

The problem with P2P file-sharing networks like Kazaa, he reasoned, is that uploading and downloading do not happen at equal speeds. Broadband providers allow their users to download at superfast rates, but let them upload only very slowly, creating a bottleneck: If two peers try to swap a compressed copy of Meet the Fokkers – say, 700 megs – the recipient will receive at a speedy 1.5 megs a second, but the sender will be uploading at maybe one-tenth of that rate.

Paradoxically, BitTorrent’s architecture means that the more popular the file is the faster it downloads – because more people are pitching in. Better yet, it’s a virtuous cycle. Users download and share at the same time; as soon as someone receives even a single piece of Fokkers, his computer immediately begins offering it to others. The more files you’re willing to share, the faster any individual torrent downloads to your computer. This prevents people from leeching, a classic P2P problem in which too many people download files and refuse to upload, creating a drain on the system. “Give and ye shall receive” became Cohen’s motto, which he printed on T-shirts and sold to supporters.

Sites like Kazaa and Morpheus are slow because they suffer from supply bottlenecks. Even if many users on the network have the same file, swapping is restricted to one uploader and downloader at a time.

Most home and many business broadband connections are asymmetric — the downstream pipe is much fatter than the upstream pipe. That’s a problem any net application that requires significant upstream bandwidth has to contend with. There is no protocol solution. A BitTorrent client can’t upload any faster than a Gnutella client.

Kazaa, eDonkey and various Gnutella clients (e.g., LimeWire) have incorporated multisource/swarming downloads for three years, and the latter two also use partial file sharing (I’m not sure about Kazaa and PFS). These two key features — download from multiple peers, and begin uploading parts of a file before you’ve completed downloading — don’t set BitTorrent apart, though it may be slightly ahead in pushing the state of the art (I haven’t examined the protocols side by side).

So why does BitTorrent work so well? Deployment.

Gnutella et al users start a client and typically download and share files in one or more local directories. All files a user has collected are shared simultaneously. A client connects to random peers and accepts random queries and requests to download any files the client is sharing. There are significant refinements, e.g., ultrapeers and supernodes.) Downloads will be spread across a huge number of files and peers downloading the same file won’t necessarily know about each other (and thus won’t be able to upload to each other while downloading). Again, there are significant refinements — Gnutella peers maintain a list of other peers sharing a given file — knows as an alternate location download mesh.

For BitTorrent such refinements are superfluous. A BitTorrent user finds a “torrent” for a file the user wants to download, a BitTorrent client is launched and connects to a tracker specified by the torrent. All clients connecting to a tracker via a single torrent are by definition all downloading the same file, and they know about each other — the ideal situation for swarming distribution. And here’s the key to BitTorrent’s success in spite of typically limited upload rates: Because users are sharing only one or a few files — the one(s) they’re downloading — their precious upstream bandwidth is used to enhance a virtuous cycle in which everyone downloading the same file downloads faster.

This ideal situation also allows BitTorrent to utilize tit-for-tat (PDF) leech resistance — “Give and ye shall receive” above. A typical filesharing client can’t effectively use this strategy as it is unlikely to have multiple interactions with the same peer, let alone simultaneous mutually beneficial interactions.

There are technologies (possibly Distributed Hash Tables), tweaks (a client giving preference to uploads of files the client has recently downloaded has been proposed) and practices (encourage filesharing client users to initiate downloads from editorially controlled lists of files rather than via ad hoc peer searches) that can be implemented in typical filesharing clients to make the average user’s download experience better, perhaps someday approaching the average BitTorrent user’s experience when downloading a popular file.

There’s also lots of talk about decentralizing BitTorrent. See eXeem, supposedly to be released in a few weeks. It appears that eXeem will attempt to keep BitTorrent’s beneficial characteristics by limiting files shared to those a user has obtained or created a .torrent file for — perhaps similar to a hypothetical Gnutella client that only shared files for which it had alternate sources. I don’t have high hopes for the decentralized bits of eXeem, whatever they turn out to be. It may serve as a decent standard BitTorrent client, but there’s no need for another of those, and eXeem will supposedly be riddled with spyware.

Flip a coin, don’t recount, revote, and litigate

Thursday, December 30th, 2004

Votes for Washington state governor cast in November have now been counted three times. One candidate won the first two counts (first by 261 votes, then 42) , the other won the second recount by 129 votes. Dino Rossi (Republican), the candidate who won the first two counts, wants a revote and is threatening litigation. Christine Gregoire (Democrat), who won the second recount, says a revote would waste $4 million, according to Rossi urges revote to fix “mess” in the Seattle Times.

Why re-count, re-vote, or litigate over same at all? Change whatever rule determines that a recount is necessary to mandate a coin flip instead of a recount.

It turns out that many people have suggested flipping a coin in this election and in the past (e.g., Ralph Nader on Florida in 2000), including several stories and columns from the Seattle Times, but I don’t gather that anyone is seriously considering the option. “Counting every vote” is too sacred.

However, “counting every vote” is itself subject to random errors. If an election is close enough for a recount, the recount won’t necessarily be any more accurate than a coin flip.

Apparently many jurisdictions allow flipping a coin or drawing lots to determine the winner of close elections in low-stakes contests. Given that recounts are no more accurate than coin flips, chance should be allowed to determine the outcome of any close election, especially “high stakes” elections, as those will be the most costly to recount, revote, and litigate.

I’d be happy to replace most elections entirely with random selection, which in my view would be far more democratic and far less costly than the current system.

Individual Rights Management

Wednesday, December 29th, 2004

Cory Doctorow correctly lambastes those soft on DRM for the umpteenth time. The following excerpt sparked a thought:

DRM isn’t protection from piracy. DRM is protection from competition.

Reminds me of airport “security” and similar. In the essay IDs and the illusion of security Bruce Schneier makes a case (not nearly as forcefully as can be done) that

Identification and profiling don’t provide very good security, and they do so at an enormous cost.

I’d argue that most measures justified by “security” actually make us less secure, in part because of their enormous cost. Another time.

Anyway, I think there’s a nice (ugly) symmetry in the arguments of apologists for Digital Restrictions Management and the national security state. Both are really much about restricting competition.

[Schneier link via Anton Sherwood.]

Lexus, Mercedes, Porsche

Wednesday, December 29th, 2004

Tyler Cowen cites a Harper’s Index factoid:

Number of American five-year-olds named Lexus: 353

One of them works at Raisins, featured in the first South Park episode I ever watched and still my sentimental favorite. Every kid should watch this episode. If it is available on DVD I can’t find it, but search for “South Park 714” or “South Park Raisins” on any filesharing network — South Park episodes are among the most shared content.

Also see Christian Hard Rock, which tackles filesharing. Almost every episode is well worth watching for kids and adults. Skip the movie, it sucks ass.

Don’t Forget Your Turmeric

Wednesday, December 29th, 2004

Betterhumans cites a study that found curcumin (turmeric, the spice that makes curry yellow) inhibits the accumulation of destructive beta-amyloid plaques and breaks up existing plaques in genetically altered mice.

Perhaps this helps explain the possible very low incidence of alzheimer’s in India. Much more study needed.

For years I have eaten lots of turmeric, which may be added to just about any food. Yum. I doubt my father’s mother has eaten any apart from tiny amounts used in food coloring. Unfortunately she’s far past being helped by any crude dietary intervention. Happy holidays to the husk of Victoria Ulakey.

ccPublisher 1.0

Monday, December 27th, 2004

Nathan Yergler just cut ccPublisher 1.0, a Windows/Mac/Linux desktop app that helps you license, tag, and distribute your audio and video works. I’m very biased, but I think it’d be a pretty neat little application even if it weren’t Creative Commons centric.

  • It’s written in Python with a wxPython UI, but is distributed as a native windows installer or Mac disk image with no dependencies. Install and run like any other program on your platform, no implementation leakage. Drag’n’drop works.
  • Also invisible to the end user, it uses the Internet Archive’s XML contribution interface, ftp and CC’s nascent web services.
  • RDF metadata is generated, hidden from the user if published at IA, or available for self-publishing, ties into CC’s search and P2P strategies.

Python and friends did most of the work, but the 90/10-10/90 rule applied (making a cross platform app work well still isn’t trivial, integration is always messy, and anything involving ID3v2 sucks). Props to Nathan.

Version 2 will be much slicker, support more media types, and be extensible for use by other data repositories.

Addendum 2005-01-12: Check out Nathan’s 1.0 post mortem and 2.0 preview.

N-level blog entry references

Monday, December 27th, 2004

Dear LazyWeb,

Bloglines, Technorati and probably others do a passable job of presenting direct references to a blog entry. (Minor complaints: With Bloglines you have to subscribe to a feed or preview with a “siteid” internal to Bloglines; if your blog has multiple duplicative feeds (e.g., rdf/rss2/atom) direct entry references only appear for one of the feeds; Bloglines makes no attempt to consolidate or allow feed owners to consolidate; Technorati appears to use screen scraping and picks up some garbage along the way.)

So here’s my LazyWeb request:

I want to know, without lots of extra clicking, not just resources that directly cite blog entry A, but resources that cite resources that cite blog entry A, and so on. In the context of Bloglines, instead of “2 references” I want to see “2 direct references, 3 level1 indirect references, 1 level2 indirect reference”, “6 references, 2 direct” or similar, and I want to be able to see all references, direct and indirect, on a single page. In the context of Technorati, indirect references could optionally be part of a “watchlist” feed.

Will Bloglines, Technorati, or some up and coming aggregation service please do this?

I’m not terribly interested in visualization of social networks implied by blogs (blogosphere visualization, blogversation maps?) or even blogthread visualization and the like here. Neat, but too heavyweight to use daily. I just want a small feature increment.

Search 2005

Thursday, December 23rd, 2004

Many of John “Searchblog” Battelle’s predictions for 2005 seem like near certainties, e.g., a fractious year for the blogosphere and trouble for those who expect major revenues from blogging.

Two trends I hope 2005 proves that Battelle’s predictions missed:

Metadata-enhanced search. Will be ad hoc and pragmatic, pulling useful bits from private sources and people following officious Semantic Web and lowercase semantic web practices.

Proliferation of niche web scale search engines. Anyone can be a small-scale google, crawling the portions of the web cared about and offering search options specific to a niche. The requisite hardware and bandwidth are supercheap and the Nutch open source search engine makes implementation trivial.

The Creative Commons search engine is a harbinger of both trends.

Battelle’s look ahead spans the web, not just web search. Possibly the biggest trend missing from his list is the rise of weblications. Egads, I have to learn DHTML, and it isn’t 1997!

A few of my near certainties: lots of desktop search innovation, very slow progress on making multimedia work with the web and usable security, open source slogs toward world domination, and most things get cheaper and more efficient.

Center for Decentralization

Wednesday, December 15th, 2004

This evening I had the pleasure of attending an open house for the CommerceNet Labs center for decentralization or Zlab. I’ve been meaning to write about Zlab for awhile, and I’m taking advantage of tonight’s event to write without having anything to say.

If I may boil down Zlab’s aim to one paraphrase: Make software that works the way a fully decentralized society would work.

Check out their The Now Economy for a flurry of deep items concerning decentralized commerce and net infrastructure, lab projects that abet the above aim and publications. Of personal interest, see Nutch: A Flexible and Scalable Open-Source Web Search Engine, which uses the Creative Commons search engine to demonstrate how a Nutch plugin is implemented.

Calorie Restriction vs. Accelerating Change

Tuesday, December 14th, 2004

Over a month ago I attended Accelerating Change 2004. I agree with Peter McCluskey’s take: an unexpected but mostly well done and welcome focus on current developments and lots of excitement about virtual worlds, Second Life in particular. Virtual words offer a low cost platform for economic, social and even physical object experimentation, prototyping and more. Virtual worlds are the future! Pity I never was much of an enthusiast for MUDs or video games, so I have a couple decades’ worth of catching up to do.

Had it not been held across the continent (South Carolina) I would’ve preferred to attend the conflicting 3rd Annual Calorie Restriction Society Conference. The first two CR conferences were lots of fun, with a good mix of learning from CRONies (CRON is Calorie Restriction with Optimal Nutrition) far more serious than me and talks by academics studying CR and aging mechanisms.

Dean Pomerleau, whose site is well worth visiting, took notes on approximately every CR 2004 talk: day 1, day 2, and day 3.