Archive for the ‘P2P’ Category

Supply-side anti-censorship

Friday, February 17th, 2006

Brad Tempelton explains why a censor should want an imperfect filter — it should be good enough to keep verboten information from most users, but easy enough to circumvent to tempt dissidents, so they can be tracked and when desired, put away.

In the second half of the post, Tempelton suggests some anti-censor techniques: ubiquitous and . Fortunately he says these are “far off” and “does not scale”, respectively. To say the least, I’d add.

Cyber-activists have long dreamed that strong encryption would thwart censorship. is an example of a project that uses this as its raison d’ĂȘtre. While I’m a huge fan of ubiquitous encryption and decentralization (please install , now!), these seem like terribly roundabout, means of fighting censorship — the price of obtaining information, which includes the chance of being caught, is lowered. But someone has to seek out or have the information pushed to them in the first place. If information is only available via hidden channels, how many people will encounter it regardless of lower risk?

An alternative, perhaps less sexy because it involves no technology adoption, is supply-side anti-censorship: make verboten information ubiquitous. Anyone upset about google.cn should publish information the Communist Party wants censored (my example is pathetic, need to work on that). This is of course not mutually exclusive with continuing to carp and dream of techno-liberation.

I guess I’m calling for projects. Or one of those chain letters (e.g, “four things”) that plagues the blogosphere.

content.exe is evil

Thursday, February 16th, 2006

I occasionally run into people who think users should download content (e.g., music or video) packaged in an executable file, usually for the purpose of wrapping the content with where the content format does not directly support DRM (or the proponent’s particular DRM scheme). Nevermind the general badness of Digital Restrictions Management, requiring users to run a new executable for each content file is evil.

Most importantly, every executable is a potential vector. There is no good excuse for exposing users to this risk. Even if your executable content contains no malware and your servers are absolutely impenetrable such that your content can never be replaced with malware, you are teaching users to download and run executables. Bad, bad, bad!

Another problem is that executables are usually platform-specific and buggy. Users have enough problem having the correct codec installed. Why take a chance that they might not run Windows (and the specific versions and configurations you have tested, sure to not exist in a decade or much less)?

I wouldn’t bother to mention this elementary topic at all, but very recently I ran into someone well intentioned who wants users to download content wrapped in , if I understand correctly for the purposes of ensuring users can obtain content metadata (most media players do a poor job of exposing content metadata and some file formats do a poor job of supporting embedded metadata, not that hardly anyone cares — this is tilting at windmills) and so that content publishers can track use (this is highly questionable), all from a pretty cross platform GUI. A jar file is an executable Java package, so the platform downside is different (Windows is not required, but a Java installation, of some range of versions and configurations, is), but it is still an executable that can do whatever it wants with the computer it is running on. Bad, bad, bad!

The proponent of this scheme said that it was ok, the jar file could be . This is no help at all. Anyone can create a certificate and sign jar files. Even if a creator did have to have their certificate signed by an established authority it would be of little help, as malware purveyors have plenty of resources that certificate authorities are happy to take. The downsides are many: users get a security prompt (”this content signed by…”) for content, which is annoying, misleading as described above and conditions the user to not pay attention when they install things that really do need to be executable, and a barrier is raised for small content producers.

If you really want to package arbitrary file formats with metadta, put everything in a zip file and include your UI in the zip as HTML. This is exactly what P2P vendor ’s Packaged Media File format is. You could also make your program (which users download only once) look for specific files within the zip to build a content-specific (and safe) interface within your program. I believe this describes ’s Kapsules, though I can’t find any technical information.

Better yet put your content on the web, where users can find and view it (in the web design of your choice), you get reasonable statistics, and the don’t get fed. You can even push this to 81/19 by including minimal but accurate embedded in your files if they support it — a name users can search for or a URL for your page related to the content.

Most of the pushers of executable content I encounter when faced with security concerns say it is an “interersting and hard problem.” No, it is a stupid and impossible problem. In contrast to web, executable content is a 5/95/-1000 solution — that last number is a .

If you really want an interesting and hard problem, executable content security is the wrong level. Go work on platform security. We can now run sophisticated applications within a web browser with some degree of safety (due to Java applet and Flash sandboxes, JavaScript security). Similar could be pushed down to the desktop, so that executables by default have no more rights to tamper with your system than do web pages. is an aggressive approach to this problem. If that sounds too hard and not interesting enough (you really wanted to distribute “media”), go the web way as above — it is subsuming the desktop anyhow.

CodeCon Extra

Monday, February 13th, 2006

A few things I heard about at outside the presentations.

Vesta was presented at CodeCon 2004, the only one I’ve missed. It is an integrated revision control and build system that guarantees build repeatability, in part by ensuring that every file used by the build is under revision control. I can barely keep my head around the few revision control and build systems I occasionally use, but I imagine that if I were starting (or saving) some large mission-critical project that found everyday tools inadequare it would be well worth considering Vesta. About its commercial equivalents, I’ve mostly heard second hand complaining.

Allmydata is where Zooko now works. The currently Windows-only service allows data backup to “grid storage” presumably a as used by . Dedicate 10Gb of local storage to the service, you can back up 1Gb, free. Soon you’ll be able to pay for better ratios, including $30/month for 1Tb of space. I badly want this service. Please make it available, and for Linux! Distributed backup has of course been a dream P2P application forever. Last time I remember the idea getting attention was a Cringely column in 2004.

Some people were debating whether the Petname Tool does anything different from what specify and whether either would make substantially harder. The former is debated in comments on Bruce Schneier’s recent post on petnames, inconclusively AFAICT. The Petname Tool works well and simply for what it does (Firefox only), which is to allow a user to assign a name to a https site if it is using strong encryption. If the user visits the site again and it is using the same certificate, the user will see the assigned name in a green box. Any other site, including one that merely looks like the original (in content or URL), or even has hijacked DNS, appears to be “secure” but uses a different certificate, will appear as “untrusted” in a yellow box. That’s great as far as it goes (see phollow the phlopping phish for a good description of the attack this would save reasonable user from), though the naming seems the least important part — a checkbox to begin trusting a site would be nearly as good. I wonder though how many users have any idea that some pages are secure and others are not. The petname tool doesn’t do anything for non-https pages, so the user becomes inured to seeing it doing nothing, then does not see it. Perhaps it should be invisible when not on a secure site. Indicators like PageRank, Alexa rank (via the Google and Alexa toolbars) and similar, , and whether the visitor has previously visited the site in question before would all help warn the user that any site may not be what they expect — nearly everyone, including me, confers a huge amount of trust on non-https sites, even if I never engage in a financial transaction on such a site. I imagine a four-part security indicator in a prominent place in the browser, with readings of site popularity (rank), danger as measured by the likes of SiteAdvisor, the user’s relationship with the site (petname) and whether the connection is strongly encrypted.

Someone claimed that three letter agencies want to mandate geolocation for every net access device. No doubt some agency types dream of this. Anyway, the person said we should be ready to fight this if it were to become a real push for such a law, because what would happen to anonymity? No doubt such a mandate should be fought tooth and nail, but preserving anonymity seems like exactly the wrong battle cry. How about privacy, or even mere freedom? On that note, someone briefly showed a tiny computer attached to and powered by what could only be called a solar flap. This could be slapped on the side of a bus and would connect to wifi networks whenever possible and route as much traffic as possible.

CodeCon Saturday

Sunday, February 12th, 2006

Delta. Arbitrarily large codebase triggers specific bug. Run delta, which attempts to provide you with only the code that triggers the bug (usually a page or so, no matter the size of the codebase) via a like algorithm (the evaluation function requires triggering the bug and considers code size). Sounds like a big productivity and quality booster where it can be used.

Djinni. Framework for approximation of problems, supposedly faster and easier to use than more academic oriented approximation frameworks. An improved simulated annealing algorithm is or will be in the mix, including an analog of “pressue” in . Super annoying presentation style. Thank you for letting us know that CodeCon is where the rubber meets the road.

iGlance. Instant Messaging with audio and video, consistent with the IM metaphor (recipient immediately hears and sees initiator) rather than telephone metaphor (recipient must pick up call). Very low bitrate video buddy lists. Screen and window sharing with single control and dual pointers so that remote user can effectively point over your shoulder. Impressive for what seems to be a one person spare time project. Uses OVPL and OVLPL licenses, very similar to GPL and LGPL, but apparently easier to handle contributor agreements, so project owner can move code between application and library layers. Why not just make the entire application ?

Overlay Anycast Service InfraStructure. Locality-aware server selection (to be) used by , easy to implement for your service. Network locality correlates highly with geographic locality due to the speed of light bound. Obvious, but the graph was neat. OpenDHT was also mentioned, another hosted service. OpenDHT clients can use OASIS to find a gateway. Super easy to play with a with around 200 nodes. Someone has built fileshare using OpenDHT, see Octopod. As Wes Felter says, this stuff really needs to be moved to a non-research network.

Query By Example. Find and rank rows [dis]similar to others in SQL using extension for , which uses a for classification (last is not visible to user). Sounds great for data mining engagements.

Friday
Saturday 2005

CodeCon Friday

Saturday, February 11th, 2006

This year Gordon Mohr had the devious idea to do preemtive reviews of CodeCon presentations. I’ll probably link to his entries and have less to say here than last year.

Daylight Fraud Prevention. I missed most of this presentation but it seems they have a set of non-open source Apache modules each of which could make phishers and malware creators work slightly harder.

SiteAdvisor. Tests a website’s evilness by downloading and running software offered by the site and filling out forms requesting an email address on the site. If virtual Windows machine running downloaded software becomes infected or email address set up for test is inundated with spam the site is considered evil. This testing is mostly automated and expensive (many Windows licenses). Great idea, surprising it is new (to me). I wonder how accurate evil readings one could obtain at much lower cost by calculating a “SpamRank” for sites based on links found in email classified as spam and links found on pages linked to in spams? (A paper has already taken the name SpamRank, though at a five second glance it looks to propose tweaks to make PageRank more spam-resistant rather than trying to measure evil.) Fortunately SiteAdvisor says that both bitzi.com and creativecommons.org are safe to use. SiteAdvisor’s data is available for use under the most restrictive Creative Commons license — Attribution-NonCommercial-NoDerivs 2.5.

VidTorrent/Peers. Streaming joke. Peers, described as a “toolkit for P2P programming with continuation passing style” I gather works syntactically as a Python code preprocessor, could be interesting. I wish they had compared Peers to other P2P toolkits, e.g., .

Localhost. A global directory shared with a modified version of the BitTorrent client. I tried about a month ago. Performance was somewhere between abysmal and nonexistent. BitTorrent is fantastic for large popular files. I’ll be surprised if localhost’s performance, which depends on transferring small XML files, ever reaches mediocrity. They’re definitely going away from BitTorrent’s strengths by uploading websites into the global directory as lots of small files (I gather). The idea of a global directory is interesting, though tags seem a more fruitful navigation method than localhost’s hierarchy.

Truman. A “sandnet” for investigating suspected malware in. Faux services (e.g., DNS, websites) can be scripted to elicit the suspected malware’s behavior, and more.

[Hot]link policy

Sunday, January 15th, 2006

I’m out of the loop. Until very recently (upon reading former Creative Commons intern Will Frank’s writeup of a brief hotlink war) I thought ‘‘ was an anachronistic way to say ‘link’ used back when the mere fact that links led to a new document, perhaps on another server, was exciting. It turns out ‘hotlink’ is now vernacular for inline linking — displaying or playing an image, audio file, video, or other media from another website.

Lucas Gonze, who has lots of experience dealing with hotlink complaints due to running Webjay, has a new post on problems with complaint forms as a solution to hotlinks. One thing missing from the post is a distinction between two completely different sets of complainers who will have different sets of solutions beyond complaining.

One sort of complainer wants a link to a third party site to go away. I suspect the complainer usually really wants the content on the third party site to go away (typically claiming the third party site has no right to distribute the content in question). Removing a link to that content from a link site works as a partial solution by making the third party hosted content more obscure. A solution in this case is to tell the complainer that the link will go away when it no longer works — in effect, the linking site ignore complaints and it is the responsibility of the complainer to directly pursue the third party site via and other threats. This allows the linking site to completely automate the removal of links — those removed as a result of threatened or actual legal action look exactly the same as any other link gone bad and can be tested for and culled using the same methods. Presumably such a hands-off policy only pisses off complainers to the extent that they become more than a minor nuisance, at least on a Webjay-like site, though it must be an option for some.

Creative Commons has guidelines very similar to this policy concerning how to consider license information in files distributed off the web — don’t believe it unless a web page (which can be taken down) has matching license information concerning the file in question.

Another sort of complainer wants a link to content on their own site to go away, generally for one or two reasons. The first reason is that hotlinking uses bandwidth and other resources on the hotlinked site which the site owner may not be able to afford. The second reason, often coupled with the first, is that the site owner does not want their content to be available outside of the context of their own site (i.e., they want viewers to have to come to the source site to view the content).

With a bit of technical savvy the complainer who wants a link to their own site removed has several options for self help. Those merely concerned with cost could redirect requests without the relevant referrer (from their own site) or maybe cookie (e.g., for a logged in user) to the free or similar, which should drastically reduce originating site bandwidth, if hotlinks are actually generating many requests (if they are not there is no problem).

A complainer who does not want their content appearing in third party sites can return a small “visit my site if you want to view this content” image, audio file, or video as appropriate in the abscense of the desired referrer or cookie. Hotlinking sites become not an annoyance, but free advertising. Many sites take this strategy already.
Presumably many publishers do not have any technical savvy, so some Webjay-like sites find it easier to honor their complaints than to ignore them.

There is a potential for technical means of saying “don’t link to me” that could be easily implemented by publishers and link sites with any technical savvy. One is to interpret exclusions to mean “don’t link to me” as well as “don’t crawl and index me.” This has the nice effect that those stupid enough to not want to be linked to also become invisible to search engines.

Another solution is to imitate — perhaps rel=nolink, though the attribute would need to be availalable on img, object, and other elements in addtion to a, or simply apply rel=nofollow to those additional elements a la the broader interpretation of robots.txt above.

I don’t care for rel=nolink as it might seem to give some legitimacy to brutally bogus link policies (without the benefit of search invisibility), but it is an obvious option.

The upshot of all this is that if a link site operator is not as polite as Lucas Gonze there are plenty of ways to ignore complainers. I suppose it largely comes down to customer service, where purely technical solutions may not work as well as social solutions. Community sites with forums have similar problems. Apparently Craig Newmark spends much of his time tending to customer service, which I suspect has contributed greatly to making such a success. However, a key difference, I suspect, is that hotlink complainers are not “customers” of the linking site, while most people who complain about behavior on Craigslist are “customers” — participants in the Craigslist community.

CodeCon 2006 Program

Thursday, January 12th, 2006

The 2006 program has been announced and it looks fantastic. I highly recommend attending if you’re near San Francisco Feb 10-12 and any sort of computer geek. There’s an unofficial CodeCon wiki.

My impressions of last year’s CodeCon: Friday, Saturday, and Sunday.

Via Wes Felter

Lightnet!

Monday, January 9th, 2006

Congratulations to Lucas Gonze on the /Yahoo! merger. (Via Kevin Burton.)

Yahoo! made a very wise decision to be acquired by the light side rather than the dark side.

My favorite Gonze post: Totally fucking bored with Napster (more at CC).

Also have a listen to the best track on ccMixter (if you share my taste, probably not), also a Gonze creation.

I could gonze on, but enough of this!

Darkfox

Tuesday, December 27th, 2005

I hate to write about software that could be vaporware, but AllPeers (via Asa Dotzler) looks like a seriously interesting darknet/media sharing/BitTorrent/and more Firefox extension.

It’s sad, but simply sending a file between computers with no shared authority nor intermediary (e.g, web or ftp server) is still a hassle. IM transfers often fail in my experience, traditional filesharing programs are too heavyweight and are configured to connect to and share with any available host, and previous attempts at clients (e.g., ) were not production quality. Merely solving this problem would make AllPeers very cool.

Assuming AllPeers proves a useful mechanism for sharing media, perhaps it could also become a lightnet bridge– as a Firefox extension.

Do check out AllPeers CTO Matthew Gertner’s musings on the AllPeers blog. I don’t agree with everything he writes, but his is a very well informed and well written take on open source, open content, browser development and business models.

Songbird Media Player looks to be another compelling application built on the (though run as a separate program rather than as a Firefox extension), to be released real soon now. 2006 should be another banner year for Firefox and Mozilla technology generally.

Lucas Gonze’s original lightnet post is now near the top of results for ‘lightnet’ on Google, Yahoo!, and MSN, and related followups fill up much of the next few dozen results, having displaced most of the new age and lighting sites that use the same term.

Redefining light and dark

Monday, November 28th, 2005

The wily Lucas Gonze is at it again, defining ‘lightnet’ and ‘darknet’ by example, without explanation. The explanation is so simple that it probably only subtracts from Gonze’s [re]definition, but I’ll play the fool anyhow.

Usually darknet refers to (largely unstoppable) friend-to-friend information sharing. As the name implies, a darknet is underground, or at least under the radar of those who want to prohibit certain kinds of information sharing. (A BlackNet doesn’t require friends and the radar doesn’t work, to horribly abuse that analogy.)

Lightnet, as far as I know, is undefined in this context.*

Anyway, Lucas’ definition-by-example lumps prohibited sharing (friend to friend as well as over filesharing networks) and together as Darknet. Such content is dark to the web. It can’t be linked to, or if it can be, the link will be to a name,** not a location, thus you may not be able to obtain the content (filesharing), or you won’t be able to view the content (DRM).

Lightnet contnet is light to the web. It can be linked to, retrieved, and viewed in the ways you expect (and by extension, searched for in the way you expect), no law breaking or bad law making required.

* Ross Mayfield called iTunes a lightnet back in 2003. Lucas includes iTunes on the dark side. I agree with Lucas’ categorization, though Ross had a good point, and in a slightly different way was contrasting iTunes with both darknets and hidebound content owners.

** Among other things, I like to think of magnet links and as attempting to bridge the gap between the web and otherwise shared content. Obviously that work is unfinished. As is making multimedia work on the web. I think that’s the last time I linked to Lucas Gonze, but he’s had plently of crafty posts between then and now that I highly recommend following.

EFF15

Monday, August 1st, 2005

The Electronic Frontient Foundation is 15 and wants “to hear about your ‘click moment’–the very first step you took to stand up for your digital rights.

I don’t remember. It musn’t have been a figurative “click moment.” Probably not a literal “click moment” either–I doubt I used a mouse.

A frequent theme of other EFF15 posts seems to be “how I become a copyfighter” or “how I became a digital freedom activist.” I’ve done embarrassingly little (the occasional letter to a government officeholder, Sklyarov protests, the odd mailing list or blog post, running non-infringing P2P nodes, a more often lapsed than not EFF membership), but that’s the tack I’ll take here.

As a free speech absolutist I’ve always found the concept of “digital rights” superfluous. Though knowledge of computers may have helped me understand “the issues,” I needed none to oppose crypto export laws, the clipper chip, CDA, DMCA, perpetual copyright extension and the like. Still, I hold “ditigal rights,” for lack of a better term, near and dear. So how I became a copyfighter of sorts: four “click themes,” one with a “click moment.” All coalesced around 1988-1992, happily matching my college years, which otherwise were a complete waste of time.

First, earliest, and most important, I’d had an ear for “experimental” music since before college. At college I scheduled and skipped classes and missed sleep around WEFT schedule. Nothing was better than great music, and from my perspective, big record companies provided none of it. There was and is more mind-blowingly escastic music made for peanuts than I could hope to experience in many lifetimes. I didn’t have the terms just yet, but it was intuitively obvious that there was no public goods provisioning problem for art, at least not for anything I appreciated, while there was a massive oversupply of abominable anti-art.

Second, somewhere between reading libertarian tracts and studying economics, I hit upon the idea that “intellectual property” may be neither. Those are likely sources anyway–I don’t remember where I first came across the idea. I kept an eye out for confirmation and somewhere, also forgotten, I found a reference to Tom Palmer’s Intellectual Property: A Non-Posnerian Law and Economics Approach. Finding and reading the article, which describes intellectual property as a state-granted monopoly privilege developed through rent seeking by publishers and non-monopoly means of producing intangible goods, at my university’s law library was my “click moment.”

Third, I saw great promise in the nascent free software movement, and I wanted to run UNIX on my computer. I awaited 386BSD with baited breath and remember when Torvalds announced Linux on Usenet. I prematurely predicted world domination a few times, but regardless, free software was and is the most concrete, compelling and hopeful sign that large scale non-monopoly production of non-rivalrous goods is possible and good, and that the net facilitates such production, and that freedom on the net and free software together render each other more useful, imporant, and defensible.

Fourth, last, and least important, I followed the cypherpunks list for some time, where the ideas of crypto anarchy and BlackNet were developed. In the ten years or so since the net has not turned inside out nor overturned governments and corporations, yet we are very early in its history. Cypherpunk outcomes may remain vaporware indefinitely, but nonetheless are evocative of the transformational potential of the net. I do not know what ends will occur, but I’ll gladly place my bets on, and defend, the means of freedom and decentralization rather than control and protectionism.

The EFF has done an immense amount of great work over the past 15 years. You should join, and I will update my membership. However, my very favorite thing about the EFF is indirect–I’ve seen co-founder and board member John Gilmore at both drug war and DMCA protests. If you care about digital rights or any rights at all and do not understand descruction of individuals, rights, and societies wreaked by the drug war, there’s no time like the present to learn–the first step needed in order to stand up for your rights.


Blog-a-thon tag:

H C

Wednesday, March 23rd, 2005

This music had every cell and fiber in my body on heavy sizzle mode.

Thurston Moore on mixtapes, could be describing me listening to early Sonic Youth or one of my many ecstasy-inducing 120 minute cassettes that I’m mostly afraid to touch, really need to digitize. Yes, Moore relates it all to MP3, P2P, etc., sounding like he’s from the EFF:

Once again, we’re being told that home taping (in the form of ripping and burning) is killing music. But it’s not: It simply exists as a nod to the true love and ego involved in sharing music with friends and lovers. Trying to control music sharing - by shutting down P2P sites or MP3 blogs or BitTorrent or whatever other technology comes along - is like trying to control an affair of the heart. Nothing will stop it.

[Via Lucas Gonze.]

I’d like little more right now than to have Sonic Youth or one of Moore’s many avant projects to release some crack under a Creative Commons license. Had they already you could maybe find it via the just released Yahoo! Search for Creative Commons. (How’s that for a lame segue?)

Open Source P2P: No Malware, EULA

Wednesday, March 9th, 2005

Ben Edelmen asks what P2P programs install what spyware and answers with a Comparison of Unwanted Software Installed by P2P Programs. Of the five programs analyzed, four (eDonkey, iMesh, Kazaa, and Morpheus) install malware or even more malware and come with voluminous End User License Agreements. LimeWire installs no additional software and has no EULA.

The comparison currently doesn’t note that only one of the five programs is open source: LimeWire. Note that LimeWire, like the others, is produced by a company that pays developers, so being commercial is no excuse for the others.

What about other open source P2P applications? I installed the current versions of BitTorrent, eMule, Phex, and Shareaza. No bundled software. BitTorrent has no installation interface to speak of, and no EULA. The others ask the user to agree to the GNU General Public License, which concerns freedoms associated with the program source code, not obtaining permission for the program to do whatever it wants with the user’s computer and data.

Each of the open source programs (excepting BitTorrent, which is a different kind of P2P app) has the same features as the proprietary P2P apps listed above. All of the open source programs lack the spyware anti-features of their proprietary equivalents.

Notice a trend?

If you want to keep control of your computer and your data, stick to open source. The threat is very real. I’ve seen friends’ computers (particularly those used by teenagers) with proprietary P2P programs that had dozens of distinct malware programs installed and were completely unusable (browsing porn sites with Internet Exploder, which teens are apparently really keen on doing, doesn’t help either; get FireFox already).

[Via Boing Boing.]

Bitcollider-PHP

Saturday, March 5th, 2005

Here you’ll find a little PHP API that wraps the single file metadata extraction feature of Bitzi’s bitcollider tool. Bitcollider also can submit file metadata to Bitzi. This PHP API doesn’t provide access to the submission feature.

Other possibly useful code provided with Bitcollider-PHP:

  • function hex_to_base_32($hex) converts hexidecimal input to Base32.
  • function magnetlink($sha1, $filename, $fileurl, $treetiger, $kzhash) returns a MAGNET link for the provided input.
  • magnetlink.php [file ...] is a command line utility that outputs MAGNET links for the files specified, using the bitcollider if available (if not kzhash and tree:tiger are not included in MAGNET links).

Versions of this code are deployed on a few sites in service of producing MAGNET links or urn:sha1: identifiers for RDF along these lines, both in the case of CC Mixter.

Criticism welcome.

CodeCon Sunday

Tuesday, February 15th, 2005

I say CodeCon was 3/4 (one abstention) on Sunday.

Wheat. An environment (including a language) for developing web applications. Objects are arranged in a tree with some filesystem-like semantics. Every object has a URL (not necessarily in a public portion of the tree). Wheat’s web object publishing model and templating seem clearly reminiscent of Zope. In response to the first of several mostly redundant questions regarding Wheat and Zope, Mark Lentczner said that he used Zope a few years ago and was discouraged by the need to use external scripts and the lack of model-view separation in templates (I suspect Mark used DTML — Wheat’s TinyTemplates reminded me of DTML’s replacement, Zope Page Templates, currently my favorite and implemented in several languages). I’m not sure Wheat is an environment I’d like to develop in, but I suspect the world might learn something from pure implementations of URL-object identity (not just mapping) and a web domain specific language/environment (I understand that Wheat has no non-web interface). Much of the talk used these slides.

Incoherence. I find it hard to believe that nobody has done exactly this audio visualization method before (x = left/right, y = frequency, point intensity and size = volume), but as an audio-ignoramous I’ll take the Incoherence team’s word. I second Wes Felter’s take: “I learned more about stereo during that talk than in the rest of my life.”

i-Brokers. This is where XNS landed and where it might go. However, the presentation barely mentioned technology and left far more questions than answers. There was talk of Zooko’s Triangle (”Names: Decentralized, Secure, Human-Memorizable: Choose Two”). 2idi and idcommons seem to have chosen the last two, temporarily. It isn’t clear to me why they brought it up, as i-names will be semi-decentralized (like DNS). In theory i-names provide privacy (you provide only your i-name to an i-name enabled site, always logging in via your i-broker, and access to your data is provided through your i-broker — never enter your password or credit card anywhere else — you set the policies for who can access your data) and persistence (keep an i-name for life, and i-names may be transparently aliased or gatewayed should you obtain others). These benefits, if they exist in the future, are subtler than the claims. Having sites access your data via a broker rather than via you typing it in does little to protect your privacy by itself. You make a decision in both cases whether you want a site to have your credit card number. Once the site has your credit card… Possibly over the long term if lots of people and sites adopt i-names sites will collect or keep less personal information. Users, via their i-brokers, may be on more equal terms with sites, as i-broker access will presumably be governed by some you-have-no-rights-at-all terms of service. Some sites may decide (for new applications) they don’t want to have to worry about the security of customer information and access the same via customers’ i-names. However, once a user has provided their i-broker with lots of personal information, it becomes easy for sites to ask for it all. Persistence is also behavioral. Domain names and URLs can last a long time; good ones don’t change. Similarly an i-name will go away if the owner stops paying for it. Can the i-name ecology be structured so that i-names tend to be longer lived than domain names or URLs? Probably, but that’s a different story. In the short term 2idi is attempting to get adoption in the convention registration market. Good luck, but I wish Fen and Victor had spent their time talking about XRI resolution or other code behind the 2idi broker.

SciTools. A collection of free to use web applications for genetic design and analysis. Integrated DNA Technologies, the company that offers SciTools, makes its money selling (physical) synthesized nucleic acids. I was a cold, tired, bio-ignoramous, so have little clue whether this is novel. (Ted Leung seems to think so and also has interesting things to say about the other presentations.)

OzymanDNS. DNS can route and move data, is deployed and not filtered everywhere, so with a little cleverness we can tunnel arbitrary streams over DNS. Dan Kaminsky is clearly the crowd pleaser, not only for his showmanship and the audacity of his hacks (streaming anime over DNS this time). More than a few in the crowd wanted to put DNS hacks to work, e.g., on aspects of supposed syndication problems. PPT slides of an older version of the talk.

Yesterday.

CodeCon Saturday

Sunday, February 13th, 2005

CodeCon is 5/5 today.

The Ultra Gleeper. A personal web page recommendation system. Promise of collaborative filtering unfulfilled, in dark ages since Firefly was acquired and shut down in the mid-90s. Presenter believes we’re about to experience a renaissance in recommendation systems, citing Audiocrobbler recommendations (I would link to mine, but personal recommendations seem to have disappeared since last time I looked; my audioscrobbler page) as a useful example (I have found no automated music recommendation system useful) and blogs as a use case for recommendations (I have far too much very high quality manually discovered reading material, including blogs, to desire automated recommendations for more and I don’t see collaborative filtering as a useful means of prioritizing my lists). The Ultra Gleeper crawls pages you link to, treating links as positive ratings, pages that link to you (via Technorati CosmosQuery and Google API), presents suggested pages to rate in a web interface. Uses a number of tricks to avoid showing obvious recommendations (does not recommend pages that are two popular) and pages you’ve already seen (including those linked to in feeds you subscribe to). Some problems faced by typical recommendation systems (new users get crummy recommendations until they enter lots of data, early adopters get doubly crummy recommendations due to lack of existing data to correlate with) obviated by bootstrapping from data in your posts and subscriptions. I suppose if lots of people run something like Gleeper robot traffic increases, more people complain about syndication bandwidth-like problems (I’m skeptical about this being a major problem). I don’t see lots of people running Gleepers as automated recommendation systems are still fairly useless and will remain so for a long time. Interesting software and presentation nonetheless.

H2O. Primarily a discussion system tuned to facilitate professor-assigned discussions. Posts may be embargoed and professor may assign course participants specific messages or other participants to respond to. Discussions may include participants from multiple courses, e.g., to facilitate a MIT engineering-Harvard law exchange. Anyone may register at H2O and create own group, acting as professor for created group. Some of the constraints that may be iposed by H2O are often raised in mailing list meta discussions following flame wars, in particular posting delays. I dislike web forums but may have to try H2O out. Another aspect of H2O is syllabus management and sharing, which is interesting largely because syllabi are typically well hidden. Professors in the same school of the same university may not be aware of what each other are teaching.

Jakarta Feedparser. Kevin Burton gave a good overview of syndication and related standards and the many challenges of dealing with feeds in the wild, which are broken in every conceivable way. Claims SAX (event) based Jakarta FeedParser is an order of magnitude faster than DOM (tree) based parsers. Nothing new to me, but very useful code.

MAPPR. Uses Flickr tags, GNS to divine geographic location of photos. REST web services modeled on Flickr’s own. Flash front end, which you could spend many hours playing with.

Photospace. Personal image annotation and search service, focus on geolocation. Functionality available as library, web fron end provided. Photospace publishes RDF which may be consumed by RDFMapper.

Note above two personal web applications that crawl or use services of other sites (The Ultra Gleeper is the stronger example of this). I bet we’ll see many more of increasing sophistication enabled by ready and easily deployable software infrastructure like Jakarta FeedParser, Lucene, SQLite and many others. A personal social networking application is an obvious candidate. Add in user hosted or controlled authentication (e.g., LID, perhaps idcommons) …

Yesterday.

CodeCon Friday

Saturday, February 12th, 2005

CodeCon requires presenters to be active developers of the projects presented and projects must have demonstrably running code. There’s an emphasis on open source and decentralization. This generally makes for interesting presentations. Today was 4/5.

Aura. Case study in how not to give a CodeCon presentation. Talk for a long time about motivations for and very high level problems of reputation systems, which all attendees are surely familiar with. Give almost no specifics about Aura, apparently a peer-to-peer reputation system, including nothing on what differentiates it from other work nor on how or why I’d use it in my own code. Demo stumbles due to display problems, fails due to ill prepared data. One mostly irrelevant bit about Aura’s implementation: it uses SQLite, an embedded, zero configuration, endian neutral SQL database that many projects have started to use recently and tons more will in the near future. I’m certain that SQLite is in my future.

ArX. Very useful presentation on Walter Landry’s ArX, which began as a fork of the GNU Arch distributed revision control system (both are pronounced ‘arc’). Lists good, bad and ugly of active open source, distributed revision control systems (I agree that any system that does not have those attributes is strictly non-interesting), including GNU Arch/tla, ArX, monotone (also uses SQLite), Darcs, svk, and Codeville. I’ve tried tla a few times but have gotten hung up on what seems to me like uncessary complexity and strange conventions. I’d pretty much settled on using Darcs going forward, but now I’m a little concerned by its reordering of patches in order to solve merge conflicts, which apparently can be very slow and may make the repository’s view of its state at a point in time inaccurate. Not sure whether this is pragmatic, evil, or both, nor am I sure I understand it. See also Zooko’s notes (Darcs row, decentralization column).

Apache CA. A certification authority motivated by the needs of the Apache Software Foundation, which has around 900 developers with commit access working on around 100 projects. Program managers can add committers, but small admin team needs to create shell accounts, add to various text files, creating bottleneck. Solution: all services (most importantly source control — migrate to subversion) eventually use SSL, check for permission based on group membership noted in personal certificates and managed via email by program managers. Sounds like a long term project. “Open CA” feature is an interesting extension — allows anyone who can sign an email with GPG to create groups in the form of user@example.com/groupname. Not sure what the ASF motivation is for Open CA, but I’m sure interesting applications can be built on it.

Off-the-Record Messaging. Messaging using the PGP model (sign with sender’s public key, encrypt with recipient’s public key) can be attacked: the “bad guys” can intercept and store your messages. In the future they can break into your computer, obtain your private key, decrypt your messages and prove that you are the author. Very briefly OTR obtains “perfect forward secrecy” through the use of short lived encryption keys and “refutable authentication” using shared MAC keys — compromise of your long term keys doesn’t allow your messages to be decrypted, and it can’t be proved that you wrote your messages. A toolkit for forging transcripts is even provided to enhance deniability. Details here. This presentation seems to match the one given at CodeCon. They have a GAIM plugin, which I’m now running, and a standalone proxy for other AIM clients. Cool stuff.

RPOW. Reusable Proofs of Work is a system for sequential reuse of hashcash mediated by a server written by the great signal-to-noise enhancer Hal Finney. RPOW has many potential uses — apparently initially motivated by a desire to implement “P2Poker” with interesting “chips” and currently being experimented with in a modified BitTorrent client in which downloaders can pay for priority wit RPOW tokens, possibly encouraging people to leave clients running after completing a download (serving as seeds in BT lingo) in order to earn tokens which may be spent on future downloads. As the BTRP page notes, people could acquire RPOWs out of band, and not contribute more upload bandwidth, or even contribute less. The net effect is hard to predict. If buying download priority with RPOWs proves useful, I expect non-BT filesharing clients, which have far less reason to cooperate, would benefit more than BT clients. Perhaps the most interesting thing about the RPOW system is its great effort to ensure that there can be no cheating, in particular by the server operator. The RPOW server will zero all data if it is physically tampered with, it is possible for anyone to verify the code it is running, and that code can verify that its database in its untrusted host has not been tampered with, using a Merkle hash tree to verify (the secure board only has two megabytes of memory). The RPOW server may be the world’s first transparent server, which could facilitate a world of distributed, cooperating RPOW servers. Presentation slides.

Saturday.

Shallow thinking about filesharing

Monday, February 7th, 2005

Tyler Cowen “cannot accept the radical anti-copyright position” and so proffers apologia for the radical intellectual protectionist position. (NB no anti-copyright position is being argued in MGM v. Grokster.) Regarding Cowen’s three arguments:

1. In ten year’s time, what will happen to the DVD and pay-for-view trades? BitTorrent allows people to download movies very quickly.

BitTorrent downloads tend to be faster than those on typical file sharing networks but still very slow. Netflix is a far superior option unless you place a very low value on your time (in addition to waiting many hours in the case of BitTorrent to weeks in the case of eDonkey for a download to complete you also need to spend time finding active torrents or hash links and dealing with low quality, mislabled and overdubbed copies, which often means starting over, even after you’ve learned how to deal with all of these. I pity the computer semi-literate who just wants to snag some “free” movies) .

Note that DVDs already account for more than half of Hollywood domestic revenue. Furthermore the process will be eased when TVs and computers can “talk” to each other more readily. Yes, I am familiar with Koleman Strumpf’s excellent work showing that illegal file-sharing has not hurt music sales. But a song download can be a loss leader for an entire CD or a concert tour. Downloading an entire movie does not prompt a person to spend money in comparable fashion.

Radical protectionists said made similar arguments about the VCR, as have those in countless businesses faced with new technology. In the case of the VCR, entrepreneurs figured out how to use the new technology to make billions. Similarly, it should be up to entrepreneurs to figure out how to thrive in the environment of ubiquitous networking, rather than up to lawmakers to ensure existing businesses survive technological change.

2. Perhaps we can make file-sharing services identify (and block) illegally traded files. After all, the listeners can find the illegal files and verify they have what they wanted. Grokster, sooner or later, will be able to do the same. Yes, fully decentralized and “foreign rogue” systems may proliferate, and any identification system will be imperfect. But this is one way to heed legitimate copyright suits without passing the notorious “Induce Act.”

Fully decentralized filesharing systems have proliferated. LimeWire is #2 at download.com and several other decentralized filesharing clients make the top 50 downloads list.

The imperfections of an identification and blocking system will include invasion of privacy and censorship.

3. I question the almost universal disdain for the “Micky Mouse” copyright extension act. OK, lengthening the copyright extension does not provide much in the way of favorable incentives. Who innovates with the expectation of reaping copyright revenues seventy-five years from now? But this is a corporate rather than an individual issue. Furthermore economic research indicates that current cash flow is a very good predictor of investment. So the revenue in fact stimulates additional investment in creative outputs. If I had my finger on the button, I still would have pushed “no” on the Mickey Mouse extension, if only because of the rule of law. Privileges of this kind should not be extended repeatedly due to special interest pressures. But we are fooling ourselves if we deny that the extension will benefit artistic output, at least in the United States.

The paper Cowen links to above (Cash Flow and Outcomes: How the Availability of Cash Impacts the Likelihood of Investing Wisely) is hardly encouraging regarding the efficacy of additional investments correlated with increased cash flow.

Eric Rescorla points out that subsidizing organizations that happen to hold copyright to work created 70 years ago is hardly the best way to subsidize new content creation, should one wish to do that.

Deployment Matters

Thursday, December 30th, 2004

Most popular descriptions of why BitTorrent works so well are off the mark, The BitTorrent Effect in the current Wired Magazine included. Excerpts:

The problem with P2P file-sharing networks like Kazaa, he reasoned, is that uploading and downloading do not happen at equal speeds. Broadband providers allow their users to download at superfast rates, but let them upload only very slowly, creating a bottleneck: If two peers try to swap a compressed copy of Meet the Fokkers - say, 700 megs - the recipient will receive at a speedy 1.5 megs a second, but the sender will be uploading at maybe one-tenth of that rate.

Paradoxically, BitTorrent’s architecture means that the more popular the file is the faster it downloads - because more people are pitching in. Better yet, it’s a virtuous cycle. Users download and share at the same time; as soon as someone receives even a single piece of Fokkers, his computer immediately begins offering it to others. The more files you’re willing to share, the faster any individual torrent downloads to your computer. This prevents people from leeching, a classic P2P problem in which too many people download files and refuse to upload, creating a drain on the system. “Give and ye shall receive” became Cohen’s motto, which he printed on T-shirts and sold to supporters.

Sites like Kazaa and Morpheus are slow because they suffer from supply bottlenecks. Even if many users on the network have the same file, swapping is restricted to one uploader and downloader at a time.

Most home and many business broadband connections are asymmetric — the downstream pipe is much fatter than the upstream pipe. That’s a problem any net application that requires significant upstream bandwidth has to contend with. There is no protocol solution. A BitTorrent client can’t upload any faster than a Gnutella client.

Kazaa, eDonkey and various Gnutella clients (e.g., LimeWire) have incorporated multisource/swarming downloads for three years, and the latter two also use partial file sharing (I’m not sure about Kazaa and PFS). These two key features — download from multiple peers, and begin uploading parts of a file before you’ve completed downloading — don’t set BitTorrent apart, though it may be slightly ahead in pushing the state of the art (I haven’t examined the protocols side by side).

So why does BitTorrent work so well? Deployment.

Gnutella et al users start a client and typically download and share files in one or more local directories. All files a user has collected are shared simultaneously. A client connects to random peers and accepts random queries and requests to download any files the client is sharing. There are significant refinements, e.g., ultrapeers and supernodes.) Downloads will be spread across a huge number of files and peers downloading the same file won’t necessarily know about each other (and thus won’t be able to upload to each other while downloading). Again, there are significant refinements — Gnutella peers maintain a list of other peers sharing a given file — knows as an alternate location download mesh.

For BitTorrent such refinements are superfluous. A BitTorrent user finds a “torrent” for a file the user wants to download, a BitTorrent client is launched and connects to a tracker specified by the torrent. All clients connecting to a tracker via a single torrent are by definition all downloading the same file, and they know about each other — the ideal situation for swarming distribution. And here’s the key to BitTorrent’s success in spite of typically limited upload rates: Because users are sharing only one or a few files — the one(s) they’re downloading — their precious upstream bandwidth is used to enhance a virtuous cycle in which everyone downloading the same file downloads faster.

This ideal situation also allows BitTorrent to utilize tit-for-tat (PDF) leech resistance — “Give and ye shall receive” above. A typical filesharing client can’t effectively use this strategy as it is unlikely to have multiple interactions with the same peer, let alone simultaneous mutually beneficial interactions.

There are technologies (possibly Distributed Hash Tables), tweaks (a client giving preference to uploads of files the client has recently downloaded has been proposed) and practices (encourage filesharing client users to initiate downloads from editorially controlled lists of files rather than via ad hoc peer searches) that can be implemented in typical filesharing clients to make the average user’s download experience better, perhaps someday approaching the average BitTorrent user’s experience when downloading a popular file.

There’s also lots of talk about decentralizing BitTorrent. See eXeem, supposedly to be released in a few weeks. It appears that eXeem will attempt to keep BitTorrent’s beneficial characteristics by limiting files shared to those a user has obtained or created a .torrent file for — perhaps similar to a hypothetical Gnutella client that only shared files for which it had alternate sources. I don’t have high hopes for the decentralized bits of eXeem, whatever they turn out to be. It may serve as a decent standard BitTorrent client, but there’s no need for another of those, and eXeem will supposedly be riddled with spyware.

Lexus, Mercedes, Porsche

Wednesday, December 29th, 2004

Tyler Cowen cites a Harper’s Index factoid:

Number of American five-year-olds named Lexus: 353

One of them works at Raisins, featured in the first South Park episode I ever watched and still my sentimental favorite. Every kid should watch this episode. If it is available on DVD I can’t find it, but search for “South Park 714″ or “South Park Raisins” on any filesharing network — South Park episodes are among the most shared content.

Also see Christian Hard Rock, which tackles filesharing. Almost every episode is well worth watching for kids and adults. Skip the movie, it sucks ass.