Post Programming

Emergent Robustness in a Walnut

Sunday, April 16th, 2006

just published his dissertation: Robust Composition: Towards a Unified Approach to Access Control and Concurrency Control informed by his years of work on the capability-secure . Great stuff, very relevant to the future of highly distributed, concurrent and secure computing, i.e., the future of computing, and pretty readable too — I blinked and momentarily misread the heading “Reference Graph Dynamics” (numbered page 66) as “Reference Graphs for Dummies.” I’ve only skimmed the document, but Part III, Concurrency Control, looks the most interesting and hardest, while Part IV, Emergent Robustness should be accessible and thought provoking to anyone with marginal technical literacy.

Also very recently announced a draft of Emily in a Walnut, a gentle introduction for imperative programmers to a secure variant of . Using Objective Caml for something interesting has been somewhere down my list for several years and will probably remain for several more.

LimeWire Filtering & Blog

Wednesday, March 29th, 2006

Just noticed that the current beta (4.11.0) includes optional copyright filtering. See the features history and brief descriptions for users and copyright owners:

In the Filtering System, copyright owners identify files that they don’t want shared and submit them for inclusion in a public list. LimeWire then consults this list and stops users from downloading the identified files “filtering” them from the sharing process.

If you sign up for an account as a copyright owner you can submit files (with file name, file size, SHA1 hash, creator, collection, description) for filtering. Users can turn the filter on and off via a preference.

LimeWire.org now features a blog with pretty random content. I notice that another PHP Base32 function (which makes a whole lot more sense than the one included in Bitcollider-PHP — I swear PHP’s bitwise operators weren’t giving correct results and worked around that, but was probably insane) is available with a hint that someone is building an “open source Gnutella Server in PHP5.”

Remember that LimeWire is Open Source P2P and thus pretty trustworthy — and you can always fork.

content.exe is evil

Thursday, February 16th, 2006

I occasionally run into people who think users should download content (e.g., music or video) packaged in an executable file, usually for the purpose of wrapping the content with where the content format does not directly support DRM (or the proponent’s particular DRM scheme). Nevermind the general badness of Digital Restrictions Management, requiring users to run a new executable for each content file is evil.

Most importantly, every executable is a potential vector. There is no good excuse for exposing users to this risk. Even if your executable content contains no malware and your servers are absolutely impenetrable such that your content can never be replaced with malware, you are teaching users to download and run executables. Bad, bad, bad!

Another problem is that executables are usually platform-specific and buggy. Users have enough problem having the correct codec installed. Why take a chance that they might not run Windows (and the specific versions and configurations you have tested, sure to not exist in a decade or much less)?

I wouldn’t bother to mention this elementary topic at all, but very recently I ran into someone well intentioned who wants users to download content wrapped in , if I understand correctly for the purposes of ensuring users can obtain content metadata (most media players do a poor job of exposing content metadata and some file formats do a poor job of supporting embedded metadata, not that hardly anyone cares — this is tilting at windmills) and so that content publishers can track use (this is highly questionable), all from a pretty cross platform GUI. A jar file is an executable Java package, so the platform downside is different (Windows is not required, but a Java installation, of some range of versions and configurations, is), but it is still an executable that can do whatever it wants with the computer it is running on. Bad, bad, bad!

The proponent of this scheme said that it was ok, the jar file could be . This is no help at all. Anyone can create a certificate and sign jar files. Even if a creator did have to have their certificate signed by an established authority it would be of little help, as malware purveyors have plenty of resources that certificate authorities are happy to take. The downsides are many: users get a security prompt (“this content signed by…”) for content, which is annoying, misleading as described above and conditions the user to not pay attention when they install things that really do need to be executable, and a barrier is raised for small content producers.

If you really want to package arbitrary file formats with metadta, put everything in a zip file and include your UI in the zip as HTML. This is exactly what P2P vendor ‘s Packaged Media File format is. You could also make your program (which users download only once) look for specific files within the zip to build a content-specific (and safe) interface within your program. I believe this describes ‘s Kapsules, though I can’t find any technical information.

Better yet put your content on the web, where users can find and view it (in the web design of your choice), you get reasonable statistics, and the don’t get fed. You can even push this to 81/19 by including minimal but accurate embedded in your files if they support it — a name users can search for or a URL for your page related to the content.

Most of the pushers of executable content I encounter when faced with security concerns say it is an “interersting and hard problem.” No, it is a stupid and impossible problem. In contrast to web, executable content is a 5/95/-1000 solution — that last number is a .

If you really want an interesting and hard problem, executable content security is the wrong level. Go work on platform security. We can now run sophisticated applications within a web browser with some degree of safety (due to Java applet and Flash sandboxes, JavaScript security). Similar could be pushed down to the desktop, so that executables by default have no more rights to tamper with your system than do web pages. is an aggressive approach to this problem. If that sounds too hard and not interesting enough (you really wanted to distribute “media”), go the web way as above — it is subsuming the desktop anyhow.

CodeCon Extra

Monday, February 13th, 2006

A few things I heard about at outside the presentations.

Vesta was presented at CodeCon 2004, the only one I’ve missed. It is an integrated revision control and build system that guarantees build repeatability, in part by ensuring that every file used by the build is under revision control. I can barely keep my head around the few revision control and build systems I occasionally use, but I imagine that if I were starting (or saving) some large mission-critical project that found everyday tools inadequare it would be well worth considering Vesta. About its commercial equivalents, I’ve mostly heard second hand complaining.

Allmydata is where Zooko now works. The currently Windows-only service allows data backup to “grid storage” presumably a as used by . Dedicate 10Gb of local storage to the service, you can back up 1Gb, free. Soon you’ll be able to pay for better ratios, including $30/month for 1Tb of space. I badly want this service. Please make it available, and for Linux! Distributed backup has of course been a dream P2P application forever. Last time I remember the idea getting attention was a Cringely column in 2004.

Some people were debating whether the Petname Tool does anything different from what specify and whether either would make substantially harder. The former is debated in comments on Bruce Schneier’s recent post on petnames, inconclusively AFAICT. The Petname Tool works well and simply for what it does (Firefox only), which is to allow a user to assign a name to a https site if it is using strong encryption. If the user visits the site again and it is using the same certificate, the user will see the assigned name in a green box. Any other site, including one that merely looks like the original (in content or URL), or even has hijacked DNS, appears to be “secure” but uses a different certificate, will appear as “untrusted” in a yellow box. That’s great as far as it goes (see phollow the phlopping phish for a good description of the attack this would save reasonable user from), though the naming seems the least important part — a checkbox to begin trusting a site would be nearly as good. I wonder though how many users have any idea that some pages are secure and others are not. The petname tool doesn’t do anything for non-https pages, so the user becomes inured to seeing it doing nothing, then does not see it. Perhaps it should be invisible when not on a secure site. Indicators like PageRank, Alexa rank (via the Google and Alexa toolbars) and similar, , and whether the visitor has previously visited the site in question before would all help warn the user that any site may not be what they expect — nearly everyone, including me, confers a huge amount of trust on non-https sites, even if I never engage in a financial transaction on such a site. I imagine a four-part security indicator in a prominent place in the browser, with readings of site popularity (rank), danger as measured by the likes of SiteAdvisor, the user’s relationship with the site (petname) and whether the connection is strongly encrypted.

Someone claimed that three letter agencies want to mandate geolocation for every net access device. No doubt some agency types dream of this. Anyway, the person said we should be ready to fight this if it were to become a real push for such a law, because what would happen to anonymity? No doubt such a mandate should be fought tooth and nail, but preserving anonymity seems like exactly the wrong battle cry. How about privacy, or even mere freedom? On that note, someone briefly showed a tiny computer attached to and powered by what could only be called a solar flap. This could be slapped on the side of a bus and would connect to wifi networks whenever possible and route as much traffic as possible.

CodeCon Sunday

Monday, February 13th, 2006

Dido. I think this provides AGI, or a way to script voice response systems using and a voice template system analogous to scripting and HTML templates for web servers, though questioners focused on a controversial feature to reorder menus based on popularity. The demo didn’t really work, except as a demonstration of everyone’s frustration with IVRs, as an audience member pointed out.

Deme. Kitchen sink collaboration web app. They aren’t done putting dishes in the sink. They’re thinking about taking all of the dishes out of the sink, replacing the sink, and putting the dishes back in (PHP to something cooler). Let’s vote on what kind of vote to put this to.

Monotone. Elegant distributed , uses SHA1 hashes to identify files and repository states. Hash of previous repository state included in current repository state, making lineage cryptographically provable. used to quickly determine file level differences between repositories (for sync). Storage and (especially) merge and diff are loosely coupled. Presentation didn’t cover day to day use, probably a good decision in terms of interestingness. The revision control presentations have been some of the best every year at CodeCon. They should consider having two or three next year. may be the only project presented this year that had a Wikipedia article before the conference.

Rhizome. Unlike Gordon (and perhaps most people), hearing the triplet doesn’t make my eyes glaze over, but I’m afraid this presentation did. Some of the underlying code ( etc) might be interesting, but was the second to last presentation, and the top level project, Rhizome, amounts to yet another idiosyncratic , with the idiosyncratic dial turned way up.

Elkhound/Elsa/Oink/Cqual++. generator that handles ambiguous grammars in a straightforward manner, C++ parser and tools built on top of same. Can find with a reasonable false positive rate. Expressed confidence that future work would lead the compiler catching far more bugs than usually thought possible (as opposed to only at runtime). Cool and important stuff, too bad I only grok it at a high level. Co-presenter Dan Wilkerson (and sole presenter on Saturday of Delta) is with the Open Source Quality Project at UC Berkeley.

Saturday
Sunday 2005

CodeCon Saturday

Sunday, February 12th, 2006

Delta. Arbitrarily large codebase triggers specific bug. Run delta, which attempts to provide you with only the code that triggers the bug (usually a page or so, no matter the size of the codebase) via a like algorithm (the evaluation function requires triggering the bug and considers code size). Sounds like a big productivity and quality booster where it can be used.

Djinni. Framework for approximation of problems, supposedly faster and easier to use than more academic oriented approximation frameworks. An improved simulated annealing algorithm is or will be in the mix, including an analog of “pressue” in . Super annoying presentation style. Thank you for letting us know that CodeCon is where the rubber meets the road.

iGlance. Instant Messaging with audio and video, consistent with the IM metaphor (recipient immediately hears and sees initiator) rather than telephone metaphor (recipient must pick up call). Very low bitrate video buddy lists. Screen and window sharing with single control and dual pointers so that remote user can effectively point over your shoulder. Impressive for what seems to be a one person spare time project. Uses OVPL and OVLPL licenses, very similar to GPL and LGPL, but apparently easier to handle contributor agreements, so project owner can move code between application and library layers. Why not just make the entire application ?

Overlay Anycast Service InfraStructure. Locality-aware server selection (to be) used by , easy to implement for your service. Network locality correlates highly with geographic locality due to the speed of light bound. Obvious, but the graph was neat. OpenDHT was also mentioned, another hosted service. OpenDHT clients can use OASIS to find a gateway. Super easy to play with a with around 200 nodes. Someone has built fileshare using OpenDHT, see Octopod. As Wes Felter says, this stuff really needs to be moved to a non-research network.

Query By Example. Find and rank rows [dis]similar to others in SQL using extension for , which uses a for classification (last is not visible to user). Sounds great for data mining engagements.

Friday
Saturday 2005

CodeCon Friday

Saturday, February 11th, 2006

This year Gordon Mohr had the devious idea to do preemtive reviews of CodeCon presentations. I’ll probably link to his entries and have less to say here than last year.

Daylight Fraud Prevention. I missed most of this presentation but it seems they have a set of non-open source Apache modules each of which could make phishers and malware creators work slightly harder.

SiteAdvisor. Tests a website’s evilness by downloading and running software offered by the site and filling out forms requesting an email address on the site. If virtual Windows machine running downloaded software becomes infected or email address set up for test is inundated with spam the site is considered evil. This testing is mostly automated and expensive (many Windows licenses). Great idea, surprising it is new (to me). I wonder how accurate evil readings one could obtain at much lower cost by calculating a “SpamRank” for sites based on links found in email classified as spam and links found on pages linked to in spams? (A paper has already taken the name SpamRank, though at a five second glance it looks to propose tweaks to make PageRank more spam-resistant rather than trying to measure evil.) Fortunately SiteAdvisor says that both bitzi.com and creativecommons.org are safe to use. SiteAdvisor’s data is available for use under the most restrictive Creative Commons license — Attribution-NonCommercial-NoDerivs 2.5.

VidTorrent/Peers. Streaming joke. Peers, described as a “toolkit for P2P programming with continuation passing style” I gather works syntactically as a Python code preprocessor, could be interesting. I wish they had compared Peers to other P2P toolkits, e.g., .

Localhost. A global directory shared with a modified version of the BitTorrent client. I tried about a month ago. Performance was somewhere between abysmal and nonexistent. BitTorrent is fantastic for large popular files. I’ll be surprised if localhost’s performance, which depends on transferring small XML files, ever reaches mediocrity. They’re definitely going away from BitTorrent’s strengths by uploading websites into the global directory as lots of small files (I gather). The idea of a global directory is interesting, though tags seem a more fruitful navigation method than localhost’s hierarchy.

Truman. A “sandnet” for investigating suspected malware in. Faux services (e.g., DNS, websites) can be scripted to elicit the suspected malware’s behavior, and more.

Alexa Grapher

Wednesday, January 18th, 2006

Many people look at to see how the Alexa traffic rank of a site of interest is faring (usually not well — there are always more websites, so even maintaining ordinal rank is an uphill battle). People who don’t twiddle URLs out of habit probably don’t realize that Alexa can be asked to graph data back to late 2001 or that graphs may be arbitarily sized.

I’d been meaning to put together an Alexa graph generating utility for months (well, one more accessible than URL editing, which I’ve always used, e.g., the graphs at the bottom of a post on blog search) and I finally got started last night.

Funny thing then that I read via Brad Neuberg that Joe Walker just published an “Ajax” Alexa grapher, so I guess I’ll just publish my own ultra-crufty Alexa grapher rather than cleaning it up first. I’m not sure what is Ajaxian about Walker’s (haven’t looked), but mine doesn’t qualify, I think — it is just plain old javascript, with the graph updated by setting innerHTML. No Async XML communication with a server.

I was going to write up a bunch of caveats about how Alexa graphs should be interpreted, but in the interest of completing this post, I’ll just point out one oddity I discovered — the url parameter of Alexa’s traffic detail page (click on the graph on my Alexa grapher to get to such a page) must be the last querystring parameter, otherwise every parameter after it gets interpreted as being part of the url parameter. Some kind of odd URL parsing going on there at Alexa. (Nevermind that they really want a domain name, not a URL, for that parameter.)

CodeCon 2006 Program

Thursday, January 12th, 2006

The 2006 program has been announced and it looks fantastic. I highly recommend attending if you’re near San Francisco Feb 10-12 and any sort of computer geek. There’s an unofficial CodeCon wiki.

My impressions of last year’s CodeCon: Friday, Saturday, and Sunday.

Via Wes Felter

XTech 2006 CFP deadline

Tuesday, January 3rd, 2006

I mentioned elsewhere that I’m on the program committe for XTech 2006, the leading web technology conference in Europe, to be held in Amsterdam May 16-19.

Presentation, tutorial and panel proposals are due in less than a week–January 9. If you’re building an extraordinary Web 2.0 application or doing research that Web 2.0 (very broadly construed) developers and entrepreneurs need to hear about, please consider submitting a proposal.

See the CFP and track descriptions.