Archive for the ‘Programming’ Category

Copyright turns us into technology idiots

Saturday, October 28th, 2006

Or do copyright enforcement technologies attract people who would be kooks anyway?

Obvious case in point: DRM.

Now this from Paul Hoffert, apparently associated with “Noank Media”, commenting on Rob Kaye’s blog:

The Noank counting system is unique. We count usage by ALL players. Players can be time-based, such as iTunes, Windows Media, open source, our own Noank player, or your own favorite. They can be Microsoft Word, Acrobat Reader, Photoshop, or any other application program. The Noank client reports consumption of all content within our catalog on Windows, Mac, Unix, or recent cell phone devices.

Rob’s response is too polite:

This is nothing but empty hand-waving, I’m sorry. If you were to hire me to implement this system, I would have to politely tell you that this is impossible. I could not code such a thing and I have over a decade of client application programming experience. Please do elaborate on how you’re going to do this. If you’ve solved this I assume that you’ve already filed for some patents, right? What are your patent application numbers? I’d like to look up these exciting details — this is got to be amazing stuff you’re working on!

To which Hoffert responds:

Our tracking system is operational now and we are scaling it for large numbers of users.

Uh huh.

Voluntary collective licensing may have a role to play but I’m afraid I’m going to have to completely write off “Noank Media” before they even have a website.

Copyright mania hass the side effect of reducing perpetual motion research, who knew?

Addendum 20061031: Lucas Gonze writes that collective licensing will never happen. I think I buy his argument:

Users and businesses are moving away from filesharing networks and to the web, where DMCA safe harbor allows many disputes to be resolved peacefully. User-created content has become a substantial part of the media ecosystem over the last few years, and it doesn’t need collective licensing to exist.

Update 20071126: Noank does have a website now and a how it works page that leaves out lots of details but is not implausible. When more details are available I hope to post a retraction. Hoffert’s language was just too easy to make fun of, and that urge turned me into a technology idiot!

Wordcamp and wiki mania

Monday, August 7th, 2006

In lieu of attending maybe the hottest conference ever I did a bit of wiki twiddling this weekend. I submitted a tiny patch (well that was almost two weeks ago — time flies), upgraded a private MediaWiki installation from 1.2.4 to 1.6.8 and a public installation from 1.5.6 to 1.6.8 and worked on a small private extension, adding to some documentation before running into a problem.

1.2.4->1.6.8 was tedious (basically four successive major version upgrades) but trouble-free, as that installation has almost no customization. The 1.5.6->1.6.8 upgrade, although only a single upgrade, took a little fiddling make a custom skin and permissions account for small changes in MediaWiki code (example). I’m not complaining — clean upgrades are hard and the MediaWiki developers have done a great job of making them relatively painless.

Saturday I attended part of , a one day unconference for WordPress users. Up until the day before the tentative schedule looked pretty interesting but it seems lots of lusers signed up so the final schedule didn’t have much meat for developers. Matt Mullenweg’s “State of the Word” and Q&A hit on clean upgrade of highly customized sites from several angles. Some ideas include better and better documented plugin and skin APIs with more metadata and less coupling (e.g., widgets should help many common cases that previously required throwing junk in templates).

Beyond the purely practical, ease of customization and upgrade is important for openness.

Now listening to the Wikimania Wikipedia and the Semantic Web panel…

Constitutionally open services

Thursday, July 6th, 2006

Luis Villa provokes, in a good way:

Someone who I respect a lot told me at GUADEC ‘open source is doomed’. He believed that the small-ish apps we tend to do pretty well will migrate to the web, increasing the capital costs of delivering good software and giving next-gen proprietary companies like Google even greater advantages than current-gen proprietary companies like MS.

Furthermore:

Seeing so many of us using proprietary software for some of our most treasured possessions (our pictures, in flickr) has bugged me deeply this week.

These things have long bugged me, too.

I think Villa has even understated the advantage of web applications — no mention of security — and overstated the advantage of desktop applications, which amounts to low latency, high bandwidth data transfer — let’s see, , including video editing, is the hottest thing on the web. Low quality video, but still. The two things client applications still excel at are very high bandwidth, very low latency data input and output, such as rendering web pages as pixels. :)

There are many things that can be done to make client development and deployment easier, more secure, more web-like and client applications more collaboration-enabled. Fortunately they’ve all been tried before (e.g., , , , others of varying relevance), so there’s much to learn from, yet the field is wide open. Somehow it seems I’d be remiss to not mention , so there it is. Web applications on the client are also a possibility, though typical only address ease of development and not manageability at all.

The ascendancy of web applications does not make the desktop unimportant any more than GUIs made filesystems unimportant. Another layer has been added to the stack, but I am still very happy to see any move of lower layers in the direction of freedom.

My ideal application would be available locally and over the network (usually that means on the web), but I’ll prefer the latter if I have to choose, and I can’t think of many applications that don’t require this choice (fortunately is one of them, or close enough).

So what can be done to make the web application dominated future open source in spirit, for lack of a better term?

First, web applications should be super easy to manage (install, upgrade, customize, secure, backup) so that running your own is a real option. Applications like and have made large strides, especially in the installation department, but still require a lot of work and knowledge to run effectively.

There are some applications that centralizaton makes tractable or at least easier and better, e.g., web scale search, social aggregation — which basically come down to high bandwidth, low latency data transfer. Various P2P technologies (much to learn from, field wide open) can help somewhat, but the pull of centralization is very strong.

In cases were one accepts a centralized web application, should one demand that application be somehow constitutionally open? Some possible criteria:

  • All source code for the running service should be published under an open source license and developer source control available for public viewing.
  • All private data available for on-demand export in standard formats.
  • All collaboratively created data available under an open license (e.g., one from Creative Commons), again in standard formats.
  • In some cases, I am not sure how rare, the final mission of the organization running the service should be to provide the service rather than to make a financial profit, i.e., beholden to users and volunteers, not investors and employees. Maybe. Would I be less sanguine about the long term prospects of Wikipedia if it were for-profit? I don’t know of evidence for or against this feeling.

Consider all of this ignorant speculation. Yes, I’m just angling for more freedom lunches.

Long tail of metadata

Monday, May 29th, 2006

Ben Adida notes that people are writing about RDFa, which is great, and envisioning conflict with microformats, which is not. As Ben says:

Microformats are useful for expressing a few, common, well-defined vocabularies. RDFa is useful for letting publishers mix and match any vocabularies they choose. Both are useful.

In other words RDFa is a technology.

Evan Prodromou thinks the future is bleak without cooperation. I like his proposed way forward (strikeout added for obvious reasons):

  1. RDFa gets acknowledged and embraced by microformats.org as the future of semantic-data-in-XHTML
  2. The RDFa group makes an effort to encompass existing microformats with a minimum of changes
  3. microformats.org leaders join in on the RDFa authorship process
  4. microformats.org becomes a focus for developing real-world RDFa vocabularies

I see little chance of points one and three occuring. However, I don’t see this as a particularly bad thing. Point three will occur, almost by default: the simplest and most widely deployed microformats (e.g., , and rellicense) are also valid RDFa — the predicate (e.g., tag, nofollow, license) appearing in the default namespace to a RDFa application. More complex microformats may be handled by hGRDDL, which is no big deal as a microformat-aware application needs to parse each microformat it cares about anyway. From an RDF perspective any well-crafted metadata is a plus (and the microformats group do very careful work) as RDF’s killer app is integrating heterogenous data sources.

From a microformats perspecitve RDFa might well be ignored. While transformation of any microformat to RDF is relatively straightforward, transformation of RDF (which is a model, not a format) to microformats is nonsensical (well, I suppose the endpoint of such a transformation could be , though I’m not sure what the point would be). Microformats, probably wisely, is not reinventing RDF (as many do, usually badly).

So why would RDFa be of interest to developers? In a word, laziness. There is no process to follow for developing an RDF vocabulary (ironic), you can freely reuse existing vocabularies and tools, not write your own parsers, and trust that really smart people are figuring out the hard stuff for you (I believe the formal background of the Semantic Web is a long-term win). Or you might just want to, as Ben says “express metadata about other documents (embedded images)” which is trivial for RDF as images have URIs.

Addendum 20060601: The “simplest” microformats mentioned above have a name: elemental microformats.

Wikiforms

Thursday, May 11th, 2006

Brad Templeton writes about overly structured forms, one of my top UI peeves. The inability to copy and paste an IP address into a form with four separate fields has annoyed me, oh, probably hundreds of times. Date widgets annoy me slightly less. Listen to Brad when designing your next form, on the web or off.

The opposite of overly structured forms would be a freeform editing widget populated with unconstrained fields blank or filled with example data, or even a completely empty editing widget with suggested structure documented next to the widget — a wiki editing form. This isn’t as strange as it seems — many forms are distributed as word processor or plain text documents that recipients are expected to fill in by editing directly and return.

I don’t think “wikiforms” are appropriate for many cases where structured forms are used, but it’s useful to think of opposites and I imagine their (and hybrids — think a “rich” wiki editor with autocompletion — I haven’t really, but I imagine this is deja vu for anyone who has used mainframe-style data entry applications) niche could increase.

Ironically the currently number one use of the term wiki forms denotes adding structured forms to wikis!

On a marginally related note the Semantic MediaWiki appears to be making good progress.

Lazyweb: guess source and taget languages for translation

Monday, April 24th, 2006

I use and Google Translate fairly often and am annoyed that both require me to specify both source (text to be translated) and destination languages. The former could be guessed at from the input text and the latter trivially obtained from browser settings (Google at least defaults to English destination at google.com and Spanish at google.es).

, failing AltaVista and Google fixing this, someone should write a script that does.

Comments at this article point to various language detection techniques.

Emergent Robustness in a Walnut

Sunday, April 16th, 2006

just published his dissertation: Robust Composition: Towards a Unified Approach to Access Control and Concurrency Control informed by his years of work on the capability-secure . Great stuff, very relevant to the future of highly distributed, concurrent and secure computing, i.e., the future of computing, and pretty readable too — I blinked and momentarily misread the heading “Reference Graph Dynamics” (numbered page 66) as “Reference Graphs for Dummies.” I’ve only skimmed the document, but Part III, Concurrency Control, looks the most interesting and hardest, while Part IV, Emergent Robustness should be accessible and thought provoking to anyone with marginal technical literacy.

Also very recently announced a draft of Emily in a Walnut, a gentle introduction for imperative programmers to a secure variant of . Using Objective Caml for something interesting has been somewhere down my list for several years and will probably remain for several more.

LimeWire Filtering & Blog

Wednesday, March 29th, 2006

Just noticed that the current beta (4.11.0) includes optional copyright filtering. See the features history and brief descriptions for users and copyright owners:

In the Filtering System, copyright owners identify files that they don’t want shared and submit them for inclusion in a public list. LimeWire then consults this list and stops users from downloading the identified files “filtering” them from the sharing process.

If you sign up for an account as a copyright owner you can submit files (with file name, file size, SHA1 hash, creator, collection, description) for filtering. Users can turn the filter on and off via a preference.

LimeWire.org now features a blog with pretty random content. I notice that another PHP Base32 function (which makes a whole lot more sense than the one included in Bitcollider-PHP — I swear PHP’s bitwise operators weren’t giving correct results and worked around that, but was probably insane) is available with a hint that someone is building an “open source Gnutella Server in PHP5.”

Remember that LimeWire is Open Source P2P and thus pretty trustworthy — and you can always fork.

content.exe is evil

Thursday, February 16th, 2006

I occasionally run into people who think users should download content (e.g., music or video) packaged in an executable file, usually for the purpose of wrapping the content with where the content format does not directly support DRM (or the proponent’s particular DRM scheme). Nevermind the general badness of Digital Restrictions Management, requiring users to run a new executable for each content file is evil.

Most importantly, every executable is a potential vector. There is no good excuse for exposing users to this risk. Even if your executable content contains no malware and your servers are absolutely impenetrable such that your content can never be replaced with malware, you are teaching users to download and run executables. Bad, bad, bad!

Another problem is that executables are usually platform-specific and buggy. Users have enough problem having the correct codec installed. Why take a chance that they might not run Windows (and the specific versions and configurations you have tested, sure to not exist in a decade or much less)?

I wouldn’t bother to mention this elementary topic at all, but very recently I ran into someone well intentioned who wants users to download content wrapped in , if I understand correctly for the purposes of ensuring users can obtain content metadata (most media players do a poor job of exposing content metadata and some file formats do a poor job of supporting embedded metadata, not that hardly anyone cares — this is tilting at windmills) and so that content publishers can track use (this is highly questionable), all from a pretty cross platform GUI. A jar file is an executable Java package, so the platform downside is different (Windows is not required, but a Java installation, of some range of versions and configurations, is), but it is still an executable that can do whatever it wants with the computer it is running on. Bad, bad, bad!

The proponent of this scheme said that it was ok, the jar file could be . This is no help at all. Anyone can create a certificate and sign jar files. Even if a creator did have to have their certificate signed by an established authority it would be of little help, as malware purveyors have plenty of resources that certificate authorities are happy to take. The downsides are many: users get a security prompt (“this content signed by…”) for content, which is annoying, misleading as described above and conditions the user to not pay attention when they install things that really do need to be executable, and a barrier is raised for small content producers.

If you really want to package arbitrary file formats with metadta, put everything in a zip file and include your UI in the zip as HTML. This is exactly what P2P vendor ‘s Packaged Media File format is. You could also make your program (which users download only once) look for specific files within the zip to build a content-specific (and safe) interface within your program. I believe this describes ‘s Kapsules, though I can’t find any technical information.

Better yet put your content on the web, where users can find and view it (in the web design of your choice), you get reasonable statistics, and the don’t get fed. You can even push this to 81/19 by including minimal but accurate embedded in your files if they support it — a name users can search for or a URL for your page related to the content.

Most of the pushers of executable content I encounter when faced with security concerns say it is an “interersting and hard problem.” No, it is a stupid and impossible problem. In contrast to web, executable content is a 5/95/-1000 solution — that last number is a .

If you really want an interesting and hard problem, executable content security is the wrong level. Go work on platform security. We can now run sophisticated applications within a web browser with some degree of safety (due to Java applet and Flash sandboxes, JavaScript security). Similar could be pushed down to the desktop, so that executables by default have no more rights to tamper with your system than do web pages. is an aggressive approach to this problem. If that sounds too hard and not interesting enough (you really wanted to distribute “media”), go the web way as above — it is subsuming the desktop anyhow.

CodeCon Extra

Monday, February 13th, 2006

A few things I heard about at outside the presentations.

Vesta was presented at CodeCon 2004, the only one I’ve missed. It is an integrated revision control and build system that guarantees build repeatability, in part by ensuring that every file used by the build is under revision control. I can barely keep my head around the few revision control and build systems I occasionally use, but I imagine that if I were starting (or saving) some large mission-critical project that found everyday tools inadequare it would be well worth considering Vesta. About its commercial equivalents, I’ve mostly heard second hand complaining.

Allmydata is where Zooko now works. The currently Windows-only service allows data backup to “grid storage” presumably a as used by . Dedicate 10Gb of local storage to the service, you can back up 1Gb, free. Soon you’ll be able to pay for better ratios, including $30/month for 1Tb of space. I badly want this service. Please make it available, and for Linux! Distributed backup has of course been a dream P2P application forever. Last time I remember the idea getting attention was a Cringely column in 2004.

Some people were debating whether the Petname Tool does anything different from what specify and whether either would make substantially harder. The former is debated in comments on Bruce Schneier’s recent post on petnames, inconclusively AFAICT. The Petname Tool works well and simply for what it does (Firefox only), which is to allow a user to assign a name to a https site if it is using strong encryption. If the user visits the site again and it is using the same certificate, the user will see the assigned name in a green box. Any other site, including one that merely looks like the original (in content or URL), or even has hijacked DNS, appears to be “secure” but uses a different certificate, will appear as “untrusted” in a yellow box. That’s great as far as it goes (see phollow the phlopping phish for a good description of the attack this would save reasonable user from), though the naming seems the least important part — a checkbox to begin trusting a site would be nearly as good. I wonder though how many users have any idea that some pages are secure and others are not. The petname tool doesn’t do anything for non-https pages, so the user becomes inured to seeing it doing nothing, then does not see it. Perhaps it should be invisible when not on a secure site. Indicators like PageRank, Alexa rank (via the Google and Alexa toolbars) and similar, , and whether the visitor has previously visited the site in question before would all help warn the user that any site may not be what they expect — nearly everyone, including me, confers a huge amount of trust on non-https sites, even if I never engage in a financial transaction on such a site. I imagine a four-part security indicator in a prominent place in the browser, with readings of site popularity (rank), danger as measured by the likes of SiteAdvisor, the user’s relationship with the site (petname) and whether the connection is strongly encrypted.

Someone claimed that three letter agencies want to mandate geolocation for every net access device. No doubt some agency types dream of this. Anyway, the person said we should be ready to fight this if it were to become a real push for such a law, because what would happen to anonymity? No doubt such a mandate should be fought tooth and nail, but preserving anonymity seems like exactly the wrong battle cry. How about privacy, or even mere freedom? On that note, someone briefly showed a tiny computer attached to and powered by what could only be called a solar flap. This could be slapped on the side of a bus and would connect to wifi networks whenever possible and route as much traffic as possible.