Archive for the ‘Programming’ Category

Web Data Common[s] Crawl Attribution Metadata

Monday, January 23rd, 2012

Via I see Web Data Commons which has “extracted structured data out of 1% of the currently available Common Crawl corpus dating October 2010″. WDC publishes the extracted data as N-Quads (the fourth item denotes the immediate provenance of each subject/predictate/object triple — the URL the triple was extracted from).

I thought it would be easy and fun to run some queries on the WDC dataset to get an idea of how annotations associated with Creative Commons licensing are used. Notes below on exactly what I did. The biggest limitation is that the license statement itself is not part of the dataset — not as xhv:license in the RDFa portion, and for some reason rel=license microformat has zero records. But cc:attributionName, cc:attributionURL, and cc:morePermissions are present in the RDFa part, as are some Dublin Core properties that the Creative Commons license chooser asks for (I only looked at dc:source) but are probably widely used in other contexts as well.

Dataset URLs Distinct objects
Common Crawl 2010 corpus 5,000,000,000a
1% sampled by WDC ~50,000,000
with RDFa 158,184b
with a cc: property 26,245c
cc:attributionName 24,942d 990e
cc:attributionURL 25,082f 3,392g
dc:source 7,235h 574i
cc:morePermissions 4,791j 253k
cc:attributionURL = dc:source 5,421l
cc:attributionURL = cc:morePermissions 1,880m
cc:attributionURL = subject 203n

Some quick takeaways:

  • Low ratio of distinct attributionURLs probably indicates HTML from license chooser deployed without any parameterization. Often the subject or current page will be the most useful attributionURL (but 203 above would probably be much higher with canonicalization). Note all of the CC licenses require that such a URL refer to the copyright notice or licensing information for the Work. Unless one has set up a side-wide license notice somewhere, a static URL is probably not the right thing to request in terms of requiring licensees to provide an attribution link; nor is a non-specific attribution link as useful to readers as a direct link to the work in question. As (and if) support for attribution metadata gets built into Creative Commons-aware CMSes, the ratio of distinct attributionURLs ought increase.
  • 79% of subjects with both dc:source and cc:attributionURL (6,836o) have the same values for both properties. This probably means people are merely entering their URL into every form field requesting a URL without thinking, not self-remixing.
  • 47% of subjects with both cc:morePermissions and cc:attributionURL (3,977p) have the same values for both properties. Unclear why this ratio is so much lower than previous; it ought be higher, as often same value for both makes sense. Unsurprising that cc:morePermissions least provided property; in my experience few people understand it.

I did not look at the provenance item at all. It’d be interesting to see what kind of assertions are being made across authority boundaries (e.g. a page on example.com makes a statements with an example.net URI as the subject) and when to discard such. I barely looked directly at the raw data at all; just enough to feel that my aggregate numbers could possibly be accurate. More could probably be gained by inspecting smaller samples in detail, informing other aggregate queries.

I look forward to future extracts. Thanks indirectly to Common Crawl for providing the crawl!

Please point out any egregious mistakes made below…

# a I don't really know if the October 2010 corpus is the
# entire 5 billion Common Crawl corpus

# download RDFa extract from Web Data Commons
wget -c https://s3.amazonaws.com/ccrdf1p/data/ccrdf.html-rdfa.nq

# Matches number stated at
# http://page.mi.fu-berlin.de/muehleis/ccrdf/stats1p.html#html-rdfa
wc -l ccrdf.html-rdfa.nq
1047250

# Includes easy to use no-server triplestore
apt-get install redland-utils

# sanity check
grep '<http://creativecommons.org/ns#attributionName>' ccrdf.html-rdfa.nq |wc -l
26404 

# Import rejects a number of triples for syntax errors
rdfproc xyz parse ccrdf.html-rdfa.nq nquads

# d Perhaps syntax errors explains fewer triples than above grep might
# indicate, but close enough
rdfproc xyz query sparql - 'select ?o where { ?s <http://creativecommons.org/ns#attributionName> ?o}' |wc -l
24942

# These replicated below with 4store because...
rdfproc xyz query sparql - 'select distinct ?o where { ?s <http://creativecommons.org/ns#attributionName> ?o}' |wc -l
990
rdfproc xyz query sparql - 'select ?o where { ?s <http://creativecommons.org/ns#attributionURL> ?o}' |wc -l
25082
rdfproc xyz query sparql - 'select distinct ?o where { ?s <http://creativecommons.org/ns#attributionURL> ?o}' |wc -l
3392
rdfproc xyz query sparql - 'select ?o where { ?o <http://creativecommons.org/ns#attributionURL> ?o }' |wc -l
203
rdfproc xyz query sparql - 'select ?o where { ?s <http://creativecommons.org/ns#morePermissions> ?o}' |wc -l
4791
rdfproc xyz query sparql - 'select distinct ?o where { ?s <http://creativecommons.org/ns#morePermissions> ?o}' |wc -l
253
rdfproc xyz query sparql - 'select ?o where { ?o <http://creativecommons.org/ns#morePermissions> ?o }' |wc -l
12

# ...this query takes forever, hours, and I have no idea why
rdfproc xyz query sparql - 'select ?s, ?o where { ?s <http://creativecommons.org/ns#morePermissions> ?o ; <http://creativecommons.org/ns#attributionURL> ?o }'

# 4store has a server, but is lightweight
apt-get install 4store

# 4store couldn't import with syntax errors, so export good triples from
# previous store first
rdfproc xyz serialize > ccrdf.export-rdfa.rdf

# import into 4store
curl -T ccrdf.export-rdfa.rdf 'http://localhost:8080/data/wdc'

# egrep is to get rid of headers and status output prefixed by ? or #
4s-query wdc -s '-1' -f text 'select ?s, ?o where { ?s <http://creativecommons.org/ns#attributionName> ?o}' |egrep -v '^[\?\#]' |wc -l
24942

#f
4s-query wdc -s '-1' -f text 'select ?s, ?o where { ?s <http://creativecommons.org/ns#attributionURL> ?o}' |egrep -v '^[\?\#]' |wc -l
25082

#j
4s-query wdc -s '-1' -f text 'select ?s, ?o where { ?s <http://creativecommons.org/ns#morePermissions> ?o}' |egrep -v '^[\?\#]' |wc -l
4791

#h
#Of course please use http://purl.org/dc/terms/source instead.
#Should be more widely deployed soon.
4s-query wdc -s '-1' -f text 'select ?s, ?o where { ?s <http://purl.org/dc/elements/1.1/source> ?o}' |egrep -v '^[\?\#]' |wc -l
7235

4s-query wdc -s '-1' -f text 'select ?s, ?o where { ?s <http://purl.org/dc/terms/source> ?o}' |egrep -v '^[\?\#]' |wc -l
4

#e
4s-query wdc -s '-1' -f text 'select distinct ?o where { ?s <http://creativecommons.org/ns#attributionName> ?o}' |egrep -v '^[\?\#]' |wc -l
990

#g
4s-query wdc -s '-1' -f text 'select distinct ?o where { ?s <http://creativecommons.org/ns#attributionURL> ?o}' |egrep -v '^[\?\#]' |wc -l
3392

#k
4s-query wdc -s '-1' -f text 'select distinct ?o where { ?s <http://creativecommons.org/ns#morePermissions> ?o}' |egrep -v '^[\?\#]' |wc -l
253

#i
4s-query wdc -s '-1' -f text 'select distinct ?o where { ?s <http://purl.org/dc/elements/1.1/source> ?o}' |egrep -v '^[\?\#]' |wc -l
574

#n
4s-query wdc -s '-1' -f text 'select ?o where { ?o <http://creativecommons.org/ns#attributionURL> ?o}' |egrep -v '^[\?\#]' |wc -l
203

4s-query wdc -s '-1' -f text 'select ?o where { ?o <http://creativecommons.org/ns#morePermissions> ?o}' |egrep -v '^[\?\#]' |wc -l
12

4s-query wdc -s '-1' -f text 'select ?o where { ?o <http://purl.org/dc/elements/1.1/source> ?o}' |egrep -v '^[\?\#]' |wc -l
120

#m
4s-query wdc -s '-1' -f text 'select ?s, ?o where { ?s <http://creativecommons.org/ns#attributionURL> ?o ; <http://creativecommons.org/ns#morePermissions> ?o }' |egrep -v '^[\?\#]' |wc -l
1880

4s-query wdc -s '-1' -f text 'select distinct ?o where { ?s <http://creativecommons.org/ns#attributionURL> ?o ; <http://creativecommons.org/ns#morePermissions> ?o }' |egrep -v '^[\?\#]' |wc -l
122

4s-query wdc -s '-1' -f text 'select ?o where { ?o <http://creativecommons.org/ns#attributionURL> ?o ; <http://creativecommons.org/ns#morePermissions> ?o }' |egrep -v '^[\?\#]' |wc -l
8

#l
4s-query wdc -s '-1' -f text 'select ?s, ?o where { ?s <http://creativecommons.org/ns#attributionURL> ?o ; <http://purl.org/dc/elements/1.1/source> ?o }' |egrep -v '^[\?\#]' |wc -l
5421

4s-query wdc -s '-1' -f text 'select distinct ?o where { ?s <http://creativecommons.org/ns#attributionURL> ?o ; <http://purl.org/dc/elements/1.1/source> ?o }' |egrep -v '^[\?\#]' |wc -l
358

4s-query wdc -s '-1' -f text 'select ?o where { ?o <http://creativecommons.org/ns#attributionURL> ?o ; <http://purl.org/dc/elements/1.1/source> ?o }' |egrep -v '^[\?\#]' |wc -l
11

#p
4s-query wdc -s '-1' -f text 'select ?s, ?o, ?n where { ?s <http://creativecommons.org/ns#attributionURL> ?o ; <http://creativecommons.org/ns#morePermissions> ?n }' |egrep -v '^[\?\#]' |wc -l
3977

#o
4s-query wdc -s '-1' -f text 'select ?s, ?o, ?n where { ?s <http://creativecommons.org/ns#attributionURL> ?o ; <http://purl.org/dc/elements/1.1/source> ?n }' |egrep -v '^[\?\#]' |wc -l
6836

4s-query wdc -s '-1' -f text 'select ?s, ?o, ?n, ?m where { ?s <http://creativecommons.org/ns#attributionURL> ?o ; <http://purl.org/dc/elements/1.1/source> ?n ; <http://creativecommons.org/ns#morePermissions> ?m }' |egrep -v '^[\?\#]' |wc -l
2946
4s-query wdc -s '-1' -f text 'select ?s, ?o where { ?s <http://creativecommons.org/ns#attributionURL> ?o ; <http://purl.org/dc/elements/1.1/source> ?o ; <http://creativecommons.org/ns#morePermissions> ?o }' |egrep -v '^[\?\#]' |wc -l
1604

#c
4s-query wdc -s '-1' -f text 'select distinct ?s where { { ?s <http://creativecommons.org/ns#attributionURL> ?o } UNION { ?s <http://creativecommons.org/ns#attributionName> ?n } UNION { ?s <http://creativecommons.org/ns#morePermissions> ?m }  }' |egrep -v '^[\?\#]' |wc -l
26245

4s-query wdc -s '-1' -f text 'select distinct ?s where { { ?s <http://creativecommons.org/ns#attributionURL> ?o } UNION { ?s <http://creativecommons.org/ns#attributionName> ?n }}' |egrep -v '^[\?\#]' |wc -l
25433

#b note subjects not the same as pages data extracted from (158,184)
4s-query wdc -s '-1' -f text 'select distinct ?s where { ?s ?p ?o }'  |egrep -v '^[\?\#]' |wc -l
264307

# Probably less than 1047250 claimed due to syntax errors
4s-query wdc -s '-1' -f text 'select ?s where { ?s ?p ?o }'  |egrep -v '^[\?\#]' |wc -l
968786

4s-query wdc -s '-1' -f text 'select ?s where { ?s ?p ?s }'  |egrep -v '^[\?\#]' |wc -l
2415

4s-query wdc -s '-1' -f text 'select ?s where { ?s ?s ?s }'  |egrep -v '^[\?\#]' |wc -l
0

4s-query wdc -s '-1' -f text 'select ?s where { ?s ?s ?o }'  |egrep -v '^[\?\#]' |wc -l
0

Faded sidebar

Sunday, January 22nd, 2012

I’ve occasionally mucked with my blog’s theme with a general aim of removing superfluous crap that makes reading posts harder. A couple weeks ago something Parker Higgins wrote inspired me to try a little harder:

How awesome is @niemanlab’s “faded” sidebar? And zen mode? That’s a site that cares about its readers.

You can view an archived version of the Nieman Lab site should their design change. The top of the sidebar isn’t faded, but overall I think the fading there is makes it a little easier to concentrate on a post without switching to “zen” mode (removing all navigation).

For my theme, I made the sidebar always faded except when hovered over, and float to the right as far away from the main content area as possible. All “header” content is in the sidebar so that there’s nothing preceding a post’s title.

I intended to remove anything hardcoded* for my blog, anything I don’t understand or not used, and anything that doesn’t validate, but I didn’t get very far on any of those. I doubt this will be useful to anyone, but patches welcome.

*Yes it is also a little ironic I’ve never bothered to published modified source used to run this blog until now.

CSS text overlay image, e.g. for attribution and license notice

Sunday, January 8th, 2012

A commenter called me on providing inadequate credit for an map image I used on this blog. I’ve often seen map credits overlaid on the bottom right of maps, so I decided to try that. I couldn’t find an example of using CSS to overlay text on an image that only showed the absolute minimum needed to achieve the effect, and explained why. Below is my attempt.

Example 1

The above may be a good example of when to not use a text overlay (there is already text at the bottom of the image), but the point is to demonstrate the effect, not to look good. I have an image and I want to overlay «Context+Colophon» at the bottom right of the image. Here’s the minimal code:

1
2
3
4
5
6
<div style="position:relative;z-index:0;width:510px">
  <img src="http://gondwanaland.com/i/young-obama-pirate-hope.png"/>
  <div style="position:absolute;z-index:1;right:0;bottom:0">
    <a href="http://identi.ca/conversation/69446510">Context</a>+<a href="http://registry.gimp.org/node/14291">Colophon</a>
  </div>
</div>

Explanation

The outer div creates a container which the text overlay will be aligned with. A position is necessary to enable z-index, which specifies how objects will stack. Here position:relative as I want the image and overlay to flow with the rest of the post, z-index:0 as the container is at the bottom of the stack. I specify width:510px as that’s how wide the image is, and without hardcoding the size of the div, the overlay as specified will float off to the right rather than align with the image. There’s nothing special about the img; it inherits from the outer div.

The inner div contains and styles the text I want to overlay. position:absolute as I will specify an absolute offset from the container, right:0;bottom:0, and z-index:1 to place above the image. Finally, I close both divs.

That’s it. I know precious little CSS; please tell me what I got wrong.

Example 2

Above is the image that prompted this post, with added attribution and license notice. Code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
<div style="z-index:0;position:relative;width:560px"
     xmlns:cc="http://creativecommons.org/ns#"
     about="http://gondwanaland.com/i/OpenStreetMap-Oakland-980.png">
  <a href="http://www.openstreetmap.org/?lat=37.8134&amp;lon=-122.2776&amp;zoom=14&amp;layers=Q">
    <img src="http://gondwanaland.com/i/OpenStreetMap-Oakland-980.png"/></a>
  <div style="position:absolute;z-index:1;right:0;bottom:0;">
    <small>
      © <a rel="cc:attributionURL"
           property="cc:attributionName"
           href="http://www.openstreetmap.org/?lat=37.8134&amp;lon=-122.2776&amp;zoom=14&amp;layers=Q">OpenStreetMap contributors</a>,
        <a rel="license"
           href="http://creativecommons.org/licenses/by-sa/2.0/">CC-BY-SA</a>
    </small>
  </div>
</div>

Explanation

With respect to the achieving the text overlay, there’s nothing in this example not in the first. Below I explain annotations added that (but are not required by) fulfillment of OpenStreetMap/CC-BY-SA attribution and license notice.

The xmlns:ccprefix, and even that may be superfluous, given cc: as a default prefix.

about sets the subject of subsequent annotations.

small isn’t an annotation, but does now seem appropriate for legal notices, and is usually rendered nicely.

rel="cc:attributionURL" says that the value of the href property is the link to use for attributing the subject. property="cc:attributionName" says that the text (“OpenStreetMap contributors”) is the name to use for attributing the subject. rel="license" says the value of its href property is the subject’s license.

If you’re bad and not using HTTPS-Everywhere (referrer not sent due to protocol change; actually I’m bad for not serving this blog over https), clicking on BY-SA above might obtain a snippet of HTML with credits for others to use. Or you can copy and paste the above code into RDFa Distiller or checkrdfa to see that the annotations are as I’ve said.

Addendum: If you’re reading this in a feed reader or aggregator, there’s a good chance inline CSS is stripped — text intended to overlay images will appear under rather than overlaying images. Click through to the post in order to see the overlays work.

End of the 2011 world

Saturday, December 31st, 2011


I took the above photo near the beginning of 2011. It has spent most of the year near the top (currently #2) of my photos hosted at Flickr ranked by their interestingness metric. Every other photo in the 200 they rank (sadly I don’t think anyone not logged in as me can see this list) has some combination of being on other people’s lists of favorites, comments, or large number of views. The above photo has none of that. Prior to this post it has only been viewed 33 times by other people, according to Flickr, and I don’t think that number has changed in some time. Their (not revealed) code must find something about the image itself interesting. Is their algorithm inaccurate? In any case the image is appropriate as the world of 2011 is ending, and in 2012 I absolutely will migrate my personal media hosting to something autonomous, as since last year someone (happens to be a friend and colleague) has taken on the mantle of building media sharing for the federated social web.

My employer’s office moved from San Francisco to Mountain View in April, contributing to a number of people leaving or transitioning out, which has been a bummer. I’ve been working exclusively from home since May. Still, there have been a number of good developments, which I won’t attempt to catalog here. My favorites include agreement with the Free Software Foundation regarding use of CC0 for public domain software, small improvements in the CC legal user interface, the return and great work of a previous colleague, retirement of two substandard licenses, research, and a global summit/launch of a process toward version 4.0 of the CC licenses, which I hope over the next year prove at least a little bit visionary, long-standing, and have some consideration for how they can make the world a better place.

Speaking of which, I’ve spent more time thinking about social science-y stuff in 2011 than I have in at least several years. I’ll probably have plenty to say regarding this on a range of topics next year, but for now I’ll state one narrow “professionally-related” conclusion: free/libre/open software/culture/etc advocates (me included) have done a wholly inadequate job of characterizing why our preferences matter, both to the general public and to specialists in every social science.

Apart from silly peeves, two moderate ideas unrelated to free/libre/open stuff that I first wrote about in 2011 and I expect I’ll continue to push for years to come: increasing the minimum age and education requirement for soldiers and tearing down highway 980.

I haven’t done much programming in several years, and not full time in about a decade. This has been making me feel like my brain is rotting, and contributes to my lack of prototyping various services that I want to exist. Though I’d been fiddling (that may be generous) with Scala for a couple years, I was never really super excited about tying myself to the JVM. I know and deeply respect lots of people who doing great things with Python, and I’ve occasionally used it for scripts over the past several years because of that, but it leaves me totally non-enthused. I’ve done enough programming in languages that are uglier but more or less the same, time for something new. For a couple months I’ve been learning and doing some prototyping using the Yesod web framework (apparently I had heard of Haskell in 2005 but I didn’t look at it closely until last year). I haven’t made as much progress as I’d like, mostly due to unrelated distractions. The biggest substantive hurdle has not been Haskell (and the concepts it stands for), but a lack of Yesod examples and documentation. This seems to be a common complaint. Yesod is rapidly moving to a 1.0 release, documentation is prioritized, and I expect to be really productive with it over the coming year. Thanks to the people who make Yesod and those who have been making Haskell for two decades.

This year I appreciated three music projects that I hadn’t paid much attention to before, much to my detriment: DNA, Moondog, and especially Harry Partch. I also listened a lot again to one of my favorite bands I discovered in college, Violence and the Sacred, which amazingly has released some of its catalog under the CC BY-SA license. Check them out!

Finally, in 2011 I had the pleasure of getting to know just a little bit some people working to make my neighborhood a better place, attending a conference with my sister, seeing one of my brothers start a new job and the other a new gallery, and with my wife of continuing to grow up (in that respect, the “better half” cliche definitely applies). Now for this world to end!

Mozilla $300m/year for freedom

Thursday, December 22nd, 2011

More Mozilla ads by Henrik Moltke / CC BY

Congratulations to Mozilla on their $300m/year deal with Google, which will more than double current annual revenue. I’ve always thought people predicting doom for Mozilla if Google failed to renew were all wrong — others would be happy to pay for the default search position; probably less since Microsoft, Yahoo, and others make less than Google per ad view, but it’d still be a very substantial amount — and the link article hints that a Microsoft bid drove the price up.

There’s always a risk that Mozilla won’t spend the money well, but I’m pretty confident that they will. Firefox is excellent, and in 2011 has gotten more excellent, faster, and I think many of the other projects they’re doing are really important, and on the right track (insofar as I’m qualified to discern, which is not much), for example BrowserID. Even in small and hopelessly annoying things, like licensing, I think Mozilla is doing good. (Bias: Mozilla has donated to my employer.)

I’m no longer enthused about the possibility of huge resources for progress toward Wikimedia’s vision from advertising on Wikipedia. Since I was last on that bandwagon, it has become even less of a possibility in anything but the distant future: Wikimedia’s donation campaigns have gone very well, adequately funding its operating mission, and lack of advertising has become even more part of Wikimedia’s messaging; I’ve also become more concerned (not in particular to Wikimedia) about the institutional corruption risks previously blogged by Peter McCluskey and Timothy B. Lee. (Note these objections don’t apply to Mozilla: its significant revenue has always been advertising-based; very roughly its revenues are already 10x those of Wikimedia’s; and it is also building up an individual donor program, which I agree is often the healthiest revenue for a nonprofit.)

But I still very much think freedom needs massive, ongoing resource infusions, in the right institutional framework. I celebrate the tremendous benefits of the FLOSS community achieves without massive, concentrated, ongoing resource infusions, but I also admit that the web likely would be much worse, much less webby, and much less free without concentrated resources at Mozilla over the last several years.

Thank you Mozillians, and congratulations. I have very high expectations for your contributions over the next years to the web and society, in particular where more freedom and security are obviously needed such as mobile and software services. Such would be just a start. As computation permeates everything, and digital freedom becomes the most important political issue, the resources of many Mozillas are needed. More on that, soon.

Creative Commons hiring CTO

Monday, July 11th, 2011

See my blog post on the CC site for more context.

Also thanks to Nathan Yergler, who held the job for four years. I really miss working with Nathan. His are big shoes to fill, but also his work across operations, applications, standards, and relationships set the foundation for the next CTO to be very successful.

5 years of posts as wordles

Saturday, January 3rd, 2009

Composition of wordles / CC BY

Unsatisfying, or perhaps this blog is just that uninteresting. Code used to produce yearly wordlists. Some possible improvements:

  • Rewrite as WordPress plugin OR abstract from WordPress
  • Case insensitivity
  • Suppress common words (used Wordle menu for this, but it isn’t very aggressive), perhaps using a word frequency dataset
  • Use free software alternative to Wordle to generate wordclouds (suggestions?)
  • Automate generation of wordclouds (very difficult using Wordle, would involve browser automation, thus previous bullet)

I started doing this in part to see five years of topic changes on this blog, but mostly because if it worked well, I’d use it on the Creative Commons blog, which is a 6+ year mass of around 2,500 almost completely uncategorized/untagged posts. In that vein, I intend to look into automated term extraction and user tagging code.

Uberfact

Monday, February 18th, 2008

There are a number of fun things about a sketch of Uberfact: the ultimate social verifier. The first is that the post could be written without mentioning . The second is that the proposed project is a nice would-be example of political desires sublimated entirely into creating useful and voluntary tools. Third, Mencius Moldbug is a fun writer.

Something like Uberfact should absolutely be built, though I’m far from certain it would hit a sweet spot. It may be too decentralized or too centralized or both. All points from enhancing Wikipedia to the Semantic Web (with Uberfact somewhere between) are complementary and well worth pursuing, particularly if that pursuit displaces malinvestment in politics.

Relatedly, but no time to explain why:

Steps toward better software and content

Saturday, December 1st, 2007

The Wikimedia Foundation board has passed a resolution that is a step toward Wikipedia migrating to the Creative Commons Attribution-ShareAlike license. I have an uninteresting interest in this due to working at Creative Commons (I do not represent them on this blog), but as someone who wants to see free knowledge “win” and achieve revolutionary impact, I declare this an important step forward. The current fragmentation of the universe of free content along the lines of legally incompatible but similar in spirit licenses delays and endangers the point at which that universe reaches critical mass — when any given project decides to use a copyleft license merely because then being able to include content from the free copyleft universe makes that decision make sense. This has worked fairly well in the software world with the GPL as the copyleft license.

Copyleft was and is a great hack, and useful in many cases. But practically it is a major barrier to collaboration in some contexts and politically it is still based on censorship. So I’m always extremely pleased by any expansion of the public domain. There could hardly be a more welcome expansion than ‘s release of his code (most notably ) into the public domain. Most of the practical benefit (including his code in free software distributions) could have been achieved by released under any free software license, including the GPL. But politically, check out this two minute video of Bernstein pointing out some of the problems of copyright and announcing that his code is in the public domain.

Bernstein (usually referred to as ‘djb’) also recently doubled the reward for finding a security hole in qmail to US$1,000. I highly recommend his Some thoughts on security after ten years of qmail 1.0, also available as something approximating slides (also see an interesting discussion of the paper on cap-talk).

gOS: the web takes and gives

Saturday, November 24th, 2007

I imagine thousands of bloggers have commented on , a Linux distribution featuring shortcuts to Google web application on the desktop and preloaded on a PC sold (out) for $200 at Wal-Mart. Someone asked me to blog about it and I do find plenty interesting about it, so thus this post.

I started predicting that Linux would take over the desktop in 1994 and stopped predicting that a couple years later. The increasing dominance of web-based applications may have me making that prediction again in a couple more years, and gOS is a harbinger of that. Obviously web apps make users care less about what desktop operating system they’re running — the web browser is the desktop platform of interest, not the OS.

gOS also points to a new and better (safer) take on a PC industry business model — payment for placement of shortcuts to web applications on the desktop (as opposed to preloading a PC with crapware) — although as far as I know Google isn’t currently paying anything to the gOS developers or , which makes the aforementioned cheap PC.

This is highly analogous to the Mozilla business model with a significant difference: distribution is controlled largely by hardware distributors, not the Mozilla/Firefox site, and one would expect end distributors to be the ones in a position to make deals with web application companies. However, this difference could become muted if lots of hardware vendors started shipping Firefox. This model will make the relationship of hardware vendors to software development, and particularly open source, very interesting over the next years.

One irony (long recognized by many) is that while web applications pose a threat to user freedoms gained through desktop free and open source software, they’ve also greatly lowered the barriers to desktop adoption.

By the way, the most interesting recent development in web application technology: Caja, or Capability Javascript.