Post Creative Commons

Web data formats Δ 2009-2012

Saturday, March 31st, 2012

January I used WebDataCommons first data publication (triples — actually quads, as URL triple found at is retained — extracted from a small subset of a 2009-2010 Common Crawl corpus) to analyze use of Creative Commons-related metadata. March 23 WDC announced a much, much larger data publication — all triples extracted from the entire Common Crawl 2009/2010 and 2012 corpora.

I’m not going to have time to perform a similar Creative Commons-related analysis on these new datasets (I really hope someone else does; perhaps see my previous post for inspiration) but aggregate numbers about formats found by the two extractions are interesting enough, and not presented in an easy to compare format on the WDC site, for me to write the remainder of this post.

Notes:

  • 2009/10 data taken from https://s3.amazonaws.com/webdatacommons/stats/stats.html and 2012 data taken from https://s3.amazonaws.com/webdatacommons-2/stats/stats.html. Calculated values italicized. Available as a spreadsheet.
  • The next points, indeed all comparisons, should be treated with great skepticism — it is unknown how comparable the two Common Crawl corpora are.
  • Publication of structured data on the web is growing rapidly.
  • Microdata barely existed in 2009/2010, so it is hardly surprising that it has grown tremendously.
  • Overall microformats adoption seems to have stagnated but still hold the vast majority of extracted data. It is possible however that the large decreases in hlisting and hresume and increase in hrecipe use are due to one or a few large sites or CMSes run by many sites. This (indeed everything) bears deeper investigation. What about deployment of micrformats-2 with prefixed class names that I don’t think would be matched by the WebDataCommons extractor?
  • Perhaps the most generally interesting item below doesn’t bear directly on HTML data — the proportion of URLs in the Common Crawl corpora parsed as HTML declined by 4 percentage points. Is this due to more non-HTML media or more CSS and JS?
2009/10 2012 Change (% points or per URL)
Total Data (compressed) 28.9 Terabyte 20.9 Terabyte
Total URLs 2,804,054,789 1,700,611,442
Parsed HTML URLs 2,565,741,671 1,486,186,868
Domains with Triples 19,113,929 65,408,946
URLs with Triples 251,855,917 302,809,140
Typed Entities 1,546,905,880 1,222,563,749
Triples 5,193,276,058 3,294,248,652
% Total URLs parsed as HTML 91.50% 87.39% -4.11%
% HTML URLs with Triples 9.82% 20.37% 10.56%
Typed Entities/HTML URL 0.60 0.82 0.22
Triples/HTML URL 2.02 2.22 0.19
2009/10 Extractor Domains with Triples URLs with Triples Typed Entities Triples % HTML URLs % HTML URLs with Triples % Domains with Triples % Typed Entities % Triples
html-rdfa 537,820 14,314,036 26,583,558 293,542,991 0.56% 5.68% 2.81% 1.72% 5.65%
html-microdata 3,930 56,964 346,411 1,197,115 0.00% 0.02% 0.02% 0.02% 0.02%
html-mf-geo 244,838 5,051,622 7,326,516 28,831,795 0.20% 2.01% 1.28% 0.47% 0.56%
html-mf-hcalendar 226,279 2,747,276 21,289,402 65,727,393 0.11% 1.09% 1.18% 1.38% 1.27%
html-mf-hcard 12,502,500 83,583,167 973,170,050 3,226,066,019 3.26% 33.19% 65.41% 62.91% 62.12%
html-mf-hlisting 31,871 1,227,574 25,660,498 88,146,122 0.05% 0.49% 0.17% 1.66% 1.70%
html-mf-hresume 10,419 387,364 1,501,009 12,640,527 0.02% 0.15% 0.05% 0.10% 0.24%
html-mf-hreview 216,331 2,836,701 8,234,850 84,411,951 0.11% 1.13% 1.13% 0.53% 1.63%
html-mf-species 3,244 25,158 152,621 391,911 0.00% 0.01% 0.02% 0.01% 0.01%
html-mf-hrecipe 13,362 115,345 695,838 1,228,925 0.00% 0.05% 0.07% 0.04% 0.02%
html-mf-xfn 5,323,335 37,526,630 481,945,127 1,391,091,386 1.46% 14.90% 27.85% 31.16% 26.79%
html-mf-*

1,519,975,911 4,898,536,029


98.26% 94.32%
2012 Extractor Domains with Triples URLs with Triples Typed Entities Triples % HTML URLs % HTML URLs with Triples % Domains with Triples % Typed Entities % Triples
html-rdfa 16,976,232 67,901,246 49,370,729 456,169,126 4.57% 22.42% 25.95% 4.04% 13.85%
html-microdata 3,952,674 26,929,865 90,526,013 404,413,915 1.81% 8.89% 6.04% 7.40% 12.28%
html-mf-geo 897,080 2,491,933 4,787,126 11,222,766 0.17% 0.82% 1.37% 0.39% 0.34%
html-mf-hcalendar 629,319 1,506,379 27,165,545 65,547,870 0.10% 0.50% 0.96% 2.22% 1.99%
html-mf-hcard 30,417,192 61,360,686 865,633,059 1,837,847,772 4.13% 20.26% 46.50% 70.80% 55.79%
html-mf-hlisting 69,569 197,027 8,252,632 20,703,189 0.01% 0.07% 0.11% 0.68% 0.63%
html-mf-hresume 9,890 20,762 92,346 432,363 0.00% 0.01% 0.02% 0.01% 0.01%
html-mf-hreview 615,681 1,971,870 7,809,088 50,475,411 0.13% 0.65% 0.94% 0.64% 1.53%
html-mf-species 4,109 14,033 139,631 224,847 0.00% 0.00% 0.01% 0.01% 0.01%
html-mf-hrecipe 127,381 422,289 5,516,036 5,513,030 0.03% 0.14% 0.19% 0.45% 0.17%
html-mf-xfn 11,709,819 26,004,925 163,271,544 441,698,363 1.75% 8.59% 17.90% 13.35% 13.41%
html-mf-*

1,082,667,007 2,433,665,611


88.56% 73.88%
2009/10 – 2012 Change (% points)



% HTML URLs % HTML URLs with Triples % Domains with Triples % Typed Entities % Triples
html-rdfa



4.01% 16.74% 23.14% 2.32% 8.20%
html-microdata



1.81% 8.87% 6.02% 7.38% 12.25%
html-mf-geo



-0.03% -1.18% 0.09% -0.08% -0.21%
html-mf-hcalendar



-0.01% -0.59% -0.22% 0.85% 0.72%
html-mf-hcard



0.87% -12.92% -18.91% 7.89% -6.33%
html-mf-hlisting



-0.03% -0.42% -0.06% -0.98% -1.07%
html-mf-hresume



-0.01% -0.15% -0.04% -0.09% -0.23%
html-mf-hreview



0.02% -0.48% -0.19% 0.11% -0.09%
html-mf-species



0.00% -0.01% -0.01% 0.00% 0.00%
html-mf-hrecipe



0.02% 0.09% 0.12% 0.41% 0.14%
html-mf-xfn



0.29% -6.31% -9.95% -17.80% -13.38%
html-mf-*






-9.70% -20.45%
2009/10 – 2012 Change (%%)



% HTML URLs % HTML URLs with Triples % Domains with Triples % Typed Entities % Triples
html-rdfa



718.95% 294.55% 822.40% 134.99% 144.98%
html-microdata



81515.61% 39220.30% 29290.79% 32965.42% 53156.82%
html-mf-geo



-14.84% -58.97% 7.07% -17.33% -38.64%
html-mf-hcalendar



-5.34% -54.39% -18.73% 61.45% 57.22%
html-mf-hcard



26.74% -38.94% -28.91% 12.55% -10.19%
html-mf-hlisting



-72.29% -86.65% -36.21% -59.31% -62.97%
html-mf-hresume



-90.75% -95.54% -72.26% -92.22% -94.61%
html-mf-hreview



20.01% -42.18% -16.83% 19.99% -5.73%
html-mf-species



-3.70% -53.61% -62.99% 15.76% -9.55%
html-mf-hrecipe



532.05% 204.50% 178.58% 903.02% 607.21%
html-mf-xfn



19.63% -42.36% -35.72% -57.13% -49.94%
html-mf-*






-9.87% -21.68%

no copyright law in the universe is going to stop me [from demanding compliance with various UN human rights and cultural diversity declarations]

Saturday, March 3rd, 2012

Currently the first autocompletion result upon typing “no copyright” into YouTube’s search is “no copyright law in the universe is going to stop me”, which is apparently a string used in the description of 108 videos on YouTube, and the title of at least one. It seems this phrase is primarily an anti-SOPA expression rather than an admonition to not take down whatever video is described.

Andy Baio pointed out late last year that disclaimers of intent to infringe others’ copyrights and claims of fair use frequently appear in the descriptions of videos on YouTube. He noted 489,000 and 664,000 results for the queries "no copyright" and "copyright" "section 107". Those numbers may have grown significantly in the last nearly 3 months, but should be taken with a huge grain of salt. Yesterday for me, “no copyright” obtained 906,000, while today YouTube has said both 972,000 and 3,850,000 to the same query. For “copyright” “section 107”, yesterday 771,000, today 418,000. I don’t know how many videos were on YouTube 3 months ago, but yesterday an empty query claimed 567,000,000; today I’ve seen 537,000,000 and 550,000,000 — maybe on the order of 1% of videos have some sort of copyright disclaimer. But there are variations that might not be picked up by the queries Baio used, including for example two of the descriptions I posted a few days ago.

Although they’re probably completely useless in preventing automated takedowns and in court (though it’s not entirely clear they ought be useless in either case), as expression they’re at the very least interesting, and perhaps more. Though they can be seen as “voodoo charms”, so can the ubiquitous “all rights reserved”, and even meaningful public copyright licenses can be seen as such to the extent they are misunderstood or totemic. My main objection to the disclaimers Baio brought attention to is that they’re clutter to the extent they crowd out writing or reading other information about works; and just about anything else is more useful, from provenance to expressions of appreciation, eg “In my opinion, one of the greatest songs of the ’80s.”

But my first reaction to such disclaimers is the wish that they would be more expressive, even substantial. Regarding the latter, in many cases the uploader has added something to or rearranged the work in question — e.g., where the work is a song, the addition of images, or the performance of a cover. How often does the uploader grant permissions to use whatever expression they’ve added? (I don’t know; one aggregate tool for exploring such might be the addition of &creativecommons=1 to the aforementioned queries, which will limit results to those marked as CC-BY.) One fairly well known case of something like this is Girl Talk’s All Day:

All Day by Girl Talk is licensed under a Creative Commons Attribution-Noncommercial license. The CC license does not interfere with the rights you have under the fair use doctrine, which gives you permission to make certain uses of the work even for commercial purposes. Also, the CC license does not grant rights to non-transformative use of the source material Girl Talk used to make the album.

Too bad with the NonCommercial condition, and I really don’t like Girl Talk’s music (for something kind of similar that I prefer aesthetically and in terms of permissions granted, check out xmarx), but otherwise that’s a great statement.

Over the past few months someone or some people have made me aware of another example, one that replaces disclaimers with demands. You can see some of this on my English Wikipedia user talk page (start at “Common IP” — unfortunately webcitation.org doesn’t pass through internal links, so you’ll have to scroll down). It may appear that my correspondent is religious and communicating poorly through automated translation between Russian and English, but there’s a kernel of something interesting there. If I understand correctly, they think that without listening to the Beatles, one cannot develop morally (that comes from elsewhere, not on my talk page) and that per a variety of United Nations declarations concerning human rights and especially cultural diversity, anyone has the legal right and moral duty to share Beatles material. As far as I know they started this campaign at beatles1.ru and moved on to other sites, including Wikipedia. It is pretty clear that they’re not looking for links to beatles1.ru or some other site they control — I think they’re sincerely promoting something they believe in, not a money-making scam.

The flaws in their campaign are legion, not least of which is that there could hardly be a worse body of work than that of the Beatles around which to plead for rights to share in the name of cultural diversity. As the Beatles are one of if not the most popular acts ever, the most obvious conclusion is that more Beatles exposure must lower global cultural diversity. On the related issue of cultural preservation, super-famous material like that of the Beatles is going to survive for a long time in spite of copyright restrictions, even vigorously enforced (see James Joyce).

As to their persistent requests for some kind of permission from me to proceed with their campaign, I say two things:

  1. As far as the copyright regime is concerned, the permissions I have to grant to you are nil.
  2. As far as demands made in the name of human rights, no human requires permission from any other to pursue those. Godspeed to you, or perhaps I should say, Beatlespeed!

I want to thank my correspondent for causing me to look at the and subsequent documents. The way they address “intellectual property”, to the extent that they do, is more curious than I would’ve thought. I leave that to a future post.

p.s. My favorite Beatles.

Pinterest Exclusion Protocol

Tuesday, February 28th, 2012
1
<meta name="pinterest" content="nopin"/>

Weirdly vendor-specific and coarse at the same time. Will other sites follow this directive, which could mean something like “don’t repost images referenced in this page”, which does differ a bit from:

1
<meta name="robots" content="noimageindex"/>

Not to mention actually using the Robots Exclusion Protocol, and perish the thought, POWDER, or even annotating individual images with microdata/formats/RDFa.

Then there’s the Spam Pinterest Spam Protocol, I mean “pin this” button. I have not been following web actions/activities/intents development beyond the headlines, but please rid us of the so-called NASCAR effect.

Not entirely orthogonal to these vendor-specific exclusion and beg-for-inclusion protocols, are images released under public licenses — not entirely orthogonal as nopin seems to be aimed at countering copyright threats (supplementing DMCA takedown compliance), which public licenses, at least free ones, waive conditionally; and releasing work under a public license is a more general invitation to include.

As far as I can tell Pinterest relies on no public license, and thus complies with no public license condition (ie license notice and attribution). As it probably should not, given its strategy appears to be relying on safe harbors and making it possible for those who want to make an effort to opt-out entirely to do so: public licenses are superfluous. Obviously Pinterest could have taken a very different strategy, and relied on public copyright licenses and public domain materials — at a very high cost: pintereters(?) would need to know about copyright, which is hugely less fun than pinning(?) whatever looks cool.

Each of these (exclusion, inclusion, copyright mitigation strategy) are fine examples of simple-ugly-but-works vs. general-elegant-but-but-but…

I’m generally a fan of general-elegant-but-but-but, but let’s see what’s good and hopeful about reality:

  • “Don’t repost images referenced in this page” is a somewhat understandable preference; let’s assume people get something out of expressing and honoring it. nopin helps people realize some of this something, using a <meta> tag is not ridiculous, and if widely used, maybe provides some in-the-wild examples to inform more sophisticated approaches.
  • I can’t think of anything good about site-specific “share” buttons. But of the three items in this list, I have by far the highest hope for a general-elegant mechanism “winning” in the foreseeable future.
  • Using copyright exceptions and limitations is crucial to maintaining them, and this is wholly good. Now it’d be nice to pass along the availability of a public license, even if one is not relying on such, as a feature for users who may wish to rely on the license for other purposes, but admittedly providing this feature is not cost-free. But I also want to see more projects services (preferably also free and federated, but putting that issue aside) that do take the strategy of relying on public licenses (which does not preclude also pushing on exceptions and limitations) as such have rather different characteristics which I think have longer-term and large benefits for collaboration, policy, and preservation.

IMG_4346
Pin It

5 years of version 3.0 of Creative Commons licenses

Saturday, February 25th, 2012

img_0263.jpg

Version 3.0 of the main Creative Commons licenses were released 2007-02-23, that is 5 years and 2 days ago, remaining current much longer than previous versions* (1.0: 17 months, 2.0: 12 months; 2.5: 20 months). Hopefully the eventual version 4.0 licenses will remain current much longer still.

Probably the most important developments that 3.0 contributed to were the adoption of CC-BY-SA as the primary license used by Wikimedia projects and the use of various CC licenses (most often CC-BY) by governments and other policy-making entities. 3.0 was probably not absolutely necessary and certainly only a small part of these developments, but it surely helped, e.g., by addressing some of the Debian community’s concerns (which overlap with the views of many Wikimedians) and through further internationalization.

Congratulations to everyone who worked on 3.0 5-6 years ago, especially CC’s General Counsel at the time, Mia Garlick, CC affiliates, and especially (and with some overlap) people who participated in public discussions!

* A further bit of trivia: depending on how one ‘counts’ (eg, not if one counts each edit on Wikipedias!), version 2.0 licenses probably remains the most used, as those are baked into Flickr. During 4.0’s long tenure, I very much hope to see new tools and forms which make it the most used CC license version, even if Flickr does not change at all and continues to grow.

Permissions are job 0 for public licenses

Saturday, February 25th, 2012

Copyright permission is the only mechanism that almost unambiguously is required to maximize social value realized from sharing and collaboration around intangible goods (given that copyright exists):

  • Some people think the addition of conditions that are in effect non-copyright regulation are also required, but others disagree, and given widespread ignorance about and noncompliance with copyleft regulation, I put in the class of probably important (is there anyone conducting serious research around this question?) rather than that of unambiguously required. In any case, current copyleft conditions would be nonsensical if not layered on top of permissions.
  • I’ve heard the argument made that no mechanism is needed: culture aided by the net will route around copyright and other restrictions, just ignore them. I can’t find a good example, but some exhortations and the like of copyheart and kopimi are a subset of the genre. But unless one can make the case that the participation of wealthy litigation targets (any significant organization, from IBM to Wikimedia) is a net negative (and that’s only the first hurdle for such an argument to clear), a mechanism for permissions that appear legally sound to the copyright regime seem unambiguously necessary.
  • There are lots of other real and potential restrictions that permission can and may be possible to grant around, but so much progress has been made with only copyright permissions explicitly granted, and how other restrictions will play out largely a matter of speculation, that I put other permissions also in the class of probably important rather than unambiguously required.

Each of these merit much more experimentation and critique, but while any progress on the first two will inevitably be controversial, progress on the third ought be celebrated and demanded. (For completeness sake, progressive changes in social policy must also be celebrated and demanded, but out of scope for this post.) I see few excuses for new licenses and dedications to not aggressively grant every permission that might be possible or needed, nor for new projects to use instruments that are not so aggressive (with the gigantic constraints that use of existing works and the non-existence of perfect instruments impose), nor for communities that vet instruments to give a stamp of approval to such instruments — indeed if politics and path dependencies were not an issues, such communities ought to push non-aggressive instruments to some kind of legacy status.

In this context I am happy with the outcome of the submission of CC0 to the Open Source Initiative for approval: due to not only lack of, but explicit exclusion of patent permissions, Creative Commons has withdrawn the submission. Richard Fontana’s and Bruce Perens’ contributions to the thread are instructive.

I still think that CC0 is the best thing Creative Commons has ever done — indeed I think that largely because of the above considerations; I don’t know of an instrument that makes as thorough attempt to grant permission around all copyright, related, and neighboring restrictions (patents aren’t in any of those classes) — and remain very happy that the Free Software Foundation considers CC0 to be GPL-compatible (I put GPL-incompatibility in a class of avoidable failure separate from but not wholly unlike not granting all permissions that may be possible, unless one is experimenting with some really novel copyleft regulation).

From the OSI submission thread, I also highly recommend Carl Boettiger’s plea for a public domain instrument appropriate for heterogeneous (code/data/other) products. It will (and ought to) take Creative Commons a long time to vet any potential new version of CC0, but fortunately as I’ve pointed out before, there is plenty of room for experimentation with public domain mechanisms, especially around branding (as incompatibility is less of an issue; compare with copyleft (although if one made explicit compatibility a requirement, there is plenty of potentially beneficial exploration to be done there, too)). An example of such that attempts to include a patent waiver is the Ampify Unlicense (background post).

I hope that the CC0/OSI discussion prompts a race to the bottom for public domain instruments, as new ones attempt to carve out every possible permission. This also ought beneficially affect future permissive and copyleft licenses, which also ought grant every permission possible, whatever conditions they layer on top. Note that adding one such permission — around sui generis database restrictions, is probably the most pressing reason for Creative Commons to have started working on version 4.0 of its licenses. I also hope that the discussion leads to increased collaboration and knowledge sharing (at the very least) across domains in which public licenses are used, taking into account Boettiger’s plea and the realities that such licenses are very often used across several domains (a major point of my recent FOSDEM talk, see especially slides 8-11) and that knowledge concerning commons governance is very thin in every domain.

But keep in mind that most of this post concerns very small potential gains relative to merely granting copyright permission (assuming no non-free conditions are present) and even those are quite a niche subject.☻

FOSDEM 2012 and computational diversity

Saturday, February 11th, 2012

I spent day 1 of FOSDEM in the legal devroom and day 2 mostly talking to a small fraction of the attendees I would’ve liked to meet or catch up with. I didn’t experience the thing I find in concept most interesting about FOSDEM: devrooms (basically 1-day tracks in medium sized classrooms) dedicated to things that haven’t been hyped in ~20 years but are probably still doing very interesting things technically and otherwise, eg microkernels and Ada.

Ada has an interesting history that I’d like to hear more about, with the requirement of highly reliable software (I suspect an undervalued characteristic; I have no idea whether Ada has proven successful in this regard, would appreciate pointers) and fast execution (on microbenchmarks anyway), and even an interesting free software story in that history, some of which is mentioned in a FOSDEM pre-interview.

I suppose FOSDEM’s low cost (volunteer run, no registration) and largeness (5000 attendees) allows for such seemingly surprising, retro, and maybe important tracks — awareness of computational diversity is good at least for fun, and for showing that whatever systems people are attached to or are hyping at the moment are not preordained.

I also wanted to mention one lightning talk I managed to see — Mike Sheldon on Libre.fm [update 20120213: video], which I think is one of the most important software projects for free culture — because it facilitates not production or sharing of “content”, but of popularity (I’ve mentioned as “peer production of [free] cultural relevance”). Sheldon (whose voice you can hear on the occasional Libre.fm podcast) stated that GNU FM (the software libre.fm runs) will support sharing of listener tastes across installations, so that a user of libre.fm or a personal instance might tell another instance (say one set up for a local festival) to recommend music that instance knows about based on a significant history. Sounds neat. You can see what libre music I listen to at alpha.libre.fm/user/mlinksva and more usefully get recommendations for yourself.

Addendum: In preemptive defense of this post’s title, of course I realize neither microkernels nor Ada are remotely weird, retro, alternative, etc. and that there are many other not quite mainstream but still relevant and modern systems and paradigms (hmm, free software desktops)…

2012-02-03%2008.26.46
2012-02-04%2002.44.16
2012-02-05%2001.44.49

It started snowing as soon as I arrived in Brussels, and was rather cold.

2012-02-06%2002.44.32

I got on the wrong train to the airport and got to see the Leuven train station. I made it to the airport half an hour before my flight, and arrived at the gate during pre-boarding. Try that in a US airport.

FOSDEM 2012 Legal Issues DevRoom

Thursday, February 9th, 2012

I attended and spoke at the FOSDEM 2012 Legal Issues DevRoom (Update 20120217: slides, blog posts) organized by Tom Marble, Bradley Kuhn, Karen Sandler, and Richard Fontana. I understand the general idea was to gather people for advanced discussions of free/libre/open source software legal and policy issues, bypassing the “what is copyright?” panel that apparently afflicts such conferences (I haven’t noticed, but don’t go to many FLOSS conferences; I bet presenters usually get the answer only superficially correct). I thought the track mostly succeeded (consider this high praise) — presentations did cover contemporary issues that mostly only people following FLOSS policy would have heard of, but I wished for just a bit more that would be news or really provocative to such people. In part I think 30 minute time slots were to blame — long enough for presenters to belabor background points, short enough for no substantive discussion. Given only 30 minutes, I personally probably would have benefited from a 15 minute speaking limit, thus being forced to state only important points, and leaving a little time for participants to tear those apart. Yes, I should have imposed that discipline on myself, but did not think of it until now.

Philippe Laurent gave an overview of cases involving “Open Licences before European Courts”. He did not list one recent “open content” case, Gerlach vs. DVU.

Ambjörn Elder on “The Methods of FOSS Activism” spoke about political activism; a worthy topic, but I hope for more discussion of activism for software freedom, rather than against ever worse policy.

In place of Armijn Hemel’s “Goes into an Executable? Identifying a Binary’s Sources by Tracing Build Processes” (missed flight) Kuhn and Sandler excerpted from a presentation on and took questions regarding nonprofit homes for free software projects. Writing this reminded me to make a donation to Software Freedom Conservancy, of which Kuhn and Sandler are respectively ED and Secretary of. Somewhat tangentially, I don’t find the topic boring, but I do find the lack of information, informed-ness (including mine), and tools regarding it boring. I don’t know of any libre documentation on running a nonprofit — I’d love to see a series of FLOSS Manuals on this. OneClickOrgs is a fairly new free software project to handle some aspects of governing a small organization, but I don’t know how useful it is at this point. Related to lack of documentation, some of the Q&A emphasized how little people know of these topics across jurisdictions — nevermind rule minutiae, even the existence of relevant “home” organizations.

Dave Neary on “Grey Areas of Software Licensing” questioned whether one could legally do various things, using examples largely drawn from GIMP development. The answer is always maybe. Fortunately developers sometimes take that as yes.

Allison Randal gave an overview of FLOSS history with a focus on legal arrangements in “FLOSSing for Good Legal Hygiene: Stories from the Trenches”.

Michael Meeks on “Risks vs. Benefits on Copyright Assignment” made the case that assignment (and some non-assingment contributor agreements) is harmful to participation, and proprietary re-licensing has not proven a good business, so a corporate sponsored software project ought to either be free (sans assignment and potential for propreitary relicensing) or proprietary, and fully enjoy the benefits of one or the other, rather than neither. He also indicated that permissive licensing can be better than copyleft for a free software project with copyrights held by a corporation, as the former gives all effectively equal rights, while the latter abets proprietary relicensing and ridiculous claims that the corporate sponsor will protect the community. Meeks repeatedly called on the FSF to abandon assingment, as for-profits disingenuously cite FSF’s practice in support of their own (FSF ED John Sullivan responded that they are getting corrections made where FSF practice is inappropriately cited and will work on explaining their practice better). Finally, Meeks requested an “ALGPL” which would require sharing of modified sources used to provide a network service, like the AGPL, but allow modifications that only link to or the equivalent ALGPL codebase to not be shared. I don’t know whether he wants GPL or LGPL behavior if such modificaitons are distributed. I was somewhat chagrined (but understanding; just not enough time, and maybe nobody submitted a decent proposal) that this was the only1 discussion of network services!

Loïc Dachary on “Can for-profit companies enforce copyleft without becoming corrupt like MySQL AB?” said yes, if they aren’t the sole copyright holders; on projects he is hired to work on, he seeks out additional contributors who will hold copyright independently.

John Sullivan in “Is copyleft being framed?” presented some new data, apparently replicable (based on Debian package metadata), showing that GPL-family licenses are used in the vast majority (did I hear 87%?) of Debian packages. Update 20120217: I did hear 87%, in 2009, and 93% in 2011. Note some software available under multiple licenses. Slides.

Richard Fontana on “The (possible) decline of the GPL, and what to do about it” suggested the need to start thinking about GPLv4, but I’m not sure for what issues2 — doesn’t matter; if the particulars of licenses can make a big difference, requirements for the next version of important ones should always be a relevant topic, even if there is no expectation of creating another version for many years. Fontana also indicated that perhaps the next (massively adopted, presumably) copyleft might not be created by an existing steward3 (meaning the FSF, or obviously CC in many non-software fields), which I take as an indication that license innovation is possibly more important than compatibility and non-proliferation.

I don’t remember much of panels with Hugo Roy, Giovanni Battista Gallus, Bradley Kuhn, Richard Fontana on application stores and Ciarán O’Riordan, Benjamin Henrion, Deb Nicholson, Karen Sandler on software patents, as I was probably preparing for my talk, but I trust that free software is still important if mode of delivery changes slightly and that software patents ought be abolished.

I spoke on “⊂ (FLOSS legal/policy ∩ CC [4.0])” (slides: odp, pdf, slideshare). Contrary to my apology I didn’t blog much of the talk beforehand. I will get to all of the topics eventually.

Most of the slides from the day should be available soon on the DevRoom’s page. Some audio might be available as well eventually.

Kuhn demonstrated his qualifications for another fallback career: crowd crontol. Fontana blogged a summary of the devroom. Sandler gave the most important talk on FLOSS policy (but not at FOSDEM). Marble apparently did almost all the organizing. Thanks to all! There will be another legal/policy devroom next year.

Addendum 20120210: Richard Fontana offered these corrections:

1“re network services, I mentioned rise as factor in possible GPL decline, coupled with AGPL pwned by dual-license hucksters”

2“main reason for GPLv4 right now is GPLv3 is needlessly complex, limiting popularity of strong copyleft.”

3“growing concern that anti-license-proliferationism concentrates power in privileged Establishment organizations”

8 year Refutation Blog

Saturday, February 4th, 2012

I first posted to this blog exactly 8 years ago, after a few years of dithering over which blog software to use (WordPress was the first that made me not feel like I had to write my own; maybe I was waiting for 1.0, released January 2004).

A little over two years ago I had the idea for a “refutation blog”: after some number of years, a blogger might attempt to refute whatever they wrote previously. In some cases they may believe they were wrong and/or stupid, in all cases, every text and idea is worthy of all-out attack, given enough resources to carry out such, and passing of time might allow attacks to be carried out a bit more honestly. I have little doubt this has been done before, and analogously for pre-blog forms; I’d love pointers.

The last two Februaries have passed without adequate time to start refuting. In order to get started (I could also write software to manage and render refutations, and figure out what vocabulary to use to annotate them, and unlikely but might in the fullness of time, but I won’t accept the excuse for years more of delay right now) I’m lowering my sights from “all-out attack” to a very brief attack on the substance of a previous post, and will do my best to avoid snarky asides.

I have added a refutation category. I will probably continue non-refutation posts here (and hope to refute those 8 years after posting). I may eventually move my current blogging or something similar to another site.

Back to that first post, See Yous at Etech. “Alpha geeks” indeed. With all the unintended at the time, but fully apparent in the name, implication of status seeking and vaporware over deep technical substance and advancement. The “new CC metadata-enhanced application” introduced there was a search prototype. The enhancement was a net negative. Metadata is costly, and usually crap. Although implemented elsewhere since then, as far as I can tell a license filter added to text-based search has never been very useful. I never use it, except as a curiosity. I do search specific collections, where metadata, including license, is a side effect of other collection processes. Maybe as and if sites automatically add annotations to curated objects, aggregation via search with a license and other filters will become useful.

Copyleft regulates

Tuesday, January 31st, 2012

Copyleft as a pro-software-freedom regulatory mechanism, of which more are needed.

Existing copyleft licenses include conditions that would not exist (unless otherwise implemented) if copyright were abolished. In other words, copyleft does not merely neutralize copyright. But I occasionally1 see claims that copyleft merely neutralizes copyright.

A copyleft license which only neutralized copyright would remove all copyright restrictions on only one condition: that works building upon a copyleft licensed work (usually as “adaptations” or “derivative works”, though other scopes are possible) be released under terms granting the same freedoms. Existing copyleft licenses have additional conditions. Here is a summary of some of those added by the most important (and some not so important) copyleft licenses:

License Provide modifiable form2 Limit DRM Attribution Notify upstream3
BY-SA y y
FDL y y y
EPL y y
EUPL y y
GPL (including LGPL and AGPL) y y
LAL y
MPL (and derivatives) y y
ODbL y y y
OFL y
OSL y y
OHL y y y

I’ve read each of the above licenses at some point, but could easily misremember or misunderstand; please correct me.

There’s a lot more variation among them than is captured above, including how each condition is implemented. But my point is just that these coarse conditions would not be present in a purely copyright neutralizing license. To answer two obvious objections: “attribution”4 in each license above goes beyond the bare minimum license notice that would be required to satisfy the condition of releasing under sufficient terms, and “limit DRM” refers only to conditions prohibiting DRM or requiring parallel distribution (which all of those requiring modifiable form do in a way, indirectly; I’ve only called out those that explicitly mention DRM), not permissions5 granted to circumvent.

I’m not sure there’s a source for the idea that copyleft only neutralizes copyright. Probably it is just an intuitive reading of the term that has been arrived at independently many times. The English Wikipedia article on copyleft doesn’t mention it, and probably more to the point, none of the main FSF articles on copyleft do either. The last includes the following:

Proprietary software developers use copyright to take away the users’ freedom; we use copyright to guarantee their freedom. That’s why we reverse the name, changing “copyright” into “copyleft.”

Copyleft is a way of using of the copyright on the program. It doesn’t mean abandoning the copyright; in fact, doing so would make copyleft impossible. The “left” in “copyleft” is not a reference to the verb “to leave”—only to the direction which is the inverse of “right”.

Copyleft is a general concept, and you can’t use a general concept directly; you can only use a specific implementation of the concept.

This is very clear — the point of copyleft is to promote and protect (“guarantee” is an exaggeration) users’ freedom, and that includes their access to source. The major reason I like to frame copyleft as regulation6 is that if access to source is important to software freedom (or otherwise socially valuable), it probably makes sense to look for additional regulatory mechanisms which might (and appreciate ones that do) contribute to promoting and protecting access to source, as well as other aspects of software freedom. Such mechanisms mostly aren’t/wouldn’t be “copyleft” (though at this point, some of them would simply mandate a copyleft license), but the point is not a relationship with copyright, but promoting and protecting software freedom.

If software freedom is important, surely it makes sense to look for additional mechanisms to promote and protect it. As others have said, licenses are difficult to enforce and/or few people are interested in doing it, and copyleft can be made irrelevant through independent non-copyleft implementation, given enough desire and resources (which the largest corporations have), not to mention the vast universe of cases in which there is no free software alternative, copyleft or not. I leave description and speculation about such mechanisms for a future post.


1For example, yesterday Rob Myers wrote:

Copyleft is a general neutralization of copyright (rather than a local neutralization, like permissive licences). Nothing more.

Only slightly more ambiguously, late last year Jason Self wrote:

Copyright gives power to restrict what other people can do with their own copies of things. Copyleft is about restoring those rights: It takes this oppressive law, which normally restricts people and takes their rights away, and make those rights inalienable.

Well said…but not exactly. I point these out merely as examples, not to make fun of Myers, who is one of the sharpest libre thinkers there is, or Self, who as far as I can tell is an excellent free software advocate.

2Note it is possible to have copyleft that doesn’t require source. As far as I know, such only exists in licenses not intended for software. But I think source for non-software is very interesting. The other obvious permutations — a copyleft license for software that does not include a source requirement, and a non-copyleft license that does include a source requirement, are curiosities that do not seem to exist at all — probably for the better, although one can imagine questionable use cases (e.g., self-modifying object code and transparency as only objective).

3As I’ve mentioned previously, requiring upstream notification likely makes the TAPR OHL non-free/open. But I list the license and condition here because it is an interesting regulation.

4One could further object that one ought to consider so-called “economic” and “moral” aspects of copyright separately, and only neutralize the former; attribution perhaps being the best known and least problematic of the former.

5Although existing copyleft licenses don’t only neutralize restrictions (one that did would be another curiosity; perhaps the License Art Libre/Free Art License currently comes closest), it is important that copyright and other restrictions are adequately neutralized — in particular modern public software licenses include patent grants, and GPLv3 permits DRM circumvention (made illegal by some copyright-related legislation such as the DMCA), while version 4.0 of CC licenses will probably grant permissions around “sui generis” restrictions on databases. Such neutralization is only counter-regulatory (if one sees copyright as a regulation), not pro-regulatory, as are source and other conditions discussed above.

6Regulation in the broadest sense, including at a minimum typical “government” and “market” regulation, as I’ve said before. By the way, it could be said that those who advocate only permissive licenses are anti-regulatory, and I imagine that if lots of people thought about copyleft as regulation, this claim would be made — but it would be a problematic claim, as permissive licenses don’t do much (or only do so “locally”, as Myers obliquely put it in the quote above) against the background regulation of copyright restrictions.

Someday knowing the ins and outs of copyright will be like knowing the intricate rules of internal passports in Communist East Germany

Thursday, January 26th, 2012

Said Evan Prodromou, who I keep quoting.

I repeat Evan as a reminder and apology. I’ve blogged many times about copyright licenses in the past, and will have a few detailed posts on the subject soon in preparation for a short talk at FOSDEM.

Given current malgovernance of the intellectual commons, public copyright licenses are important for freedom. They’re probably also important trials for post-copyright regulation (meant in the broadest sense, including at least “market” and “government” regulatory mechanisms), eg of ability to inspect and modify complete and corresponding source.

At the same time, the totemic and contentious role copyright licenses (and sometimes assignment or contributor agreements, and sometimes covering related wrongs and patents) play in free/libre/open works, projects, and communities often seems an unfortunate misdirection of energy at best, and probably looks utterly ridiculous to casual observers. I suspect copyright also takes at least some deserved limelight, and perhaps much more, from other aspects of governance, plain old getting things done, and activism around other issues (regarding the first, some good recent writings includes those by Simon Phipps and Bradley Kuhn, but the prominence of copyright arrangements therein reinforces my point). But this all amounts to an additional reason it is important to get the details of public copyright licenses right, in particular compatibility between them where it can be achieved — so as to minimize the amount of time and energy projects put into considering and arguing about the options.

Obviously the energy put into public licenses is utterly insignificant against that spent on other copyright/patent/trademark complex activities. But I’m not going to write about that in the near future, so it isn’t part of my apology and rationalization.

Someday I hope that knowing the ins and outs of both Internal Passports of the mind and international passports will be like knowing the rules of internal passports in Communist East Germany (presumably intricate; I did not look for details, but hopefully they exist not many hops from a Wikipedia article on Eastern Bloc emigration and defection).