Post Wikipedia

Call for mini-essays on “the cost of freedom” in free knowledge movements in honor of Bassel Khartabil

Thursday, October 29th, 2015

Dear friends,

I’m helping organize a book titled “The Cost of Freedom” in honor of Bassel Khartabil, a contributor to numerous free/open knowledge projects worldwide and in Syria, where he’s been a political prisoner since 2012, missing and in grave danger since October 3. You can read about Bassel at and lots more at and

Much of the book is going to be created at a face-to-face Book Sprint in Marseille Nov 2-6; some info about that and the theme/title generally at

We’re also asking people like yourself who have been fighting in the trenches of various free knowledge movements (culture, software, science, etc.) to contribute brief essays for inclusion in the book. One form an essay might take is a paragraph on each of:

* An issue you’ve faced that was challenging to you in your free knowledge work, through the lens on “cost”; perhaps a career or time opportunity cost, or the cost of dealing with unwelcoming or worse participants, or the cost of “peeling off layer upon layer the proprietary way of life” as put in
* How you addressed this challenge, or perhaps have yet to do so completely
* Advice to someone starting out in free knowledge; perhaps along the lines of had you understood the costs, what would you have done differently

But feel free to be maximally creative within the theme. We don’t have a minimum or a maximum required length for contributed essays, but especially do not be shy about concision or form. If all we get is haiku that might be a problem, or there might be a message in that of some sort.

Other details: The book will be PUBLISHED on Nov 6. We need your contribution no later than the end of Nov 3 UTCThursday, Nov 5 at 11:00 UTC (Paris: noon; New York: 6AM; Tokyo: 9PM) to be included. The book will be released under CC0; giving up the “right” to sue anyone for any use whatsoever of your contribution is a cost of entry…or one of those proprietary layers to be peeled back. Send contributions to

Feel free to share this with other people who you know have something to say on this topic. We’re especially looking for voices underrepresented in free knowledge movements.


p.s. Please spread the word about #freebassel even if you can’t contribute to the book!

AcaWiki non-summary

Sunday, October 25th, 2015

Six years ago I helped launch AcaWiki, a site based on Semantic MediaWiki (software for which I had very high expectations, mostly transferred to Wikidata) for summarizing academic research.

A substantial community failed to materialize. I’ve probably been the only semi-consistent contributor over its entire six years. The best contributions have come from Jodi Schneider, who summarized a bunch of papers related to her research on the semantic web and online discourse, Benjamin Mako Hill, who summarized his PhD qualification exam readings, and Nate Matias who did the same and added a bunch of summaries related to online harassment. Students of an archaeology course taught by Ben Marwick summarized many papers as part of the class. Thank you Jodi, Mako, Nate, Ben, and a bunch of people who have each contributed one or a few summaries.

I’m not going to try to enumerate the deficiencies of AcaWiki here. They boil down to lack of time dedicated to outreach and to improving the site, and zero effort to raise funds to support such work, following a small startup grant obtained by AcaWiki’s founder Neeru Paharia, who has since been busy earning a doctorate and becoming a professor. With Neeru I’ve been the organization’s other long-term director so bear responsibility for this lack of effort. In retrospect dedicating more time to AcaWiki these last years at a cost to non-collaborative activity (e.g., this blog) would have been wise. I haven’t moved to take the other obvious course of shutting down the site, because I still believe something like it is badly needed, not least by me, as I wrote in 2009:

This could be seen as an end-run around access and copyright restrictions (the Open Access movement has made tremendous progress though there is still much to be done), but AcaWiki is a very partial solution to that problem — sometimes an article summary (assuming AcaWiki has one) would be enough, though often a researcher would still need access to the full paper (and the full dataset, but that’s another battle).

More interesting to me is the potential for AcaWiki summaries to increase the impact of research by making it more accessible in another way — comprehensible to non-specialists and approachable by non-speedreaders. I read a fair number of academic papers and many more get left on my reading queue unread. A “human readable” distillation of the key points of articles (abstracts typically convey next to nothing or are filled with jargon) would really let me ingest more.

This has held true even given AcaWiki’s tiny size to date: I regularly look back at summaries I’ve written to remember what I’ve read, and wish I summarized much more of what I’ve read, because most of it I’ve almost totally forgotten! I recommend summarizing papers even though it is hard.

Much harder still and more valuable are literature reviews. These were envisioned to be a part of AcaWiki, but I now think that every Wikipedia article should effectively be a literature review (and more). A year ago I blogged about an example of Wikipedia article as literature review led by James Heilman. Earlier this year Heilman wrote a call to action around a genre of literature review, Open Access to a High-Quality, Impartial, Point-of-Care Medical Summary Would Save Lives: Why Does It Not Exist? (which of course I summarized on AcaWiki). I have a partially written commentary on this piece but for now I can only urge you to read Heilman, or start with and improve my summary.

This brings me to one of my excuses for not dedicating more time to AcaWiki: hope that it would be superseded by a project directly under the Wikimedia umbrella, benefiting from that organization’s and movement’s scale. But, I’ve done almost nothing to make this happen, either. I imagine the current effort that could lead in that direction is WikiProject Open Signalling OA-ness, as I’ve noted at the top of a page on AcaWiki listing similar projects. By far the best project on the list is Journalist’s Resource, also launched in 2009, with vastly greater resources. The projects listed so far as “similar” must only the tip of an iceberg of efforts to summarize academic research, for it’s widely recognized (yes, citation needed; I just created a placeholder on AcaWiki for gathering these) that summarization in various forms is valuable and much more is needed.

If this hasn’t been enough of a ramble already, I’ll close with miscellaneous notes about and unsorted to-dos AcaWiki:

  • Very brief summaries, perhaps 140 character or not much longer, would be useful complements to longer summaries. It would be easy to add a short summary field to AcaWiki.
  • For summaries of articles which are themselves freely licensed, it might be useful to include the author’s abstract in AcaWiki. Again, it would be easy to add a field.
  • There’s lots of research on automated summarization, some of it producing open source tools. These could be applied to initialize summaries, either for human summaries, or en masse bot summary creation.
  • I have added a field for an article’s Wikidata identifier. AcaWiki is one of a handful of sites potentially using Wikidata for authority control. There will be many more. But it’d be far more useful to do something with that identifier, most obviously to ingest article metadata from Wikidata and create Wikidata items/push metadata to Wikidata where items corresponding to summarized articles do not exist. I’ve not yet seriously looked into how much of this can be currently accomplished using Wikibase Client.
  • Last month there was debate about a program giving some Wikipedia contributors gratis access to closed academic journals. Does this program help improve Wikipedia as a free resource, or promote non-free literature? It must do some of both; which is the bigger impact on long-term free knowledge outcomes probably depends on one’s perspective. My bias is that improving and promoting free resources is vastly more important than suppressing non-free ones. But I also think that free academic summaries could help in both respects. For Wikipedia readers, a reference with an immediately available summary would be more useful than one without. The summary would also reduce the need to access the original non-free article. AcaWiki in its current state is inadequate, but perhaps the the debate ought motivate more work on free academic summaries, here or elsewhere.
  • Has any closed access publisher freed only article abstracts (including a free license; abstracts are almost always gratis access)? This would be useful to a site like AcaWiki at the least, especially if abstracts were more consistently useful.
  • Should the scope of AcaWiki be explicitly expanded to include summarizing material that is somehow academic but is not in the form of a peer-reviewed paper published in an academic journal? Some of the summaries I’ve contributed are for books or grey literature.
  • Periodically it’s been suggested to change the default license for AcaWiki summaries from CC-BY to CC-BY-SA. I should add updated thoughts at the link.
  • Some time ago in order to put a stop to the creation of spam accounts, I enabled the ConfirmAccount extension, which forces users who want to contribute to fill out an account request form. I admit this is hugely annoying. I have done zero research into it, but I would love to have an extension which auto-enables account creation based on some external authentication and reputation, e.g., Wikimedia wiki accounts or even users followed/subscribed to/endorsed by existing AcaWiki users on other sites, e.g., social networks.
  • Upgrade site to https when Let’s Encrypt becomes generally available. Alternatively, see if it is possible to move hosting (currently a $10/month Digital Ocean VPS) to Miraheze, which mandates https.
  • I intended to write an update on AcaWiki for Open Access Week (October 19-25). I only realized after beginning that AcaWiki was recently 6 years old.
  • I’m going to ping the people who have contributed to AcaWiki so far to look at this post and provide feedback. What would it take for them to feel good about recommending others do what they’ve done, e.g., summarizing PhD or research program readers, or assigning contributing or improving AcaWiki summaries to their classes? Or if something else entirely should be done to push forward free summarization of academic literature, what is that something?
  • For some time Fabricatorz did a bit of work on and hosted AcaWiki. From my email correspondence I see that Bassel Khartabil did some of that. As I’ve blogged before (1, 2, 3), Bassel has been detained by the Syrian government since 2012. Recently he has gone missing and presumably is in grave danger. Props to his Frabricatorz and many other friends who have done more to raise awareness of Bassel’s plight than I would have imagined possible when writing those previous posts. See for info and links, and spread the word. I’ll add a note about #freebassel to the AcaWiki home page (which badly needs a general revamp) shortly.

If any of this interests you, get in touch or merely watch for updates on the acawiki-general mailing list, AcaWiki on, Twitter, or Facebook, or blog comments below, or the AcaWiki site.

Democratizing Wikimedia Innovation

Wednesday, May 27th, 2015

Through the end of this month the Wikimedia community is electing 3 members of the Wikimedia Foundation board. You qualify to vote if you’ve made at least 300 edits before April 15 and 20 between October 15 and April 15 to any Wikimedia project.

If you don’t quality to vote, it won’t be hard to do so for next time if you get started now: Log in or create an account and be bold when you see a typo, incorrect or missing information in a Wikipedia article. Familiarize yourself with Wikipedia’s sibling projects; edits to any of them count. Play the Wikidata Game. I heartily recommend doing these things as a matter of learning and sharing knowledge regardless of desire to vote in Wikimedia elections or lower threshold and more fun votes such as for the Wikimedia Commons Picture of the Year. The current election is just an excuse for inserting this Public Service Announcement. ;-)

If you do qualify to vote, please do. I voted for Denny Vrandečić and give him the strongest possible endorsement. I also voted for and endorse James Heilman.

The election uses approval/disapproval ratio to determine winners, so disapproval votes are powerful. I made a few but don’t want to publish because frankly all of the candidates are excellent and extremely qualified for a Wikimedia Foundation community board seat.

community-centered theory of changeThe central issue in this election is evident in the Candidate statements, discussion, structured Q&A (1, 2, 3, 4), in a series of blog posts by Pete Forsyth (who was briefly a candidate but stepped aside), and outside the context of the current election, in blog posts by Lane Raspberry and Nimish Gautam., and the one message I’ve sent on the issue, which the first paragraph of Vrandečić’s candidate statement sums up:

Wikimedia is a modern wonder – and yet, it must change: most of our projects, as they are today, cannot truly succeed. To achieve our mission, we must increase the effectivity of every single contributor. At the same time, the communities are often seen as change resistant – but falsely so: they do welcome change, done right, as I have shown with Wikidata.

Along these lines, I especially commend Vrandečić’s and Heilman’s answers to the following Q&A topics: Use of Superprotect and respect for community consensus, Retaining current volunteers versus recruiting new ones, Improving content, and Diversity and scope.

It’s commonplace for central organizations (of which I am a fan) to neglect or denigrate communities they serve, whether the relationship is one of collaboration, constituency, or consumption. Sometimes a version of neglect is even the right behavior, e.g., a product or project with some users may need to be EOL’d. But most organizations could do much better. It is essential that the Wikimedia Foundation do so, as the people who edit or otherwise contribute to the various Wikimedia projects are its key competitive advantage. If Wikimedia and other commons-based peer production projects are to stay relevant, nevermind helping achieve world liberation, they need to figure out how to become more effective, starting with embracing the idea that most of the vision and innovation needed to do so will come from the community, not the central organizations, and implementation done in partnership with the community.

Unrelated to the community issue, I’ve previously blog cheered Vrandečić’s and Heilman’s work on Wikidata and Wikipedia/medical journal collaboration respectively.

Tangential ex-Wikimedia Foundation links:

I was very sad to read that Erik Moeller recently left the foundation, where he was Deputy Director. Though he seemed to endorse the organization/community vision dichotomy (my one message linked above is a mailing list reply to him), in my view he is perhaps the best example in the Wikimedia universe of community vision — he had written about and many cases prototyped most of the innovations the foundation is still working on implementing, many years later, before becoming an employee.

Moeller has since started a podcast, interviewing another ex-Wikimedia Foundation person, Sumana Harihareswara, for the first episode.

Harihareswara has two recent posts on Crooked Timber, Codes of conduct and the trade-offs of copyleft and Where are the women in the history of open source? I found them both very interesting and left comments.

Former Wikimedia Foundation Executive Director Sue Gardner is now “developing a strategic plan for and with the Tor Project” and separately researching “the broader state of ‘freedom tech’ — all the tools and technologies that enable free speech, free assembly, and freedom of the press.” That’s great news; Tor and other ‘freedom tech’ tools are incredibly exciting and important. But, a moment of critical cheering: as I noted around the time Gardner stepped down as WMF ED, I’m inclined to think that re-routing the knowledge economy is even more important than tools that can route around censorship for a good future. The former is what Wikimedia projects do.

The Killing of Abu Sayyaf (according to unreliable, one-sided, and conflicted sources)

Saturday, May 16th, 2015

Read The Killing of Osama bin Laden or a summary on the English Wikipedia entry for Seymour Hersh.

Then read Abu Sayyaf, an ISIS Leader, Killed in Syria by Special Forces, U.S. Says. The part after the last comma is backed up by the article:

Pentagon officials said
One American military official described
the Pentagon’s description
A Defense Department official said
The official said
(The accounts of the raid came from military and government officials and could not be immediately verified through independent sources.)
officials said
American officials said
The White House rejected initial reports
said Bernadette Meehan, the National Security Council spokeswoman
Defense Secretary Ashton B. Carter said
Officials said
Defense Department officials said
a Defense Department official said
the official said
the official said
the Defense Department official said
Defense Department officials said
officials acknowledged
officials said
Mr. Carter said
the senior United States official said

Why bother to publish this story? Why is the disclaimer of verifiability buried in a parenthetical instead of a banner at the top of the article highlighting multiple issues, a la Wikipedia?

The article closes with a conjecture from a former C.I.A. analyst that anyone could have made.

I’m not complaining about anything new; recently reading the Hersh article made me want to skim the article on the apparent killing of Abu Sayyaf, and the opportunity to update the title of Hersh’s article made me want to write this blog post.

Great Crimes, Again and Again

Friday, April 24th, 2015

Title refers to Medz Yeghern (Armenian: Մեծ Եղեռն, “Great Crime”), a name for the Armenian Genocide (April 24 is remembrance day), and the empty slogan never again.

I recommend the English Wikipedia article on the Armenian Genocide. It’s a good long read; I learned a fair bit from it that should stick with me. I did not realize that the vast majority of Armenians in the Ottoman Empire lived a helot existence (I only knew that there were prominent Armenian elites in the Empire; indeed the remembrance day is the anniversary of rounding up of 250 Armenian intellectuals in Constantinople), that there was a mass expulsion of Muslims from the Balkans in the years prior to the genocide, that the genocide was widely reported in the West as it was in progress, and that it was witnessed directly by many (Central Power allies) Germans, possibly creating a direct line to some elements of the Holocaust.

I’ve only done naive searches for and skimming of genocide prevention material but my general impression is that it all takes an international perspective. That’s necessary and fine, but given how abysmal and nationalistic international governance is (including with regard to remembering genocides), I’d love to read more about how potential perpetrator and victim groups within jurisdictions have attempted to prevent genocide or its direct preconditions. I know when they have failed (documented genocides), but am almost completely ignorant of what attempts have been made, including any that have been successful, and how such attempts might inform the actions of people under threat today. I’m not talking about simplistic hypotheticals (e.g., what if someone killed Hitler before the war), nor heroic actions to save some people during a genocide. I’m wondering for example the extent of Turk liberal and Armenian elite efforts toward equal rights for all, Armenian elite efforts to protect Armenian helots, Armenian helot efforts to organize, and how such efforts could have been made more effective.

Previously regarding the Armenian-Assyrian-Greek genocide.

“Within jurisdictions” implies “improve yours” (in my case, the U.S.), which indeed I take as highly effective and necessary. A few past posts: Stop Killing Them and Invasion Ethics (present), Robot Gang Memorial Day (future), and Independence′ Day (remembrance).

Addendum 20150501: The English Wikipedia Signpost’s traffic report for April 19-25:

And much more sobering, but also in the Report for the first time, is the Armenian Genocide (#10 added: 631,960 views), which commenced 100 years ago this week. Farther down the list on the Top 25, it is worth noting that Adolf Hitler (#23), who famously asked who remembered the Armenian Genocide, also appears in the Top 25 for the first time. While World War II related topics often make the charts, for some reason Hitler himself has not since the Top 25’s debut in January 2013.

Annual thematic doubt 2

Tuesday, February 17th, 2015

My second annual thematic doubt post, expressing doubts I have about themes I blogged about during 2014.

commons ⇄ freedom, equality ⇄ good future

Same as last year, my main topic has been “protecting and promoting intellectual freedom, in particular through the mechanisms of free/open/knowledge commons movements, and in reframing information and innovation policy with freedom and equality outcomes as top.”

Rather than repeating the three doubts I expressed last year under the heading “intellectual freedom” (my evaluation of these has not much changed), I will take the subject from a different angle: the “theory of change” I have been espousing. This theory is not new to me. Essentially it is what attracted me to following the free software movement circa 1990 — its potential of extensive, pro-freedom socio-economic reform from the bottom up. That and wanting to run a unix-like on my computer — a want satisfied without respect to freedom as soon as I could use a Sun workstation at work, and for many years now would have been satisfied by OS X. I never cared very much about being able to read, modify, and share all of the software on my computer — the socio-economic implications of those capabilities make them interesting, to me. The claimed ends of the theory are in the ‘for a good future’ slogan I’ve occasionally used at least since 1998. I occasionally included the theory in blog posts (2006) and presentations (2008). Much of my ‘critical cheering’ last year (doubt) and before has largely been about my perhaps unreasonable wish that ‘free/open’ organizations and movements would take the theory I do and act as I think follows. I could easily be wrong on the theory or best actions it implies. Accordingly, I ratcheted down critical cheering in 2014; hopefully most but not all of what remained was relatively fun or novel. Instead I focused more sharply on the theory, e.g., in Sleepwalking past Freedom’s Commons, or how peer production could increase democracy, equality, freedom, and innovation, all of them!

The theory could be attacked from a number of angles — I’d love to see that done and learn of new vulnerabilities. For example, commons might not significantly affect freedom and equality, these may not be the right values, and one might consider a ‘good future’ to be one with maximum hierarchy, spectacle, even war (I repeatedly argue that future tech and culture will be marvels in most plausible futures, and that is a reason to reject ones that do not have freedom and equality as top values, but also something that makes it hard to see how a future — or present — could be different or better with more knowledge economy/policy-driven freedom and equality). But this isn’t a cheap refutation post (see below) and I don’t have very practical doubts about those values and what they imply constitutes a good future.

But I do have practical doubts about the first leg of the theory. Summary of that leg before getting to doubts: Commons-based knowledge production simultaneously destroys rents dependent on freedom infringing regimes, diminishing the constituency for those regimes, grows the constituency and policy imagination for freedom respecting regimes, and not least, directly increases freedom and equality.


  • Effects could be too small to matter, or properly attributed to generational or other competition among firms, not commons-based production. Consider Wikipedia, a success of commons-based production if there is one. Such success may not be possible in other sectors, especially ones that command top policy attention (drugs and movies) — policy imagination has not been increased. The traditional encyclopedia industry was already mostly destroyed by Microsoft Encarta when Wikipedia came along. The encyclopedia industry was not a significant constituency for freedom infringing regimes, so its destruction matters not for future policy. Encyclopedias were readily accessible at libraries, vastly more useful info of the sort found in encyclopedias is accessible online now, excluding Wikipedia, and ‘freedoms’ to modify and distribute are just not relevant nearly all humans.
  • I claim that the best knowledge policy reform is that which favors commons and that the reforms traditionally proposed by copyright and patent reformers are relatively futile because such proposals if implemented would not significantly change the knowledge economy to produce freedom and equality nor grow the constituencies for such changes — rather they are just about who, how, and for how much the outputs of production under freedom infringing regimes may be used — so-called balance, not the tilt I demand. But perhaps the usual set of reform proposals is the best that can be hoped for, especially given decades of discourse and organization-building around those proposals, and almost none about commons-favoring reform. Further, perhaps the usual set of reform proposals is best without qualification — commons-based production is a culturally marginal (in software; wholly irrelevant in most other sectors) arrangement that ought be totally ignored by policy.
  • Various (sometimes semi-) free/open movements within various sectors (e.g., software, education, research publication) are having some policy successes, without (as far as I know) usually considering themselves to be as or more central to shaping knowledge policy as usual things fitting under ‘copyright reform’ and ‘patent reform’ but this could be just what needs to happen. The important thing is that commons-based knowledge production entities act to further their interests with minimal distance from current policy discourse, not that they have any distracting and possibly discrediting theory about doing so relative to overall knowledge policy.

Only the first of these gives me serious pause, though my discounting the last two might be a matter of (dis)taste — my feeling is that most of the people involved thoroughly identify with the trivia of copyright, patent, and similar law, even if they think those laws need serious reform, and act as if commons-based production is something to be protected from reform in the bad direction, but not at all central. Sadly if my feeling is accurate, the second and third doubts probably ought give me more pause than they do.

Despite these doubts, the potential huge win-win (freedom and equality, without conflict) of reorienting the knowledge economy and policy around commons-based production makes robust discourse (at the least) on this possibility urgent, even if tilt probability is low. One of the things that makes me favor this approach is that reform can be very incremental — indeed, it is by far the most feasible reform of any proposed — we just need a lot more of it. Push-roll towards tilt!

The most damning observation is perhaps that I’m only talking, and mostly on this very blog. I should change my ways, but again, this is not a cheap refutation post.

Software Freedom/Futurism/Science Fantasy

I recently wrote that “it’s much easier to take software freedom as a serious issue of top importance if one has a ‘futurist’ bent. This will also figure in a forthcoming post from me casting doubt on everything in this post and the rest from 2014.”

How important are computers to human arrangements, and how large is the range of plausible computer-involved arrangements, and how much can those realized be changed? Should anyone besides programmers and enthusiasts care about software specifically, any more or less than they care about the conditions under which any tool is created and distributed? (Contrast with other tools would be good here, but I’ll leave for another time.)

The vast majority of people seem to treat software as any other tool — they want it to work as well as possible, and to be as cheap as possible, the only difference being that their intuitions about quality and cost of software may be worse than their intuitions for the quality and cost of, for example, bridges. Arguably nearly everyone has been and perhaps still is correct.

But one doesn’t need to be much of a futurist to see software getting much more important — organizations good at using software ‘eating’ the lunches of those less good at using software, software embedded in everything or designing everything (and anything else being obsolete), regulating and mediating every sort of arrangement — with lots of plausible variation as to how this happens.

Now the doubt: does future-motivated interest in software freedom share more with interest in science fiction (i.e., moralistic fantasy) or with interest in future studies and the many parts of various social sciences that aim to improve systems going forward in addition to understanding current and past ones? If the latter, why is software freedom ignored by all of these fields? Possibly most people who do think software is becoming very important are not convinced that software freedom is an important dimension to consider. If so (I would love to see some kind of a review on the matter) it would be most reasonable to follow the academic consensus (even if it is one of omission; that consensus being of software freedom not interesting or important enough to investigate) and if one cares about the ethical dimensions of software, focus instead on the ones the consensus says are important.

Two additional posts last year in which I claim software freedom is of outsized and underappreciated importance (of course I don’t usually restrict myself to only software, but consider software a large and growing part of knowledge embodying cumulative innovation, and of the knowledge economy leading to more such accumulation) and some of many from years past (2006, 2006, 2007, 2007). The first from 2006 highlights the most obvious problem with the future. I had forgotten about that post when mentioning displacement of movies by some other form as the height of culture in 2013 — one has to squint to see such displacement even beginning yet. The second isn’t about the future but is closely related: alternative history.

Uncritical Cheering

I feared that many of my posts last year were uncritical cheering (see critical cheering above and last year). Looking back at posts where I’m promoting something, I have usually included or at least hinted at some amount of criticism (e.g., 1 2). I don’t feel too bad. But know that most of the things I promote on my blog are very likely to fail or otherwise be inconsequential — if they were sufficiently mainstream and established they’d be sufficiently covered elsewhere, and I likely wouldn’t bother blogging about them.

One followup: I cheered the publication of the first formally peer-reviewed and edited Wikipedia article in Open Medicine — a journal which has since ceased publishing.

Freeway 980

I continue to blog about removing freeway 980, which cuts through the oldest parts of Oakland. Doubt: I don’t know whether full removal would be better (at least when considering feasibility) than capping the portion of 980 which is below grade. I intended to read about freeway capping, come to some informed opinion, and blog about it. I have not, but supposedly Oakland mayor Libby Schaaf has mentioned removing 980. Hopefully that will spur much more qualified people to publish analyses of various options for my reading pleasure. ConnectOakland is a website dedicated to one removal/fill scenario.


I’m satisfied enough with the doubt in my two posts about Mozilla’s leadership debacle, but I’ll note apparent tension between fostering ideological diversity and shunning people who would deny some people basic freedoms. I don’t think this one was fairly clear cut, but there are doubtless far more difficult cases in the world.

Instead of doubt, I’d like to clarify my intention with two other posts: thought experiment/provocation, serious demand.


I fell further behind, producing no new dedicated collections of refutations of my 8+ year old posts. My very next post will be one, but as with previous such posts, the refutations will be cheap — flippant rather than drilling down on doubts I may have gained over the years. Again these observations (late, cheap) are what led me last year to initiate a thematic doubt post covering the immediately previous year. How was this one?

Happy UTC+0 New Year

Wednesday, December 31st, 2014

With apologies for the projection.

Smattering of followups on mostly-recent posts, posted at 2015-01-01 00:00:00 UTC. Does anyone celebrate UTC+0 New Year except by coincidence of being in UTC+0 time zone? Yes.

Software Freedom Conservancy released a video with me endorsing them (my recent blog endorsement). I self-recorded the footage and acknowledge total videography incompetence, need of a haircut, and need to be still.

PLOS Biology published a perspective by Daniel Mietchen on The Transformative Nature of Transparency in Research Funding. Riffing on his tweet, that’s early theory; practice is the Wikidata for Research proposal that he is leading creation of in the open (my recent blog endorsement).’s one-time crowdfunding campaign (my recent blog endorsement and others) is wrapping up very successfully. Looking forward to seeing launch in early 2015.

Free Software Foundation’s call for input on updating its high priority projects list (my blog post) has resulted in over 100 emails to, most of them very thoughtful and containing numerous suggestions. Some are mirrored in public posts: Antoine Amarilli, Christopher Allan Webber, d3vid seaward, Denver Gingerich, Ingegnue. Please send your feedback! I especially enjoy seeing public posts and explanations of how suggestions are on critical path toward achieving goal of software freedom for everyone.

Speaking of the FSF, they recently released a new video making the case that software freedom is important for everyone. I agree with Christopher Allan Webber’s asseessment of good progress. The video also ties into a free software futurist dinner that Webber said raised money for Software Freedom Conservancy, and some statements I make in the video above: I suspect it’s much easier to take software freedom as a serious issue of top importance if one has a “futurist” bent. This will also figure in a forthcoming post from me casting doubt on everything in this post and the rest from 2014 (last year’s version).

There’s some overlap between the above and OpenHatch’s year-end newsletter (my year-ago blog endorsement).

Finally, check out Don Marti’s below the fold announcement about Aloodo, a project to (if I understand correctly) help sites protect themselves from the long-term damage of being associated with pervasive tracking and door-to-door-like incentives (everything to make immediate conversion, nothing to build trust). I still have not gotten around to blogging other ideas for “fixing” online advertising, but very much look forward to seeing how Marti’s project plays out.


Sunday, December 21st, 2014

Recently I’ve uncritically cheered for Wikidata as “rapidly fulfilling” hopes to “turn the universal encyclopedia into the universal database while simultaneously improving the quality of the encyclopedia.” In April I uncritically cheered for Daniel Mietchen’s open proposal for research on opening research proposals.

Let’s combine the two: an open proposal for work toward establishing Wikidata (including its community, data, ontologies, practices, software, and external tools) as a “collaborative hub around research data” responding to a European Commission call on e-infrastructures. That would be Wikidata for Research (WD4R), instigated by Mietchen, who has already assembled an impressive set of partner institutions and an outline of work packages. The proposal is being drafted in public (you can help) and will be submitted January 14.


The proposal will be strong on its own merits, and very well aligned with the stated desired outcomes from the EC call, and the open proposal dogfood angle is also great. I added for all to this post’s title because I suspect WD4R will be a great for pushing Wikidata toward realizing aforementioned “universal database” hopes (which again means not just the data, but community, tools, etc.; “virtual research environment” is one catch-all term) and will make Wikidata much more useful “research” most broadly construed (e.g., by students, journalists, knowledge workers, anyone), potentially much faster than would happen otherwise.

My suspicion has two bases (please correct me if I’m wrong about either):

  1. A database or virtual environment “for research” might give the impression of someplace to dump data from or perform experiments. Maybe that would be appropriate for Wikidata in some instance, but the overwhelming research-supporting use would seem to be mass collaboration in consolidating, annotating, and correcting data and ontologies which many researchers (and researchers-broadly-construed, everyone) can benefit from, either querying or referencing directly, or extracting and using elsewhere. The pre-existing Gene Wiki project which is beginning to use Wikidata is an example of such useful-to-all collections (as referenced in the WD4R pages).
  2. One of the proposed work packages is to identify and work on features needed for research but not on, or not prioritized on, the Wikidata development plan. I suspect other Wikimedia projects can tremendously benefit from Wikidata integration without Wikidata itself or external tools supporting complex queries and reporting that would be called for by a virtual research environment — and also called for to realize “universal database” hopes. Wikidata’s existing plan looks good to me; here I’m just saying WD4R might help it be even better, faster.

The previously linked Gene Wiki post includes:

For more than a decade many different groups have proposed and many have implemented solutions to this challenge using standards and techniques from the Semantic Web. Yet, today, the vast majority of biological data is still accessed from individual databases such as Entrez Gene that make no attempt to use any component of the Semantic Web or to otherwise participate in the Linked Open Data movement. With a few notable exceptions, the data silos have only gotten larger and problems of fragmentation worse.
Now, we are working to see if Wikidata can be the bridge between the open community-driven power of Wikipedia and the structured world of semantic data integration. Can the presence of that edit button on a centralized knowledge base associated with Wikipedia help the semantic web break through into everyday use within our community?

I agree that massive centralized commons-oriented resources are needed for decentralization to progress (link analogous but not the same — linked open data : federation :: data silos : messaging silos).

Check out Mietchen’s latest WD4R blog post and the WD4R project page.

Monday, December 1st, 2014

Last month the Free Software Foundation and Software Freedom Conservancy launched, “a collaborative project to create and disseminate useful information, tutorial material, and new policy ideas regarding all forms of copyleft licensing.” The main feature of the project now is a 157 page tutorial on the GPL which assembles material developed over the past 10 years and a new case study. I agreed to write a first draft of material covering CC-BY-SA, the copyleft license most widely used for non-software works. My quote in the announcement: “I’m glad to bring my knowledge about the Creative Commons copyleft licenses as a contribution to improve further this excellent tutorial text, and I hope that as a whole can more generally become a central location to collect interesting ideas about copyleft policy.”

I tend to offer apologia to copyleft detractors and criticism to copyleft advocates, and cheer whatever improvements to copyleft licenses can be mustered (I hope to eventually write a cheery post about the recent compatibility of CC-BY-SA and the Free Art License), but I’m far more interested in copyleft licenses as prototypes for non-copyright policy.

For now, below is that first draft. It mostly stands alone, but might be merged in pieces as the tutorial is restructured to integrate material about non-GPL and non-software copyleft licenses. Your patches and total rewrites welcome!

Detailed Analysis of the Creative Commons Attribution-ShareAlike Licenses

This tutorial gives a comprehensive explanation of the most popular free-as-in-freedom copyright licenses for non-software works, the Creative Commons Attribution-ShareAlike (“CC-BY-SA”, or sometimes just “BY-SA”) – with an emphasis on the current version 4.0 (“CC-BY-SA-4.0”).

Upon completion of this part of the tutorial, readers can expect to have learned the following:

  • The history and role of copyleft licenses for non-software works.
  • The differences between the GPL and CC-BY-SA, especially with respect to copyleft policy.
  • The basic differences between CC-BY-SA versions 1.0, 2.0, 2.5, and 4.0.
  • An understanding of how CC-BY-SA-4.0 implements copyleft.
  • Where to find more resources about CC-BY-SA compliance.

FIXME this list should be more aggressive, but material is not yet present

WARNING: As of November 2014 this part is brand new, and badly needs review, referencing, expansion, error correction, and more.

Freedom as in Free Culture, Documentation, Education…

Critiques of copyright’s role in concentrating power over and making culture inaccessible have existed throughout the history of copyright. Few contemporary arguments about “copyright in the digital age” have not already been made in the 1800s or before. Though one can find the occasional ad hoc “anti-copyright”, “no rights reserved”, or pro-sharing statement accompanying a publication, use of formalized public licenses for non-software works seems to have begun only after the birth of the free software movement and of widespread internet access among elite populations.

Although they have much older antecedents, contemporary movements to create, share, and develop policy encouraging “cultural commons”, “open educational resources”, “open access scientific publication” and more, have all come of age in the last 10-15 years – after the huge impact of free software was unmistakable. Additionally, these movements have tended to emphasize access, with permissions corresponding to the four freedoms of free software and the use of fully free public licenses as good but optional.

It’s hard not to observe that it seems the free software movement arose more or less shortly after as it became desirable (due to changes in the computing industry and software becoming unambiguously subject to copyright in the United States by 1983), but non-software movements for free-as-in-freedom knowledge only arose after they became more or less inevitable, and only begrudgingly at that. Had a free culture “constructed commons” movement been successful prior to the birth of free software, the benefits to computing would have been great – consider the burdens of privileged access to proprietary culture for proprietary software through DRM and other mechanisms, toll access to computer science literature, and development of legal mechanisms and policy through pioneering trial-and-error.

Alas, counterfactual optimism does not change the present – but might embolden our visions of what freedom can be obtained and defended going forward. Copyleft policy will surely continue to be an important and controversial factor, so it’s worth exploring the current version of the most popular copyleft license intended for use with non-software works, Creative Commons Attribution-ShareAlike 4.0 International (CC-BY-SA-4.0), the focus of this tutorial.

Free Definitions

When used to filter licenses, the Free Software Definition and Open Source Definition have nearly identical results. For licenses primarily intended for non-software works, the Definition of Free Cultural Works and Open Definition similarly have identical results, both with each other and with the software definitions which they imitate. All copyleft licenses for non-software works must be “free” and “open” per these definitions.

There are various other definitions of “open access”, “open content”, and “open educational resources” which are more subject to interpretation or do not firmly require the equivalent of all four freedoms of the free software definition. While these definitions are not pertinent to circumscribing the concept of copyleft – which is about enforcing all four freedoms, for everyone. But copyleft licenses for non-software works are usually considered “open” per these other definitions, if they are considered at all.

The open access to scientific literature movement, for example, seems to have settled into advocacy for non-copyleft free licenses (CC-BY) on one hand, and acceptance of highly restrictive licenses or access without other permissions on the other. This creates practical problems: for example, nearly all scientific literature either may not be incorporated into Wikipedia (which uses CC-BY-SA) or may not incorporate material developed on Wikipedia – both of which do happen, when the licenses allow it. This tutorial is not the place to propose solutions, but let this problem be a motivator for encouraging more widespread understanding of copyleft policy.

Non-software Copylefts

Copyleft is a compelling concept, so unsurprisingly there have been many attempts to apply it to non-software works – starting with use of GPLv2 for documentation, then occasionally for other texts, and art in various media. Although the GPL was and is perfectly usable for any work subject to copyright, several factors were probably important in preventing it from being the dominant copyleft outside of software:

  • the GPL is clearly intended first as a software license, thus requiring some perspective to think of applying to non-software works;
  • the FSF’s concern is software, and the organization has not strongly advocated for using the GPL for non-software works;
  • further due to the (now previous) importance of its hardcopy publishing business and desire to retain the ability to take legal action against people who might modify its statements of opinion, FSF even developed a non-GPL copyleft license specifically for documentation, the Free Documentation License (FDL; which ceases to be free and thus is not a copyleft if its “invariant sections” and similar features are used);
  • a large cultural gap and lack of population overlap between free software and other movements has limited knowledge transfer and abetted reinvention and relearning;
  • the question of what constitutes source (“preferred form of the work for making modifications”) for many non-software works.

As a result, several copyleft licenses for non-software works were developed, even prior to the existence of Creative Commons. These include the aforementioned FDL (1998), Design Science License (1999), Open Publication License (1999; like the FDL it has non-free options), Free Art License (2000), Open Game License (2000; non-free options), EFF Open Audio License (2001), LinuxTag Green OpenMusic License (2001; non-free options) and the QING Public License (2002). Additionally several copyleft licenses intended for hardware designs were proposed starting in the late 1990s if not sooner (the GPL was then and is now also commonly used for hardware designs, as is now CC-BY-SA).1

At the end of 2002 Creative Commons launched with 11 1.0 licenses and a public domain dedication. The 11 licenses consisted of every non-mutually exclusive combination of at least one of the Attribution (BY), NoDerivatives (ND), NonCommercial (NC), and ShareAlike (SA) conditions (ND and SA are mutually exclusive; NC and ND are non-free). Three of those licenses were free (as was the public domain dedication), two of them copyleft: CC-SA-1.0 and CC-BY-SA-1.0.

Creative Commons licenses with the BY condition were more popular, so the 5 without (including CC-SA) were not included in version 2.0 of the licenses. Although CC-SA had some advocates, all who felt very strongly in favor of free-as-in-freedom, its incompatibility with CC-BY-SA (meaning had CC-SA been widely used, the copyleft pool of works would have been further fragmented) and general feeling that Creative Commons had created too many licenses led copyleft advocates who hoped to leverage Creative Commons to focus on CC-BY-SA.

Creative Commons began with a small amount of funding and notoriety, but its predecessors had almost none (FSF and EFF had both, but their entries were not major focuses of those organizations), so Creative Commons licenses (copyleft and non-copyleft, free and non-free) quickly came to dominate the non-software public licensing space. The author of the Open Publication License came to recommend using Creative Commons licenses, and the EFF declared version 2.0 of the Open Audio License compatible with CC-BY-SA and suggested using the latter. Still, at least one copyleft license for “creative” works was released after Creative Commons launched: the Against DRM License (2006), though it did not achieve wide adoption. Finally a font-specific copyleft license (SIL Open Font License) was introduced in 2005 (again the GPL, with a “font exception”, was and is now also used for fonts).

Although CC-BY-SA was used for licensing “databases” almost from its launch, and still is, copyleft licenses specifically intended to be used for databases were proposed starting from the mid-2000s. The most prominent of those is the Open Database License (ODbL; 2009). As we can see public software licenses following the subjection of software to copyright, interest in public licenses for databases followed the EU database directive mandating “sui generis database rights”, which began to be implemented in member state law starting from 1998. How CC-BY-SA versions address databases is covered below.

Aside on share-alike non-free therefore non-copylefts

Many licenses intended for use with non-software works include the “share-alike” aspect of copyleft: if adaptations are distributed, to comply with the license they must be offered under the same terms. But some (excluding those discussed above) do not grant users the equivalent of all four software freedoms. Such licenses aren’t true copylefts, as they retain a prominent exclusive property right aspect for purposes other than enforcing all four freedoms for everyone. What these licenses create are “semicommons” or mixed private property/commons regimes, as opposed to the commons created by all free licenses, and protected by copyleft licenses. One reason non-free public licenses might be common outside software, but rare for software, is that software more obviously requires ongoing maintenance.2 Without control concentrated through copyright assignment or highly asymmetric contributor license agreements, multi-contributor maintenance quickly creates an “anticommons” – e.g., nobody has adequate rights to use commercially.

These non-free share-alike licenses often aggravate freedom and copyleft advocates as the licenses sound attractive, but typically are confusing, probably do not help and perhaps stymie the cause of freedom. There is an argument that non-free licenses offer conservative artists, publishers, and others the opportunity to take baby steps, and perhaps support better policy when they realize total control is not optimal, or to eventually migrate to free licenses. Unfortunately no rigorous analysis of any of these conjectures exists. The best that can be done might be to promote education about and effective use of free copyleft licenses (as this tutorial aims to do) such that conjectures about the impact of non-free licenses become about as interesting as the precise terms of proprietary software EULAs – demand freedom instead.

In any case, some of these non-free share-alike licenses (also watch out for aforementioned copyleft licenses with non-free and thus non-copyleft options) include: Open Content License (1998), Free Music Public License (2001), LinuxTag Yellow, Red, and Rainbow OpenMusic Licenses (2001), Open Source Music License (2002), Creative Commons NonCommercial-ShareAlike and Attribution-NonCommercial-ShareAlike Licenses (2002), Common Good Public License (2003), and Peer Production License (2013). CC-BY-NC-SA is by far the most widespread of these, and has been versioned with the other Creative Commons licenses, through the current version 4.0 (2013).

Creative Commons Attribution-ShareAlike

The remainder of this tutorial exclusively concerns the most widespread copyleft license intended for non-software works, Creative Commons Attribution-ShareAlike(CC-BY-SA). But, there are actually many CC-BY-SA licenses – 5 versions (6 if you count version 2.1, a bugfix for a few jurisdiction “porting” mistakes), ports to 60 jurisdictions – 96 distinct CC-BY-SA licenses in total. After describing CC-BY-SA and how it differs from the GPL at a high level, we’ll have an overview of the various CC-BY-SA licenses, then a section-by-section walkthrough of the most current and most clear of them – CC-BY-SA-4.0.

CC-BY-SA allows anyone to share and adapt licensed material, for any purpose, subject to providing credit and releasing adaptations under the same terms. The preceding sentence is a severe abridgement of the “human readable” license summary or “deed” provided by Creative Commons at the canonical URL for one of the CC-BY-SA licenses – the actual license or “legalcode” is a click away. But this abridgement, and the longer the summary provided by Creative Commons are accurate in that they convey CC-BY-SA is a free, copyleft license.

GPL and CC-BY-SA differences

FIXME this section ought refernence GPL portion of tutorial extensively

There are several differences between the GPL and CC-BY-SA that are particularly pertinent to their analysis as copyleft licenses.

The most obvious such difference is that CC-BY-SA does not require offering works in source form, that is their preferred form for making modifications. Thus CC-BY-SA makes a huge tradeoff relative to the GPL – CC-BY-SA dispenses with a whole class of compliance questions which are more ambiguous for some creative works than they are for most software – but in so doing it can be seen as a much weaker copyleft.

Copyleft is sometimes described as a “hack” or “judo move” on copyright, but the GPL makes two moves, though it can be hard to notice they are conceptually different moves, without the contrast provided by a license like CC-BY-SA, which only substantially makes one move. The first move is to neutralize copyright restrictions – adaptations, like the originally licensed work, will effectively not be private property (of course they are subject to copyright, but nobody can exercise that copyright to prevent others’ use). If copyright is a privatized regulatory system (it is), the first move is deregulatory. The second move is regulatory – the GPL requires offer of source form, a requirement that would not hold if copyright disappeared, absent a different regulatory regime which mandated source revelation (one can imagine such a regime on either “pragmatic” grounds, e.g., in the interest of consumer protection, or on the grounds of enforcing software freedom as a universal human right).

FIXME analysis of differences in copyleft scope (eg interplay of derivative works, modified copies, collections, aggregations, containers) would be good here but might be difficult to avoid novel research

CC-BY-SA makes the first move3 but adds the second in a limited fashion. It does not require offer of preferred form for modification nor any variation thereof (e.g., the FDL requires access to a “transparent copy”). CC-BY-SA does prohibit distribution with “effective technical measures” (i.e., digital restrictions management or DRM) if doing so limits the freedoms granted by the license. We can see that this is regulatory because absent copyright and any regime specifically limiting DRM, such distribution would be perfectly legal. Note the GPL does not prohibit distribution with DRM, although its source requirement makes DRM superfluous, and somewhat analogously, of course GPLv3 carefully regulates distribution of GPL’d software with locked-down devices – to put it simply, it requires keys rather than prohibiting locks. Occasionally a freedom advocate will question whether CC-BY-SA’s DRM prohibition makes CC-BY-SA a non-free license. Few if any questioners come down on the side of CC-BY-SA being non-free, perhaps for two reasons: first, overwhelming dislike of DRM, thus granting the possibility that CC-BY-SA’s approach could be appropriate for a license largely used for cultural works; second, the DRM prohibition in CC-BY-SA (and all CC licenses) seems to be mainly expressive – there are no known enforcements, despite the ubiquity of DRM in games, apps, and media which utilize assets under various CC licenses.

Another obvious difference between the GPL and CC-BY-SA is that the former is primarily intended to be used for software, and the latter for cultural works (and, with version 4.0, databases). Although those are the overwhelming majority of uses of each license, there are areas in which both are used, e.g., for hardware design and interactive cultural works, where there is not a dominant copyleft practice or the line between software and non-software is not absolutely clear.

This brings us to the third obvious difference, and provides a reason to mitigate it: the GPL and CC-BY-SA are not compatible, and have slightly different compatibility mechanisms. One cannot mix GPL and CC-BY-SA works in a way that creates a derivative work and comply with either of them. This could change – CC-BY-SA-4.0 introduced4 the possibility of Creative Commons declaring CC-BY-SA-4.0 one-way (as a donor) compatible with another copyleft license – the GPL is obvious candidate for such compatibility. Discussion is expected to begin in late 2014, with a decision sometime in 2015. If this one-way compatibility were to be enacted, one could create an adaptation of a CC-BY-SA work and release the adaptation under the GPL, but not vice-versa – which makes sense given that the GPL is the stronger copyleft.

The GPL has no externally declared compatibility with other licenses mechanism (and note no action from the FSF would be required for CC-BY-SA-4.0 to be made one-way compatible with the GPL). The GPL’s compatibility mechanism for later versions of itself differs from CC-BY-SA’s in two ways: the GPL’s is optional, and allows for use of the licensed work and adaptations under later versions; CC-BY-SA’s is non-optional, but only allows for adaptations under later versions.

Fourth, using slightly different language, the GPL and CC-BY-SA’s coverage of copyright and similar restrictions should be identical for all intents and purposes (GPL explicitly notes “semiconductor mask rights” and CC-BY-SA-4.0 “database rights” but neither excludes any copyright-like restrictions). But on patents, the licenses are rather different. CC-BY-SA-4.0 explicitly does not grant any patent license, while previous versions were silent. GPLv3 has an explicit patent license, while GPLv2’s patent license is implied (see [gpl-implied-patent-grant] and [GPLv3-drm] for details). This difference ought give serious pause to anyone considering use of CC-BY-SA for works potentially subject to patents, especially any potential licensee if CC-BY-SA licensor holds such patents. Fortunately Creative Commons has always strongly advised against using any of its licenses for software, and that advice is usually heeded; but in the space of hardware designs Creative Commons has been silent, and unfortunately from a copyleft (i.e., use mechanisms at disposal to enforce user freedom) perspective, CC-BY-SA is commonly used (all the more reason to enable one-way compatibility, allowing such projects to migrate to the stronger copyleft).

The final obvious difference pertinent to copyleft policy between the GPL and CC-BY-SA is purpose. The GPL’s preamble makes it clear its goal is to guarantee software freedom for all users, and even without the preamble, it is clear that this is the Free Software Foundation’s driving goal. CC-BY-SA (and other CC licenses) state no purpose, and (depending on version) are preceded with a disclaimer and neutral “considerations for” licensors and licensees to think about (the CC0 public domain dedication is somewhat of an exception; it does have a statement of purpose, but even that has more of a feel of expressing yes-I-really-mean-to-do-this than a social mission). Creative Commons has always included elements of merely offering copyright holders additional choices and of purposefully creating a commons. While CC-BY-SA (and initially CC-SA) were just among the 11 non-mutually exclusive combinations of “BY”, “NC”, “ND”, and “SA”, freedom advocates quickly adopted CC-BY-SA as “the” copyleft for non-software works (surpassing previously existing non-software copylefts mentioned above). Creative Commons has at times recognized the special role of CC-BY-SA among its licenses, e.g., in a statement of intent regarding the license made in order to assure Wikimedians considering changing their default license from the FDL to CC-BY-SA that the latter, including its steward, was acceptably aligned with the Wikimedia movement (itself probably more directly aligned with software freedom than any other major non-software commons).

FIXME possibly explain why purpose might be relevant, eg copyleft instrument as totemic expression, norm-setting, idea-spreading

FIXME possibly mention that CC-BY-SA license text is free (CC0)

There are numerous other differences between the GPL and CC-BY-SA that are not particularly interesting for copyleft policy, such as the exact form of attribution and notice, and how license translations are handled. Many of these have changed over the course of CC-BY-SA versioning.

CC-BY-SA versions

FIXME section ought explain jurisdiction ports

This section gives a brief overview of changes across the main versions (1.0, 2.0, 2.5, 3.0, and 4.0) of CC-BY-SA, again focused on changes pertinent to copyleft policy. Creative Commons maintains a page detailing all significant changes across versions of all of its CC-BY* licenses, in many cases linking to detailed discussion of individual changes.

As of late 2014, versions 2.0 (the one called “Generic”; there are also 18 jurisdiction ports) and 3.0 (called “Unported”; there are also 39 ports) are by far the most widely used. 2.0 solely because it is the only version that the proprietary web image publishing service Flickr has ever supported. It hosts 27 million CC-BY-SA-2.0 photos 5 and remains the go-to general source for free images (though it may eventually be supplanted by Wikimedia Commons, some new proprietary service, or a federation of free image sharing sites, perhaps powered by GNU MediaGlobin). 3.0 both because it was the current version far longer (2007-2013) than any other and because it has been adopted as the default license for most Wikimedia projects.

However apart from the brief notes on each version, we will focus on 4.0 for a section-by-section walkthrough in the next section, as 4.0 is improved in several ways, including understandability, and should eventually become the most widespread version, both because 4.0 is intended to remain the current version for the indefinite and long future, and it would be reasonable to predict that Wikimedia projects will make CC-BY-SA-4.0 their default license in 2015 or 2016.

FIXME subsections might not be the right strcuture or formatting here

1.0 (2002-12-16)

CC-BY-SA-1.0 set the expectation for future versions. But the most notable copyleft policy feature (apart from the high level differences with GPLv2, such as not requiring source) was no measure for compatibility with future versions (nor with the CC-SA-1.0, also a copyleft license, nor with pre-existing copyleft licenses such as GPL, FDL, FAL, and others, nor with CC jurisdiction ports, of which there were 3 for 1.0).

2.0 (2004-05-25)

CC-BY-SA-2.0 made itself compatible with future versions and CC jurisdiction ports of the same version. Creative Commons did not version CC-SA, leaving CC-BY-SA-2.0 as “the” CC copyleft license. CC-BY-SA-2.0 also adds the only clarification of what constitutes a derivative work, making “synchronization of the Work in timed-relation with a moving image” subject to copyleft.

2.5 (2005-06-09)

CC-BY-SA-2.5 makes only one change, to allow licensor to designate another party to receive attribution. This does not seem interesting for copyleft policy, but the context of the change is: it was promoted by the desire to make attribution of mass collaborations easy (and on the other end of the spectrum, to make it possible to clearly require giving attribution to a publisher, e.g., of a journal). There was a brief experiment in branding CC-BY-SA-2.5 as the “CC-wiki” license. This was an early step toward Wikimedia adopting CC-BY-SA-3.0, four years later.

3.0 (2007-02-23)

CC-BY-SA-3.0 introduced a mechanism for externally declaring bilateral compatibility with other licenses. This mechanism to date has not been used for CC-BY-SA-3.0, in part because another way was found for Wikimedia projects to change their default license from FDL to CC-BY-SA: the Free Software Foundation released FDL 1.3, which gave a time-bound permission for mass collaboration sites to migrate to CC-BY-SA. While not particularly pertinent to copyleft policy, it’s worth noting for anyone wishing to study old versions in depth that 3.0 is the first version to substantially alter the text of most of the license, motivated largely by making the text use less U.S.-centric legal language. The 3.0 text is also considerably longer than previous versions.

4.0 (2013-11-25)

CC-BY-SA-4.0 added to 3.0’s external compatibility declaration mechanism by allowing one-way compatibility. After release of CC-BY-SA-4.0 bilateral compatibility was reached with FAL-1.3. As previously mentioned, one-way compatibility with GPLv3 will soon be discussed.

4.0 also made a subtle change in that an adaptation may be considered to be licensed solely under the adapter’s license (currently CC-BY-SA-4.0 or FAL-1.3, in the future potentially GPLv3 or or a hypothetical CC-BY-SA-5.0). In previous versions licenses were deemed to “stack” – if a work under CC-BY-SA-2.0 were adapted and released under CC-BY-SA-3.0, users of the adaptation would need to comply with both licenses. In practice this is an academic distinction, as compliance with any compatible license would tend to mean compliance with the original license. But for a licensee using a large number of works that wished to be extremely rigorous, this would be a large burden, for it would mean understanding every license (including those of jurisdiction ports not in English) in detail.

The new version is also an even more complete rewrite of 3.0 than 3.0 was of previous versions, completing the “internationalization” of the license, and actually decreasing in length and increasing in readability.

Additionally, 4.0 consistently treats database (licensing them like other copyright-like rights) and moral rights (waiving them to the extent necessary to exercise granted freedoms) – in previous versions some jurisdiction ports treated these differently – and tentatively eliminates the need for jurisdiction ports. Official linguistic translations are underway (Finnish is the first completed) and no legal ports are planned for.

4.0 is the first version to explicitly exclude a patent (and less problematically, trademark) license. It also adds two features akin to those found in GPLv3: waiver of any right licensor may have to enforce anti-circumvention if DRM is applied to the work, and reinstatement of rights after termination if non-compliance corrected within 30 days.

Finally, 4.0 streamlines the attribution requirement, possibly of some advantage to massive long-term collaborations which historically have found copyleft licenses a good fit.

The 4.0 versioning process was much more extensively researched, public, and documented than previous CC-BY-SA versionings; see for the record and for a summary of final decisions.

CC-BY-SA-4.0 International section-by-section

FIXME arguably this section ought be the substance of the tutorial, but is very thin and weak now

FIXME formatted/section-referenced copy of license should be added to license-texts.tex and referenced throughout

The best course of action at this juncture would be to read – the entire text is fairly easy to read, and should be quickly understood if one has the benefit of study of other public licenses and of copyleft policy.

The following walk-through will simply call out portions of each section one may wish to study especially closely due to their pertinence to copyleft policy issues mentioned above.

FIXME subsections might not be the right structure or formatting here

1 – Definitions

The first three definitions – “Adapted Material”, “Adapter’s License”, and “BY-SA Compatible License” are crucial to understanding copyleft scope and compatibility.

2 – Scope

The license grant is what makes all four freedoms available to licensees. This section is also where waiver of DRM anti-circumvention is to be found, also patent and trademark exclusions.

3 – License Conditions

This section contains the details of the attribution and share-alike requirements; the latter read closely with aforementioned definitions describe the copyleft aspect of CC-BY-SA-4.0.

4 – Sui Generis Database Rights

This section describes how the previous grant and condition sections apply in the case of a database subject to sui generis database rights. This is an opportunity to go down a rabbit-hole of trying to understand sui generis database rights. Generally, this is a pointless exercise. You can comply with the license in the same way you would if the work were subject only to copyright – and determining whether a database is subject to copyright and/or sui generis database rights is another pit of futility. You can license databases under CC-BY-SA-4.0 and use databases subject to the same license as if they were any other sort of work.

5 – Disclaimer of Warranties and Limitation of Liability

Unsurprisingly, this section does its best to serve as an “absolute disclaimer and waiver of all liability.”

6 – Term and Termination

This section is similar to GPLv3, but without special provision for cases in which the licensor wishes to terminate even cured violations.

7 – Other Terms and Conditions

Though it uses different language, like the GPL, CC-BY-SA-4.0 does not allow additional restrictions not contained in the license. Unlike the GPL, CC-BY-SA-4.0 does not have an explicit additional permissions framework, although effectively a licensor can offer any other terms if they are the sole copyright holder (the license is non-exclusive), including the sorts of permissions that would be structured as additional permissions with the GPL. Creative Commons has sometimes called offering of separate terms (whether additional permissions or “proprietary relicensing”) the confusing name “CC+”; however where this is encountered at all it is usually in conjunction with one of the non-free CC licenses. Perhaps CC-BY-SA is not a strong enough copyleft to sometimes require additional permissions, or be used to gain commercially valuable asymmetric rights, in contrast with the GPL.

8 – Interpretation

Nothing surprising here. Note that CC-BY-SA does not “reduce, limit, restrict, or impose conditions on any use of the Licensed Material that could lawfully be made without permission under this Public License.” This is a point that Creative Commons has always been eager to make about all of its licenses. GPLv3 also “acknowledges your rights of fair use or other equivalent”. This may be a wise strategy, but should not be viewed as mandatory for any copyleft license – indeed, the ODbL attempts (somewhat self-contradictorily; it also acknowledges fair use or other rights to use) make its conditions apply even for works potentially subject to neither copyright nor sui generis database rights.


There are only a small number of court cases involving any Creative Commons license. Creative Commons lists these and some related cases at

Only two of those cases concern enforcing the terms of a CC-BY-SA license (Gerlach v. DVU in Germany, and No. 71036 N. v. Newspaper in a private Rabbinical tribunal) each hinged on attribution, not share-alike.

Further research could uncover out of compliance uses being brought into compliance without lawsuit, however no such research, nor any hub for conducting such compliance work, is known. Editors of Wikimedia Commons document some external uses of Commons-hosted media, including whether user are compliant with the relevant license for the media (often CC-BY-SA), resulting in a category listing non-compliant uses (which seem to almost exclusively concern attribution).

Compliance Resources

FIXME this section is just a stub; ideally there would also be an additional section or chapter on CC-BY-SA compliance

Creative Commons has a page on ShareAlike interpretation as well as an extensive Frequently Asked Questions for licensees which addresses compliance with the attribution condition.

English Wikipedia’s and Wikimedia Commons’ pages on using material outside of Wikimedia projects provide valuable information, as the majority of material on those sites is CC-BY-SA licensed, and their practices are high-profile.

FIXME there is no section on business use of CC-BY-SA; there probably ought to be as there is one for GPL, though there’d be much less to put.

Wikidata II

Thursday, October 30th, 2014

Wikidata went live two years ago, but the II in the title is also a reference to the first page called Wikidata on which for years collected ideas for first class data support in Wikipedia. I had linked to Wikidata I writing about the most prominent of those ideas, Semantic MediaWiki (SMW), which I later (8 years ago) called the most important software project and said would “turn the universal encyclopedia into the universal database while simultaneously improving the quality of the encyclopedia.”

SMW was and is very interesting and useful on some wikis, but turned out to be not revolutionary (the bigger story is wikis turned out to be not revolutionary, or only revolutionary on a small scale, except for Wikipedia) and not quite a fit for Wikipedia and its sibling projects. While I’d temper “most” and “universal” now (and should have 8 years ago), the actual Wikidata project (created by many of the same people who created SMW) is rapidly fulfilling general wikidata hopes.

One “improving the encyclopedia” hope that Wikidata will substantially deliver on over the next couple years and that I only recently realized the importance of is increasing trans-linguistic collaboration and availability of the sum of knowledge in many languages — when facts are embedded in free text, adding, correcting, and making available facts happens on a one-language-at-a-time basis. When facts about a topic are in Wikidata, they can be exposed in every language so long as labels are translated, even if on many topics nothing has ever been written about in nor translated into many languages. Reasonator is a great demonstrator.

Happy 2nd to all Wikidatians and Wikidata, by far the most important project for realizing Wikimedia’s vision. You can and should edit the data and edit and translate the schema. Browse Wikidata WikiProjects to find others working to describe topics of interest to you. I imagine some readers of this blog might be interested in WikiProjects Source MetaData (for citations) and Structured Data for Commons (the media repository).

For folks concerned about intellectual parasites, Wikidata has done the right thing — all data dedicated to the public domain with CC0.