Post Open Access

Research Ideas, Inputs, Impacts and Outcomes, Outputs

Saturday, December 26th, 2015

I’ve previously cheered Daniel Mietchen’s efforts to promote open access research proposals. Mietchen, with fellow researcher/OA activist Ross Mounce and ecologist/academic publisher Lyubomir Penev have recently launched Research Ideas and Outcomes (I sometimes misremember the name; title of this post may help others afflicted find RIO), a new open access mega journal that provides a venue for publishing the entire research cycle for almost any field of research.

The Wikidata for Research proposal that Mietchen also spearheaded and I again cheered is one of the first artifacts published in RIO.

I encourage everyone to read RIO’s opening editorial, Publishing the research process. I want to especially highlight the “highlighting social impact” section by making a copy:

Given that much of research is publicly funded and that public funding is limited, there is a growing interest in assessing the impact that research has on society beyond academia and in having this criterion influence decisions on whether and how public funds are to be spent on specific lines or fields of research (Roy 1985, Bornmann 2012, Reich and Myhrvold 2013).

Despite past criticisms of similar initiatives (e.g. Wright 2002), some researchers have called for support from the scientific community for the United Nations’ Sustainable Development Goals, seeing their role in “help[ing] to integrate monitoring and evaluation mechanisms into policy-making at all levels and ensure that information about our planet is easily available to all.” (Lu et al. 2015)

RIO addresses societal impact in several ways: (i) it is free to read, so that anyone interested can actually access it, (ii) it is openly licensed (CC BY 4.0 by default, with an option for CC0/Public Domain), so as to encourage the dissemination and reuse of its materials in other contexts, (iii) it is available in XML, which facilitates reuse by automated tools and integration with other platforms, (iv) it encourages authors to map their research to societal challenges it helps to address (and allows users to search and browse the journal by societal challenges they are interested in).

While the first three of these publishing practices are on the way to becoming standard in a growing range of disciplines, we are not aware of other journals to engage in the fourth one, but we encourage them to do so.

As another way to achieve societal impact, it has been suggested that researchers engage more in writing overview papers that summarize the state of knowledge in their field in a way that is accessible (in multiple senses of the word) to a broader audience, and that research evaluators should take such activities into account (Bornmann and Marx 2013). With that in mind, RIO offers the possibility to publish such overview papers as Policy Briefs.

When thinking of impact outside academia, another useful strategy is to bring research to places where non-academics might look for information. RIO will thus facilitate the creation of Wikipedia articles (Butler 2008, Logan et al. 2010), both on topics that have just been created through advances of scholarship (i.e. new methods or objects of study; e.g. RNA families, as in Daub et al. 2008) or on topics that have been studied for a while but not yet found decent coverage on the English Wikipedia (as pioneered for computational biology; Wodak et al. 2012).

Finally, RIO’s policies have been written with societal benefits in mind: they default to open sharing of all data and code underlying the research reported here and require public justification for exceptions to the open default. The primary effect of such an open default is an increase in the reproducibility and replicability and thus the reliability of research: the more of research workflows is being shared and the earlier the sharing occurs, the harder it will be for mistakes, systematic errors or fraud to go unnoticed. A welcome side effect of this is an increased educational value of the research and its documentation, and over time, we expect learners and educators, practitioners, journalists, artists, makers and others to engage with the research reported in RIO and with the associated data, code and materials.

RIO has a blog post on emphasizing research contribution to, e.g., the UN Sustainable Development Goals. I wholly endorse this emphasis, but the above excerpt is far richer, as it additionally tackles the social impact of academic publishing, which affects the social impact of all research. Not only does the secton cover the (should be) obvious open (free access, free permission, to/for forms suitable for modification) dimensions, but the huge opportunity to make research more accessible through summarization and cooperation with Wikipedians.

The only way the section could be improved would be for it to also mention macro impacts of commoning the knowledge economy, e.g., on equality and security. But I can’t blame the authors as I don’t know of great citations on these topics. I love Copyright and Inequality but it isn’t about research publications. I’ve got nothing on academic publishing and security, though recently widely discussed The Moral Character of Cryptographic Work is related. Please help correct my ignorance by pointing me at more on-point citations for these topics or by creating ones…why not start by publishing a proposal for such research in RIO?

Well, there is one other way the section could be improved: a mention of commoning academic publishing infrastructure. But, the software that runs RIO is not open source. I’d love to see a proposal published in RIO for funding whatever work would be needed to make the RIO platform open source.

If you’re interested in getting involved in RIO, you can apply to be a subject editor or editorial apprentice (see links on the RIO home page). If you’re working on any of the research or proposals mentioned in the two previous paragraphs and there’s any way I might be able to help, feel free to get in touch. I’m not an academic but am very keen to see progress in these areas!

Call for mini-essays on “the cost of freedom” in free knowledge movements in honor of Bassel Khartabil

Thursday, October 29th, 2015

Dear friends,

I’m helping organize a book titled “The Cost of Freedom” in honor of Bassel Khartabil, a contributor to numerous free/open knowledge projects worldwide and in Syria, where he’s been a political prisoner since 2012, missing and in grave danger since October 3. You can read about Bassel at and lots more at and

Much of the book is going to be created at a face-to-face Book Sprint in Marseille Nov 2-6; some info about that and the theme/title generally at

We’re also asking people like yourself who have been fighting in the trenches of various free knowledge movements (culture, software, science, etc.) to contribute brief essays for inclusion in the book. One form an essay might take is a paragraph on each of:

* An issue you’ve faced that was challenging to you in your free knowledge work, through the lens on “cost”; perhaps a career or time opportunity cost, or the cost of dealing with unwelcoming or worse participants, or the cost of “peeling off layer upon layer the proprietary way of life” as put in
* How you addressed this challenge, or perhaps have yet to do so completely
* Advice to someone starting out in free knowledge; perhaps along the lines of had you understood the costs, what would you have done differently

But feel free to be maximally creative within the theme. We don’t have a minimum or a maximum required length for contributed essays, but especially do not be shy about concision or form. If all we get is haiku that might be a problem, or there might be a message in that of some sort.

Other details: The book will be PUBLISHED on Nov 6. We need your contribution no later than the end of Nov 3 UTCThursday, Nov 5 at 11:00 UTC (Paris: noon; New York: 6AM; Tokyo: 9PM) to be included. The book will be released under CC0; giving up the “right” to sue anyone for any use whatsoever of your contribution is a cost of entry…or one of those proprietary layers to be peeled back. Send contributions to

Feel free to share this with other people who you know have something to say on this topic. We’re especially looking for voices underrepresented in free knowledge movements.


p.s. Please spread the word about #freebassel even if you can’t contribute to the book!

AcaWiki non-summary

Sunday, October 25th, 2015

Six years ago I helped launch AcaWiki, a site based on Semantic MediaWiki (software for which I had very high expectations, mostly transferred to Wikidata) for summarizing academic research.

A substantial community failed to materialize. I’ve probably been the only semi-consistent contributor over its entire six years. The best contributions have come from Jodi Schneider, who summarized a bunch of papers related to her research on the semantic web and online discourse, Benjamin Mako Hill, who summarized his PhD qualification exam readings, and Nate Matias who did the same and added a bunch of summaries related to online harassment. Students of an archaeology course taught by Ben Marwick summarized many papers as part of the class. Thank you Jodi, Mako, Nate, Ben, and a bunch of people who have each contributed one or a few summaries.

I’m not going to try to enumerate the deficiencies of AcaWiki here. They boil down to lack of time dedicated to outreach and to improving the site, and zero effort to raise funds to support such work, following a small startup grant obtained by AcaWiki’s founder Neeru Paharia, who has since been busy earning a doctorate and becoming a professor. With Neeru I’ve been the organization’s other long-term director so bear responsibility for this lack of effort. In retrospect dedicating more time to AcaWiki these last years at a cost to non-collaborative activity (e.g., this blog) would have been wise. I haven’t moved to take the other obvious course of shutting down the site, because I still believe something like it is badly needed, not least by me, as I wrote in 2009:

This could be seen as an end-run around access and copyright restrictions (the Open Access movement has made tremendous progress though there is still much to be done), but AcaWiki is a very partial solution to that problem — sometimes an article summary (assuming AcaWiki has one) would be enough, though often a researcher would still need access to the full paper (and the full dataset, but that’s another battle).

More interesting to me is the potential for AcaWiki summaries to increase the impact of research by making it more accessible in another way — comprehensible to non-specialists and approachable by non-speedreaders. I read a fair number of academic papers and many more get left on my reading queue unread. A “human readable” distillation of the key points of articles (abstracts typically convey next to nothing or are filled with jargon) would really let me ingest more.

This has held true even given AcaWiki’s tiny size to date: I regularly look back at summaries I’ve written to remember what I’ve read, and wish I summarized much more of what I’ve read, because most of it I’ve almost totally forgotten! I recommend summarizing papers even though it is hard.

Much harder still and more valuable are literature reviews. These were envisioned to be a part of AcaWiki, but I now think that every Wikipedia article should effectively be a literature review (and more). A year ago I blogged about an example of Wikipedia article as literature review led by James Heilman. Earlier this year Heilman wrote a call to action around a genre of literature review, Open Access to a High-Quality, Impartial, Point-of-Care Medical Summary Would Save Lives: Why Does It Not Exist? (which of course I summarized on AcaWiki). I have a partially written commentary on this piece but for now I can only urge you to read Heilman, or start with and improve my summary.

This brings me to one of my excuses for not dedicating more time to AcaWiki: hope that it would be superseded by a project directly under the Wikimedia umbrella, benefiting from that organization’s and movement’s scale. But, I’ve done almost nothing to make this happen, either. I imagine the current effort that could lead in that direction is WikiProject Open Signalling OA-ness, as I’ve noted at the top of a page on AcaWiki listing similar projects. By far the best project on the list is Journalist’s Resource, also launched in 2009, with vastly greater resources. The projects listed so far as “similar” must only the tip of an iceberg of efforts to summarize academic research, for it’s widely recognized (yes, citation needed; I just created a placeholder on AcaWiki for gathering these) that summarization in various forms is valuable and much more is needed.

If this hasn’t been enough of a ramble already, I’ll close with miscellaneous notes about and unsorted to-dos AcaWiki:

  • Very brief summaries, perhaps 140 character or not much longer, would be useful complements to longer summaries. It would be easy to add a short summary field to AcaWiki.
  • For summaries of articles which are themselves freely licensed, it might be useful to include the author’s abstract in AcaWiki. Again, it would be easy to add a field.
  • There’s lots of research on automated summarization, some of it producing open source tools. These could be applied to initialize summaries, either for human summaries, or en masse bot summary creation.
  • I have added a field for an article’s Wikidata identifier. AcaWiki is one of a handful of sites potentially using Wikidata for authority control. There will be many more. But it’d be far more useful to do something with that identifier, most obviously to ingest article metadata from Wikidata and create Wikidata items/push metadata to Wikidata where items corresponding to summarized articles do not exist. I’ve not yet seriously looked into how much of this can be currently accomplished using Wikibase Client.
  • Last month there was debate about a program giving some Wikipedia contributors gratis access to closed academic journals. Does this program help improve Wikipedia as a free resource, or promote non-free literature? It must do some of both; which is the bigger impact on long-term free knowledge outcomes probably depends on one’s perspective. My bias is that improving and promoting free resources is vastly more important than suppressing non-free ones. But I also think that free academic summaries could help in both respects. For Wikipedia readers, a reference with an immediately available summary would be more useful than one without. The summary would also reduce the need to access the original non-free article. AcaWiki in its current state is inadequate, but perhaps the the debate ought motivate more work on free academic summaries, here or elsewhere.
  • Has any closed access publisher freed only article abstracts (including a free license; abstracts are almost always gratis access)? This would be useful to a site like AcaWiki at the least, especially if abstracts were more consistently useful.
  • Should the scope of AcaWiki be explicitly expanded to include summarizing material that is somehow academic but is not in the form of a peer-reviewed paper published in an academic journal? Some of the summaries I’ve contributed are for books or grey literature.
  • Periodically it’s been suggested to change the default license for AcaWiki summaries from CC-BY to CC-BY-SA. I should add updated thoughts at the link.
  • Some time ago in order to put a stop to the creation of spam accounts, I enabled the ConfirmAccount extension, which forces users who want to contribute to fill out an account request form. I admit this is hugely annoying. I have done zero research into it, but I would love to have an extension which auto-enables account creation based on some external authentication and reputation, e.g., Wikimedia wiki accounts or even users followed/subscribed to/endorsed by existing AcaWiki users on other sites, e.g., social networks.
  • Upgrade site to https when Let’s Encrypt becomes generally available. Alternatively, see if it is possible to move hosting (currently a $10/month Digital Ocean VPS) to Miraheze, which mandates https.
  • I intended to write an update on AcaWiki for Open Access Week (October 19-25). I only realized after beginning that AcaWiki was recently 6 years old.
  • I’m going to ping the people who have contributed to AcaWiki so far to look at this post and provide feedback. What would it take for them to feel good about recommending others do what they’ve done, e.g., summarizing PhD or research program readers, or assigning contributing or improving AcaWiki summaries to their classes? Or if something else entirely should be done to push forward free summarization of academic literature, what is that something?
  • For some time Fabricatorz did a bit of work on and hosted AcaWiki. From my email correspondence I see that Bassel Khartabil did some of that. As I’ve blogged before (1, 2, 3), Bassel has been detained by the Syrian government since 2012. Recently he has gone missing and presumably is in grave danger. Props to his Frabricatorz and many other friends who have done more to raise awareness of Bassel’s plight than I would have imagined possible when writing those previous posts. See for info and links, and spread the word. I’ll add a note about #freebassel to the AcaWiki home page (which badly needs a general revamp) shortly.

If any of this interests you, get in touch or merely watch for updates on the acawiki-general mailing list, AcaWiki on, Twitter, or Facebook, or blog comments below, or the AcaWiki site.

Democratizing Wikimedia Innovation

Wednesday, May 27th, 2015

Through the end of this month the Wikimedia community is electing 3 members of the Wikimedia Foundation board. You qualify to vote if you’ve made at least 300 edits before April 15 and 20 between October 15 and April 15 to any Wikimedia project.

If you don’t quality to vote, it won’t be hard to do so for next time if you get started now: Log in or create an account and be bold when you see a typo, incorrect or missing information in a Wikipedia article. Familiarize yourself with Wikipedia’s sibling projects; edits to any of them count. Play the Wikidata Game. I heartily recommend doing these things as a matter of learning and sharing knowledge regardless of desire to vote in Wikimedia elections or lower threshold and more fun votes such as for the Wikimedia Commons Picture of the Year. The current election is just an excuse for inserting this Public Service Announcement. ;-)

If you do qualify to vote, please do. I voted for Denny Vrandečić and give him the strongest possible endorsement. I also voted for and endorse James Heilman.

The election uses approval/disapproval ratio to determine winners, so disapproval votes are powerful. I made a few but don’t want to publish because frankly all of the candidates are excellent and extremely qualified for a Wikimedia Foundation community board seat.

community-centered theory of changeThe central issue in this election is evident in the Candidate statements, discussion, structured Q&A (1, 2, 3, 4), in a series of blog posts by Pete Forsyth (who was briefly a candidate but stepped aside), and outside the context of the current election, in blog posts by Lane Raspberry and Nimish Gautam., and the one message I’ve sent on the issue, which the first paragraph of Vrandečić’s candidate statement sums up:

Wikimedia is a modern wonder – and yet, it must change: most of our projects, as they are today, cannot truly succeed. To achieve our mission, we must increase the effectivity of every single contributor. At the same time, the communities are often seen as change resistant – but falsely so: they do welcome change, done right, as I have shown with Wikidata.

Along these lines, I especially commend Vrandečić’s and Heilman’s answers to the following Q&A topics: Use of Superprotect and respect for community consensus, Retaining current volunteers versus recruiting new ones, Improving content, and Diversity and scope.

It’s commonplace for central organizations (of which I am a fan) to neglect or denigrate communities they serve, whether the relationship is one of collaboration, constituency, or consumption. Sometimes a version of neglect is even the right behavior, e.g., a product or project with some users may need to be EOL’d. But most organizations could do much better. It is essential that the Wikimedia Foundation do so, as the people who edit or otherwise contribute to the various Wikimedia projects are its key competitive advantage. If Wikimedia and other commons-based peer production projects are to stay relevant, nevermind helping achieve world liberation, they need to figure out how to become more effective, starting with embracing the idea that most of the vision and innovation needed to do so will come from the community, not the central organizations, and implementation done in partnership with the community.

Unrelated to the community issue, I’ve previously blog cheered Vrandečić’s and Heilman’s work on Wikidata and Wikipedia/medical journal collaboration respectively.

Tangential ex-Wikimedia Foundation links:

I was very sad to read that Erik Moeller recently left the foundation, where he was Deputy Director. Though he seemed to endorse the organization/community vision dichotomy (my one message linked above is a mailing list reply to him), in my view he is perhaps the best example in the Wikimedia universe of community vision — he had written about and many cases prototyped most of the innovations the foundation is still working on implementing, many years later, before becoming an employee.

Moeller has since started a podcast, interviewing another ex-Wikimedia Foundation person, Sumana Harihareswara, for the first episode.

Harihareswara has two recent posts on Crooked Timber, Codes of conduct and the trade-offs of copyleft and Where are the women in the history of open source? I found them both very interesting and left comments.

Former Wikimedia Foundation Executive Director Sue Gardner is now “developing a strategic plan for and with the Tor Project” and separately researching “the broader state of ‘freedom tech’ — all the tools and technologies that enable free speech, free assembly, and freedom of the press.” That’s great news; Tor and other ‘freedom tech’ tools are incredibly exciting and important. But, a moment of critical cheering: as I noted around the time Gardner stepped down as WMF ED, I’m inclined to think that re-routing the knowledge economy is even more important than tools that can route around censorship for a good future. The former is what Wikimedia projects do.

Hello World Intellectual Freedom Organization

Saturday, April 25th, 2015

Today I’m soft launching an initiative that I’ve been thinking about for 20 years, obtained a domain name for in 1998, blogged about once in 2004, and the last few years have been exploring on this blog without naming it. See the first items in my annual thematic doubt posts for 2013 and 2014: “protecting and promoting intellectual freedom, in particular through the mechanisms of free/open/knowledge commons movements, and in reframing information and innovation policy with freedom and equality outcomes as top.”

I call it the World Intellectual Freedom Organization (WIFO).

Read about its theory, why a new organization, proposed activities, and how you can help/get involved.

Why today? Because April 26 is World Intellectual Freedom Day, occupying and displacing World Intellectual Property Day, just as intellectual freedom must occupy and displace intellectual property for a good future. Consider this 0th World Intellectual Freedom Day another small step forward, following last year’s Without Intellectual Property Day.

Why a soft launch? Because I’m eager to be public about WIFO, but there’s tons of work to do before it can properly be considered launched. I’ve been getting feedback from a handful of people on a quasi-open fellowship proposal for WIFO (that’s where the activities link above points to) and apologize to the many other people I should’ve reached out to. Well, now I’m doing that. I want your help in this project of world liberation!

Video version of my proposal at the Internet Archive or YouTube. My eyes do not lie, I am reading in an attempt to fit too much material in 5 minutes.

I’ll probably blog much less here about “IP” and commons/free/libre/open issues here from now on, especially after opening a WIFO blog (for now there’s a Discourse forum; most of the links above point there). Not to worry, I am overflowing with idiosyncratic takes on everything else, and will continue to post accordingly here, as much as time permits. ☻

Be sure to celebrate the 0th World Intellectual Freedom Day, even if only momentarily and with your lizard brain.


Sunday, December 21st, 2014

Recently I’ve uncritically cheered for Wikidata as “rapidly fulfilling” hopes to “turn the universal encyclopedia into the universal database while simultaneously improving the quality of the encyclopedia.” In April I uncritically cheered for Daniel Mietchen’s open proposal for research on opening research proposals.

Let’s combine the two: an open proposal for work toward establishing Wikidata (including its community, data, ontologies, practices, software, and external tools) as a “collaborative hub around research data” responding to a European Commission call on e-infrastructures. That would be Wikidata for Research (WD4R), instigated by Mietchen, who has already assembled an impressive set of partner institutions and an outline of work packages. The proposal is being drafted in public (you can help) and will be submitted January 14.


The proposal will be strong on its own merits, and very well aligned with the stated desired outcomes from the EC call, and the open proposal dogfood angle is also great. I added for all to this post’s title because I suspect WD4R will be a great for pushing Wikidata toward realizing aforementioned “universal database” hopes (which again means not just the data, but community, tools, etc.; “virtual research environment” is one catch-all term) and will make Wikidata much more useful “research” most broadly construed (e.g., by students, journalists, knowledge workers, anyone), potentially much faster than would happen otherwise.

My suspicion has two bases (please correct me if I’m wrong about either):

  1. A database or virtual environment “for research” might give the impression of someplace to dump data from or perform experiments. Maybe that would be appropriate for Wikidata in some instance, but the overwhelming research-supporting use would seem to be mass collaboration in consolidating, annotating, and correcting data and ontologies which many researchers (and researchers-broadly-construed, everyone) can benefit from, either querying or referencing directly, or extracting and using elsewhere. The pre-existing Gene Wiki project which is beginning to use Wikidata is an example of such useful-to-all collections (as referenced in the WD4R pages).
  2. One of the proposed work packages is to identify and work on features needed for research but not on, or not prioritized on, the Wikidata development plan. I suspect other Wikimedia projects can tremendously benefit from Wikidata integration without Wikidata itself or external tools supporting complex queries and reporting that would be called for by a virtual research environment — and also called for to realize “universal database” hopes. Wikidata’s existing plan looks good to me; here I’m just saying WD4R might help it be even better, faster.

The previously linked Gene Wiki post includes:

For more than a decade many different groups have proposed and many have implemented solutions to this challenge using standards and techniques from the Semantic Web. Yet, today, the vast majority of biological data is still accessed from individual databases such as Entrez Gene that make no attempt to use any component of the Semantic Web or to otherwise participate in the Linked Open Data movement. With a few notable exceptions, the data silos have only gotten larger and problems of fragmentation worse.
Now, we are working to see if Wikidata can be the bridge between the open community-driven power of Wikipedia and the structured world of semantic data integration. Can the presence of that edit button on a centralized knowledge base associated with Wikipedia help the semantic web break through into everyday use within our community?

I agree that massive centralized commons-oriented resources are needed for decentralization to progress (link analogous but not the same — linked open data : federation :: data silos : messaging silos).

Check out Mietchen’s latest WD4R blog post and the WD4R project page.

Monday, December 1st, 2014

Last month the Free Software Foundation and Software Freedom Conservancy launched, “a collaborative project to create and disseminate useful information, tutorial material, and new policy ideas regarding all forms of copyleft licensing.” The main feature of the project now is a 157 page tutorial on the GPL which assembles material developed over the past 10 years and a new case study. I agreed to write a first draft of material covering CC-BY-SA, the copyleft license most widely used for non-software works. My quote in the announcement: “I’m glad to bring my knowledge about the Creative Commons copyleft licenses as a contribution to improve further this excellent tutorial text, and I hope that as a whole can more generally become a central location to collect interesting ideas about copyleft policy.”

I tend to offer apologia to copyleft detractors and criticism to copyleft advocates, and cheer whatever improvements to copyleft licenses can be mustered (I hope to eventually write a cheery post about the recent compatibility of CC-BY-SA and the Free Art License), but I’m far more interested in copyleft licenses as prototypes for non-copyright policy.

For now, below is that first draft. It mostly stands alone, but might be merged in pieces as the tutorial is restructured to integrate material about non-GPL and non-software copyleft licenses. Your patches and total rewrites welcome!

Detailed Analysis of the Creative Commons Attribution-ShareAlike Licenses

This tutorial gives a comprehensive explanation of the most popular free-as-in-freedom copyright licenses for non-software works, the Creative Commons Attribution-ShareAlike (“CC-BY-SA”, or sometimes just “BY-SA”) – with an emphasis on the current version 4.0 (“CC-BY-SA-4.0”).

Upon completion of this part of the tutorial, readers can expect to have learned the following:

  • The history and role of copyleft licenses for non-software works.
  • The differences between the GPL and CC-BY-SA, especially with respect to copyleft policy.
  • The basic differences between CC-BY-SA versions 1.0, 2.0, 2.5, and 4.0.
  • An understanding of how CC-BY-SA-4.0 implements copyleft.
  • Where to find more resources about CC-BY-SA compliance.

FIXME this list should be more aggressive, but material is not yet present

WARNING: As of November 2014 this part is brand new, and badly needs review, referencing, expansion, error correction, and more.

Freedom as in Free Culture, Documentation, Education…

Critiques of copyright’s role in concentrating power over and making culture inaccessible have existed throughout the history of copyright. Few contemporary arguments about “copyright in the digital age” have not already been made in the 1800s or before. Though one can find the occasional ad hoc “anti-copyright”, “no rights reserved”, or pro-sharing statement accompanying a publication, use of formalized public licenses for non-software works seems to have begun only after the birth of the free software movement and of widespread internet access among elite populations.

Although they have much older antecedents, contemporary movements to create, share, and develop policy encouraging “cultural commons”, “open educational resources”, “open access scientific publication” and more, have all come of age in the last 10-15 years – after the huge impact of free software was unmistakable. Additionally, these movements have tended to emphasize access, with permissions corresponding to the four freedoms of free software and the use of fully free public licenses as good but optional.

It’s hard not to observe that it seems the free software movement arose more or less shortly after as it became desirable (due to changes in the computing industry and software becoming unambiguously subject to copyright in the United States by 1983), but non-software movements for free-as-in-freedom knowledge only arose after they became more or less inevitable, and only begrudgingly at that. Had a free culture “constructed commons” movement been successful prior to the birth of free software, the benefits to computing would have been great – consider the burdens of privileged access to proprietary culture for proprietary software through DRM and other mechanisms, toll access to computer science literature, and development of legal mechanisms and policy through pioneering trial-and-error.

Alas, counterfactual optimism does not change the present – but might embolden our visions of what freedom can be obtained and defended going forward. Copyleft policy will surely continue to be an important and controversial factor, so it’s worth exploring the current version of the most popular copyleft license intended for use with non-software works, Creative Commons Attribution-ShareAlike 4.0 International (CC-BY-SA-4.0), the focus of this tutorial.

Free Definitions

When used to filter licenses, the Free Software Definition and Open Source Definition have nearly identical results. For licenses primarily intended for non-software works, the Definition of Free Cultural Works and Open Definition similarly have identical results, both with each other and with the software definitions which they imitate. All copyleft licenses for non-software works must be “free” and “open” per these definitions.

There are various other definitions of “open access”, “open content”, and “open educational resources” which are more subject to interpretation or do not firmly require the equivalent of all four freedoms of the free software definition. While these definitions are not pertinent to circumscribing the concept of copyleft – which is about enforcing all four freedoms, for everyone. But copyleft licenses for non-software works are usually considered “open” per these other definitions, if they are considered at all.

The open access to scientific literature movement, for example, seems to have settled into advocacy for non-copyleft free licenses (CC-BY) on one hand, and acceptance of highly restrictive licenses or access without other permissions on the other. This creates practical problems: for example, nearly all scientific literature either may not be incorporated into Wikipedia (which uses CC-BY-SA) or may not incorporate material developed on Wikipedia – both of which do happen, when the licenses allow it. This tutorial is not the place to propose solutions, but let this problem be a motivator for encouraging more widespread understanding of copyleft policy.

Non-software Copylefts

Copyleft is a compelling concept, so unsurprisingly there have been many attempts to apply it to non-software works – starting with use of GPLv2 for documentation, then occasionally for other texts, and art in various media. Although the GPL was and is perfectly usable for any work subject to copyright, several factors were probably important in preventing it from being the dominant copyleft outside of software:

  • the GPL is clearly intended first as a software license, thus requiring some perspective to think of applying to non-software works;
  • the FSF’s concern is software, and the organization has not strongly advocated for using the GPL for non-software works;
  • further due to the (now previous) importance of its hardcopy publishing business and desire to retain the ability to take legal action against people who might modify its statements of opinion, FSF even developed a non-GPL copyleft license specifically for documentation, the Free Documentation License (FDL; which ceases to be free and thus is not a copyleft if its “invariant sections” and similar features are used);
  • a large cultural gap and lack of population overlap between free software and other movements has limited knowledge transfer and abetted reinvention and relearning;
  • the question of what constitutes source (“preferred form of the work for making modifications”) for many non-software works.

As a result, several copyleft licenses for non-software works were developed, even prior to the existence of Creative Commons. These include the aforementioned FDL (1998), Design Science License (1999), Open Publication License (1999; like the FDL it has non-free options), Free Art License (2000), Open Game License (2000; non-free options), EFF Open Audio License (2001), LinuxTag Green OpenMusic License (2001; non-free options) and the QING Public License (2002). Additionally several copyleft licenses intended for hardware designs were proposed starting in the late 1990s if not sooner (the GPL was then and is now also commonly used for hardware designs, as is now CC-BY-SA).1

At the end of 2002 Creative Commons launched with 11 1.0 licenses and a public domain dedication. The 11 licenses consisted of every non-mutually exclusive combination of at least one of the Attribution (BY), NoDerivatives (ND), NonCommercial (NC), and ShareAlike (SA) conditions (ND and SA are mutually exclusive; NC and ND are non-free). Three of those licenses were free (as was the public domain dedication), two of them copyleft: CC-SA-1.0 and CC-BY-SA-1.0.

Creative Commons licenses with the BY condition were more popular, so the 5 without (including CC-SA) were not included in version 2.0 of the licenses. Although CC-SA had some advocates, all who felt very strongly in favor of free-as-in-freedom, its incompatibility with CC-BY-SA (meaning had CC-SA been widely used, the copyleft pool of works would have been further fragmented) and general feeling that Creative Commons had created too many licenses led copyleft advocates who hoped to leverage Creative Commons to focus on CC-BY-SA.

Creative Commons began with a small amount of funding and notoriety, but its predecessors had almost none (FSF and EFF had both, but their entries were not major focuses of those organizations), so Creative Commons licenses (copyleft and non-copyleft, free and non-free) quickly came to dominate the non-software public licensing space. The author of the Open Publication License came to recommend using Creative Commons licenses, and the EFF declared version 2.0 of the Open Audio License compatible with CC-BY-SA and suggested using the latter. Still, at least one copyleft license for “creative” works was released after Creative Commons launched: the Against DRM License (2006), though it did not achieve wide adoption. Finally a font-specific copyleft license (SIL Open Font License) was introduced in 2005 (again the GPL, with a “font exception”, was and is now also used for fonts).

Although CC-BY-SA was used for licensing “databases” almost from its launch, and still is, copyleft licenses specifically intended to be used for databases were proposed starting from the mid-2000s. The most prominent of those is the Open Database License (ODbL; 2009). As we can see public software licenses following the subjection of software to copyright, interest in public licenses for databases followed the EU database directive mandating “sui generis database rights”, which began to be implemented in member state law starting from 1998. How CC-BY-SA versions address databases is covered below.

Aside on share-alike non-free therefore non-copylefts

Many licenses intended for use with non-software works include the “share-alike” aspect of copyleft: if adaptations are distributed, to comply with the license they must be offered under the same terms. But some (excluding those discussed above) do not grant users the equivalent of all four software freedoms. Such licenses aren’t true copylefts, as they retain a prominent exclusive property right aspect for purposes other than enforcing all four freedoms for everyone. What these licenses create are “semicommons” or mixed private property/commons regimes, as opposed to the commons created by all free licenses, and protected by copyleft licenses. One reason non-free public licenses might be common outside software, but rare for software, is that software more obviously requires ongoing maintenance.2 Without control concentrated through copyright assignment or highly asymmetric contributor license agreements, multi-contributor maintenance quickly creates an “anticommons” – e.g., nobody has adequate rights to use commercially.

These non-free share-alike licenses often aggravate freedom and copyleft advocates as the licenses sound attractive, but typically are confusing, probably do not help and perhaps stymie the cause of freedom. There is an argument that non-free licenses offer conservative artists, publishers, and others the opportunity to take baby steps, and perhaps support better policy when they realize total control is not optimal, or to eventually migrate to free licenses. Unfortunately no rigorous analysis of any of these conjectures exists. The best that can be done might be to promote education about and effective use of free copyleft licenses (as this tutorial aims to do) such that conjectures about the impact of non-free licenses become about as interesting as the precise terms of proprietary software EULAs – demand freedom instead.

In any case, some of these non-free share-alike licenses (also watch out for aforementioned copyleft licenses with non-free and thus non-copyleft options) include: Open Content License (1998), Free Music Public License (2001), LinuxTag Yellow, Red, and Rainbow OpenMusic Licenses (2001), Open Source Music License (2002), Creative Commons NonCommercial-ShareAlike and Attribution-NonCommercial-ShareAlike Licenses (2002), Common Good Public License (2003), and Peer Production License (2013). CC-BY-NC-SA is by far the most widespread of these, and has been versioned with the other Creative Commons licenses, through the current version 4.0 (2013).

Creative Commons Attribution-ShareAlike

The remainder of this tutorial exclusively concerns the most widespread copyleft license intended for non-software works, Creative Commons Attribution-ShareAlike(CC-BY-SA). But, there are actually many CC-BY-SA licenses – 5 versions (6 if you count version 2.1, a bugfix for a few jurisdiction “porting” mistakes), ports to 60 jurisdictions – 96 distinct CC-BY-SA licenses in total. After describing CC-BY-SA and how it differs from the GPL at a high level, we’ll have an overview of the various CC-BY-SA licenses, then a section-by-section walkthrough of the most current and most clear of them – CC-BY-SA-4.0.

CC-BY-SA allows anyone to share and adapt licensed material, for any purpose, subject to providing credit and releasing adaptations under the same terms. The preceding sentence is a severe abridgement of the “human readable” license summary or “deed” provided by Creative Commons at the canonical URL for one of the CC-BY-SA licenses – the actual license or “legalcode” is a click away. But this abridgement, and the longer the summary provided by Creative Commons are accurate in that they convey CC-BY-SA is a free, copyleft license.

GPL and CC-BY-SA differences

FIXME this section ought refernence GPL portion of tutorial extensively

There are several differences between the GPL and CC-BY-SA that are particularly pertinent to their analysis as copyleft licenses.

The most obvious such difference is that CC-BY-SA does not require offering works in source form, that is their preferred form for making modifications. Thus CC-BY-SA makes a huge tradeoff relative to the GPL – CC-BY-SA dispenses with a whole class of compliance questions which are more ambiguous for some creative works than they are for most software – but in so doing it can be seen as a much weaker copyleft.

Copyleft is sometimes described as a “hack” or “judo move” on copyright, but the GPL makes two moves, though it can be hard to notice they are conceptually different moves, without the contrast provided by a license like CC-BY-SA, which only substantially makes one move. The first move is to neutralize copyright restrictions – adaptations, like the originally licensed work, will effectively not be private property (of course they are subject to copyright, but nobody can exercise that copyright to prevent others’ use). If copyright is a privatized regulatory system (it is), the first move is deregulatory. The second move is regulatory – the GPL requires offer of source form, a requirement that would not hold if copyright disappeared, absent a different regulatory regime which mandated source revelation (one can imagine such a regime on either “pragmatic” grounds, e.g., in the interest of consumer protection, or on the grounds of enforcing software freedom as a universal human right).

FIXME analysis of differences in copyleft scope (eg interplay of derivative works, modified copies, collections, aggregations, containers) would be good here but might be difficult to avoid novel research

CC-BY-SA makes the first move3 but adds the second in a limited fashion. It does not require offer of preferred form for modification nor any variation thereof (e.g., the FDL requires access to a “transparent copy”). CC-BY-SA does prohibit distribution with “effective technical measures” (i.e., digital restrictions management or DRM) if doing so limits the freedoms granted by the license. We can see that this is regulatory because absent copyright and any regime specifically limiting DRM, such distribution would be perfectly legal. Note the GPL does not prohibit distribution with DRM, although its source requirement makes DRM superfluous, and somewhat analogously, of course GPLv3 carefully regulates distribution of GPL’d software with locked-down devices – to put it simply, it requires keys rather than prohibiting locks. Occasionally a freedom advocate will question whether CC-BY-SA’s DRM prohibition makes CC-BY-SA a non-free license. Few if any questioners come down on the side of CC-BY-SA being non-free, perhaps for two reasons: first, overwhelming dislike of DRM, thus granting the possibility that CC-BY-SA’s approach could be appropriate for a license largely used for cultural works; second, the DRM prohibition in CC-BY-SA (and all CC licenses) seems to be mainly expressive – there are no known enforcements, despite the ubiquity of DRM in games, apps, and media which utilize assets under various CC licenses.

Another obvious difference between the GPL and CC-BY-SA is that the former is primarily intended to be used for software, and the latter for cultural works (and, with version 4.0, databases). Although those are the overwhelming majority of uses of each license, there are areas in which both are used, e.g., for hardware design and interactive cultural works, where there is not a dominant copyleft practice or the line between software and non-software is not absolutely clear.

This brings us to the third obvious difference, and provides a reason to mitigate it: the GPL and CC-BY-SA are not compatible, and have slightly different compatibility mechanisms. One cannot mix GPL and CC-BY-SA works in a way that creates a derivative work and comply with either of them. This could change – CC-BY-SA-4.0 introduced4 the possibility of Creative Commons declaring CC-BY-SA-4.0 one-way (as a donor) compatible with another copyleft license – the GPL is obvious candidate for such compatibility. Discussion is expected to begin in late 2014, with a decision sometime in 2015. If this one-way compatibility were to be enacted, one could create an adaptation of a CC-BY-SA work and release the adaptation under the GPL, but not vice-versa – which makes sense given that the GPL is the stronger copyleft.

The GPL has no externally declared compatibility with other licenses mechanism (and note no action from the FSF would be required for CC-BY-SA-4.0 to be made one-way compatible with the GPL). The GPL’s compatibility mechanism for later versions of itself differs from CC-BY-SA’s in two ways: the GPL’s is optional, and allows for use of the licensed work and adaptations under later versions; CC-BY-SA’s is non-optional, but only allows for adaptations under later versions.

Fourth, using slightly different language, the GPL and CC-BY-SA’s coverage of copyright and similar restrictions should be identical for all intents and purposes (GPL explicitly notes “semiconductor mask rights” and CC-BY-SA-4.0 “database rights” but neither excludes any copyright-like restrictions). But on patents, the licenses are rather different. CC-BY-SA-4.0 explicitly does not grant any patent license, while previous versions were silent. GPLv3 has an explicit patent license, while GPLv2’s patent license is implied (see [gpl-implied-patent-grant] and [GPLv3-drm] for details). This difference ought give serious pause to anyone considering use of CC-BY-SA for works potentially subject to patents, especially any potential licensee if CC-BY-SA licensor holds such patents. Fortunately Creative Commons has always strongly advised against using any of its licenses for software, and that advice is usually heeded; but in the space of hardware designs Creative Commons has been silent, and unfortunately from a copyleft (i.e., use mechanisms at disposal to enforce user freedom) perspective, CC-BY-SA is commonly used (all the more reason to enable one-way compatibility, allowing such projects to migrate to the stronger copyleft).

The final obvious difference pertinent to copyleft policy between the GPL and CC-BY-SA is purpose. The GPL’s preamble makes it clear its goal is to guarantee software freedom for all users, and even without the preamble, it is clear that this is the Free Software Foundation’s driving goal. CC-BY-SA (and other CC licenses) state no purpose, and (depending on version) are preceded with a disclaimer and neutral “considerations for” licensors and licensees to think about (the CC0 public domain dedication is somewhat of an exception; it does have a statement of purpose, but even that has more of a feel of expressing yes-I-really-mean-to-do-this than a social mission). Creative Commons has always included elements of merely offering copyright holders additional choices and of purposefully creating a commons. While CC-BY-SA (and initially CC-SA) were just among the 11 non-mutually exclusive combinations of “BY”, “NC”, “ND”, and “SA”, freedom advocates quickly adopted CC-BY-SA as “the” copyleft for non-software works (surpassing previously existing non-software copylefts mentioned above). Creative Commons has at times recognized the special role of CC-BY-SA among its licenses, e.g., in a statement of intent regarding the license made in order to assure Wikimedians considering changing their default license from the FDL to CC-BY-SA that the latter, including its steward, was acceptably aligned with the Wikimedia movement (itself probably more directly aligned with software freedom than any other major non-software commons).

FIXME possibly explain why purpose might be relevant, eg copyleft instrument as totemic expression, norm-setting, idea-spreading

FIXME possibly mention that CC-BY-SA license text is free (CC0)

There are numerous other differences between the GPL and CC-BY-SA that are not particularly interesting for copyleft policy, such as the exact form of attribution and notice, and how license translations are handled. Many of these have changed over the course of CC-BY-SA versioning.

CC-BY-SA versions

FIXME section ought explain jurisdiction ports

This section gives a brief overview of changes across the main versions (1.0, 2.0, 2.5, 3.0, and 4.0) of CC-BY-SA, again focused on changes pertinent to copyleft policy. Creative Commons maintains a page detailing all significant changes across versions of all of its CC-BY* licenses, in many cases linking to detailed discussion of individual changes.

As of late 2014, versions 2.0 (the one called “Generic”; there are also 18 jurisdiction ports) and 3.0 (called “Unported”; there are also 39 ports) are by far the most widely used. 2.0 solely because it is the only version that the proprietary web image publishing service Flickr has ever supported. It hosts 27 million CC-BY-SA-2.0 photos 5 and remains the go-to general source for free images (though it may eventually be supplanted by Wikimedia Commons, some new proprietary service, or a federation of free image sharing sites, perhaps powered by GNU MediaGlobin). 3.0 both because it was the current version far longer (2007-2013) than any other and because it has been adopted as the default license for most Wikimedia projects.

However apart from the brief notes on each version, we will focus on 4.0 for a section-by-section walkthrough in the next section, as 4.0 is improved in several ways, including understandability, and should eventually become the most widespread version, both because 4.0 is intended to remain the current version for the indefinite and long future, and it would be reasonable to predict that Wikimedia projects will make CC-BY-SA-4.0 their default license in 2015 or 2016.

FIXME subsections might not be the right strcuture or formatting here

1.0 (2002-12-16)

CC-BY-SA-1.0 set the expectation for future versions. But the most notable copyleft policy feature (apart from the high level differences with GPLv2, such as not requiring source) was no measure for compatibility with future versions (nor with the CC-SA-1.0, also a copyleft license, nor with pre-existing copyleft licenses such as GPL, FDL, FAL, and others, nor with CC jurisdiction ports, of which there were 3 for 1.0).

2.0 (2004-05-25)

CC-BY-SA-2.0 made itself compatible with future versions and CC jurisdiction ports of the same version. Creative Commons did not version CC-SA, leaving CC-BY-SA-2.0 as “the” CC copyleft license. CC-BY-SA-2.0 also adds the only clarification of what constitutes a derivative work, making “synchronization of the Work in timed-relation with a moving image” subject to copyleft.

2.5 (2005-06-09)

CC-BY-SA-2.5 makes only one change, to allow licensor to designate another party to receive attribution. This does not seem interesting for copyleft policy, but the context of the change is: it was promoted by the desire to make attribution of mass collaborations easy (and on the other end of the spectrum, to make it possible to clearly require giving attribution to a publisher, e.g., of a journal). There was a brief experiment in branding CC-BY-SA-2.5 as the “CC-wiki” license. This was an early step toward Wikimedia adopting CC-BY-SA-3.0, four years later.

3.0 (2007-02-23)

CC-BY-SA-3.0 introduced a mechanism for externally declaring bilateral compatibility with other licenses. This mechanism to date has not been used for CC-BY-SA-3.0, in part because another way was found for Wikimedia projects to change their default license from FDL to CC-BY-SA: the Free Software Foundation released FDL 1.3, which gave a time-bound permission for mass collaboration sites to migrate to CC-BY-SA. While not particularly pertinent to copyleft policy, it’s worth noting for anyone wishing to study old versions in depth that 3.0 is the first version to substantially alter the text of most of the license, motivated largely by making the text use less U.S.-centric legal language. The 3.0 text is also considerably longer than previous versions.

4.0 (2013-11-25)

CC-BY-SA-4.0 added to 3.0’s external compatibility declaration mechanism by allowing one-way compatibility. After release of CC-BY-SA-4.0 bilateral compatibility was reached with FAL-1.3. As previously mentioned, one-way compatibility with GPLv3 will soon be discussed.

4.0 also made a subtle change in that an adaptation may be considered to be licensed solely under the adapter’s license (currently CC-BY-SA-4.0 or FAL-1.3, in the future potentially GPLv3 or or a hypothetical CC-BY-SA-5.0). In previous versions licenses were deemed to “stack” – if a work under CC-BY-SA-2.0 were adapted and released under CC-BY-SA-3.0, users of the adaptation would need to comply with both licenses. In practice this is an academic distinction, as compliance with any compatible license would tend to mean compliance with the original license. But for a licensee using a large number of works that wished to be extremely rigorous, this would be a large burden, for it would mean understanding every license (including those of jurisdiction ports not in English) in detail.

The new version is also an even more complete rewrite of 3.0 than 3.0 was of previous versions, completing the “internationalization” of the license, and actually decreasing in length and increasing in readability.

Additionally, 4.0 consistently treats database (licensing them like other copyright-like rights) and moral rights (waiving them to the extent necessary to exercise granted freedoms) – in previous versions some jurisdiction ports treated these differently – and tentatively eliminates the need for jurisdiction ports. Official linguistic translations are underway (Finnish is the first completed) and no legal ports are planned for.

4.0 is the first version to explicitly exclude a patent (and less problematically, trademark) license. It also adds two features akin to those found in GPLv3: waiver of any right licensor may have to enforce anti-circumvention if DRM is applied to the work, and reinstatement of rights after termination if non-compliance corrected within 30 days.

Finally, 4.0 streamlines the attribution requirement, possibly of some advantage to massive long-term collaborations which historically have found copyleft licenses a good fit.

The 4.0 versioning process was much more extensively researched, public, and documented than previous CC-BY-SA versionings; see for the record and for a summary of final decisions.

CC-BY-SA-4.0 International section-by-section

FIXME arguably this section ought be the substance of the tutorial, but is very thin and weak now

FIXME formatted/section-referenced copy of license should be added to license-texts.tex and referenced throughout

The best course of action at this juncture would be to read – the entire text is fairly easy to read, and should be quickly understood if one has the benefit of study of other public licenses and of copyleft policy.

The following walk-through will simply call out portions of each section one may wish to study especially closely due to their pertinence to copyleft policy issues mentioned above.

FIXME subsections might not be the right structure or formatting here

1 – Definitions

The first three definitions – “Adapted Material”, “Adapter’s License”, and “BY-SA Compatible License” are crucial to understanding copyleft scope and compatibility.

2 – Scope

The license grant is what makes all four freedoms available to licensees. This section is also where waiver of DRM anti-circumvention is to be found, also patent and trademark exclusions.

3 – License Conditions

This section contains the details of the attribution and share-alike requirements; the latter read closely with aforementioned definitions describe the copyleft aspect of CC-BY-SA-4.0.

4 – Sui Generis Database Rights

This section describes how the previous grant and condition sections apply in the case of a database subject to sui generis database rights. This is an opportunity to go down a rabbit-hole of trying to understand sui generis database rights. Generally, this is a pointless exercise. You can comply with the license in the same way you would if the work were subject only to copyright – and determining whether a database is subject to copyright and/or sui generis database rights is another pit of futility. You can license databases under CC-BY-SA-4.0 and use databases subject to the same license as if they were any other sort of work.

5 – Disclaimer of Warranties and Limitation of Liability

Unsurprisingly, this section does its best to serve as an “absolute disclaimer and waiver of all liability.”

6 – Term and Termination

This section is similar to GPLv3, but without special provision for cases in which the licensor wishes to terminate even cured violations.

7 – Other Terms and Conditions

Though it uses different language, like the GPL, CC-BY-SA-4.0 does not allow additional restrictions not contained in the license. Unlike the GPL, CC-BY-SA-4.0 does not have an explicit additional permissions framework, although effectively a licensor can offer any other terms if they are the sole copyright holder (the license is non-exclusive), including the sorts of permissions that would be structured as additional permissions with the GPL. Creative Commons has sometimes called offering of separate terms (whether additional permissions or “proprietary relicensing”) the confusing name “CC+”; however where this is encountered at all it is usually in conjunction with one of the non-free CC licenses. Perhaps CC-BY-SA is not a strong enough copyleft to sometimes require additional permissions, or be used to gain commercially valuable asymmetric rights, in contrast with the GPL.

8 – Interpretation

Nothing surprising here. Note that CC-BY-SA does not “reduce, limit, restrict, or impose conditions on any use of the Licensed Material that could lawfully be made without permission under this Public License.” This is a point that Creative Commons has always been eager to make about all of its licenses. GPLv3 also “acknowledges your rights of fair use or other equivalent”. This may be a wise strategy, but should not be viewed as mandatory for any copyleft license – indeed, the ODbL attempts (somewhat self-contradictorily; it also acknowledges fair use or other rights to use) make its conditions apply even for works potentially subject to neither copyright nor sui generis database rights.


There are only a small number of court cases involving any Creative Commons license. Creative Commons lists these and some related cases at

Only two of those cases concern enforcing the terms of a CC-BY-SA license (Gerlach v. DVU in Germany, and No. 71036 N. v. Newspaper in a private Rabbinical tribunal) each hinged on attribution, not share-alike.

Further research could uncover out of compliance uses being brought into compliance without lawsuit, however no such research, nor any hub for conducting such compliance work, is known. Editors of Wikimedia Commons document some external uses of Commons-hosted media, including whether user are compliant with the relevant license for the media (often CC-BY-SA), resulting in a category listing non-compliant uses (which seem to almost exclusively concern attribution).

Compliance Resources

FIXME this section is just a stub; ideally there would also be an additional section or chapter on CC-BY-SA compliance

Creative Commons has a page on ShareAlike interpretation as well as an extensive Frequently Asked Questions for licensees which addresses compliance with the attribution condition.

English Wikipedia’s and Wikimedia Commons’ pages on using material outside of Wikimedia projects provide valuable information, as the majority of material on those sites is CC-BY-SA licensed, and their practices are high-profile.

FIXME there is no section on business use of CC-BY-SA; there probably ought to be as there is one for GPL, though there’d be much less to put.

Wikidata II

Thursday, October 30th, 2014

Wikidata went live two years ago, but the II in the title is also a reference to the first page called Wikidata on which for years collected ideas for first class data support in Wikipedia. I had linked to Wikidata I writing about the most prominent of those ideas, Semantic MediaWiki (SMW), which I later (8 years ago) called the most important software project and said would “turn the universal encyclopedia into the universal database while simultaneously improving the quality of the encyclopedia.”

SMW was and is very interesting and useful on some wikis, but turned out to be not revolutionary (the bigger story is wikis turned out to be not revolutionary, or only revolutionary on a small scale, except for Wikipedia) and not quite a fit for Wikipedia and its sibling projects. While I’d temper “most” and “universal” now (and should have 8 years ago), the actual Wikidata project (created by many of the same people who created SMW) is rapidly fulfilling general wikidata hopes.

One “improving the encyclopedia” hope that Wikidata will substantially deliver on over the next couple years and that I only recently realized the importance of is increasing trans-linguistic collaboration and availability of the sum of knowledge in many languages — when facts are embedded in free text, adding, correcting, and making available facts happens on a one-language-at-a-time basis. When facts about a topic are in Wikidata, they can be exposed in every language so long as labels are translated, even if on many topics nothing has ever been written about in nor translated into many languages. Reasonator is a great demonstrator.

Happy 2nd to all Wikidatians and Wikidata, by far the most important project for realizing Wikimedia’s vision. You can and should edit the data and edit and translate the schema. Browse Wikidata WikiProjects to find others working to describe topics of interest to you. I imagine some readers of this blog might be interested in WikiProjects Source MetaData (for citations) and Structured Data for Commons (the media repository).

For folks concerned about intellectual parasites, Wikidata has done the right thing — all data dedicated to the public domain with CC0.

Non-citizens should decide elections

Monday, October 27th, 2014

Do non-citizens vote in U.S. elections? (tax funded but $19.95 to read; how can that be good for democratic discourse?) and Washington Post post by two of the paper’s authors Could non-citizens decide the November election? Yes and yes — assuming pertinent elections are very close and we take citizen votes as a given. Most interesting:

Unlike other populations, including naturalized citizens, education is not associated with higher participation among non-citizens. In 2008, non-citizens with less than a college degree were significantly more likely to cast a validated vote, and no non-citizens with a college degree or higher cast a validated vote. This hints at a link between non-citizen voting and lack of awareness about legal barriers.

The authors suggest raising awareness of legal barriers might further reduce non-citizen voting. But non-citizen voting is not the problem that ought be addressed. Instead the problem is non-voting by educated non-citizens, whose input is lost. If we can begin to disentangle nationalism and democracy, clearly the former ought be discarded (it is after all the modern distillation of the worst tendencies of humanity) and franchise further expanded — a win whether treating democracy as a collective intelligence system (more diverse, more disinterested input) or as a collective representation/legitimacy system (non-citizens are also taxed, regulated, and killed).

Further expanding franchise presents challenges (I went over some of them previously in a post on extra-jurisdictional voting), but so does enforcing the status quo. Anyone not in the grip of nationalism or with a commitment to democracy ought want to meet any challenges faced by expanded franchise, not help enforce the status quo, even by means of “soft” informational campaigns.


Thursday, October 2nd, 2014

The first wiki[pedia]2journal article has been published: Dengue fever Wikipedia article, peer-reviewed version (PDF). Modern medicine comes online: How putting Wikipedia articles through a medical journal’s traditional process can put free, reliable information into as many hands as possible is the accompanying editorial (emphasis added):

As a source of clinical information, how does Wikipedia differ from UpToDate or, for that matter, a textbook or scholarly journal? Wikipedia lacks three main things. First, a single responsible author, typically with a recognized academic affiliation, who acts as guarantor of the integrity of the work. Second, the careful eye of a trained editorial team, attuned to publication ethics, who ensure consistency and accuracy through the many iterations of an article from submission to publication. Third, formal peer review by at least one, and often many, experts who point out conflicts, errors, redundancies, or gaps. These form an accepted ground from which publication decisions can be made with confidence.

In this issue of Open Medicine, we are pleased to publish the first formally peer-reviewed and edited Wikipedia article. The clinical topic is dengue fever. It has been submitted by the author who has made the most changes, and who has designated 3 others who contributed most meaningfully. It has been peer reviewed by international experts in infectious disease, and by a series of editors at Open Medicine. It has been copy-edited and proofread; once published, it will be indexed in MEDLINE. Although by the time this editorial is read the Wikipedia article will have changed many times, there will be a link on the Wikipedia page that can take the viewer back to the peer-reviewed and published piece on the Open Medicine website. In a year’s time, the most responsible author will submit the changed piece to an indexed journal, so it can move through the same editorial process and continue to function as a valid, reliable, and evolving free and complete reference for everyone in the world. Although there may be a need for shorter, more focused clinical articles published elsewhere as this one expands, it is anticipated that the Wikipedia page on dengue will be a reference against which all others can be compared. While it might be decades before we see an end to dengue, perhaps the time and money saved on exhaustive, expensive, and redundant searches about what yet needs to be done will let us see that end sooner.

I love that this is taking Wikipedia and commons-based peer production into a challenging product area, which if wildly successful, could directly challenge and ultimately destroy the proprietary competition. The editorial notes:

Some institutions pay UpToDate hundreds of thousands of dollars per year for that sense of security. This has allowed Wolters Kluwer, the owners of UpToDate, to accrue annual revenues of hundreds of millions of dollars and to forecast continued double-digit growth as “market conditions for print journals and books … remain soft.”

See the WikiProject Medicine collaborative publication page for more background on the process and future developments. Note at least 7 articles have been published in journal2wiki[pedia] fashion, see PLOS Computational Biology and corresponding Wikipedia articles. Ideally these 2 methods would converge on wiki↔journal, as the emphasized portion of the quote above seems to indicate.

Peer review of Wikipedia articles and publication in another venue in theory could minimize dependencies and maximize mutual benefit between expert authoring (which has historically failed in the wiki context, see Nupedia and Citizendium) and mass collaboration (see challenges noted by editorial above). But one such article only demonstrates the concept; we’ll see whether it becomes an important method, let alone market dominating one.

One small but embarrassing obstacle to wiki↔journal is license incompatibility. PLOS journals use CC-BY-4.0 (donor-only relative to following; the version isn’t important for this one) and Wikipedia CC-BY-SA-3.0 (recipient-only relative to previous…and following) and Open Medicine CC-BY-SA-2.5-Canada (donor-only relative to the immediately previous) — meaning if all contributors to the Dengue fever Wikipedia article did not sign off, the journal version is technically not in compliance with the upstream license. Clearly nobody should care about this second issue, except for license stewards, who should mitigate the problem going forward: all previous versions (2.0 or greater due to lack of a “later versions” provision in 1.0) of CC-BY-SA should be added to CC-BY-SA-4.0’s compatibility list, allowing contributions to go both ways. The first issue unfortunately cannot be addressed within the framework of current licenses (bidirectional use could be avoided, or contributors could all sign off, either of which would be outside the license framework).

Daniel Mietchen (who is a contributor to the aforementioned journal2wiki effort, and just about everything else relating to Wikipedia and Open Access) has another version of his proposal to open up research funding proposals up at the Knight News Challenge: Libraries site. Applaud and comment there if you like, as I do (endorsement of previous version).

Near the beginning of the above editorial:

New evidence pours in to the tune of 12 systematic reviews per day, and accumulating the information and then deciding how to incorporate it into one’s practice is an almost impossible task. A study published in BMJ showed that if one hoped to take account of all that has been published in the relatively small discipline of echocardiography, it would take 5 years of constant reading—by which point the reader would be a year behind.

A similar avalanche of publishing can be found in any academic discipline. It is conceivable that copyright helps, providing an incentive for services like UpToDate. My guess is that it gets in the way, both by propping up arrangements oriented toward pumping out individual articles, and by putting up barriers (the public license incompatibility mentioned above is inconsequential compared to the paywalled, umitigated copyright, and/or PDF-only case which dominates) to collaborative — human and machine — distillation of the states of the art. As I wrote about entertainment, do not pay copyright holders, for a good future.