Post Open Access

OA mandate, FLOSS contrast

Friday, February 22nd, 2013

The Obama administration:

has directed Federal agencies with more than $100M in R&D expenditures to develop plans to make the published results of federally funded research freely available to the public within one year of publication and requiring researchers to better account for and manage the digital data resulting from federally funded scientific research

A similar policy has been in place for NIH funded research for several years, and more are in the works around the world.

Peter Suber, as far as I can tell the preeminent chronicler of the Open Access (OA) movement, and one of its primary activists, seems to have the go-to summary post.

Congratulations and thanks to all OA activists. I want to take this particular milestone in order to make some exaggerated contrasts between OA and free/libre/open source software (FLOSS). I won’t bother with cultural, educational, and other variants, but assume they’re somewhere between and lagging overall.

  • OA is far more focused on end products (papers), FLOSS on modifiable forms (source)
  • OA is far more focused on gratis access (available on-line at no cost), FLOSS on removing legal restrictions (via public licenses)
  • OA has a fairly broad conception of info governance, FLOSS focused on class of public licenses, selection within that class
  • OA is far more focused on public and institutional policy (eg mandates like today’s), FLOSS on individual developer and user choices
  • OA is more focused on global ethics (eg access to knowledge in poor regions), FLOSS on individual developer and user ethics

If you’ve followed either movement you can think of exceptions. I suspect the above generalizations are correct as such, but tell me I’m wrong.

Career arrangements are an obvious motivator of some of these differences: science more institutional and tracked, less varied relative to programming. Thus where acting on individual ethics alone with regard to publishing is often characterized as suicidal for a scientist, it is welcome, but not extraordinary nor a cause for concern for a programmer. At the same time, FLOSS people might overestimate the effectiveness of individual choices, merely because they are relatively easy to make and expressive.

One can imagine a universe in which facts are different enough that the characteristics of movements for something like open research and software are reversed, eg no giant institutions and centralized funding, but radical individual ethics for science, dominance of amazing mainframes and push for software escrow for programming. Maybe our universe isn’t that bad, eh?

I do not claim one approach is superior to the other. Indeed I think there’s plenty each can learn from the other. Tip-of-the-iceberg examples: I appreciate those making FLOSS-like demands of OA, think those working on government and institutional policy in FLOSS should be appreciated much more, and the global ethical dimension of FLOSS, in particular with regard to A2K-like equality implications, badly needs to be articulated.

Beyond much needed learning and copying of strategies, some of those involved in OA and FLOSS (and that in between and lagging) might better appreciate each others’ objectives, their commonalities, and actively collaborate. All ignore computational dominance of everything at their peril, and software people self-limit, self-marginalize, even self-refute by limiting their ethics and action to software.

“Commoning the noosphere” sounds anachronistic, but is yet to be, and I suspect involves much more than a superset of OA and FLOSS strategy and critique.

Open Knowledge Foundation

Wednesday, February 13th, 2013

I used to privately poke fun at the Open Knowledge Foundation for what seemed like a never-ending stream of half-baked projects (and domains, websites, lists, etc). I was wrong.

(I have also criticized OKF’s creation of a database-specific copyleft license, but recognize its existence is mostly Creative Commons’ fault, just as I criticize some of Creative Commons’ licenses but recognize that their existence is mostly due to a lack of vision on the part of free software activists.)

Some of those projects have become truly impressive (e.g. the Public Domain Review and CKAN, the latter being a “data portal” deployed by numerous governments in direct competition with proprietary “solutions”; hopefully my local government will eventually adopt the instance OpenOakland has set up). Some projects once deemed important seem relatively stagnant, but were way ahead of their time, if only because the non-software free/open universe painfully lags software (e.g. KnowledgeForge). I haven’t kept track of most OKF projects, but whichever ones haven’t succeeded wildly don’t seem to have caused overall problems.

Also, in the past couple years, OKF has sprouted local groups around the world.

Why has the OKF succeeded, despite what seemed to me for a time chaotic behavior?

  • It knows what it is doing. Not necessarily in terms of having a solid plan for every project it starts, but in the more fundamental sense of knowing what it is trying to accomplish, grounded by its own definition of what open knowledge is (unsurprisingly it is derived from the Open Source Definition). I’ve been on the advisory council for that definition for most of its existence, and this year I’m its chair. I wrote a post for the OKF blog today reiterating the foundational nature of the definition and its importance to the success of OKF and the many “open” movements in various fields.
  • It has been a lean organization, structured to be able to easily expand and contract in terms of paid workers, allowing it to pursue on-mission projects rather than be dominated by permanent institutional fundraising.
  • It seems to have mostly brought already committed open activists/doers into the organization and its projects.
  • The network (eg local groups) seems to have grown fairly organically, rather than from a top-down vision to create an umbrella that all would attach themselves toview with great skepticism.

OKF is far from perfect (in particular I think it is too detached from free/open source software, to the detriment of open data and reducing my confidence it will continue to say on a fully Open course — through action and recruitment — one of their more ironic practices at this moment is the Google map at the top of their local groups page [Update: already fixed, see comments]). But it is an excellent organization, at this point probably the single best connection to all things Open, irrespective of field or geography.

Check them out online, join or start a local group, and if you’re interested in the minutiae of of whether particular licenses for intended-to-be-open culture/data/education/government/research works are actually open, help me out with OKF’s OpenDefinition.org project.

Future of culture & IP & beating of books in San Jose, Thursday

Tuesday, November 13th, 2012

I’m looking forward to this “in conversation” event with artist Stephanie Syjuco. The ZERO1 Garage is a neat space, and Syjuco’s installation, FREE TEXTS: An Open Source Reading Room, is just right.

For background on my part of the conversation, perhaps read my bit on the future of copyright and my interview with Lewis Hyde, author of at least one of the treated FREE TEXTS (in the title of this post “beating of books” is a play on “beating of bounds”; see the interview, one of my favorite posts ever to the Creative Commons blog).

One of the things that makes FREE TEXTS just right is that “IP” makes for a cornucopia of irony (Irony Ponies for all?), and one of the specialty fruits therein is literature extolling the commons and free expression and problematizing copyright … subject to unmitigated copyright and expensive in time and/or money to access, let alone modify.

Even when a text is in-theory offered under a public license, thus mitigating copyright (but note, it is rare for any such mitigation to be offered), access to a digital copy is often frustrated, and access to a conveniently modified copy, almost unknown. The probability of these problems occurring reaches near certainty if a remotely traditional publisher is involved.

Two recent examples that I’m second-hand familiar with (I made small contributions). All chapters of Wealth of the Commons (Levellers Press, 2012) with the exception of one are released under the CC-BY-SA license. But only a paper version of the book is now available. I understand that digital copies (presumably for sale and gratis) will be available sometime next year. Some chapters are now available as HTML pages, including mine. The German version of the book (Transcript, 2012), published earlier this year with a slightly different selection of essays, is all CC-BY-SA and available in whole as a PDF, and some chapters as HTML pages, again including mine (but if one were to nitpick, the accompanying photo under CC-BY-NC-SA is incongruous).

The Social Media Reader (New York University Press, 2012) consists mostly of chapters under free licenses (CC-BY and CC-BY-SA) and a couple under CC-BY-NC-SA, with the collection under the last. Apparently it is selling well for such a book, but digital copies are only available with select university affiliation. Fortunately someone uploaded a PDF copy to the Internet Archive, as the licenses permit.

In the long run, these can be just annoyances and make-work, at least to the extent the books consist of material under free licenses. Free-as-in-freedom does not have to mean free-as-in-price. Even without any copyright mitigation, it’s common for digital books to be made available in various places, as FREE TEXTS highlights. Under free licenses, it becomes feasible for people to openly collaborate to make improved, modifiable, annotatable versions available in various formats. This is currently done for select books at Wikibooks (educational, neutral point of view, not original research) and Wikisource (historically significant). I don’t know of a community for this sort of work on other classes of books, but I’d be happy to hear of such, and may eventually have to start doing it if not. Obvious candidate platforms include Mediawiki, Booktype, and source-repository-per-book.

You can register for the event (gratis) in order to help determine seating and refreshments. I expect the conversation to be considerably more wide ranging than the above might imply!

CODATA

Saturday, November 10th, 2012

Last week I attended CODATA 2012 in Taipei, the biannual conference of the Committee on Data for Science and Technology. I struggled a bit with deciding to go — I am not a “data scientist” nor a scientist and while I know a fair amount about some of the technical and policy issues for data management, specific application to science has never been my expertise, all away from my current focus, and I’m skeptical of travel.

I finally went in order to see through a session on mass collaboration data projects and policies that I developed with Tyng-Ruey Chuang and Shun-Ling Chen. A mere rationalization as they didn’t really need my presence, but I enjoyed the conference and trip anyway.

My favorite moments from the panel:

  • Mikel Maron said approximately “not only don’t silo your data, don’t silo your code” (see a corresponding bullet in his slides), a point woefully and consistently underestimated and ignored by “open” advocates.
  • Chen’s eloquent polemic closing with approximately “mass collaboration challenges not only Ⓒ but distribution of power, authority, credibility”; I hope she publishes her talk content!

My slides from the panel (odp, pdf, slideshare) and from an open data workshop following the conference (odp, pdf, slideshare).

Tracey Lauriault summarized the mass collaboration panel (all of it, check out the parts I do not mention), including:

Mike Linksvayer, was provocative in stating that copyright makes us stupider and is stupid and that it should be abolished all together. I argued that for traditional knowledge where people are seriously marginalized and where TK is exploited, copyright might be the only way to protect themselves.

I’m pretty sure I only claimed that including copyright in one’s thinking about any topic, e.g., data policy, effectively makes one’s thinking about that topic more muddled and indeed stupid. I’ve posted about this before but consider a post enumerating the ways copyright makes people stupid individually and collectively forthcoming.

I didn’t say anything about abolishing copyright, but I’m happy for that conclusion to be drawn — I’d be even happier for the conclusion to be drawn that abolition is a moderate reform and boring (in no-brainer and non-interesting senses) among the possibilities for information and innovation policies — indeed, copyright has made society stupid about these broader issues. I sort of make these points in my future of copyright piece that Lauriault linked to, but will eventually address them directly.

Also, Traditional Knowledge, about which I’ve never posted unless you count my claim that malgovernance of the information commons is ancient, for example cult secrets (mentioned in first paragraph of previous link), though I didn’t have contemporary indigenous peoples in mind, and TK covers a wide range of issues. Indeed, my instinct is to divide these between issues where traditional communities are being excluded from their heritage (e.g., plant patents, institutionally-held data and items, perhaps copyrestricted cultural works building on traditional culture) and where they would like to have a collective right to exclude information from the global public domain.

The theme of CODATA 2012 was “Open Data and Information for a Changing Planet” and the closing plenary appropriately aimed to place the entire conference in that context, and question its impact and followup. That included the inevitable asking whether anyone would notice. At the beginning of the conference attendees were excitedly encouraged to tweet, and if I understood correctly, there were some conference staff to be dedicated to helping people tweet. As usual, I find this sort of exhortation and dedication of resources to social media scary. But what about journalists? How can we make the media care?

Fortunately for (future) CODATA and other science and data related events, there’s a great answer (usually there isn’t one), but one I didn’t hear mentioned at all outside of my own presentation: invite data journalists. They could learn a lot from other attendees, have a meta story about exactly the topic they’re passionate about, and an inside track on general interest data-driven stories developing from data-driven science in a variety of fields — for example the conference featured a number of sessions on disaster data. Usual CODATA science and policy attendees would probably also learn a lot about how to make their work interesting for data journalists, and thus be able to celebrate rather than whinge when talking about media. A start on that learning, and maybe ideas for people to invite might come from The Data Journalism Handbook (disclaimer: I contributed what I hope is the least relevant chapter in the whole book).

Someone asked how to move forward and David Carlson gave some conceptually simple and very good advice, paraphrased:

  • Adopt an open access data publishing policy at the inception of a project.
  • Invest in data staff — human resources are the limiting factor.
  • Start publishing and doing small experiments with data very early in a project’s life.

Someone also asked about “citizen science”, to which Carlson also had a good answer (added to by Jane Hunter and perhaps others), in sum roughly:

  • Community monitoring (data collection) may be a more accurate name for part of what people call citizen science;
  • but the community should be involved in many more aspects of some projects, up to governance;
  • don’t assume “citizen scientists” are non-scientists: often they’ll have scientific training, sometimes full-time scientists contributing to projects outside of work.

To bring this full circle (and very much aligned with the conference’s theme and Carlson’s first recommendation above) would have been consideration of scientist-as-citizen. Fortunately I had serendipitously titled my “open data workshop” presentation for the next day “Open data policy for scientists as citizens and for citizen science”.

Finally, “data citation” was another major topic of the conference, but semantic web/linked open data not explicitly mentioned much, as observed by someone in the plenary. I tend to agree, but may have missed the most relevant sessions, though they may have been my focus if I was actually working in the field. I did really enjoy happening to sit next to Curt Tilmes at a dinner, and catching up a bit on W3C Provenance (I’ve mentioned briefly before) of which he is a working group member.

I got to spend a little time outside the conference. I’d been to Taipei once before, but failed to notice its beautiful setting — surrounded and interspersed with steep and very green hills.

I visited National Palace Museum with Puneet Kishor. I know next to nothing about feng shui, but I was struck by what seemed to be an ultra-favorable setting (and made me think of feng shui, which I never have before in my life, without someone else bringing it up) taking advantage of some of the aforementioned hills. I think the more one knows about Chinese history the more one would get out of the museum, but for someone who loves maps, the map room alone is worth the visit.

It was also fun hanging out a bit with Christopher Adams and Sophie Chiang, catching up with Bob Chao and seeing the booming Mozilla Taiwan offices, and meeting Florence Ko, Lucien Lin, and Rock of Open Source Software Foundry and Emily from Creative Commons Taiwan.

Finally, thanks to Tyng-Ruey Chuang, one of the main CODATA 2012 local organizers, and instigator of our session and workshop. He is one of the people I most enjoyed working with while at Creative Commons (e.g., a panel from last year) and given some overlapping technology and policy interests, one of the people I could most see working with again.

Open Data nuance

Sunday, October 7th, 2012

I’m very mildly annoyed with some discussion of “open data”, in part where it is an amorphous thing for which expectations must be managed, value found and sustainable business models, perhaps marketplaces, invented, all with an abstract and tangential relationship to software, or “IT”.

All of this was evident at a recent Open Knowledge Foundation meetup at the Wikimedia Foundation offices — but perhaps only evident to me, and I do not really intend to criticize anyone there. Their projects are all great. Nonetheless, I think very general discussion about open data tends to be very suboptimal, even among experts. Perhaps this just means general discussion is suboptimal, except as an excuse for socializing. But I am more comfortable enumerating peeves than I am socializing:

  • “Open” and “data” should sometimes be considered separately. “Open” (as in anyone can use for any purpose, as opposed to facing possible legal threat from copyright, database, patent and other “owners”, even their own governments, and their enforcement apparatuses) is only an expensive policy choice if pursued at too low a level, where rational ignorance and a desire to maintain every form of control and conceivable revenue stream rule. Regardless of “open” policy, or lack thereof, any particular dataset might be worthwhile, or not. But this is the most minor of my annoyances. It is even counterproductive to consider, most of the time — due to the expense of overcoming rational ignorance about “open” policy, and of evaluating any particular dataset, it probably makes a lot of sense to bundle “open data” and agitate for as much data to be made available under as good of practices as possible, and manage expectations when necessary.
  • To minimize the need to make expensive evaluations and compromises, open data needs to be cheap, preferably a side-effect of business as usual. Cheapness requires automation requires software requires open source software, otherwise “open data” institutions are themselves not transparent, are hostage to “enterprise software” companies, and are severely constrained in their ability to help each other, and to be helped by their publics. I don’t think an agitation approach is optimal (I recently attended an OpenOakland meeting, and one of the leaders said something like “we don’t hate proprietary software, but we do love open source”, which seems reasonable) but I am annoyed nevertheless by the lack of priority and acknowledgement given to software by “open data” (and even moreso, open access/content/education/etc) folk in general, strategic discussions (but, in action the Open Knowledge Foundation is better, having hatched valuable open source projects needed for open data). Computation rules all!
  • A “data marketplace” should not be the first suggestion, or even metaphor, for how to realize value from open data — especially not in the offices of the Wikimedia Foundation. Instead, mass collaboration.
  • Open data is neither necessary nor sufficient for better governance. Human institutions (inclusive of “public”, “private”, or any other categorization you like) have been well governed and atrociously governed throughout recorded history. Open data is just another mechanism that in some cases might make a bit of a difference. Another tool. But speaking of managing expectations, one should expect and demand good governance, or at least less atrocity, from our institutions, completely independent of open data!

“Nuance” is a vague joke in lieu of a useful title.

altmetrics moneyball

Sunday, March 4th, 2012

I read in 2004 and recall two things: a baseball team used statistics about individual contribution to team winning to build a winning team on the cheap (other teams at the time were selecting players based on gut feel and statistics not well correlated to team success; players that looked good in person to baseball insiders or looked good on paper to a naive analysis were overpaid) and some trivia about fan gathered versus team or league controlled game data.

The key thing for this post is that it was a team that was able to exploit better metrics, and profit (or at least win).

Just as many baseball enthusiasts were dissatisfied with simple player metrics like home runs and steals, and searched for metrics better correlated with team success (), many academia enthusiasts are dissatisfied with simple academic metrics like (actually all based upon) number of citations to journal articles, and are looking for metrics better correlated with an academic’s contribution to knowledge (altmetrics).

Among other things, altmetrics could lead researchers to spend time doing things that might be more useful than publishing journal articles, bring people into research who are good at doing these other things (creation of useful datasets is often given as an example) but not writing journal articles, and help break the lockin and extraordinary profits enjoyed by a few legacy publishers. Without altmetrics, these things are happening only very slowly as career advancement in academia is currently highly dependent on journal article citation metrics.

As far as I can tell, altmetrics are in their infancy at best: nobody knows how to measure contribution to knowledge, let alone innovation, and baseball enthusiasts faced a much, much more constrained problem: contribution to winning baseball games. But, given that so little is known, and current metrics so obviously inadequate and adhered to, some academics who do well on journal article citation metrics are vastly over-recruited and overpaid, while many academics and would-be-academics who don’t, aren’t. This ought mean there could be big wins from relatively crude improvements.

Who should gamble on potential crude improvements over journal article citation metrics? Entities that hire academics, in particular universities, perhaps even more particularly ones that are in the “big leagues” (considered a research university?) but nowhere near the top, and without cash to recruit superstars per gut feel or journal article citation metrics. I vaguely understand that universities make huge, conscious, expensive efforts to create superstar departments. Nearly all universities aren’t Columbia hoping to spend their way to recruiting superstars from Harvard and Princeton. Instead, why not make a long-term gamble on some plausible altmetric? At best, such a university or department will greatly outperform over the next decade and get credit beyond that for pioneering new practices that everyone copies. At worst, such a university or department will remain mediocre and perhaps slip a bit over the next decade, and get a bit of skeptical press about the effort (if made public). The benefits to society from such experimentation could be large.

Are there universities or departments pursuing such a strategy? I am in no position to know. I did search a bit and came up with What do the social sciences have in common with baseball? The story of Moneyball, academic impact, and quantitative methods. I’m pretty sure the author is writing about hiring social scientists who specialize in quantitative methods, not hiring social scientists based on speculative quantitative methods. What about universities outside wealthy jurisdictions?

Speaking of baseball players and academics, just yesterday I realized that academics have the equivalent of player statistics pages when I discovered my brother’s (my only close relative in academia, as far as I know) via his new site. I’ll have to ask him how he feels about giving such a public performance performance. My first reaction is that it is good for society. Such would be good for more professions — for most we have not conventional metrics like home runs or citations, needing improvement, but zero metrics, only gut feel or worse. Lots of fields, employment and otherwise, are ripe for disruption.

Addendum: Another view is that metrics lead to bad outcomes and that rather than using more sophisticated metrics, academia should become more like most employment and shun metrics altogether, hiring purely based on gut feel, and that other fields should continue as before, and fight any encroachment of metrics. Of course these theories may also be experimented with on a team-by-university-by-organization basis.

Black March→Freedom March

Wednesday, February 29th, 2012

ASCAP/BMI

In 1939 and 1940, the American Society of Composers, Authors and Publishers (ASCAP) greatly increased its licensing fees. Broadcasters for a time played only music in the public domain and that licensed from Broadcast Music, Inc. (BMI), a competitor to ASCAP they set up. ASCAP’s monopoly was broken, some genres that had been ignored obtained airplay. I’ve also seen this described as a failed ASCAP boycott of the broadcasters. I have not read beyond sketches to know the best characterization, but there were a small enough number of entities on both sides that either or both could hold out, and effectively “boycott” for a higher or lower price.

Open Access

A new pledge to not do one or more of publishing in, reviewing for, or doing editorial work for journals published by Elsevier has gotten a fair amount of notice. 7,671 researchers have signed, which has probably already led to some Elsevier concessions and a drop in their share price.

Academics are not nearly as concentrated as U.S. radio broadcasting in 1940, but hopefully, and just possibly this boycott will lead to lasting change (the share analyst quoted in link above does not think so). But pledges to not contribute to non-Open Access journals are nothing new — 34,000 scientists (pdf; has anyone counted how many have stuck with the publication part of the pledge?) signed one in 2001

but the publishing landscape remained largely unchanged until PLoS became a publisher itself to affect change. PLoS therefore reinvented itself as a publisher in 2003 to show how open access publishing could work.

Black March

Copied from black-march.com, but of unknown provenance/Anonymous:

With the continuing campaigns for Internet-censoring litigation such as SOPA and PIPA, and the closure of sites such as Megaupload under allegations of ‘piracy’ and ‘conspiracy’, the time has come to take a stand against music, film and media companies’ lobbyists.

The only way to hit them where it truly hurts… Their profit margins.

Do not buy a single record. Do not download a single song, legally or illegally. Do not go to see a single film in cinemas, or download a copy. Do not buy a DVD in the stores. Do not buy a videogame. Do not buy a single book or magazine.

Wait the 4 weeks to buy them in April, see the film later, etc. Holding out for just 4 weeks will lave a gaping hole in the media and entertainment companies’ profits for the 1st quarter. An economic hit which will in turn be observed by governments worldwide as stocks and shares will blip from a large enough loss of incomes.

This action can give a statement of intent:
“We will not tolerate the Media Industries’ lobbying for legislation which will censor the internet.”

Nice sentiment. Not purely a tiresome rearguard action. But I don’t see how it can conceivably make a noticeable impact on copyright industry profit margins. Getting a fair number of people to contact their elected representative is noticeable, as usually few do it. A significant proportion of the world’s population pays something to the copyright industries. To make the stated difference, a much larger number of people would have to participate than have in SOPA and ACTA protests, and the participation would require a relatively sustained behavior change, not a few clicks.

Still, perhaps “Black March” will be useful as a consciousness-raising exercise; but of what?

Freedom March

I’ve seen some suggest (especially in Spanish, as the linked post is [Update 20120304: English translation]) that the “what” needs to include making, using, and sharing free works. I agree.

Encyclopedia of Original Research

Thursday, December 15th, 2011

As I’m prone to say that some free/libre/open projects ought strive to not merely recapitulate existing production methods and products (so as to sometimes create something much better), I have to support and critique such strivings.

A proposal for the Encyclopedia of Original Research, besides a name that I find most excellent, seems like just such a project. The idea, if I understand correctly, is to leverage Open Access literature and using both machine- and wiki-techniques, create always-up-to-date reviews of the state of research in any field, broad or narrow. If wildly successful, such a mechanism could nudge the end-product of research from usually instantly stale, inaccessible (multiple ways), unread, untested, singular, and generally useless dead-tree-oriented outputs toward more accessible, exploitable, testable, queryable, consensus outputs. In other words, explode the category of “scientific publication”.

Another name for the project is “Beethoven’s open repository of research” — watch the video.

The project is running a crowdfunding campaign right now. They only have a few hours left and far from their goal, but I’m pretty sure the platform they’re using does not require projects to meet a threshold in order to obtain pledges, and it looks like a small amount would help continue to work and apply for other funding (eminently doable in my estimation; if I can help I will). I encourage kicking in some funds if you read this in the next couple hours, and I’ll update this post with other ways to help in the future if you’re reading later, as in all probability you are.

EoOR is considerably more radical than (and probably complementary to and/or ought consume) AcaWiki, a project I’ve written about previously with the more limited aim to create human-readable summaries of academic papers and reviews. It also looks like, if realized, a platform that projects with more specific aims, like OpenCures, could leverage.

Somehow EoOR escaped my attention (or more likely, my memory) until now. It seems the proposal was developed as part of a class on getting your Creative Commons project funded, which I think I can claim credit for getting funded (Jonas Öberg was very convincing; the idea for and execution of the class are his).

Collaborative Futures 4

Friday, January 22nd, 2010

Day 4 of the Collaborative Futures book sprint and I added yet another chapter intended for the “future” section, current draft copied below. I’m probably least happy with it, but perhaps I’m just tired. I hope it gets a good edit, but today (day 5) is the final day and we have lots to wrap up!

(Boring permissions note: I’m blogging whole chapter drafts before anyone else touches them, so they’re in the public domain like everything else original here. The book is licensed under CC BY-SA and many of the chapters, particularly in the first half of the book, have had multiple authors pretty much from the start.)

Another observation about the core sprint group of 5 writers, 1 facilitator, and 1 developer: although the sprint is hosted in Berlin, there are no Germans. However, there are three people living in Berlin (from Ireland, Spain, and New Zealand), two living in New York (one from there, another from Israel), one living in and from Croatia, and me, from Illinois and living in California.

I hope to squeeze in a bit of writing about postnationalism and collaboration today — hat tip to Mushon Zer-Aviv. Also see his day 4 post, and Postnational.org, one of his projects.

Beyond Education

Education has a complicated history, including swings between decentralization, e.g., loose associations of students and teachers typifying some early European universities such as Oxford, to centralized control by the state or church. It’s easy to imagine that in some of these cases teachers had great freedom to collaborate with each other or that learning might be a collaboration among students and teacher, while in others, teachers would be told what to teach, and students would learn that, with little opportunity for collaboration.

Our current and unprecedented wealth has brought near universal literacy and enrollment in primary education in many societies and created impressive research universities and increasing enrollment in university and and graduate programs. This apparent success masks that we are in an age of centralized control, driven by standards politically determined at the level of large jurisdictions and a model in which teachers teach how to take tests and both students and teachers are consumers of educational materials created by large publishers. Current educational structures and practices do not take advantage of the possibilities offered by collaboration tools and methods and in some cases are in opposition to use of such tools.

Much as the disconnect between the technological ability to access and build upon and the political and economic reality of closed access in scientific publishing created the Open Access (OA) movement, the disconnect between what is possible and what is practiced in education has created collaborative responses.

Open Educational Resources

The Open Educational Resources (OER) movement encourages the availability of educational materials for free use and remixing — including textbooks and also any materials that facilitate learning. As in the case of OA, there is a strong push for materials to be published under liberal Creative Commons licenses and in formats amenable to reuse in order to maximize opportunities for latent collaboration, and in some cases to form the legal and technical basis for collaboration among large institutions.

OpenCourseWare (OCW) is the best known example of a large institutional collaboration in this space. Begun at MIT, over 200 universities and associated institutions have OCW programs, publishing course content and in many cases translating and reusing material from other OCW programs.

Connexions, hosted by Rice University, is an example of an OER platform facilitating large scale collaborative development and use of granular “course modules” which currently number over 15,000. The Connexions philosophy page is explicit about the role of collaboration in developing OER:

Connexions is an environment for collaboratively developing, freely sharing, and rapidly publishing scholarly content on the Web. Our Content Commons contains educational materials for everyone — from children to college students to professionals — organized in small modules that are easily connected into larger collections or courses. All content is free to use and reuse under the Creative Commons “attribution” license.

Content should be modular and non-linear
Most textbooks are a mass of information in linear format: one topic follows after another. However, our brains are not linear – we learn by making connections between new concepts and things we already know. Connexions mimics this by breaking down content into smaller chunks, called modules, that can be linked together and arranged in different ways. This lets students see the relationships both within and between topics and helps demonstrate that knowledge is naturally interconnected, not isolated into separate classes or books.
Sharing is good
Why re-invent the wheel? When people share their knowledge, they can select from the best ideas to create the most effective learning materials. The knowledge in Connexions can be shared and built upon by all because it is reusable:

  • technologically: we store content in XML, which ensures that it works on multiple computer platforms now and in the future.
  • legally: the Creative Commons open-content licenses make it easy for authors to share their work – allowing others to use and reuse it legally – while still getting recognition and attribution for their efforts.
  • educationally: we encourage authors to write each module to stand on its own so that others can easily use it in different courses and contexts. Connexions also allows instructors to customize content by overlaying their own set of links and annotations. Please take the Connexions Tour and see the many features in Connexions.
Collaboration is encouraged
Just as knowledge is interconnected, people don’t live in a vacuum. Connexions promotes communication between content creators and provides various means of collaboration. Collaboration helps knowledge grow more quickly, advancing the possibilities for new ideas from which we all benefit.

Connexions – Philosophy, CC BY, http://cnx.org/aboutus/

Beyond the institution

OER is not only used in an institutional context — it is especially a boon for self-learning. OCW materials are useful for self-learners, but OCW programs generally do not actively facilitate collaboration with self-learners. A platform like Connexions is more amenable to such collaboration, while wiki-based OER platforms have an even lower barrier to contribution that enable self-learners (and of course teachers and students in more traditional settings) to collaborate directly on the platform. Wiki-based OER platforms such as Wikiversity and WikiEducator make it even easier for learners and teachers in all settings to participate in the development and repurposing of educational materials.

Self-learning only goes so far. Why not apply the lessons of collaboration directly to the learning process, helping self-learners help each other? This is what a project called Peer 2 Peer University has set out to do:

The mission of P2PU is to leverage the power of the Internet and social software to enable communities of people to support learning for each other. P2PU combines open educational resources, structured courses, and recognition of knowledge/learning in order to offer high-quality low-cost education opportunities. It is run and governed by volunteers.

Scaling educational collaboration

As in the case of science, delivering the full impact of the possibilities of modern collaboration tools requires more than simply using the tools to create more resources. For the widest adoption, collaboratively created and curated materials must meet state-mandated standards and include accompanying assessment mechanisms.

While educational policy changes may be required, perhaps the best way for open education communities to convince policymakers to make these changes is to develop and adopt even more sophisticated collaboration tools, for example reputation systems for collaborators and quality metrics, collaborative filtering and other discovery mechanisms for educational materials. One example are “lenses” at Connexions (see http://cnx.org/lenses), which allow one to browse resources specifically endorsed by an organization or individual that one trusts.

Again, similar to science, clearing the external barriers to adoption of collaboration may result in general breakthroughs in collaboration tools and methods.

Collaborative Futures 3

Thursday, January 21st, 2010

Day 3 of the Collaborative Futures book sprint and we’re close to 20,000 words. I added another chapter intended for the “future” section, current draft copied below. It is very much a scattershot survey based on my paying partial attention for several years. There’s nothing remotely new apart from recording a favorite quote from my colleague John Wilbanks that doesn’t seem to have been written down before.

Continuing a tradition, another observation about the sprint group and its discussions: an obsession with attribution. A current drafts says attribution is “not only socially acceptable and morally correct, it is also intelligent.” People love talking about this and glomming on all kinds of other issues including participation and identity. I’m counter-obsessed (which Michael Mandiberg pointed out means I’m still obsessed).

Attribution is only interesting to me insofar as it is a side effect (and thus low cost) and adds non-moralistic value. In the ideal case, it is automated, as in the revision histories of wiki articles and version control systems. In the more common case, adding attribution information is a service to the reader — nevermind the author being attributed.

I’m also interested in attribution (and similar) metadata that can easily be copied with a work, making its use closer to automated — Creative Commons provides such metadata if a user choosing a license provides attribution information and CC license deeds use that metadata to provide copy&pastable attribution HTML, hopefully starting a beneficient cycle.

Admittedly I’ve also said many times that I think attribution, or rather requiring (or merely providing in the case of public domain content) attribution by link specifically, is an undersold term of the Creative Commons licenses — links are the currency of the web, and this is an easy way to say “please use my work and link to me!”

Mushon Zer-Aviv continues his tradition for day 3 of a funny and observant post, but note that he conflates attribution and licensing, perhaps to make a point:

The people in the room have quite strong feelings about concepts of attribution. What is pretty obvious by now is that both those who elevate the importance of proper crediting to the success of collaboration and those who dismiss it all together are both quite equally obsessed about it. The attribution we chose for the book is CC-BY-SA oh and maybe GPL too… Not sure… Actually, I guess I am not the most attribution obsessed guy in the room.

Science 2.0

Science is a prototypical example of collaboration, from closely coupled collaboration within a lab to the very loosely coupled collaboration of the grant scientific enterprise over centuries. However, science has been slow to adopt modern tools and methods for collaboration. Efforts to adopt or translate new tools and methods have been broadly (and loosely) characterized as “Science 2.0” and “Open Science”, very roughly corresponding to “Web 2.0” and “Open Source”.

Open Access (OA) publishing is an effort to remove a major barrier to distributed collaboration in science — the high price of journal articles, effectively limiting access to researchers affiliated with wealthy institutions. Access to Knowledge (A2K) emphasizes the equality and social justice aspects of opening access to the scientific literature.

The OA movement has met with substantial and increasing success recently. The Directory of Open Access Journals (see http://www.doaj.org) lists 4583 journals as of 2010-01-20. The Public Library of Science’s top journals are in the first tier of publications in their fields. Traditional publishers are investing in OA, such as Springer’s acquisition of large OA publisher BioMed Central, or experimenting with OA, for example Nature Precedings.

In the longer term OA may lead to improving the methods of scientific collaboration, eg peer review, and allowing new forms of meta-collaboration. An early example of the former is PLoS ONE, a rethinking of the journal as an electronic publication without a limitation on the number of articles published and with the addition of user rating and commenting. An example of the latter would be machine analysis and indexing of journal articles, potentially allowing all scientific literature to be treated as a database, and therefore queryable — at least all OA literature. These more sophisticated applications of OA often require not just access, but permission to redistribute and manipulate, thus a rapid movement to publication under a Creative Commons license that permits any use with attribution — a practice followed by both PLoS and BioMed Central.

Scientists have also adopted web tools to enhance collaboration within a working group as well as to facilitate distributed collaboration. Wikis and blogs have been purposed as as open lab notebooks under the rubric of “Open Notebook Science”. Connotea is a tagging platform (they call it “reference management”) for scientists. These tools help “scale up” and direct the scientific conversation, as explained by Michael Nielsen:

You can think of blogs as a way of scaling up scientific conversation, so that conversations can become widely distributed in both time and space. Instead of just a few people listening as Terry Tao muses aloud in the hall or the seminar room about the Navier-Stokes equations, why not have a few thousand talented people listen in? Why not enable the most insightful to contribute their insights back?

Stepping back, what tools like blogs, open notebooks and their descendants enable is filtered access to new sources of information, and to new conversation. The net result is a restructuring of expert attention. This is important because expert attention is the ultimate scarce resource in scientific research, and the more efficiently it can be allocated, the faster science can progress.

Michael Nielsen, “Doing science online”, http://michaelnielsen.org/blog/doing-science-online/

OA and adoption of web tools are only the first steps toward utilizing digital networks for scientific collaboration. Science is increasingly computational and data-intensive: access to a completed journal article may not contribute much to allowing other researcher’s to build upon one’s work — that requires publication of all code and data used during the research used to produce the paper. Publishing the entire “resarch compendium” under apprpriate terms (eg usually public domain for data, a free software license for software, and a liberal Creative Commons license for articles and other content) and in open formats has recently been called “reproducible research” — in computational fields, the publication of such a compendium gives other researches all of the tools they need to build upon one’s work.

Standards are also very important for enabling scientific collaboration, and not just coarse standards like RSS. The Semantic Web and in particular ontologies have sometimes been ridiculed by consumer web developers, but they are necessary for science. How can one treat the world’s scientific literature as a database if it isn’t possible to identify, for example, a specific chemical or gene, and agree on a name for the chemical or gene in question that different programs can use interoperably? The biological sciences have taken a lead in implementation of semantic technologies, from ontology development and semantic databsases to inline web page annotation using RDFa.

Of course all of science, even most of science, isn’t digital. Collaboration may require sharing of physical materials. But just as online stores make shopping easier, digital tools can make sharing of scientific materials easier. One example is the development of standardized Materials Transfer Agreements accompanied by web-based applications and metadata, potentially a vast improvement over the current choice between ad hoc sharing and highly bureaucratized distribution channels.

Somewhere between open science and business (both as in for-profit business and business as usual) is “Open Innovation” which refers to a collection of tools and methods for enabling more collaboration, for example crowdsourcing of research expertise (a company called InnoCentive is a leader here), patent pools, end-user innovation (documented especially by Erik von Hippel in Democratizing Innovation), and wisdom of the crowds methods such as prediction markets.

Reputation is an important question for many forms of collaboration, but particularly in science, where careers are determined primarily by one narrow metric of reputation — publication. If the above phenomena are to reach their full potential, they will have to be aligned with scientific career incentives. This means new reputation systems that take into account, for example, re-use of published data and code, and the impact of granular online contributions, must be developed and adopted.

From the grand scientific enterprise to business enterprise modern collaboration tools hold great promise for increasing the rate of discovery, which sounds prosaic, but may be our best tool for solving our most vexing problems. John Wilbanks, Vice President for Science at Creative Commons often makes the point like this: “We don’t have any idea how to solve cancer, so all we can do is increase the rate of discovery so as to increase the probability we’ll make a breakthrough.”

Science 2.0 also holds great promise for allowing the public to access current science, and even in some cases collaborate with professional researchers. The effort to apply modern collaboration tools to science may even increase the rate of discovery of innovations in collaboration!