Archive for October, 2014

Wikidata II

Thursday, October 30th, 2014


Wikidata went live two years ago, but the II in the title is also a reference to the first page called Wikidata on meta.wikimedia.org which for years collected ideas for first class data support in Wikipedia. I had linked to Wikidata I writing about the most prominent of those ideas, Semantic MediaWiki (SMW), which I later (8 years ago) called the most important software project and said would “turn the universal encyclopedia into the universal database while simultaneously improving the quality of the encyclopedia.”

SMW was and is very interesting and useful on some wikis, but turned out to be not revolutionary (the bigger story is wikis turned out to be not revolutionary, or only revolutionary on a small scale, except for Wikipedia) and not quite a fit for Wikipedia and its sibling projects. While I’d temper “most” and “universal” now (and should have 8 years ago), the actual Wikidata project (created by many of the same people who created SMW) is rapidly fulfilling general wikidata hopes.

One “improving the encyclopedia” hope that Wikidata will substantially deliver on over the next couple years and that I only recently realized the importance of is increasing trans-linguistic collaboration and availability of the sum of knowledge in many languages — when facts are embedded in free text, adding, correcting, and making available facts happens on a one-language-at-a-time basis. When facts about a topic are in Wikidata, they can be exposed in every language so long as labels are translated, even if on many topics nothing has ever been written about in nor translated into many languages. Reasonator is a great demonstrator.

Happy 2nd to all Wikidatians and Wikidata, by far the most important project for realizing Wikimedia’s vision. You can and should edit the data and edit and translate the schema. Browse Wikidata WikiProjects to find others working to describe topics of interest to you. I imagine some readers of this blog might be interested in WikiProjects Source MetaData (for citations) and Structured Data for Commons (the media repository).

For folks concerned about intellectual parasites, Wikidata has done the right thing — all data dedicated to the public domain with CC0.

Non-citizens should decide elections

Monday, October 27th, 2014

Do non-citizens vote in U.S. elections? (tax funded but $19.95 to read; how can that be good for democratic discourse?) and Washington Post post by two of the paper’s authors Could non-citizens decide the November election? Yes and yes — assuming pertinent elections are very close and we take citizen votes as a given. Most interesting:

Unlike other populations, including naturalized citizens, education is not associated with higher participation among non-citizens. In 2008, non-citizens with less than a college degree were significantly more likely to cast a validated vote, and no non-citizens with a college degree or higher cast a validated vote. This hints at a link between non-citizen voting and lack of awareness about legal barriers.

The authors suggest raising awareness of legal barriers might further reduce non-citizen voting. But non-citizen voting is not the problem that ought be addressed. Instead the problem is non-voting by educated non-citizens, whose input is lost. If we can begin to disentangle nationalism and democracy, clearly the former ought be discarded (it is after all the modern distillation of the worst tendencies of humanity) and franchise further expanded — a win whether treating democracy as a collective intelligence system (more diverse, more disinterested input) or as a collective representation/legitimacy system (non-citizens are also taxed, regulated, and killed).

Further expanding franchise presents challenges (I went over some of them previously in a post on extra-jurisdictional voting), but so does enforcing the status quo. Anyone not in the grip of nationalism or with a commitment to democracy ought want to meet any challenges faced by expanded franchise, not help enforce the status quo, even by means of “soft” informational campaigns.

Ubuntu Ten

Thursday, October 23rd, 2014

Retrospective on 10 years of Ubuntu (LWN discussion). I ran Ubuntu on my main computer from 2005 to 2011. I was happy to see Ubuntu become a “juggernaut” and I think like many hoped for it to become mainstream, largely indicated by major vendor preinstallation. The high point for me, which I seem to have never blogged about, was in 2007 purchasing a very well priced Dell 1420n with Ubuntu preinstalled.

But the juggernaut stalled at the top of the desktop GNU/Linux distribution heap, which isn’t very high. Although people have had various complaints about Ubuntu and Canonical Ltd., as I’ve written before my overriding disappointment is that they haven’t been much more successful. There are a couple tiny vendors that focus exclusively or primarily on shipping Ubuntu-preinstalled consumer hardware, and Dell or another major vendor occasionally offers something — Dell has had a developer edition Ubuntu preinstall for a couple years, usually substantially out of date, as the current offering is now.

Canonical seems to have followed Red Hat and others in largely becoming an enterprise/cloud servicing company, though apparently they’re still working on an Ubuntu flavor for mobile devices (and I haven’t followed, but I imagine that Red Hat still does some valuable engineering for the desktop). I wish both companies ever more success in these ventures — more huge companies doing only or almost only open source are badly needed, even imperfect ones.

For Ubuntu fans, this seems like a fine time to ask why it hasn’t been even more successful. Why hasn’t it achieved consistent and competitive mainstream vendor distribution? How much, if any blame can be laid at Canonical’s stumbles with respect to free/open source software? It seems to me that a number of Canonical products would have been much more likely to be dominante had they been open source from the beginning (Launchpad, Ubuntu One) or not required a Contributor License Agreement (bzr, Upstart, Mir), would not have alienated a portion of the free/open source software community, and that the world would overall be a better place had most of those products won — the categories of the first two remain dominated by proprietary services, and the latter three might have gained widespread adoption sooner than the things that eventually did or will probably win (git, systemd, wayland). But taking a step back, it’s really hard to see how these stumbles (that’s again from an outsider free/open source perspective; maybe they are still seen as having been the right moves at the time inside Canonical; I just don’t know) might have contributed in a major way to lack of mainstream success. Had the stumbles been avoided, perhaps some engineering resources would have been better allocated or increased, but unless reallocated with perfect hindsight as to what the technical obstacles to mainstream adoption were — an impossibility — I doubt they made much of a difference. What about alientation of a portion of the free/open source community? Conceivably had they (we) been more enthusiastic, more consumer lobbying/demand for Ubuntu preinstalls would have occurred, and tipped a balance — but that seems like wishful thinking, requiring a level of perfect organizing of GNU/Linux fan consumer demand that nobody has achieved. I’d love to believe that had Canonical stuck closer to a pure free/open source software path, it’d have achieved greater mainstream success, but I just don’t see much of a causal link. What are the more likely causes? I’d love to read an informed analysis.

For Ubuntu detractors, this seems like a fine time to ask why Ubuntu has been a juggernaut relative to your preferred GNU/Linux distribution. If you’re angry at Canonical, I suggest your anger is misdirected — you should be angry instead that your preferred distribution hasn’t managed to do marketing and distribution as well as it needed to, on its own terms — and figure out why that is. Better yet, form and execute on a plan to achieve the mainstream success that Ubuntu hasn’t. Otherwise in all likelihood it’s an Android and ChromeOS (and huge Windows legacy, with some Apple stuff in between) world for a long time to come. I’d love to read a feasible plan!

Global Columbus/Columbia/Colombia/Colón rename

Monday, October 13th, 2014

Today is not World IP Day (nor that one), which is August 9. Some celebrate today as IP Day counter to Columbus Day, but that puts too much emphasis on Columbus as a singular actor/great man/abominable person with respect to all pre-1492 western hemisphere populations. Had Columbus never been born, or his first voyage swallowed by the Atlantic before reaching land, it is hard to imagine the result for said populations over the next centuries being any different — merciless conquest by eastern hemisphere humans and microbes. So celebrate World IP Day on August 9, and rub out Columbus Day because he was a murderer and slaver. Criminal procedures, not clash of civilizations ones, are to be recommended for Columbus, as they are for any modern day terrorist.

What should this city and county in Ohio be renamed to? Photo: Derek Jensen, Public Domain

But one day is only a beginning. Wikipedia:

Veneration of Columbus in America dates back to colonial times. The name Columbia for “America” first appeared in a 1738 weekly publication of the debates of the British Parliament. The use of Columbus as a founding figure of New World nations and the use of the word “Columbia”, or simply the name “Columbus”, spread rapidly after the American Revolution. Columbus’ name was given to the federal capital of the United States (District of Columbia), the capital cities of two U.S. states (Ohio and South Carolina), and the Columbia River. Outside the United States the name was used in 1819 for the Gran Colombia, a precursor of the modern Republic of Colombia. Numerous cities, towns, counties, streets, and plazas (called Plaza Colón or Plaza de Colón throughout Latin America and Spain) have been named after him. A candidate for sainthood in the Catholic Church in 1866, celebration of Columbus’ legacy perhaps reached a zenith in 1892 with the 400th anniversary of his first arrival in the Americas. Monuments to Columbus like the Columbian Exposition in Chicago and Columbus Circle in New York City were erected throughout the United States and Latin America extolling him.

There should be no veneration or extolling of slave owners. Everything named after Columbus (fun fact: Colombo, the city in Sri Lanka, is not) should be renamed, along with everything named after Washington, Jefferson, Madison, Monroe, Jackson, Van Buren, Harrison, Tyler, Polk, Taylor, Pierce, Buchanan, Johnson, Grant, Penn, Franklin, Houston, Austin, and many more.

wiki↔journal

Thursday, October 2nd, 2014

The first wiki[pedia]2journal article has been published: Dengue fever Wikipedia article, peer-reviewed version (PDF). Modern medicine comes online: How putting Wikipedia articles through a medical journal’s traditional process can put free, reliable information into as many hands as possible is the accompanying editorial (emphasis added):

As a source of clinical information, how does Wikipedia differ from UpToDate or, for that matter, a textbook or scholarly journal? Wikipedia lacks three main things. First, a single responsible author, typically with a recognized academic affiliation, who acts as guarantor of the integrity of the work. Second, the careful eye of a trained editorial team, attuned to publication ethics, who ensure consistency and accuracy through the many iterations of an article from submission to publication. Third, formal peer review by at least one, and often many, experts who point out conflicts, errors, redundancies, or gaps. These form an accepted ground from which publication decisions can be made with confidence.

In this issue of Open Medicine, we are pleased to publish the first formally peer-reviewed and edited Wikipedia article. The clinical topic is dengue fever. It has been submitted by the author who has made the most changes, and who has designated 3 others who contributed most meaningfully. It has been peer reviewed by international experts in infectious disease, and by a series of editors at Open Medicine. It has been copy-edited and proofread; once published, it will be indexed in MEDLINE. Although by the time this editorial is read the Wikipedia article will have changed many times, there will be a link on the Wikipedia page that can take the viewer back to the peer-reviewed and published piece on the Open Medicine website. In a year’s time, the most responsible author will submit the changed piece to an indexed journal, so it can move through the same editorial process and continue to function as a valid, reliable, and evolving free and complete reference for everyone in the world. Although there may be a need for shorter, more focused clinical articles published elsewhere as this one expands, it is anticipated that the Wikipedia page on dengue will be a reference against which all others can be compared. While it might be decades before we see an end to dengue, perhaps the time and money saved on exhaustive, expensive, and redundant searches about what yet needs to be done will let us see that end sooner.

I love that this is taking Wikipedia and commons-based peer production into a challenging product area, which if wildly successful, could directly challenge and ultimately destroy the proprietary competition. The editorial notes:

Some institutions pay UpToDate hundreds of thousands of dollars per year for that sense of security. This has allowed Wolters Kluwer, the owners of UpToDate, to accrue annual revenues of hundreds of millions of dollars and to forecast continued double-digit growth as “market conditions for print journals and books … remain soft.”

See the WikiProject Medicine collaborative publication page for more background on the process and future developments. Note at least 7 articles have been published in journal2wiki[pedia] fashion, see PLOS Computational Biology and corresponding Wikipedia articles. Ideally these 2 methods would converge on wiki↔journal, as the emphasized portion of the quote above seems to indicate.

Peer review of Wikipedia articles and publication in another venue in theory could minimize dependencies and maximize mutual benefit between expert authoring (which has historically failed in the wiki context, see Nupedia and Citizendium) and mass collaboration (see challenges noted by editorial above). But one such article only demonstrates the concept; we’ll see whether it becomes an important method, let alone market dominating one.


One small but embarrassing obstacle to wiki↔journal is license incompatibility. PLOS journals use CC-BY-4.0 (donor-only relative to following; the version isn’t important for this one) and Wikipedia CC-BY-SA-3.0 (recipient-only relative to previous…and following) and Open Medicine CC-BY-SA-2.5-Canada (donor-only relative to the immediately previous) — meaning if all contributors to the Dengue fever Wikipedia article did not sign off, the journal version is technically not in compliance with the upstream license. Clearly nobody should care about this second issue, except for license stewards, who should mitigate the problem going forward: all previous versions (2.0 or greater due to lack of a “later versions” provision in 1.0) of CC-BY-SA should be added to CC-BY-SA-4.0’s compatibility list, allowing contributions to go both ways. The first issue unfortunately cannot be addressed within the framework of current licenses (bidirectional use could be avoided, or contributors could all sign off, either of which would be outside the license framework).

Daniel Mietchen (who is a contributor to the aforementioned journal2wiki effort, and just about everything else relating to Wikipedia and Open Access) has another version of his proposal to open up research funding proposals up at the Knight News Challenge: Libraries site. Applaud and comment there if you like, as I do (endorsement of previous version).

Near the beginning of the above editorial:

New evidence pours in to the tune of 12 systematic reviews per day, and accumulating the information and then deciding how to incorporate it into one’s practice is an almost impossible task. A study published in BMJ showed that if one hoped to take account of all that has been published in the relatively small discipline of echocardiography, it would take 5 years of constant reading—by which point the reader would be a year behind.

A similar avalanche of publishing can be found in any academic discipline. It is conceivable that copyright helps, providing an incentive for services like UpToDate. My guess is that it gets in the way, both by propping up arrangements oriented toward pumping out individual articles, and by putting up barriers (the public license incompatibility mentioned above is inconsequential compared to the paywalled, umitigated copyright, and/or PDF-only case which dominates) to collaborative — human and machine — distillation of the states of the art. As I wrote about entertainment, do not pay copyright holders, for a good future.