Recently I’ve uncritically cheered for Wikidata as “rapidly fulfilling” hopes to “turn the universal encyclopedia into the universal database while simultaneously improving the quality of the encyclopedia.” In April I uncritically cheered for Daniel Mietchen’s open proposal for research on opening research proposals.

Let’s combine the two: an open proposal for work toward establishing Wikidata (including its community, data, ontologies, practices, software, and external tools) as a “collaborative hub around research data” responding to a European Commission call on e-infrastructures. That would be Wikidata for Research (WD4R), instigated by Mietchen, who has already assembled an impressive set of partner institutions and an outline of work packages. The proposal is being drafted in public (you can help) and will be submitted January 14.


The proposal will be strong on its own merits, and very well aligned with the stated desired outcomes from the EC call, and the open proposal dogfood angle is also great. I added for all to this post’s title because I suspect WD4R will be a great for pushing Wikidata toward realizing aforementioned “universal database” hopes (which again means not just the data, but community, tools, etc.; “virtual research environment” is one catch-all term) and will make Wikidata much more useful “research” most broadly construed (e.g., by students, journalists, knowledge workers, anyone), potentially much faster than would happen otherwise.

My suspicion has two bases (please correct me if I’m wrong about either):

  1. A database or virtual environment “for research” might give the impression of someplace to dump data from or perform experiments. Maybe that would be appropriate for Wikidata in some instance, but the overwhelming research-supporting use would seem to be mass collaboration in consolidating, annotating, and correcting data and ontologies which many researchers (and researchers-broadly-construed, everyone) can benefit from, either querying or referencing directly, or extracting and using elsewhere. The pre-existing Gene Wiki project which is beginning to use Wikidata is an example of such useful-to-all collections (as referenced in the WD4R pages).
  2. One of the proposed work packages is to identify and work on features needed for research but not on, or not prioritized on, the Wikidata development plan. I suspect other Wikimedia projects can tremendously benefit from Wikidata integration without Wikidata itself or external tools supporting complex queries and reporting that would be called for by a virtual research environment — and also called for to realize “universal database” hopes. Wikidata’s existing plan looks good to me; here I’m just saying WD4R might help it be even better, faster.

The previously linked Gene Wiki post includes:

For more than a decade many different groups have proposed and many have implemented solutions to this challenge using standards and techniques from the Semantic Web. Yet, today, the vast majority of biological data is still accessed from individual databases such as Entrez Gene that make no attempt to use any component of the Semantic Web or to otherwise participate in the Linked Open Data movement. With a few notable exceptions, the data silos have only gotten larger and problems of fragmentation worse.
Now, we are working to see if Wikidata can be the bridge between the open community-driven power of Wikipedia and the structured world of semantic data integration. Can the presence of that edit button on a centralized knowledge base associated with Wikipedia help the semantic web break through into everyday use within our community?

I agree that massive centralized commons-oriented resources are needed for decentralization to progress (link analogous but not the same — linked open data : federation :: data silos : messaging silos).

Check out Mietchen’s latest WD4R blog post and the WD4R project page.

3 Responses

