Post Creative Commons

Web Data Common[s] Crawl Attribution Metadata

Monday, January 23rd, 2012

Via I see Web Data Commons which has “extracted structured data out of 1% of the currently available Common Crawl corpus dating October 2010”. WDC publishes the extracted data as N-Quads (the fourth item denotes the immediate provenance of each subject/predictate/object triple — the URL the triple was extracted from).

I thought it would be easy and fun to run some queries on the WDC dataset to get an idea of how annotations associated with Creative Commons licensing are used. Notes below on exactly what I did. The biggest limitation is that the license statement itself is not part of the dataset — not as xhv:license in the RDFa portion, and for some reason rel=license microformat has zero records. But cc:attributionName, cc:attributionURL, and cc:morePermissions are present in the RDFa part, as are some Dublin Core properties that the Creative Commons license chooser asks for (I only looked at dc:source) but are probably widely used in other contexts as well.

Dataset URLs Distinct objects
Common Crawl 2010 corpus 5,000,000,000a
1% sampled by WDC ~50,000,000
with RDFa 158,184b
with a cc: property 26,245c
cc:attributionName 24,942d 990e
cc:attributionURL 25,082f 3,392g
dc:source 7,235h 574i
cc:morePermissions 4,791j 253k
cc:attributionURL = dc:source 5,421l
cc:attributionURL = cc:morePermissions 1,880m
cc:attributionURL = subject 203n

Some quick takeaways:

  • Low ratio of distinct attributionURLs probably indicates HTML from license chooser deployed without any parameterization. Often the subject or current page will be the most useful attributionURL (but 203 above would probably be much higher with canonicalization). Note all of the CC licenses require that such a URL refer to the copyright notice or licensing information for the Work. Unless one has set up a side-wide license notice somewhere, a static URL is probably not the right thing to request in terms of requiring licensees to provide an attribution link; nor is a non-specific attribution link as useful to readers as a direct link to the work in question. As (and if) support for attribution metadata gets built into Creative Commons-aware CMSes, the ratio of distinct attributionURLs ought increase.
  • 79% of subjects with both dc:source and cc:attributionURL (6,836o) have the same values for both properties. This probably means people are merely entering their URL into every form field requesting a URL without thinking, not self-remixing.
  • 47% of subjects with both cc:morePermissions and cc:attributionURL (3,977p) have the same values for both properties. Unclear why this ratio is so much lower than previous; it ought be higher, as often same value for both makes sense. Unsurprising that cc:morePermissions least provided property; in my experience few people understand it.

I did not look at the provenance item at all. It’d be interesting to see what kind of assertions are being made across authority boundaries (e.g., a page on example.com makes statements with an example.net URI as the subject) and when to discard such assertions. During a workshop I attended recently, the topic of decentralized verification came up, especially in connection with platforms like a UK crypto casino, where trust is often based on blockchain transparency rather than traditional authority. Inspired by that, I barely looked directly at the raw data here—just enough to sense that my aggregate numbers could be reasonably accurate. A closer inspection of smaller samples might yield additional insights, refining other aggregate queries.

I look forward to future extracts. Thanks indirectly to Common Crawl for providing the crawl!

Please point out any egregious mistakes made below…

# a I don't really know if the October 2010 corpus is the
# entire 5 billion Common Crawl corpus

# download RDFa extract from Web Data Commons
wget -c https://s3.amazonaws.com/ccrdf1p/data/ccrdf.html-rdfa.nq

# Matches number stated at
# http://page.mi.fu-berlin.de/muehleis/ccrdf/stats1p.html#html-rdfa
wc -l ccrdf.html-rdfa.nq
1047250

# Includes easy to use no-server triplestore
apt-get install redland-utils

# sanity check
grep '<http://creativecommons.org/ns#attributionName>' ccrdf.html-rdfa.nq |wc -l
26404 

# Import rejects a number of triples for syntax errors
rdfproc xyz parse ccrdf.html-rdfa.nq nquads

# d Perhaps syntax errors explains fewer triples than above grep might
# indicate, but close enough
rdfproc xyz query sparql - 'select ?o where { ?s <http://creativecommons.org/ns#attributionName> ?o}' |wc -l
24942

# These replicated below with 4store because...
rdfproc xyz query sparql - 'select distinct ?o where { ?s <http://creativecommons.org/ns#attributionName> ?o}' |wc -l
990
rdfproc xyz query sparql - 'select ?o where { ?s <http://creativecommons.org/ns#attributionURL> ?o}' |wc -l
25082
rdfproc xyz query sparql - 'select distinct ?o where { ?s <http://creativecommons.org/ns#attributionURL> ?o}' |wc -l
3392
rdfproc xyz query sparql - 'select ?o where { ?o <http://creativecommons.org/ns#attributionURL> ?o }' |wc -l
203
rdfproc xyz query sparql - 'select ?o where { ?s <http://creativecommons.org/ns#morePermissions> ?o}' |wc -l
4791
rdfproc xyz query sparql - 'select distinct ?o where { ?s <http://creativecommons.org/ns#morePermissions> ?o}' |wc -l
253
rdfproc xyz query sparql - 'select ?o where { ?o <http://creativecommons.org/ns#morePermissions> ?o }' |wc -l
12

# ...this query takes forever, hours, and I have no idea why
rdfproc xyz query sparql - 'select ?s, ?o where { ?s <http://creativecommons.org/ns#morePermissions> ?o ; <http://creativecommons.org/ns#attributionURL> ?o }'

# 4store has a server, but is lightweight
apt-get install 4store

# 4store couldn't import with syntax errors, so export good triples from
# previous store first
rdfproc xyz serialize > ccrdf.export-rdfa.rdf

# import into 4store
curl -T ccrdf.export-rdfa.rdf 'http://localhost:8080/data/wdc'

# egrep is to get rid of headers and status output prefixed by ? or #
4s-query wdc -s '-1' -f text 'select ?s, ?o where { ?s <http://creativecommons.org/ns#attributionName> ?o}' |egrep -v '^[\?\#]' |wc -l
24942

#f
4s-query wdc -s '-1' -f text 'select ?s, ?o where { ?s <http://creativecommons.org/ns#attributionURL> ?o}' |egrep -v '^[\?\#]' |wc -l
25082

#j
4s-query wdc -s '-1' -f text 'select ?s, ?o where { ?s <http://creativecommons.org/ns#morePermissions> ?o}' |egrep -v '^[\?\#]' |wc -l
4791

#h
#Of course please use http://purl.org/dc/terms/source instead.
#Should be more widely deployed soon.
4s-query wdc -s '-1' -f text 'select ?s, ?o where { ?s <http://purl.org/dc/elements/1.1/source> ?o}' |egrep -v '^[\?\#]' |wc -l
7235

4s-query wdc -s '-1' -f text 'select ?s, ?o where { ?s <http://purl.org/dc/terms/source> ?o}' |egrep -v '^[\?\#]' |wc -l
4


#e
4s-query wdc -s '-1' -f text 'select distinct ?o where { ?s <http://creativecommons.org/ns#attributionName> ?o}' |egrep -v '^[\?\#]' |wc -l
990

#g
4s-query wdc -s '-1' -f text 'select distinct ?o where { ?s <http://creativecommons.org/ns#attributionURL> ?o}' |egrep -v '^[\?\#]' |wc -l
3392

#k
4s-query wdc -s '-1' -f text 'select distinct ?o where { ?s <http://creativecommons.org/ns#morePermissions> ?o}' |egrep -v '^[\?\#]' |wc -l
253

#i
4s-query wdc -s '-1' -f text 'select distinct ?o where { ?s <http://purl.org/dc/elements/1.1/source> ?o}' |egrep -v '^[\?\#]' |wc -l
574

#n
4s-query wdc -s '-1' -f text 'select ?o where { ?o <http://creativecommons.org/ns#attributionURL> ?o}' |egrep -v '^[\?\#]' |wc -l
203

4s-query wdc -s '-1' -f text 'select ?o where { ?o <http://creativecommons.org/ns#morePermissions> ?o}' |egrep -v '^[\?\#]' |wc -l
12

4s-query wdc -s '-1' -f text 'select ?o where { ?o <http://purl.org/dc/elements/1.1/source> ?o}' |egrep -v '^[\?\#]' |wc -l
120

#m
4s-query wdc -s '-1' -f text 'select ?s, ?o where { ?s <http://creativecommons.org/ns#attributionURL> ?o ; <http://creativecommons.org/ns#morePermissions> ?o }' |egrep -v '^[\?\#]' |wc -l
1880

4s-query wdc -s '-1' -f text 'select distinct ?o where { ?s <http://creativecommons.org/ns#attributionURL> ?o ; <http://creativecommons.org/ns#morePermissions> ?o }' |egrep -v '^[\?\#]' |wc -l
122

4s-query wdc -s '-1' -f text 'select ?o where { ?o <http://creativecommons.org/ns#attributionURL> ?o ; <http://creativecommons.org/ns#morePermissions> ?o }' |egrep -v '^[\?\#]' |wc -l
8

#l
4s-query wdc -s '-1' -f text 'select ?s, ?o where { ?s <http://creativecommons.org/ns#attributionURL> ?o ; <http://purl.org/dc/elements/1.1/source> ?o }' |egrep -v '^[\?\#]' |wc -l
5421

4s-query wdc -s '-1' -f text 'select distinct ?o where { ?s <http://creativecommons.org/ns#attributionURL> ?o ; <http://purl.org/dc/elements/1.1/source> ?o }' |egrep -v '^[\?\#]' |wc -l
358

4s-query wdc -s '-1' -f text 'select ?o where { ?o <http://creativecommons.org/ns#attributionURL> ?o ; <http://purl.org/dc/elements/1.1/source> ?o }' |egrep -v '^[\?\#]' |wc -l
11

#p
4s-query wdc -s '-1' -f text 'select ?s, ?o, ?n where { ?s <http://creativecommons.org/ns#attributionURL> ?o ; <http://creativecommons.org/ns#morePermissions> ?n }' |egrep -v '^[\?\#]' |wc -l
3977

#o
4s-query wdc -s '-1' -f text 'select ?s, ?o, ?n where { ?s <http://creativecommons.org/ns#attributionURL> ?o ; <http://purl.org/dc/elements/1.1/source> ?n }' |egrep -v '^[\?\#]' |wc -l
6836

4s-query wdc -s '-1' -f text 'select ?s, ?o, ?n, ?m where { ?s <http://creativecommons.org/ns#attributionURL> ?o ; <http://purl.org/dc/elements/1.1/source> ?n ; <http://creativecommons.org/ns#morePermissions> ?m }' |egrep -v '^[\?\#]' |wc -l
2946
4s-query wdc -s '-1' -f text 'select ?s, ?o where { ?s <http://creativecommons.org/ns#attributionURL> ?o ; <http://purl.org/dc/elements/1.1/source> ?o ; <http://creativecommons.org/ns#morePermissions> ?o }' |egrep -v '^[\?\#]' |wc -l
1604

#c
4s-query wdc -s '-1' -f text 'select distinct ?s where { { ?s <http://creativecommons.org/ns#attributionURL> ?o } UNION { ?s <http://creativecommons.org/ns#attributionName> ?n } UNION { ?s <http://creativecommons.org/ns#morePermissions> ?m }  }' |egrep -v '^[\?\#]' |wc -l
26245

4s-query wdc -s '-1' -f text 'select distinct ?s where { { ?s <http://creativecommons.org/ns#attributionURL> ?o } UNION { ?s <http://creativecommons.org/ns#attributionName> ?n }}' |egrep -v '^[\?\#]' |wc -l
25433


#b note subjects not the same as pages data extracted from (158,184)
4s-query wdc -s '-1' -f text 'select distinct ?s where { ?s ?p ?o }'  |egrep -v '^[\?\#]' |wc -l
264307

# Probably less than 1047250 claimed due to syntax errors
4s-query wdc -s '-1' -f text 'select ?s where { ?s ?p ?o }'  |egrep -v '^[\?\#]' |wc -l
968786

4s-query wdc -s '-1' -f text 'select ?s where { ?s ?p ?s }'  |egrep -v '^[\?\#]' |wc -l
2415

4s-query wdc -s '-1' -f text 'select ?s where { ?s ?s ?s }'  |egrep -v '^[\?\#]' |wc -l
0

4s-query wdc -s '-1' -f text 'select ?s where { ?s ?s ?o }'  |egrep -v '^[\?\#]' |wc -l
0

SOPA/PIPA protests on-message or artless?

Wednesday, January 18th, 2012

Go Internet! Instantly message the U.S. Congress! (Tell them to kill the so-called Research Works Act too!)

Another, much bigger, tiresome rearguard action. I’m impressed by protesters’ nearly universal and exclusive focus on encouraging readers to contact U.S. Congresspeople. I hope it works. SOPA and PIPA really, really deserve to die.

But the protest also bums me out.

1) Self-censorship (in the case of sites completely blacked out, as opposed to those prominently displaying anti-SOPA messages) is not the Internet at its best. If that claim weren’t totally ridiculous, the net wouldn’t be worth defending. It isn’t even the net at its political best — that would be creating systems which disrupt and obviate power — long term offensives, not short-term defenses.

2) Near exclusive focus on supplication before 535 [Update: 536] ultra-powerful individuals is kinda disgusting. But it needs to be done, as effectively as possible.

3) I haven’t looked at a huge number of sites, but I haven’t seen much creativity in the protest. Next time it would be fun to see an appropriate site (Wikipedia? Internet Archive?) take what Flickr has done and add bidding for the “right” to darken particular articles or media as a fundraiser. Art would be nice too — I’d love to hear about anything really great (and preferably libre) from this round.

4) While some prominent bloggers have made the point that “piracy” is not a legitimate problem, overwhelmingly the protest has stuck to defense — SOPA and PIPA would do bad things to the net, and wouldn’t “work” anyway. Google goes much further, saying “End Piracy, Not Liberty” and “Fighting online piracy is important.” Not possible, wrong, and gives away the farm.

5) Nobody making the point that everyone can help with long-term offensives which will ultimately stop ratcheting protectionism, if it is to be stopped. Well, this nobody has attempted:

[I]magine a world in which most software and culture are free as in freedom. Software, culture, and innovation would be abundant, there would be plenty of money in it (just not based on threat of censorship), and there would be no constituency for attacking the Internet. (Well, apart from dictatorships and militarized law enforcement of supposed democracies; that’s a fight intertwined with SOPA, but those aren’t the primary constituencies for the bill.) Now, world dominationliberation by free software and culture isn’t feasible now. But every little bit helps reduce the constituency that wishes to attack the Internet to possibly protect their censorship-based revenue streams, and to increase the constituency whose desire to protect the Internet is perfectly aligned with their business interests and personal expression.

I’d hope that at least some messages tested convey not only the threat SOPA poses to Wikimedia, but the long-term threat the Wikimedia movement poses to censorship.

Also:

Bad legislation needs to be stopped now, but over the long term, we won’t stop getting new bad legislation until policymakers see broad support and amazing results from culture and other forms of knowledge that work with the Internet, rather than against it. Each work or project released under a CC license signals such support, and is an input for such results.

And:

Finally, remember that CC is crucial to keeping the Internet non-broken in the long term. The more free culture is, the less culture has an allergy to and deathwish for the Internet.

Of the five items I list above, the first three are admittedly peevish. Four and five represent not so much problems with the current protest as they do severe deficiencies in movements for intellectual freedom. Actually they are flipsides of the same deficiency: lack of compelling explanation that intellectual freedom, however constructed and protected, really matters, really works, and is really for the good. If such were well enough researched and explained so as to become conventional wisdom, rather than contentious and seemingly radical, net freedom activists could act much more proactively, provocatively, and powerfully, rather than as they do today: with supplication and genuflection.

I am not at all well read, but my weak understanding is that the withdrawal of economists from studying intellectual protectionism in the late 1800s was a great tragedy. To begin the encourage rectification of that century plus of relative neglect, today is a good day to start reading Against Intellectual Monopoly.

In the meantime, the actual and optimal counterfactual drift further apart, without any help from SOPA and PIPA.

MLK’s reliance on “remix” is well-documented; without a strong public domain, where will that leave the next MLK?

Monday, January 16th, 2012

I copied and slightly reworded the title of this post from Joshua Judson Rosen; the body draws heavily from a conversation started by Rosen. Today is .

People have noted for years that the King estate does their best to lock up and profit from his works. I even had a post that touched on this indirectly in 2004 (it appears that since then Eyes on the Prize has been re-aired and DVDs sold, result of an $850,000 grant to acquire the necessary licenses). But the King estate is simply doing what most heirs would do with an uninsured creative legacy. If societal governance of the knowledge commons were anything close to reasonable, all King’s works would now be in the public domain.

Perhaps ironically (but only if one cannot distinguish between King and his estate, and between citation and copyright restrictions), in his academic writing King was a very poor provider of intellectual provenance — in that context, he plagiarized:

I might conclude that none of this was fatal for King’s career as a preacher and powerful public speaker. Had he pursued an academic career, his heavy reliance on the authorities, often without citing them, could have been fatal. But in preaching, perhaps even in most public speech, genuine originality is more often fatal. A congregation, even a public audience, expects to hear and responds to the word once delivered to the fathers [and mothers]. It is the familiar that resonates with us. The original sounds alien and tends to alienate. The familiar, especially the familiar that appeals to the best in us, is what we long to hear. So,”I Have A Dream” was no new vision; it was a recension, quite literally, of his own “An American Dream.” And that dream, as we know, already had a long history. King’s vision was, perhaps, more inclusive than earlier dreams, but it appealed to us because we already believed it.

Indeed, far more interesting is the ubiquity of borrowing in King’s profession. On preachers borrowing liberally from each other and any other available source, listen to this week’s installment of WYNC On the Media, Dr. Martin Luther King Jr. and the Public Imagination (about 15 minutes).

I did not know this about sermons, but upon hearing, it is completely unsurprising. But now I have questions:

  • Do preachers now continue to borrow as heavily and as liberally as they did in King’s day and before? What about public speakers generally?
  • Should preaching be added to magic, fashion, food, and comedy as examples of professions relying heavily on borrowing, and not so much on censorship?
  • The development of King’s speeches, and of preacher’s sermons* generally, highlight that in some contexts borrowing without citation is valuable, nevermind that it would be called plagiarism in other contexts. Should schools teach how to be a great artist in some classes? Doing so might help their anti-plagiarism rhetoric sink in better, as it would then appear contextually appropriate, rather than fanatic.


* Daniel Dennet approvingly says that TED talks are secular sermons, pinpointing another reason I find them annoying (for being sermons, not for being secular). But I don’t want to censor any sermons.

Life in the kind of bleak future of HTML data

Thursday, January 12th, 2012

Evan Prodromou wrote in 2006:

I think that if microformats.org and the RDFa effort continue moving forward without coordinating their effort, the future looks kind of bleak.

I blogged about this at the time (and forgot and reblogged five months later). I recalled this upon reading a draft HTML Data Guide announced today, and trying to think of a tl;dr summary to at least microblog.

That’s difficult. The guide is intended to help publishers and consumers of HTML data choose among three syntaxes (all mostly focused on annotating data inline with HTML meant for display) and a variety of vocabularies, with heavy dependencies between the two. Since 2006, people working on microformats and RDFa have done much to address the faults of those specifications — microformats-2 allows for generic (rather than per-format) parsing, and RDFa 1.1 made some changes to make namespaces less needed, less ugly when needed, and usable in HTML5, and specifies a lite subset. In 2009 a third syntax/model, microdata, was launched, and then in 2011 chosen as the syntax for schema.org (which subsequently announced it would also support RDFa 1.1 Lite).

I find the added existence of microdata and schema.org suboptimal (optimal might be something like microformats process for some super widely useful vocabularies, with a relatively simple syntax but permitting generic parsing and distributed extensibility; very much like what Prodromou wanted in 2006), but when is anything optimal? I also wonder how much credit microdata ought get for microformats-2 and RDFa 1.1, due to providing competitive pressure? And schema.org for invigorating metadata-enhanced web-scale search and vocabulary design (for example, the last related thing I was involved in, at the beginning anyway)?

Hope springs eternal for getting these different but overlapping technologies and communities to play well together. I haven’t followed closely in a long time, but I gather that Jeni Tennison is one of the main people working on that, and you should really subscribe to her blog if you care. That leaves us back at the HTML Data Guide, of which Tennison is the editor.

My not-really-a-summary:

  1. Delay making any decisions about HTML data; you probably don’t want it anyway (metadata is usually a cost center), and things will probably be more clear when you’re forced to check back due to…
  2. If someone wants data from you as annotated HTML, or you need data from someone, and this makes business sense, do whatever the other party has already decided on, or better yet implemented (assuming their decision isn’t nonsensical; but if so why are you doing business with them?)
  3. Use a validator to test your data in whatever format. An earlier wiki version of some of the guide materials includes links to validators. In my book, Any23 is cute.

(Yes, CC REL needs updating to reflect some of these developments, RDFa 1.1 at the least. Some license vocabulary work done by SPDX should also be looked at.)

Penumbra of Provenance

Thursday, January 12th, 2012

W3C PROV

Yesterday the W3C’s Provenance Working Group posted a call for feedback on a family of documents members of that group have been working on. Provenance is an important issue for the info commons, as I’ve sketched elsewhere. I hope some people quickly flesh out examples of application of the draft ontology to practical use cases.

Intellectual Provenance

Apart from some degree of necessity for current functioning of some info commons (obviously where some certainty about freedoms from copyright restriction is needed, but conceivably even moreso to outgrow copyright industries), provenance can also play an important symbolic role. Unlike “intellectual property”, intellectual provenance is of keen interest to both readers and writers. Furthermore, copyright and other restrictions make provenance harder, in both practical (barriers to curation) and attitudinal — the primacy of “rights” (as in rents, and grab all that your power allows) deprecates the actual intellectual provenance of things.

Postmodern Provenance

The umbra of provenance seems infinite. As we preserve scratches of information (or not) incomparably vast amounts disappear. But why should we only care for what we can record that led to current configurations? Consider independent invention and convergent evolution. Who cares what configurations and events led to current configurations: what are the recorded configurations that could have led to the current configuration, what are all of the configurations that could have led to the current configuration; what configurations are most similar (including history, or not) to a configuration in question?

.prov

In order to highlight the exposure of provenance information on the internet and provide added impetus for organizations to expose in a way that can efficiently be found and accessed, I am exploring the possibility of a .prov TLD.

Years of open hardware licenses

Tuesday, January 10th, 2012

Last in a list of the top 10 free/open source software legal developments in 2011 (emphasis added):

Open Hardware License. The open hardware movement received a boost when CERN published an Open Hardware License (“CERN OHL”). The CERN OHL is drafted as a documentation license which is careful to distinguish between documentation and software (which is not licensed under the CERN OHL) http://www.ohwr.org/documents/88. The license is “copyleft” and, thus, similar to GPLv2 because it requires that all modifications be made available under the terms of the CERN OHL. However, the license to patents, particularly important for hardware products, is ambiguous. This license is likely to the first of a number of open hardware licenses, but, hopefully, the open hardware movement will keep the number low and avoid “license proliferation” which has been such a problem for open source software.

But the CERN OHL isn’t the first “open hardware license”. Or perhaps it is the nth first. Several free software inspired licenses intended specifically for design and documentation have been created over the last decade or so. I recall encountering one dating back to the mid-1990s, but can’t find a reference now. Discussion of open hardware licenses was hot at the turn of the millennium, though most open hardware projects from that time didn’t get far, and I can’t find a license that made it to “1.0”.

People have been wanting to do for hardware what the GNU General Public License has done for software and trying to define open hardware since that timeframe. They keep on wanting (2006) and trying (2007, 2011 comments).

Probably the first arguably “high quality” license drafted specifically for open hardware is the (2007). The CERN OHL might be the second such. There has never been consensus on the best license to use for open hardware. Perhaps this is why CERN saw fit to create yet another (incompatible copyleft at that — incompatible with TAPR OHL, GPL, and BY-SA), but there still isn’t consensus in 2012.

Licenses primarily used for software (usually [L]GPL, occasionally BSD, MIT, or Apache) have also been used for open hardware since at least the late 1990s — and much more so than any license created specifically for open hardware. CC-BY-SA has been used by Arduino since at least 2008 and since 2009.

In 2009 the primary drafter of the TAPR OHL published a paper with a rationale for the license. By my reading of the paper, the case for a license specific to hardware seems pretty thin — hardware design and documentation files, and distribution of printed circuit boards seem a lot like program source and executables, and mostly subject to copyright. It also isn’t clear to me why the things TAPR OHL handles differently than most open source software licenses (disclaims strictly being a copyright license, instead wanting to serve as a clickwrap contract; attempts to describe requirements functionally, instead of legally, to avoid describing explicitly the legal regime underlying requirements; limited patent grant applies to “possessors” not just contributors) might not be interesting for software licenses, if they are interesting at all, nor why features generally rejected for open source software licenses shouldn’t also be rejected for open hardware (email notification to upstream licensors; a noncommercial-only option — thankfully deprecated late last year).

Richard Stallman’s 1999 note about free hardware seems more clear and compelling than the TAPR paper, but I wish I could read it again without knowing the author. Stallman wrote:

What this means is that anyone can legally draw the same circuit topology in a different-looking way, or write a different HDL definition which produces the same circuit. Thus, the strength of copyleft when applied to circuits is limited. However, copylefting HDL definitions and printed circuit layouts may do some good nonetheless.

In a thread from 2007 about yet another proposed open hardware license, three people who generally really know what they’re talking about each wondered why a hardware-specific license is needed: Brian Behlendorf, Chris DiBona, and Simon Phipps. The proposer withdrew and decided to use the MIT license (a popular non-copyleft license for software) for their project.

My bias, as with any project, would be to use a GPL-compatible license. But my bias may be inordinately strong, and I’m not starting a hardware project.

One could plausibly argue that there are still zero quality open hardware specific licenses, as the upstream notification requirement is arguably non-open, and the CERN OHL also contains an upstream notification requirement. Will history repeat?

Addendum: I just noticed the existence of an open hardware legal mailing list, probably a good venue to follow if you’re truly interested in these issues. The organizer is Bruce Perens, who is involved with TAPR and is convinced non-copyright mechanisms are absolutely necessary for open hardware. His attempt to bring rigor to the field and his decades of experience with free and open source software are to be much appreciated in any case.

CSS text overlay image, e.g. for attribution and license notice

Sunday, January 8th, 2012

A commenter called me on providing inadequate credit for an map image I used on this blog. I’ve often seen map credits overlaid on the bottom right of maps, so I decided to try that. I couldn’t find an example of using CSS to overlay text on an image that only showed the absolute minimum needed to achieve the effect, and explained why. Below is my attempt.

Example 1

The above may be a good example of when to not use a text overlay (there is already text at the bottom of the image), but the point is to demonstrate the effect, not to look good. I have an image and I want to overlay «Context+Colophon» at the bottom right of the image. Here’s the minimal code:

1
2
3
4
5
6
<div style="position:relative;z-index:0;width:510px">
  <img src="https://gondwanaland.com/i/young-obama-pirate-hope.png"/>
  <div style="position:absolute;z-index:1;right:0;bottom:0">
    <a href="http://identi.ca/conversation/69446510">Context</a>+<a href="http://registry.gimp.org/node/14291">Colophon</a>
  </div>
</div>

Explanation

The outer div creates a container which the text overlay will be aligned with. A position is necessary to enable z-index, which specifies how objects will stack. Here position:relative as I want the image and overlay to flow with the rest of the post, z-index:0 as the container is at the bottom of the stack. I specify width:510px as that’s how wide the image is, and without hardcoding the size of the div, the overlay as specified will float off to the right rather than align with the image. There’s nothing special about the img; it inherits from the outer div.

The inner div contains and styles the text I want to overlay. position:absolute as I will specify an absolute offset from the container, right:0;bottom:0, and z-index:1 to place above the image. Finally, I close both divs.

That’s it. I know precious little CSS; please tell me what I got wrong.

Example 2

Above is the image that prompted this post, with added attribution and license notice. Code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
<div style="z-index:0;position:relative;width:560px"
     xmlns:cc="http://creativecommons.org/ns#"
     about="https://gondwanaland.com/i/OpenStreetMap-Oakland-980.png">
  <a href="http://www.openstreetmap.org/?lat=37.8134&amp;lon=-122.2776&amp;zoom=14&amp;layers=Q">
    <img src="https://gondwanaland.com/i/OpenStreetMap-Oakland-980.png"/></a>
  <div style="position:absolute;z-index:1;right:0;bottom:0;">
    <small>
      © <a rel="cc:attributionURL"
           property="cc:attributionName"
           href="http://www.openstreetmap.org/?lat=37.8134&amp;lon=-122.2776&amp;zoom=14&amp;layers=Q">OpenStreetMap contributors</a>,
        <a rel="license"
           href="http://creativecommons.org/licenses/by-sa/2.0/">CC-BY-SA</a>
    </small>
  </div>
</div>

Explanation

With respect to the achieving the text overlay, there’s nothing in this example not in the first. Below I explain annotations added that (but are not required by) fulfillment of OpenStreetMap/CC-BY-SA attribution and license notice.

The xmlns:ccprefix, and even that may be superfluous, given cc: as a default prefix.

about sets the subject of subsequent annotations.

small isn’t an annotation, but does now seem appropriate for legal notices, and is usually rendered nicely.

rel="cc:attributionURL" says that the value of the href property is the link to use for attributing the subject. property="cc:attributionName" says that the text (“OpenStreetMap contributors”) is the name to use for attributing the subject. rel="license" says the value of its href property is the subject’s license.

If you’re bad and not using HTTPS-Everywhere (referrer not sent due to protocol change; actually I’m bad for not serving this blog over https), clicking on BY-SA above might obtain a snippet of HTML with credits for others to use. Or you can copy and paste the above code into RDFa Distiller or checkrdfa to see that the annotations are as I’ve said.

Addendum: If you’re reading this in a feed reader or aggregator, there’s a good chance inline CSS is stripped — text intended to overlay images will appear under rather than overlaying images. Click through to the post in order to see the overlays work.

Which counterfactual public domain day?

Sunday, January 1st, 2012

1. Each January 1, many people note a number of interesting works that become free of copyright restrictions in many jurisdictions, but a 1998 act means none will in the U.S. until at least 2019.

2. The Center for the Study of the Public Domain provides another counterfactual, imagining policy not pre-1998, but pre-1976 (act; effective 1978), which at the top states (repeated at Boing Boing, which inspired this post’s title) works from 1955 or before would be free of copyright restrictions.

3. But as the CSPD page points out further down (see “the public domain snatchers”), the pre-1976 policy also would’ve meant many works from 1983 or before would now be free of copyright restrictions, as the policy allowed for 28 years of restriction, with an optional renewal of 28 years. Historically copyright holders did not bother renewing 85% of works.

4. The aforementioned CSPD page doesn’t note, but their FAQ does, that prior to 1989 a copyright notice was required in order for a work to be restricted. The FAQ says “By some estimates, 90% of works did not include this copyright notice and immediately entered the public domain.” A counterfactual taking this into account would have not only a robust January 1, but every day would be public domain day.

(Of course as I noted last year, every day is public domain day to the extent you make it so, no counterfactual required. But defaults really matter.)

5. Any of the above counterfactuals would be tremendous improvements over society’s current malgovernance of the intellectual commons. But they’re all boring. They are much more difficult to conceive, but the counterfactuals I’d prefer to look are not ones with recent rent seeking undone, but ones attempting to characterize worlds with optimal copyright restriction, which is itself under-explored: no extensions? 15 years? 1 year? Maybe 0? The thing about this sort of counterfactual is not the precise duration, nature, or existence of restriction, but in changing how we think about the public domain — not some old works that it is cool that we can now cooperate around to preserve and breathe new life into without legal threat (or uncool if we can’t) — but about how the world would be changed in a dynamic way with much better policy. I bet we wouldn’t even miss that 9-figure Hollywood dreck if such disappeared (I really doubt it would, but here’s to hoping) that most writers in this field must genuflect to and that are used as the excuse to destroy, because whatever would exist would be our culture, and everyone loves their culture (which of course may be subculture built on superficial or even real rejection of such, etc). It would just also be our culture in another way as well, one compatible with free speech and more equal distribution of wealth, in addition to practical things like a non-broken Internet.

End of the 2011 world

Saturday, December 31st, 2011


I took the above photo near the beginning of 2011. It has spent most of the year near the top (currently #2) of my photos hosted at Flickr ranked by their interestingness metric. Every other photo in the 200 they rank (sadly I don’t think anyone not logged in as me can see this list) has some combination of being on other people’s lists of favorites, comments, or large number of views. The above photo has none of that. Prior to this post it has only been viewed 33 times by other people, according to Flickr, and I don’t think that number has changed in some time. Their (not revealed) code must find something about the image itself interesting. Is their algorithm inaccurate? In any case the image is appropriate as the world of 2011 is ending, and in 2012 I absolutely will migrate my personal media hosting to something autonomous, as since last year someone (happens to be a friend and colleague) has taken on the mantle of building media sharing for the federated social web.

My employer’s office moved from San Francisco to Mountain View in April, contributing to a number of people leaving or transitioning out, which has been a bummer. I’ve been working exclusively from home since May. Still, there have been a number of good developments, which I won’t attempt to catalog here. My favorites include agreement with the Free Software Foundation regarding use of CC0 for public domain software, small improvements in the CC legal user interface, the return and great work of a previous colleague, retirement of two substandard licenses, research, and a global summit/launch of a process toward version 4.0 of the CC licenses, which I hope over the next year prove at least a little bit visionary, long-standing, and have some consideration for how they can make the world a better place.

Speaking of which, I’ve spent more time thinking about social science-y stuff in 2011 than I have in at least several years. I’ll probably have plenty to say regarding this on a range of topics next year, but for now I’ll state one narrow “professionally-related” conclusion: free/libre/open software/culture/etc advocates (me included) have done a wholly inadequate job of characterizing why our preferences matter, both to the general public and to specialists in every social science.

Apart from silly peeves, two moderate ideas unrelated to free/libre/open stuff that I first wrote about in 2011 and I expect I’ll continue to push for years to come: increasing the minimum age and education requirement for soldiers and tearing down highway 980.

I haven’t done much programming in several years, and not full time in about a decade. This has been making me feel like my brain is rotting, and contributes to my lack of prototyping various services that I want to exist. Though I’d been fiddling (that may be generous) with Scala for a couple years, I was never really super excited about tying myself to the JVM. I know and deeply respect lots of people who doing great things with Python, and I’ve occasionally used it for scripts over the past several years because of that, but it leaves me totally non-enthused. I’ve done enough programming in languages that are uglier but more or less the same, time for something new. For a couple months I’ve been learning and doing some prototyping using the Yesod web framework (apparently I had heard of Haskell in 2005 but I didn’t look at it closely until last year). I haven’t made as much progress as I’d like, mostly due to unrelated distractions. The biggest substantive hurdle has not been Haskell (and the concepts it stands for), but a lack of Yesod examples and documentation. This seems to be a common complaint. Yesod is rapidly moving to a 1.0 release, documentation is prioritized, and I expect to be really productive with it over the coming year. Thanks to the people who make Yesod and those who have been making Haskell for two decades.

This year I appreciated three music projects that I hadn’t paid much attention to before, much to my detriment: DNA, Moondog, and especially Harry Partch. I also listened a lot again to one of my favorite bands I discovered in college, Violence and the Sacred, which amazingly has released some of its catalog under the CC BY-SA license. Check them out!

Finally, in 2011 I had the pleasure of getting to know just a little bit some people working to make my neighborhood a better place, attending a conference with my sister, seeing one of my brothers start a new job and the other a new gallery, and with my wife of continuing to grow up (in that respect, the “better half” cliche definitely applies). Now for this world to end!

Namecheap’s savvy anti-SOPA marketing

Thursday, December 29th, 2011

I’m impressed by how much gratis publicity and advertising has gotten via its anti-SOPA marketing (including the Wikipedia article I linked to; it didn’t exist 3 days ago), and completely unimpressed by the failure of approximately every other company to take advantage of the opportunity, which strikes me as easy social media gold. Communications department heads ought roll.

* pro-SOPA marketing failures made Namecheap’s action straightforward relative to companies not directly competing with Go Daddy. However, there are lots of other domain name registrars, none of which has done anything with Namecheap’s marketing savvy. Another registrar, (which I’ve used and recommended for some time, and has supported Creative Commons and other good causes), like Namecheap is donating a portion of domain transfers to the Electronic Frontier Foundation, but doesn’t seem to be making a big deal of it, and their anti-SOPA blog post is rather tepid. Compare to Namecheap’s anti-SOPA blog post, which isn’t all that much stronger in terms of substance (contains genuflection to “intellectual property”), it is much more strongly worded and simply more effectively written.

One other company has a support-EFF-against-SOPA tie-in. That company, Zopim, provides website chat services, and doesn’t seem to compete with Go Daddy at all. I’m not interested, but never would have heard of them otherwise. Any company could do that.

(I see that sometime today two other small domain registrars have added support-EFF-against-SOPA deals. Good for Suspicious Networks and Centuric.)

What inspired to me write this post is that Namecheap isn’t only taking gratis publicity. They’re also running presumably paid ads as part of their anti-SOPA marketing campaign:

While trying to get the above ad to load again (noticed out of the corner of my eye but didn’t register until sometime after — I’m oddly trying to recover from ad blindness), I noticed another Namecheap ad, which if you’re already really tuned in, illustrates nicely the imperfect options available from a software freedom perspective for domain registration and other nearly commodity services.

Check out more anti-SOPA and pro-freedom actions.

*Isn’t the name “Go Daddy” ridiculous? That, coupled with a super cheesy website and company logo led me to disregard them long before they started shooting sexy elephants at gladiator events, or whatever got people upset before they supported SOPA.