Archive for January, 2012

Copyleft regulates

Tuesday, January 31st, 2012

Copyleft as a pro-software-freedom regulatory mechanism, of which more are needed.

Existing copyleft licenses include conditions that would not exist (unless otherwise implemented) if copyright were abolished. In other words, copyleft does not merely neutralize copyright. But I occasionally1 see claims that copyleft merely neutralizes copyright.

A copyleft license which only neutralized copyright would remove all copyright restrictions on only one condition: that works building upon a copyleft licensed work (usually as “adaptations” or “derivative works”, though other scopes are possible) be released under terms granting the same freedoms. Existing copyleft licenses have additional conditions. Here is a summary of some of those added by the most important (and some not so important) copyleft licenses:

License Provide modifiable form2 Limit DRM Attribution Notify upstream3
BY-SA y y
FDL y y y
EPL y y
EUPL y y
GPL (including LGPL and AGPL) y y
MPL (and derivatives) y y
ODbL y y y
OSL y y
OHL y y y

I’ve read each of the above licenses at some point, but could easily misremember or misunderstand; please correct me.

There’s a lot more variation among them than is captured above, including how each condition is implemented. But my point is just that these coarse conditions would not be present in a purely copyright neutralizing license. To answer two obvious objections: “attribution”4 in each license above goes beyond the bare minimum license notice that would be required to satisfy the condition of releasing under sufficient terms, and “limit DRM” refers only to conditions prohibiting DRM or requiring parallel distribution (which all of those requiring modifiable form do in a way, indirectly; I’ve only called out those that explicitly mention DRM), not permissions5 granted to circumvent.

I’m not sure there’s a source for the idea that copyleft only neutralizes copyright. Probably it is just an intuitive reading of the term that has been arrived at independently many times. The English Wikipedia article on copyleft doesn’t mention it, and probably more to the point, none of the main FSF articles on copyleft do either. The last includes the following:

Proprietary software developers use copyright to take away the users’ freedom; we use copyright to guarantee their freedom. That’s why we reverse the name, changing “copyright” into “copyleft.”

Copyleft is a way of using of the copyright on the program. It doesn’t mean abandoning the copyright; in fact, doing so would make copyleft impossible. The “left” in “copyleft” is not a reference to the verb “to leave”—only to the direction which is the inverse of “right”.

Copyleft is a general concept, and you can’t use a general concept directly; you can only use a specific implementation of the concept.

This is very clear — the point of copyleft is to promote and protect (“guarantee” is an exaggeration) users’ freedom, and that includes their access to source. The major reason I like to frame copyleft as regulation6 is that if access to source is important to software freedom (or otherwise socially valuable), it probably makes sense to look for additional regulatory mechanisms which might (and appreciate ones that do) contribute to promoting and protecting access to source, as well as other aspects of software freedom. Such mechanisms mostly aren’t/wouldn’t be “copyleft” (though at this point, some of them would simply mandate a copyleft license), but the point is not a relationship with copyright, but promoting and protecting software freedom.

If software freedom is important, surely it makes sense to look for additional mechanisms to promote and protect it. As others have said, licenses are difficult to enforce and/or few people are interested in doing it, and copyleft can be made irrelevant through independent non-copyleft implementation, given enough desire and resources (which the largest corporations have), not to mention the vast universe of cases in which there is no free software alternative, copyleft or not. I leave description and speculation about such mechanisms for a future post.

1For example, yesterday Rob Myers wrote:

Copyleft is a general neutralization of copyright (rather than a local neutralization, like permissive licences). Nothing more.

Only slightly more ambiguously, late last year Jason Self wrote:

Copyright gives power to restrict what other people can do with their own copies of things. Copyleft is about restoring those rights: It takes this oppressive law, which normally restricts people and takes their rights away, and make those rights inalienable.

Well said…but not exactly. I point these out merely as examples, not to make fun of Myers, who is one of the sharpest libre thinkers there is, or Self, who as far as I can tell is an excellent free software advocate.

2Note it is possible to have copyleft that doesn’t require source. As far as I know, such only exists in licenses not intended for software. But I think source for non-software is very interesting. The other obvious permutations — a copyleft license for software that does not include a source requirement, and a non-copyleft license that does include a source requirement, are curiosities that do not seem to exist at all — probably for the better, although one can imagine questionable use cases (e.g., self-modifying object code and transparency as only objective).

3As I’ve mentioned previously, requiring upstream notification likely makes the TAPR OHL non-free/open. But I list the license and condition here because it is an interesting regulation.

4One could further object that one ought to consider so-called “economic” and “moral” aspects of copyright separately, and only neutralize the former; attribution perhaps being the best known and least problematic of the former.

5Although existing copyleft licenses don’t only neutralize restrictions (one that did would be another curiosity; perhaps the License Art Libre/Free Art License currently comes closest), it is important that copyright and other restrictions are adequately neutralized — in particular modern public software licenses include patent grants, and GPLv3 permits DRM circumvention (made illegal by some copyright-related legislation such as the DMCA), while version 4.0 of CC licenses will probably grant permissions around “sui generis” restrictions on databases. Such neutralization is only counter-regulatory (if one sees copyright as a regulation), not pro-regulatory, as are source and other conditions discussed above.

6Regulation in the broadest sense, including at a minimum typical “government” and “market” regulation, as I’ve said before. By the way, it could be said that those who advocate only permissive licenses are anti-regulatory, and I imagine that if lots of people thought about copyleft as regulation, this claim would be made — but it would be a problematic claim, as permissive licenses don’t do much (or only do so “locally”, as Myers obliquely put it in the quote above) against the background regulation of copyright restrictions.

Wincing at surveillance, the security state, medical devices, and free software

Friday, January 27th, 2012

Last week I saw a play version of . I winced throughout, perhaps due to over-familiarity with the topics and locale, and there are just so many ways a story with its characteristics (heavy handed politics that I agree with, written for adolescents, set in near future) can embarrass me. Had there been any room for the nuance of apathy, a few bars of Saturday Night Holocaust would’ve been great to work into the play. But the acting and other stuff making up the play seemed well done, I’m glad that people are trying to make art about issues that I care about, and I’d recommend seeing the play (extended to Feb 25 in San Francisco) for anyone less sensitive.

I also just watched Karen Sandler’s LCA talk, which I can’t recommend highly enough. It is more expansive than a short talk she gave last year at OSCON based on her paper Killed by Code: Software Transparency in Implantable Medical Devices.

I frequently complain that free/libre/open software and nearby aren’t taken seriously as being important to a free and otherwise good society and that advocates have completely failed to demonstrate this importance. Well, much more is needed, but the above talks give me hope, and Sandler in front of as many people as possible would be great progress.

Someday knowing the ins and outs of copyright will be like knowing the intricate rules of internal passports in Communist East Germany

Thursday, January 26th, 2012

Said Evan Prodromou, who I keep quoting.

I repeat Evan as a reminder and apology. I’ve blogged many times about copyright licenses in the past, and will have a few detailed posts on the subject soon in preparation for a short talk at FOSDEM.

Given current malgovernance of the intellectual commons, public copyright licenses are important for freedom. They’re probably also important trials for post-copyright regulation (meant in the broadest sense, including at least “market” and “government” regulatory mechanisms), eg of ability to inspect and modify complete and corresponding source.

At the same time, the totemic and contentious role copyright licenses (and sometimes assignment or contributor agreements, and sometimes covering related wrongs and patents) play in free/libre/open works, projects, and communities often seems an unfortunate misdirection of energy at best, and probably looks utterly ridiculous to casual observers. I suspect copyright also takes at least some deserved limelight, and perhaps much more, from other aspects of governance, plain old getting things done, and activism around other issues (regarding the first, some good recent writings includes those by Simon Phipps and Bradley Kuhn, but the prominence of copyright arrangements therein reinforces my point). But this all amounts to an additional reason it is important to get the details of public copyright licenses right, in particular compatibility between them where it can be achieved — so as to minimize the amount of time and energy projects put into considering and arguing about the options.

Obviously the energy put into public licenses is utterly insignificant against that spent on other copyright/patent/trademark complex activities. But I’m not going to write about that in the near future, so it isn’t part of my apology and rationalization.

Someday I hope that knowing the ins and outs of both Internal Passports of the mind and international passports will be like knowing the rules of internal passports in Communist East Germany (presumably intricate; I did not look for details, but hopefully they exist not many hops from a Wikipedia article on Eastern Bloc emigration and defection).

Counterfeiting against inequality and addiction

Tuesday, January 24th, 2012

When I read articles blaming advertisers for the bad behavior of (especially relatively poor) people who want advertised products (quoted material below mostly from linked story) I tend to think:

  1. To the extent “corporate pushers have made us addicts”:
    1. As a letter-to-the-editor from Michael Slembrouck says “You can ask your dealer to stop selling you dope because you have a problem, but if you keep giving him money he’s going to keep giving you the same dope.”
    2. It seems to me that being able to ignore/forgo potentially addictive messages/products is an important survival skill.
  2. More [free] speech (broadly speaking) is the answer:
    1. What is the hidden role of patent and trademark? In other words, what is the role of lack of cheap copies? Cheap copies would reduce incentive to advertise in the first place, and also reduce “the dreary feeling many get from walking by store windows knowing society offers no legal path for them to ever possess what is inside.” Is bad behavior supposedly related to lack of access to fashionable items reduced where counterfeit goods are plentiful? That’s a serious question, though of course answers will largely be swamped by cross cultural confounders.
    2. Regarding addiction and other adverse things characterized as such, I still think one of the best messages trusted figures (friends, ministers, the famous, etc) can convey is how totally unacceptable it is to follow spam — and I consider advertising to include a continuum from spam to useful information, with that critiqued as solely “manufacturing desire” tending toward the spam end.
    3. If advertising is so powerful, why not use it more for counter-addiction-and-other-adverse-messages? In the link above, I wished for the Ad Council to run a don’t-click-on-spam campaign. Maybe too close to its membership for comfort. Fortunately, access to media has improved greatly, including access to organizing for access to media. Hopefully things like LoudSauce (crowdfunded advertising) will help make that happen.

As indicated by the title, I mostly blogged this for 2(a). I think the contribution of intellectual protectionism to inequality is woefully underexplored and underexploited. I made a new category on this blog, Inequality Promotion, to remind me to attempt further exploration and exploitation.

Web Data Common[s] Crawl Attribution Metadata

Monday, January 23rd, 2012

Via I see Web Data Commons which has “extracted structured data out of 1% of the currently available Common Crawl corpus dating October 2010”. WDC publishes the extracted data as N-Quads (the fourth item denotes the immediate provenance of each subject/predictate/object triple — the URL the triple was extracted from).

I thought it would be easy and fun to run some queries on the WDC dataset to get an idea of how annotations associated with Creative Commons licensing are used. Notes below on exactly what I did. The biggest limitation is that the license statement itself is not part of the dataset — not as xhv:license in the RDFa portion, and for some reason rel=license microformat has zero records. But cc:attributionName, cc:attributionURL, and cc:morePermissions are present in the RDFa part, as are some Dublin Core properties that the Creative Commons license chooser asks for (I only looked at dc:source) but are probably widely used in other contexts as well.

Dataset URLs Distinct objects
Common Crawl 2010 corpus 5,000,000,000a
1% sampled by WDC ~50,000,000
with RDFa 158,184b
with a cc: property 26,245c
cc:attributionName 24,942d 990e
cc:attributionURL 25,082f 3,392g
dc:source 7,235h 574i
cc:morePermissions 4,791j 253k
cc:attributionURL = dc:source 5,421l
cc:attributionURL = cc:morePermissions 1,880m
cc:attributionURL = subject 203n

Some quick takeaways:

  • Low ratio of distinct attributionURLs probably indicates HTML from license chooser deployed without any parameterization. Often the subject or current page will be the most useful attributionURL (but 203 above would probably be much higher with canonicalization). Note all of the CC licenses require that such a URL refer to the copyright notice or licensing information for the Work. Unless one has set up a side-wide license notice somewhere, a static URL is probably not the right thing to request in terms of requiring licensees to provide an attribution link; nor is a non-specific attribution link as useful to readers as a direct link to the work in question. As (and if) support for attribution metadata gets built into Creative Commons-aware CMSes, the ratio of distinct attributionURLs ought increase.
  • 79% of subjects with both dc:source and cc:attributionURL (6,836o) have the same values for both properties. This probably means people are merely entering their URL into every form field requesting a URL without thinking, not self-remixing.
  • 47% of subjects with both cc:morePermissions and cc:attributionURL (3,977p) have the same values for both properties. Unclear why this ratio is so much lower than previous; it ought be higher, as often same value for both makes sense. Unsurprising that cc:morePermissions least provided property; in my experience few people understand it.

I did not look at the provenance item at all. It’d be interesting to see what kind of assertions are being made across authority boundaries (e.g. a page on makes a statements with an URI as the subject) and when to discard such. I barely looked directly at the raw data at all; just enough to feel that my aggregate numbers could possibly be accurate. More could probably be gained by inspecting smaller samples in detail, informing other aggregate queries.

I look forward to future extracts. Thanks indirectly to Common Crawl for providing the crawl!

Please point out any egregious mistakes made below…

# a I don't really know if the October 2010 corpus is the
# entire 5 billion Common Crawl corpus

# download RDFa extract from Web Data Commons
wget -c

# Matches number stated at
wc -l ccrdf.html-rdfa.nq

# Includes easy to use no-server triplestore
apt-get install redland-utils

# sanity check
grep '<>' ccrdf.html-rdfa.nq |wc -l

# Import rejects a number of triples for syntax errors
rdfproc xyz parse ccrdf.html-rdfa.nq nquads

# d Perhaps syntax errors explains fewer triples than above grep might
# indicate, but close enough
rdfproc xyz query sparql - 'select ?o where { ?s <> ?o}' |wc -l

# These replicated below with 4store because...
rdfproc xyz query sparql - 'select distinct ?o where { ?s <> ?o}' |wc -l
rdfproc xyz query sparql - 'select ?o where { ?s <> ?o}' |wc -l
rdfproc xyz query sparql - 'select distinct ?o where { ?s <> ?o}' |wc -l
rdfproc xyz query sparql - 'select ?o where { ?o <> ?o }' |wc -l
rdfproc xyz query sparql - 'select ?o where { ?s <> ?o}' |wc -l
rdfproc xyz query sparql - 'select distinct ?o where { ?s <> ?o}' |wc -l
rdfproc xyz query sparql - 'select ?o where { ?o <> ?o }' |wc -l

# ...this query takes forever, hours, and I have no idea why
rdfproc xyz query sparql - 'select ?s, ?o where { ?s <> ?o ; <> ?o }'

# 4store has a server, but is lightweight
apt-get install 4store

# 4store couldn't import with syntax errors, so export good triples from
# previous store first
rdfproc xyz serialize > ccrdf.export-rdfa.rdf

# import into 4store
curl -T ccrdf.export-rdfa.rdf 'http://localhost:8080/data/wdc'

# egrep is to get rid of headers and status output prefixed by ? or #
4s-query wdc -s '-1' -f text 'select ?s, ?o where { ?s <> ?o}' |egrep -v '^[\?\#]' |wc -l

4s-query wdc -s '-1' -f text 'select ?s, ?o where { ?s <> ?o}' |egrep -v '^[\?\#]' |wc -l

4s-query wdc -s '-1' -f text 'select ?s, ?o where { ?s <> ?o}' |egrep -v '^[\?\#]' |wc -l

#Of course please use instead.
#Should be more widely deployed soon.
4s-query wdc -s '-1' -f text 'select ?s, ?o where { ?s <> ?o}' |egrep -v '^[\?\#]' |wc -l

4s-query wdc -s '-1' -f text 'select ?s, ?o where { ?s <> ?o}' |egrep -v '^[\?\#]' |wc -l

4s-query wdc -s '-1' -f text 'select distinct ?o where { ?s <> ?o}' |egrep -v '^[\?\#]' |wc -l

4s-query wdc -s '-1' -f text 'select distinct ?o where { ?s <> ?o}' |egrep -v '^[\?\#]' |wc -l

4s-query wdc -s '-1' -f text 'select distinct ?o where { ?s <> ?o}' |egrep -v '^[\?\#]' |wc -l

4s-query wdc -s '-1' -f text 'select distinct ?o where { ?s <> ?o}' |egrep -v '^[\?\#]' |wc -l

4s-query wdc -s '-1' -f text 'select ?o where { ?o <> ?o}' |egrep -v '^[\?\#]' |wc -l

4s-query wdc -s '-1' -f text 'select ?o where { ?o <> ?o}' |egrep -v '^[\?\#]' |wc -l

4s-query wdc -s '-1' -f text 'select ?o where { ?o <> ?o}' |egrep -v '^[\?\#]' |wc -l

4s-query wdc -s '-1' -f text 'select ?s, ?o where { ?s <> ?o ; <> ?o }' |egrep -v '^[\?\#]' |wc -l

4s-query wdc -s '-1' -f text 'select distinct ?o where { ?s <> ?o ; <> ?o }' |egrep -v '^[\?\#]' |wc -l

4s-query wdc -s '-1' -f text 'select ?o where { ?o <> ?o ; <> ?o }' |egrep -v '^[\?\#]' |wc -l

4s-query wdc -s '-1' -f text 'select ?s, ?o where { ?s <> ?o ; <> ?o }' |egrep -v '^[\?\#]' |wc -l

4s-query wdc -s '-1' -f text 'select distinct ?o where { ?s <> ?o ; <> ?o }' |egrep -v '^[\?\#]' |wc -l

4s-query wdc -s '-1' -f text 'select ?o where { ?o <> ?o ; <> ?o }' |egrep -v '^[\?\#]' |wc -l

4s-query wdc -s '-1' -f text 'select ?s, ?o, ?n where { ?s <> ?o ; <> ?n }' |egrep -v '^[\?\#]' |wc -l

4s-query wdc -s '-1' -f text 'select ?s, ?o, ?n where { ?s <> ?o ; <> ?n }' |egrep -v '^[\?\#]' |wc -l

4s-query wdc -s '-1' -f text 'select ?s, ?o, ?n, ?m where { ?s <> ?o ; <> ?n ; <> ?m }' |egrep -v '^[\?\#]' |wc -l
4s-query wdc -s '-1' -f text 'select ?s, ?o where { ?s <> ?o ; <> ?o ; <> ?o }' |egrep -v '^[\?\#]' |wc -l

4s-query wdc -s '-1' -f text 'select distinct ?s where { { ?s <> ?o } UNION { ?s <> ?n } UNION { ?s <> ?m }  }' |egrep -v '^[\?\#]' |wc -l

4s-query wdc -s '-1' -f text 'select distinct ?s where { { ?s <> ?o } UNION { ?s <> ?n }}' |egrep -v '^[\?\#]' |wc -l

#b note subjects not the same as pages data extracted from (158,184)
4s-query wdc -s '-1' -f text 'select distinct ?s where { ?s ?p ?o }'  |egrep -v '^[\?\#]' |wc -l

# Probably less than 1047250 claimed due to syntax errors
4s-query wdc -s '-1' -f text 'select ?s where { ?s ?p ?o }'  |egrep -v '^[\?\#]' |wc -l

4s-query wdc -s '-1' -f text 'select ?s where { ?s ?p ?s }'  |egrep -v '^[\?\#]' |wc -l

4s-query wdc -s '-1' -f text 'select ?s where { ?s ?s ?s }'  |egrep -v '^[\?\#]' |wc -l

4s-query wdc -s '-1' -f text 'select ?s where { ?s ?s ?o }'  |egrep -v '^[\?\#]' |wc -l

Faded sidebar

Sunday, January 22nd, 2012

I’ve occasionally mucked with my blog’s theme with a general aim of removing superfluous crap that makes reading posts harder. A couple weeks ago something Parker Higgins wrote inspired me to try a little harder:

How awesome is @niemanlab’s “faded” sidebar? And zen mode? That’s a site that cares about its readers.

You can view an archived version of the Nieman Lab site should their design change. The top of the sidebar isn’t faded, but overall I think the fading there is makes it a little easier to concentrate on a post without switching to “zen” mode (removing all navigation).

For my theme, I made the sidebar always faded except when hovered over, and float to the right as far away from the main content area as possible. All “header” content is in the sidebar so that there’s nothing preceding a post’s title.

I intended to remove anything hardcoded* for my blog, anything I don’t understand or not used, and anything that doesn’t validate, but I didn’t get very far on any of those. I doubt this will be useful to anyone, but patches welcome.

*Yes it is also a little ironic I’ve never bothered to published modified source used to run this blog until now.

SOPA/PIPA protests on-message or artless?

Wednesday, January 18th, 2012

Go Internet! Instantly message the U.S. Congress! (Tell them to kill the so-called Research Works Act too!)

Another, much bigger, tiresome rearguard action. I’m impressed by protesters’ nearly universal and exclusive focus on encouraging readers to contact U.S. Congresspeople. I hope it works. SOPA and PIPA really, really deserve to die.

But the protest also bums me out.

1) Self-censorship (in the case of sites completely blacked out, as opposed to those prominently displaying anti-SOPA messages) is not the Internet at its best. If that claim weren’t totally ridiculous, the net wouldn’t be worth defending. It isn’t even the net at its political best — that would be creating systems which disrupt and obviate power — long term offensives, not short-term defenses.

2) Near exclusive focus on supplication before 535 [Update: 536] ultra-powerful individuals is kinda disgusting. But it needs to be done, as effectively as possible.

3) I haven’t looked at a huge number of sites, but I haven’t seen much creativity in the protest. Next time it would be fun to see an appropriate site (Wikipedia? Internet Archive?) take what Flickr has done and add bidding for the “right” to darken particular articles or media as a fundraiser. Art would be nice too — I’d love to hear about anything really great (and preferably libre) from this round.

4) While some prominent bloggers have made the point that “piracy” is not a legitimate problem, overwhelmingly the protest has stuck to defense — SOPA and PIPA would do bad things to the net, and wouldn’t “work” anyway. Google goes much further, saying “End Piracy, Not Liberty” and “Fighting online piracy is important.” Not possible, wrong, and gives away the farm.

5) Nobody making the point that everyone can help with long-term offensives which will ultimately stop ratcheting protectionism, if it is to be stopped. Well, this nobody has attempted:

[I]magine a world in which most software and culture are free as in freedom. Software, culture, and innovation would be abundant, there would be plenty of money in it (just not based on threat of censorship), and there would be no constituency for attacking the Internet. (Well, apart from dictatorships and militarized law enforcement of supposed democracies; that’s a fight intertwined with SOPA, but those aren’t the primary constituencies for the bill.) Now, world dominationliberation by free software and culture isn’t feasible now. But every little bit helps reduce the constituency that wishes to attack the Internet to possibly protect their censorship-based revenue streams, and to increase the constituency whose desire to protect the Internet is perfectly aligned with their business interests and personal expression.

I’d hope that at least some messages tested convey not only the threat SOPA poses to Wikimedia, but the long-term threat the Wikimedia movement poses to censorship.


Bad legislation needs to be stopped now, but over the long term, we won’t stop getting new bad legislation until policymakers see broad support and amazing results from culture and other forms of knowledge that work with the Internet, rather than against it. Each work or project released under a CC license signals such support, and is an input for such results.


Finally, remember that CC is crucial to keeping the Internet non-broken in the long term. The more free culture is, the less culture has an allergy to and deathwish for the Internet.

Of the five items I list above, the first three are admittedly peevish. Four and five represent not so much problems with the current protest as they do severe deficiencies in movements for intellectual freedom. Actually they are flipsides of the same deficiency: lack of compelling explanation that intellectual freedom, however constructed and protected, really matters, really works, and is really for the good. If such were well enough researched and explained so as to become conventional wisdom, rather than contentious and seemingly radical, net freedom activists could act much more proactively, provocatively, and powerfully, rather than as they do today: with supplication and genuflection.

I am not at all well read, but my weak understanding is that the withdrawal of economists from studying intellectual protectionism in the late 1800s was a great tragedy. To begin the encourage rectification of that century plus of relative neglect, today is a good day to start reading Against Intellectual Monopoly.

In the meantime, the actual and optimal counterfactual drift further apart, without any help from SOPA and PIPA.

MLK’s reliance on “remix” is well-documented; without a strong public domain, where will that leave the next MLK?

Monday, January 16th, 2012

I copied and slightly reworded the title of this post from Joshua Judson Rosen; the body draws heavily from a conversation started by Rosen. Today is .

People have noted for years that the King estate does their best to lock up and profit from his works. I even had a post that touched on this indirectly in 2004 (it appears that since then Eyes on the Prize has been re-aired and DVDs sold, result of an $850,000 grant to acquire the necessary licenses). But the King estate is simply doing what most heirs would do with an uninsured creative legacy. If societal governance of the knowledge commons were anything close to reasonable, all King’s works would now be in the public domain.

Perhaps ironically (but only if one cannot distinguish between King and his estate, and between citation and copyright restrictions), in his academic writing King was a very poor provider of intellectual provenance — in that context, he plagiarized:

I might conclude that none of this was fatal for King’s career as a preacher and powerful public speaker. Had he pursued an academic career, his heavy reliance on the authorities, often without citing them, could have been fatal. But in preaching, perhaps even in most public speech, genuine originality is more often fatal. A congregation, even a public audience, expects to hear and responds to the word once delivered to the fathers [and mothers]. It is the familiar that resonates with us. The original sounds alien and tends to alienate. The familiar, especially the familiar that appeals to the best in us, is what we long to hear. So,”I Have A Dream” was no new vision; it was a recension, quite literally, of his own “An American Dream.” And that dream, as we know, already had a long history. King’s vision was, perhaps, more inclusive than earlier dreams, but it appealed to us because we already believed it.

Indeed, far more interesting is the ubiquity of borrowing in King’s profession. On preachers borrowing liberally from each other and any other available source, listen to this week’s installment of WYNC On the Media, Dr. Martin Luther King Jr. and the Public Imagination (about 15 minutes).

I did not know this about sermons, but upon hearing, it is completely unsurprising. But now I have questions:

  • Do preachers now continue to borrow as heavily and as liberally as they did in King’s day and before? What about public speakers generally?
  • Should preaching be added to magic, fashion, food, and comedy as examples of professions relying heavily on borrowing, and not so much on censorship?
  • The development of King’s speeches, and of preacher’s sermons* generally, highlight that in some contexts borrowing without citation is valuable, nevermind that it would be called plagiarism in other contexts. Should schools teach how to be a great artist in some classes? Doing so might help their anti-plagiarism rhetoric sink in better, as it would then appear contextually appropriate, rather than fanatic.

* Daniel Dennet approvingly says that TED talks are secular sermons, pinpointing another reason I find them annoying (for being sermons, not for being secular). But I don’t want to censor any sermons.

Life in the kind of bleak future of HTML data

Thursday, January 12th, 2012

Evan Prodromou wrote in 2006:

I think that if and the RDFa effort continue moving forward without coordinating their effort, the future looks kind of bleak.

I blogged about this at the time (and forgot and reblogged five months later). I recalled this upon reading a draft HTML Data Guide announced today, and trying to think of a tl;dr summary to at least microblog.

That’s difficult. The guide is intended to help publishers and consumers of HTML data choose among three syntaxes (all mostly focused on annotating data inline with HTML meant for display) and a variety of vocabularies, with heavy dependencies between the two. Since 2006, people working on microformats and RDFa have done much to address the faults of those specifications — microformats-2 allows for generic (rather than per-format) parsing, and RDFa 1.1 made some changes to make namespaces less needed, less ugly when needed, and usable in HTML5, and specifies a lite subset. In 2009 a third syntax/model, microdata, was launched, and then in 2011 chosen as the syntax for (which subsequently announced it would also support RDFa 1.1 Lite).

I find the added existence of microdata and suboptimal (optimal might be something like microformats process for some super widely useful vocabularies, with a relatively simple syntax but permitting generic parsing and distributed extensibility; very much like what Prodromou wanted in 2006), but when is anything optimal? I also wonder how much credit microdata ought get for microformats-2 and RDFa 1.1, due to providing competitive pressure? And for invigorating metadata-enhanced web-scale search and vocabulary design (for example, the last related thing I was involved in, at the beginning anyway)?

Hope springs eternal for getting these different but overlapping technologies and communities to play well together. I haven’t followed closely in a long time, but I gather that Jeni Tennison is one of the main people working on that, and you should really subscribe to her blog if you care. That leaves us back at the HTML Data Guide, of which Tennison is the editor.

My not-really-a-summary:

  1. Delay making any decisions about HTML data; you probably don’t want it anyway (metadata is usually a cost center), and things will probably be more clear when you’re forced to check back due to…
  2. If someone wants data from you as annotated HTML, or you need data from someone, and this makes business sense, do whatever the other party has already decided on, or better yet implemented (assuming their decision isn’t nonsensical; but if so why are you doing business with them?)
  3. Use a validator to test your data in whatever format. An earlier wiki version of some of the guide materials includes links to validators. In my book, Any23 is cute.

(Yes, CC REL needs updating to reflect some of these developments, RDFa 1.1 at the least. Some license vocabulary work done by SPDX should also be looked at.)

Penumbra of Provenance

Thursday, January 12th, 2012


Yesterday the W3C’s Provenance Working Group posted a call for feedback on a family of documents members of that group have been working on. Provenance is an important issue for the info commons, as I’ve sketched elsewhere. I hope some people quickly flesh out examples of application of the draft ontology to practical use cases.

Intellectual Provenance

Apart from some degree of necessity for current functioning of some info commons (obviously where some certainty about freedoms from copyright restriction is needed, but conceivably even moreso to outgrow copyright industries), provenance can also play an important symbolic role. Unlike “intellectual property”, intellectual provenance is of keen interest to both readers and writers. Furthermore, copyright and other restrictions make provenance harder, in both practical (barriers to curation) and attitudinal — the primacy of “rights” (as in rents, and grab all that your power allows) deprecates the actual intellectual provenance of things.

Postmodern Provenance

The umbra of provenance seems infinite. As we preserve scratches of information (or not) incomparably vast amounts disappear. But why should we only care for what we can record that led to current configurations? Consider independent invention and convergent evolution. Who cares what configurations and events led to current configurations: what are the recorded configurations that could have led to the current configuration, what are all of the configurations that could have led to the current configuration; what configurations are most similar (including history, or not) to a configuration in question?


In order to highlight the exposure of provenance information on the internet and provide added impetus for organizations to expose in a way that can efficiently be found and accessed, I am exploring the possibility of a .prov TLD.