Post Wikipedia

You against abominable people

Saturday, December 16th, 2006

On Time magazine’s person of the year, Chris F. Masse writes:

TIME is right on target, but their thematic articles are banal and not engaging. Complete crap.

Agreed on both points.

I am happy to see that in praising dispersed contributors to the net Time took the opportunity to bash “great men” (emphasis added):

The “Great Man” theory of history is usually attributed to the Scottish philosopher Thomas Carlyle, who wrote that “the history of the world is but the biography of great men.” He believed that it is the few, the powerful and the famous who shape our collective destiny as a species. That theory took a serious beating this year.

To be sure, there are individuals we could blame for the many painful and disturbing things that happened in 2006.

Yes, because it is only possible to be “great” through doing great harm. Time:

But look at 2006 through a different lens and you’ll see another story, one that isn’t about conflict or great men. It’s a story about community and collaboration on a scale never seen before. It’s about the cosmic compendium of knowledge Wikipedia and the million-channel people’s network YouTube and the online metropolis MySpace. It’s about the many wresting power from the few and helping one another for nothing and how that will not only change the world, but also change the way the world changes.

Yes, it is the anti-authoritarian age. Time:

But 2006 gave us some ideas. This is an opportunity to build a new kind of international understanding, not politician to politician, great man to great man, but citizen to citizen, person to person.

Even more of a stretch, I’ll take opportunity to link in another of my pet peeves.

The short person of the year article also references directly or indirectly Wikipedia, blogs, open source, peer production, and free culture.

I occasionally wonder what it would feel like to read a mass media article and more or less think “right on!” Now that I have encountered such an article, should I enjoy it, reconsider what makes me agree, considering the source, or reconsider my assumption that Time and similar are emotionalized diarrhea magazines rather than news magazines, just like TV?

Most important software project

Sunday, December 10th, 2006

I don’t have a whole lot more to say about Semantic MediaWiki than I said over a year ago. The summary is to turn the universal encyclopedia into the universal database while simultaneously improving the quality of the encyclopedia.

Flip through Denny Vrandecic’s recent presentations on Semantic MediaWiki (a smaller pdf version not directly linked in that post). There’s some technical content, but flip past that and you should still get the idea and be very excited.

I predict that Semantic MediaWiki also will be the killer application for the Semantic Web that so many have been skeptical of.

Yaron Koren also says that Semantic MediaWiki is “the technology that will revolutionize the web” and has built DiscourseDB using the software. DiscourseDB catalogs political opinion pieces. Koren’s post on aggregating analysis using DiscouseDB. Unsurprisingly this analysis shows the political experts making bad calls.

Koren also has created Betocracy, another play money prediction market where users create claims. It looks like Betocracy is going for a blog-like interface, but I can’t say more as registration obtains a database error.

One prediction market and Semantic MediaWiki connection is that making data more accessible makes prediction markets more feasible. Obtaining data necessary to create and judge prediction market contracts is expensive.

On that note Swivel also looks interesting. Some have called it data porn. Speaking of porn, see Strange Maps.

Embrace the public domain

Sunday, November 26th, 2006

Peter Saint-André published his promised essay Who’s Afraid of the Public Domain?. It’s fairly short and covers a fair amount of ground. I highly recommend it. Two of my favorite paragraphs:

Yet the public domain is nothing to fear. The works of Homer, Sophocles, Confucius, Plato, Aristotle, Dante, Shakespeare, Galileo, Newton, Bach, Beethoven, and other creative giants are all in the public domain. Their works are revered, not reviled. Sure, the fact that the Fifth Symphony is in the public domain enabled Chuck Berry to write “Roll Over Beethoven”; but far from defiling Beethoven’s good name, Berry’s song indicates the level of respect that we still have for Beethoven’s works. I bet you’d love it for your works to be similarly known and respected two hundred years from now (what creative individual wouldn’t?).

Because of that corporate influence over the copyright laws (at least in America), you face a choice: accept that your works will never pass into the public domain, or willingly place them there. You can place your works into the public domain immediately (as I have done) or specify in your will that your works shall pass into the public domain upon your death. I find it simpler to place my works in the public domain as soon as I publish them, but only you can decide the best course of action for your own works.

I would add that if you don’t make an effort to free your works, they will disappear, and your creative legacy with them.

One item of fear, uncertainty and doubt spread about the public domain (that would have been out of scope for Saint-André’s essay to address) is that it may not be possible legally to affirmatively place a work into the public domain (see Wikipedia:Granting work into the public domain for some discussion), especially outside the U.S. jurisdiction.

I believe wikipedians attempt to work around this with statements like the one currently in Template:Userpd (emphasis added):

I, the author, hereby agree to waive all claim of copyright (economic and moral) in all content contributed by me, the user, and immediately place any and all contributions by me into the public domain; I grant anyone the right to use my work for any purpose, without any conditions, to be changed or destroyed in any manner whatsoever without any attribution or notice to the creator.

Or one of many specialized “public domain or release all rights legally possible” templates like this one:

This image really is in the Public domain as its author has released it into the public domain. If this is not possible, the author grants anyone the right to use this work for any purpose, without any conditions, unless such conditions are required by law.

I have no idea what a court would make of these, but presumably someone has or will inform the Wikipedia community if they are bogus.

If you aren’t ready to fully embrace the public domain, Creative Commons offers several gradations of partial measures (as well as a form to help you dedicate work to the public domain).

Check out all of Saint-André’s posts about the public domain and digg his essay.

Defeatist dreaming

Sunday, October 22nd, 2006

Jimmy Wales of Wikipedia says to dream a little:

Imagine there existed a budget of $100 million to purchase copyrights to be made available under a free license. What would you like to see purchased and released under a free license?

I was recently asked this question by someone who is potentially in a position to make this happen, and he wanted to know what we need, what we dream of, that we can’t accomplish on our own, or that we would expect to take a long time to accomplish on our own.

One shouldn’t look a gift horse in the mouth and this could do a great deal of good, particularly if the conditions “can’t accomplish on our own…” are stringently adhered to.

However, this is a blog and I’m going to complain.

Don’t fork over money to the copyright industry! This is defeatist and exhibits static world thinking.

$100 million could fund a huge amount of new free content, free software, free infrastructure and supporting institutions, begetting more of the same.

But if I were a donor with $100 million to give I’d try really hard to quantify my goals and predict the most impactful spending toward those goals. I’ll just repeat a paragraph from last December 30, Outsourcing charity … to Wikipedia:

Wikipedia chief considers taking ads (via Boing Boing) says that at current traffic levels, Wikipedia could generate hundreds of millions of dollars a year by running ads. There are strong objections to running ads from the community, but that is a staggering number for a tiny nonprofit, an annual amount that would be surpassed only by the wealthiest foundations. It could fund a staggering Wikimedia Foundation bureaucracy, or it could fund additional free knowledge projects. Wikipedia founder Jimmy Wales has asked what will be free. Would an annual hundred million dollar budget increase the odds of those predictions? One way to find out before actually trying.

Via Boing Boing via /.

Friends don’t let friends click spam

Thursday, September 7th, 2006

Doc Searls unfortunately decided the other day that offering his blog under a relatively restrictive Creative Commons NonCommercial license instead of placing its contents in the public domain is chemo for splogs (spam blogs). I doubt that, strongly. Spam bloggers don’t care about copyright. They’ll take “all rights reserved” material, that which only limits commercial use, and stuff in the public domain equally. Often they combine tiny snippets from many sources, probably triggering copyright for none of them.

A couple examples found while looking at people who had mentioned Searls’ post: all rights reserved material splogged, commenter here says “My blog has been licensed with the CC BY-NC-SA 2.5 for a while now, and sploggers repost my content all the time.” A couple anecdotes prove nothing, but I’d be surprised to find that sploggers are, for example, using CC-enabled search to find content they can legally re-splog. I hope someone tries to figure out what characteristics make blog content more likely to be used in splogs and whether licensing is one of them. I’d get some satisfaction from either answer.

Though Searls’ license change was motived by a desire “to come up with new forms of treatment. Ones that don’t just come from Google and Yahoo. Ones that come from us” I do think blog spam is primarily the search engines’ problem to solve. Search results that don’t contain splogs are more valuable to searchers than spam-ridden results. Sites that cannot be found through search effectively don’t exist. That’s almost all there is to it.

Google in particular may have mixed incentives (they want people to click on their syndicated ads wherever the ads appear), but others don’t (Technorati, Microsoft, Ask, etc. — Yahoo! wishes it had Google’s mixed incentives). At least once where spam content seriously impacted the quality of search results Google seems to have solved the problem — at some point in the last year or so I stopped seeing Wikipedia content reposted with ads (an entirely legal practice) in Google search results.

What can people outside the search engines do to fight blog and other spam? Don’t click on it. It seems crazy, but clickfraud aside, real live idiots clicking on and even buying stuff via spam is what keeps spammers in business. Your uncle is probably buying pills from a spammer right now. Educate him.

On a broader scale, why isn’t the , or the blogger equivalent, running an educational campaign teaching people to avoid spam and malware? Some public figure should throw in “dag gammit, don’t click on spam” along with “don’t do drugs.” Ministers too.

Finally, if spam is so easy for (aware) humans to detect (I certainly have a second sense about it), why isn’t human-augmented computation being leveraged? Opportunities abound…

Sex on Wikipedia

Monday, August 28th, 2006

I was surprised for a moment to see that sex and porn related articles are among the most popular on Wikipedia. From a list of the 100 most viewed articles on Wikipedia this month:

Views per day Percent Rank and Title
18500 ± 134% 0.0724% 7.
18000 ± 136% 0.0705% 8.
17000 ± 140% 0.0666% 10.
15500 ± 147% 0.0607% 13.
14500 ± 152% 0.0568% 15.
14500 ± 152% 0.0568% 16.
13000 ± 160% 0.0509% 19.
12000 ± 167% 0.0470% 24.
11500 ± 171% 0.0450% 34.
10500 ± 178% 0.0411% 38.
9000 ± 193% 0.0352% 56.
9000 ± 193% 0.0352% 63.
8500 ± 198% 0.0333% 70.
8000 ± 204% 0.0313% 78.

Of course I shouldn’t have been surprised. Wikipedia content should more or less mirror that of the Internet, media in general, and human thoughts and conversation: sex is big, but not dominant.

I haven’t looked at a normal encyclopedia in ages, but I suspect sex would be seriously underrepresented.

Now I want to know whether Arabic Wikipedia has articles on sex (there is currently no interlanguage link to the Arabic Wikipedia from the English Sex article) and if so are they relatively even more popular than their English counterparts. If not I smell opportunity for Arabic-literate Wikipedians.

Wordcamp and wiki mania

Monday, August 7th, 2006

In lieu of attending maybe the hottest conference ever I did a bit of wiki twiddling this weekend. I submitted a tiny patch (well that was almost two weeks ago — time flies), upgraded a private MediaWiki installation from 1.2.4 to 1.6.8 and a public installation from 1.5.6 to 1.6.8 and worked on a small private extension, adding to some documentation before running into a problem.

1.2.4->1.6.8 was tedious (basically four successive major version upgrades) but trouble-free, as that installation has almost no customization. The 1.5.6->1.6.8 upgrade, although only a single upgrade, took a little fiddling make a custom skin and permissions account for small changes in MediaWiki code (example). I’m not complaining — clean upgrades are hard and the MediaWiki developers have done a great job of making them relatively painless.

Saturday I attended part of , a one day unconference for WordPress users. Up until the day before the tentative schedule looked pretty interesting but it seems lots of lusers signed up so the final schedule didn’t have much meat for developers. Matt Mullenweg’s “State of the Word” and Q&A hit on clean upgrade of highly customized sites from several angles. Some ideas include better and better documented plugin and skin APIs with more metadata and less coupling (e.g., widgets should help many common cases that previously required throwing junk in templates).

Beyond the purely practical, ease of customization and upgrade is important for openness.

Now listening to the Wikimania Wikipedia and the Semantic Web panel…

Freedom Lunches

Monday, June 19th, 2006

Another excellent post from Tim Lee (two of many, just subscribe to TLF):

The oft-repeated (especially by libertarians) view that there’s no such thing as a free lunch is actually nonsense. Civilization abounds in free lunches. Social cooperation produces immense surpluses that have allowed us to become as wealthy as we are. Craigslist is just an extreme example of this phenomenon, because it allows social cooperation on a much greater scale at radically reduced cost. Craigslist creates an enormous amount of surplus value (that is, the benefits to users vastly exceed the infrastructure costs of providing the service). For whatever reason, Craigslist itself has chosen to appropriate only a small portion of that value, leaving the vast majority to its users.

As a political slogan I think of as applying only to transfers though perhaps others apply it overbroadly. Regardless the free lunches of which Lee writes are vastly underappreciated.

The strategy has another advantage too: charging people money for things is expensive. A significant fraction of the cost of a classified ad is the labor required to sell the ads. Even if you could automate that process, it’s still relatively expensive to process a credit card transaction. The same is true of ads. Which means that not only is Craigslist letting its users keep more of the surplus, but its surplus is actually bigger, too!

Charging money also enables taxation and encourages regulation. Replacement of financial transaction mediated production with peer production is a libertarian (of any stripe — substitute exploitation for taxation and regulation if desired) dream come true.

Put another way, that which does not require money is hard to control. I see advocacy of free software, free culture and similar as flowing directly from my desire for free speech and freedom and individual autonomy in general.

In the long run, then, I think sites that pursue a Craigslist-like strategy will come to dominate their categories, because they simply undercut their competition. That sucks if you’re the competitor, but it’s great for the rest of us!

Amen, though Craigslist, Wikipedia and similar do far more than merely undercut their competition.

Wikiforms

Thursday, May 11th, 2006

Brad Templeton writes about overly structured forms, one of my top UI peeves. The inability to copy and paste an IP address into a form with four separate fields has annoyed me, oh, probably hundreds of times. Date widgets annoy me slightly less. Listen to Brad when designing your next form, on the web or off.

The opposite of overly structured forms would be a freeform editing widget populated with unconstrained fields blank or filled with example data, or even a completely empty editing widget with suggested structure documented next to the widget — a wiki editing form. This isn’t as strange as it seems — many forms are distributed as word processor or plain text documents that recipients are expected to fill in by editing directly and return.

I don’t think “wikiforms” are appropriate for many cases where structured forms are used, but it’s useful to think of opposites and I imagine their (and hybrids — think a “rich” wiki editor with autocompletion — I haven’t really, but I imagine this is deja vu for anyone who has used mainframe-style data entry applications) niche could increase.

Ironically the currently number one use of the term wiki forms denotes adding structured forms to wikis!

On a marginally related note the Semantic MediaWiki appears to be making good progress.

What’s your Freedom/China Ratio?

Thursday, January 26th, 2006

Fred Stutzman points out that for the query site:ibiblio.org google.com estimates 7,640,000 hits while google.cn estimates 1,610,000, perhaps explained in part by support of freedom in Tibet.

That’s an impressive ratio of 4.75 pages findable in the relatively free world to 1 page findable in , call it a domain FCR of 4.75.

The domain FCR of a few sites I’m involved with:

bitzi.com: 635,000/210,000 = 3.02
creativecommons.org: 213,000/112,000 = 1.90
gondwanaland.com: 514/540 = 0.95

Five other sites of interest:

archive.org: 5,900,000/427,000 = 13.82
blogspot.com: 24,300,000/15,400,000 = 1.58
ibiblio.org: 5,260,000/ 1,270,000 = 4.14
typepad.com: 13,100,000 /2,850,000 = 4.60
wikipedia.org: 156,000,000/17,000,000 = 9.18

If you are cool your FCR will be very high. The third site above is my personal domain. I am obviously very uncool and so loved by the that they have twisted Google’s arm to make more of my blog posts available in China than are available elsewhere.

The is obviously the coolest site by far amongst those surveyed above, followed by . Very curious that apparently blocks a far higher percentage of pages at the blog service than of those at Google property .

It must be noted that the number of hits any web scale search engine claims are only estimates and these can vary considerably. Presumably Stutzman and I were hitting different Google servers, or perhaps his preferences are set slightly differently (I do have “safe search” off and accept results in any language — the obvious variables). However, the FCR from our results for site:ibiblio.org roughly agree.

Here’s a feeble attempt to draw the ire of PRC censors and increase my FCR:

Bryan Caplan’s Museum of Communism
Human Rights in China
Tiananmen Square Massacre
Government of Tibet in Exile
Tibet Online
民主進步黨 (Taiwan )

Note that I don’t really care about which jurisdiction or jurisdictions , , the or elsewhere fall under. would be preferable to the current arrangement, if the former led to more freedom, which it plausibly could. I post some independence-oriented links simply because I know that questions of territorial control matter deeply to states and my goal here is to increase my FCR.

You should attempt to increase your FCR, too. No doubt you can find better links than I did. If enough people try, the Google.cn index will become less interesting, though by one global method of guestimation, it is already seriously lacking. Add claimed hits for queries for html and -html to get a total index size.

google.com: 4,290,000,000 + 6,010,000,000 = 10,300,000,000
google.cn: 2,370,000,000 + 3,540,000,000 = 5,910,000,000

So the global FCR is 10,300,000,000/5,910,000,000 = 1.74

Although my domain FCR is lame, my name FCR is not bad (query for linksvayer) — 98,200/21,500 = 4.57.

Give me ∞ or give me the death of censorship!

(I eagerly await evidence that my methodology and assumptions are completely wrong.)