Archive for the ‘Wikipedia’ Category

Sex on Wikipedia

Monday, August 28th, 2006

I was surprised for a moment to see that sex and porn related articles are among the most popular on Wikipedia. From a list of the 100 most viewed articles on Wikipedia this month:

Views per day Percent Rank and Title
18500 ± 134% 0.0724% 7.
18000 ± 136% 0.0705% 8.
17000 ± 140% 0.0666% 10.
15500 ± 147% 0.0607% 13.
14500 ± 152% 0.0568% 15.
14500 ± 152% 0.0568% 16.
13000 ± 160% 0.0509% 19.
12000 ± 167% 0.0470% 24.
11500 ± 171% 0.0450% 34.
10500 ± 178% 0.0411% 38.
9000 ± 193% 0.0352% 56.
9000 ± 193% 0.0352% 63.
8500 ± 198% 0.0333% 70.
8000 ± 204% 0.0313% 78.

Of course I shouldn’t have been surprised. Wikipedia content should more or less mirror that of the Internet, media in general, and human thoughts and conversation: sex is big, but not dominant.

I haven’t looked at a normal encyclopedia in ages, but I suspect sex would be seriously underrepresented.

Now I want to know whether Arabic Wikipedia has articles on sex (there is currently no interlanguage link to the Arabic Wikipedia from the English Sex article) and if so are they relatively even more popular than their English counterparts. If not I smell opportunity for Arabic-literate Wikipedians.

Wordcamp and wiki mania

Monday, August 7th, 2006

In lieu of attending maybe the hottest conference ever I did a bit of wiki twiddling this weekend. I submitted a tiny patch (well that was almost two weeks ago — time flies), upgraded a private MediaWiki installation from 1.2.4 to 1.6.8 and a public installation from 1.5.6 to 1.6.8 and worked on a small private extension, adding to some documentation before running into a problem.

1.2.4->1.6.8 was tedious (basically four successive major version upgrades) but trouble-free, as that installation has almost no customization. The 1.5.6->1.6.8 upgrade, although only a single upgrade, took a little fiddling make a custom skin and permissions account for small changes in MediaWiki code (example). I’m not complaining — clean upgrades are hard and the MediaWiki developers have done a great job of making them relatively painless.

Saturday I attended part of , a one day unconference for WordPress users. Up until the day before the tentative schedule looked pretty interesting but it seems lots of lusers signed up so the final schedule didn’t have much meat for developers. Matt Mullenweg’s “State of the Word” and Q&A hit on clean upgrade of highly customized sites from several angles. Some ideas include better and better documented plugin and skin APIs with more metadata and less coupling (e.g., widgets should help many common cases that previously required throwing junk in templates).

Beyond the purely practical, ease of customization and upgrade is important for openness.

Now listening to the Wikimania Wikipedia and the Semantic Web panel…

Freedom Lunches

Monday, June 19th, 2006

Another excellent post from Tim Lee (two of many, just subscribe to TLF):

The oft-repeated (especially by libertarians) view that there’s no such thing as a free lunch is actually nonsense. Civilization abounds in free lunches. Social cooperation produces immense surpluses that have allowed us to become as wealthy as we are. Craigslist is just an extreme example of this phenomenon, because it allows social cooperation on a much greater scale at radically reduced cost. Craigslist creates an enormous amount of surplus value (that is, the benefits to users vastly exceed the infrastructure costs of providing the service). For whatever reason, Craigslist itself has chosen to appropriate only a small portion of that value, leaving the vast majority to its users.

As a political slogan I think of as applying only to transfers though perhaps others apply it overbroadly. Regardless the free lunches of which Lee writes are vastly underappreciated.

The strategy has another advantage too: charging people money for things is expensive. A significant fraction of the cost of a classified ad is the labor required to sell the ads. Even if you could automate that process, it’s still relatively expensive to process a credit card transaction. The same is true of ads. Which means that not only is Craigslist letting its users keep more of the surplus, but its surplus is actually bigger, too!

Charging money also enables taxation and encourages regulation. Replacement of financial transaction mediated production with peer production is a libertarian (of any stripe — substitute exploitation for taxation and regulation if desired) dream come true.

Put another way, that which does not require money is hard to control. I see advocacy of free software, free culture and similar as flowing directly from my desire for free speech and freedom and individual autonomy in general.

In the long run, then, I think sites that pursue a Craigslist-like strategy will come to dominate their categories, because they simply undercut their competition. That sucks if you’re the competitor, but it’s great for the rest of us!

Amen, though Craigslist, Wikipedia and similar do far more than merely undercut their competition.

Wikiforms

Thursday, May 11th, 2006

Brad Templeton writes about overly structured forms, one of my top UI peeves. The inability to copy and paste an IP address into a form with four separate fields has annoyed me, oh, probably hundreds of times. Date widgets annoy me slightly less. Listen to Brad when designing your next form, on the web or off.

The opposite of overly structured forms would be a freeform editing widget populated with unconstrained fields blank or filled with example data, or even a completely empty editing widget with suggested structure documented next to the widget — a wiki editing form. This isn’t as strange as it seems — many forms are distributed as word processor or plain text documents that recipients are expected to fill in by editing directly and return.

I don’t think “wikiforms” are appropriate for many cases where structured forms are used, but it’s useful to think of opposites and I imagine their (and hybrids — think a “rich” wiki editor with autocompletion — I haven’t really, but I imagine this is deja vu for anyone who has used mainframe-style data entry applications) niche could increase.

Ironically the currently number one use of the term wiki forms denotes adding structured forms to wikis!

On a marginally related note the Semantic MediaWiki appears to be making good progress.

What’s your Freedom/China Ratio?

Thursday, January 26th, 2006

Fred Stutzman points out that for the query site:ibiblio.org google.com estimates 7,640,000 hits while google.cn estimates 1,610,000, perhaps explained in part by support of freedom in Tibet.

That’s an impressive ratio of 4.75 pages findable in the relatively free world to 1 page findable in , call it a domain FCR of 4.75.

The domain FCR of a few sites I’m involved with:

bitzi.com: 635,000/210,000 = 3.02
creativecommons.org: 213,000/112,000 = 1.90
gondwanaland.com: 514/540 = 0.95

Five other sites of interest:

archive.org: 5,900,000/427,000 = 13.82
blogspot.com: 24,300,000/15,400,000 = 1.58
ibiblio.org: 5,260,000/ 1,270,000 = 4.14
typepad.com: 13,100,000 /2,850,000 = 4.60
wikipedia.org: 156,000,000/17,000,000 = 9.18

If you are cool your FCR will be very high. The third site above is my personal domain. I am obviously very uncool and so loved by the that they have twisted Google’s arm to make more of my blog posts available in China than are available elsewhere.

The is obviously the coolest site by far amongst those surveyed above, followed by . Very curious that apparently blocks a far higher percentage of pages at the blog service than of those at Google property .

It must be noted that the number of hits any web scale search engine claims are only estimates and these can vary considerably. Presumably Stutzman and I were hitting different Google servers, or perhaps his preferences are set slightly differently (I do have “safe search” off and accept results in any language — the obvious variables). However, the FCR from our results for site:ibiblio.org roughly agree.

Here’s a feeble attempt to draw the ire of PRC censors and increase my FCR:

Bryan Caplan’s Museum of Communism
Human Rights in China
Tiananmen Square Massacre
Government of Tibet in Exile
Tibet Online
民主進步黨 (Taiwan )

Note that I don’t really care about which jurisdiction or jurisdictions , , the or elsewhere fall under. would be preferable to the current arrangement, if the former led to more freedom, which it plausibly could. I post some independence-oriented links simply because I know that questions of territorial control matter deeply to states and my goal here is to increase my FCR.

You should attempt to increase your FCR, too. No doubt you can find better links than I did. If enough people try, the Google.cn index will become less interesting, though by one global method of guestimation, it is already seriously lacking. Add claimed hits for queries for html and -html to get a total index size.

google.com: 4,290,000,000 + 6,010,000,000 = 10,300,000,000
google.cn: 2,370,000,000 + 3,540,000,000 = 5,910,000,000

So the global FCR is 10,300,000,000/5,910,000,000 = 1.74

Although my domain FCR is lame, my name FCR is not bad (query for linksvayer) — 98,200/21,500 = 4.57.

Give me or give me the death of censorship!

(I eagerly await evidence that my methodology and assumptions are completely wrong.)

Going overboard with Wikipedia tags

Thursday, January 12th, 2006

A frequent correspondent recently complained that my linking to articles about organizations rather than the home pages of organizations is detrimental to the of this site, probably spurred by my linking to a stub article about Webjay.

I do so for roughly two reasons. First, I consider a Wikipedia link more usable than a link to an organization home page. An organization article will link directly to an organization home page, if the latter exists. The reverse is almost never true (though doing so is a great idea). An organization article at Wikipedia is more likely to be objective, succinct, and informational than an organizational home page (not to mention there is no chance of encountering , window resizing, or other annoying distractions — less charitably, attempts to control my browser — at Wikipedia). When I hear about something new these days, I nearly always check for a Wikipedia article before looking for an actual website. Finally, I have more confidence that the content of a Wikipedia article will be relevant to the content of my post many years from now.

(link to webjay.org) is actually a good example of these usability issues. Perhaps I have an unusually strong preference for words, but I think its still very brief Wikipedia article should allow one to understand exactly what Webjay is in under a minute.1 If I were visiting the Webjay site for the first time, I’d need to click around awhile to figure the service out — and Webjay’s interface is very to the point, unlike many other sites. Years from now I’d expect webjay.org to be a yet another site — or since the Yahoo! acquisition, to redirect to some Yahoo! property — or the property of whatever entities own Yahoo! in the future. (Smart browser integration with the ’s Wayback Machine could mitigate this problem.)

Anyway, I predict that in the forseeable future your browser will be able to convert a Wikipedia article link into a home page link if that is your preference, aided by Semantic Mediawiki annotations or similar.

The second reason I link to Wikipedia preferentially2 is that Wikipedia article URLs conveniently serve as “, as specified by the . If Technorati and its competitors happen to index this blog this month, it will show up in their tag-based searches, the names of the various Wikipedia articles I’ve linked to serving to name tags. I’ve never been enthusiastic about the overall utility of author applied tags, but I figure linking to Wikipedia is not as bad as linking to a tagreggator.

Also, Wikipedia serves as a tag disambiguator. Some tagging service is going to use Wikipedia data to disambiguate, cluster, merge, and otherwise enhance tags. I think this is pretty low hanging fruit — I’d work on it if I had concentration to spare.

Update: Chris Masse responds (see bottom of page). Approximate answer to his question: 14,000 links to www.tradesports.com, 17 links to en.wikipedia.org/wiki/Tradesports (guess where from). I’ll give Masse convention.

In the same post Masse claims that his own “following of Jakob Nielsen’s guidelines is responsible for the very high intergalactic popularity of my Internet presence.” How very humble of Masse to attribute the modest success of his site to mere guideline following rather than his own content and personality. Unfortunately I think there’s a missing counterfactual.

1 I would think that, having written most of the current Webjay article.

2 Actually my first link preference is for my past posts to this blog. I figure that if someone is bothering to read my ramblings, they may be interested in my past related ramblings — and I can use the memory aid.

Outsourcing charity … to Wikipedia

Friday, December 30th, 2005

Giving and asking for recommendations for worthy charitable donations seems to be popular this time of year, so I’ll do both, following my earlier unsolicited financial advice.

Excepting the very laws of nature (see arch anarchy), aging and its resulting suffering and death is the greatest oppressor of humanity. As far as I know Aubrey de Grey’s Methuselah Mouse Prize/Foundation is the only organization making a direct assault on aging, so I advise giving generously. Fight Aging! is the place to watch for new anti-aging philanthropy.

The most important human-on-human oppression to end, in the U.S. at least, is the drug war (which directly causes oppression in other jurisdictions as well). I’ve only mentioned this in passing here. There’s too much to say. The Drug Reform Coordination Network is saying some of it. The seems to be spearheading state level liberalization initiatives. See MPP’s 2006 plan. I met MPP founder Rob Kampia a year or so ago and was left with a good impression of the organization.

is the current exemplar of the anti-authoritarian age and I love their .

Finally, you could help pay my salary at Creative Commons, more in these letters.

I’d really prefer to give entirely outside the U.S. and other wealthy jurisdictions. However, I’m not interested in any organization that gives direct aid (reactionary, low long term impact), supports education (feel good, low long term impact), exhibits economic neanderthalism, has religious or social conservative ties, or is a shill for U.S. foreign policy in the areas of drugs, terror, or intellectual property. I am looking for organizations that support autonomous liberalization or any of the goals exemplified by the organizations I already support above. Suggestions?

I suppose supporting prizes is one means of donating without respect to jurisdiction. In cases were low cost is important, researchers in cheap areas will tend to win.

I’d also prefer to give via some innovative mechanism. We’ll see what the new year brings.

Wikipedia chief considers taking ads (via Boing Boing) says that at current traffic levels, Wikipedia could generate hundreds of millions of dollars a year by running ads. There are strong objections to running ads from the community, but that is a staggering number for a tiny nonprofit, an annual amount that would be surpassed only by the wealthiest foundations. It could fund a staggering Wikimedia Foundation bureaucracy, or it could fund additional free knowledge projects. Wikipedia founder Jimmy Wales has asked what will be free. Would an annual hundred million dollar budget increase the odds of those predictions? One way to find out before actually trying.

Of course I expect all of my donations to have imperceptible impact, almost as imperceptible as voting. But it’s all about expression. I’ve increased my expressive value by including a donor comment — “in loving memory of Άναξιμένης” — with my Wikipedia donation. I got an expressive boost when my comment was chosen for highlighting.

( was a pupil or contemporary of and has a cooler sounding name. As a kid I’d dedicate donations to Alexander the Great, but I now know better.)

The Anti-Authoritarian Age

Saturday, December 24th, 2005

In a compelling post Chris Anderson claims that people are unconfortable with distributed systems “[b]ecause these systems operate on the alien logic of probabilistic statistics, which sacrifices perfection at the microscale for optimization at the macroscale.”

I suspect one could make an even stronger claim closer to people’s actual thoughts, which aren’t about probability: people crave authority, and any system that doesn’t claim authority is suspect.

The most extreme example does not involve the web, blogs, wikipedia, markets, or democracy, all of which Anderson mentions. Science is the extreme example, and its dual, religion.

Science disclaims authority and certain knowledge. Even scientific “laws” are subject to continued investigation, criticism, and revision. Religions claim certain knowledge with no evidence, only assertions of authority, and count billions as believers.

Distributed systems sacrifice claims of perfection for optimization at the macroscale.

What wikipedia really needs is the pope to declare certain articles .

On the subject of response to the ongoing rounds of wikipedia criticism, this otherwise excellent post from Rob Kaye is pretty typical:

The Wikipedians will carry on their work and in another 5 years time it will be better than encyclopedia britannica — its only a matter of time.

For me this time is measured in negative years. I loved paper encyclopedias as a kid (but was always skeptical of their content–very incomplete at best). I haven’t looked at one in years. I use wikipedia every day.

Not having access to a paper encyclopedia means I have more shelf space to work with. Not having access to wikipedia would be a severe annoyance. In another 5 years time it would be a severe disability.

Addendum 20051225: I forgot to mention another example of ready acceptance of bogus authority versus rejection of uncertain discovery: the WMD excuse for invading Iraq versus the horror at an .

Annotating Wikipedia

Saturday, September 3rd, 2005

The Semantic MediaWiki proposal looks really promising.

Anyone who knows how to edit articles should find the syntax simple and usable:

Berlin is the captial of [[is capital of::Federal Republic of Germany|Germany]].

Berlin has about [[Population:=3.390.444|3.4 Mio]] inhabitants.

All that fantastic data, unlocked. (I’ve been meaning to write on post on why explicit metadata is democratic.) Wikipedia database dump downloads will skyrocket.

There are also interesting proposals under Wikidata as well (though big forms make me uneasy), but those mostly seem more applicable to new data-centric projects, while the Semantic MediaWiki proposal looks just right for the encyclopedia. Gordon Mohr’s Flexible Fields for MediaWiki proposal could probably serve both roles.

Once people get hooked on access to a semantic encyclopedia, perhaps they’ll want similar access to the entire web.

Via Danny Ayers.

Predict what will be free

Thursday, August 4th, 2005

Jimmy Wales, guest blogging at Lessig’s, has started what promises to be an interesting series of posts on ten things that will be free (as in free software):

[T]his is not a dream list of things which I hope through some magic to become free, but a list of things which I believe are solvable in reality, things that will be free. Anyone whose business model for the next 100 years depends on these things remaining proprietary better watch out: free culture is coming to get you.

For each of the ten, I will try to give some basic (and hopefully not too ambiguous) definitions for what it will mean for each of them to be “solved”, and we can all check back for the next 25 or 50 years to see how we are doing.

In a subsequent post Wales is even more explicit:

[T]he point of naming the list “will be free” rather than “should be free” or “must be free” is that I am making concrete predictions rather than listing a pie in the sky list of things I wish to see.

I’d love to see similar (but shorter term and more thoroughly specified) predictions as claims on a prediction market. With the right set of claims we can more easily talk about, and plan for, which things are more likely to be free, and when.

Thus far Wales has predicted encyclopedias and curricula will be free. I can’t think of any segments that I am fairly certain will be free, are associated with large businesses, and have not already been alluded to in the comments on his first post.

However, regarding widely deployed software (e.g., operating systems, productivity applications) I have a theory explaining why it will be free: Microsoft Windows and Office have a half life–eventually a release of each will be a failure, at which point the only viable alternaives will be free, and any non-free alternaitves will face slow death–think commercial Unixes in the face of Linux. I’m not going to stand by this theory–it probably assumes too little change, of any sort.