Post Blogs

What’s your Freedom/China Ratio?

Thursday, January 26th, 2006

Fred Stutzman points out that for the query site:ibiblio.org google.com estimates 7,640,000 hits while google.cn estimates 1,610,000, perhaps explained in part by support of freedom in Tibet.

That’s an impressive ratio of 4.75 pages findable in the relatively free world to 1 page findable in , call it a domain FCR of 4.75.

The domain FCR of a few sites I’m involved with:

bitzi.com: 635,000/210,000 = 3.02
creativecommons.org: 213,000/112,000 = 1.90
gondwanaland.com: 514/540 = 0.95

Five other sites of interest:

archive.org: 5,900,000/427,000 = 13.82
blogspot.com: 24,300,000/15,400,000 = 1.58
ibiblio.org: 5,260,000/ 1,270,000 = 4.14
typepad.com: 13,100,000 /2,850,000 = 4.60
wikipedia.org: 156,000,000/17,000,000 = 9.18

If you are cool your FCR will be very high. The third site above is my personal domain. I am obviously very uncool and so loved by the that they have twisted Google’s arm to make more of my blog posts available in China than are available elsewhere.

The is obviously the coolest site by far amongst those surveyed above, followed by . Very curious that apparently blocks a far higher percentage of pages at the blog service than of those at Google property .

It must be noted that the number of hits any web scale search engine claims are only estimates and these can vary considerably. Presumably Stutzman and I were hitting different Google servers, or perhaps his preferences are set slightly differently (I do have “safe search” off and accept results in any language — the obvious variables). However, the FCR from our results for site:ibiblio.org roughly agree.

Here’s a feeble attempt to draw the ire of PRC censors and increase my FCR:

Bryan Caplan’s Museum of Communism
Human Rights in China
Tiananmen Square Massacre
Government of Tibet in Exile
Tibet Online
民主進步黨 (Taiwan )

Note that I don’t really care about which jurisdiction or jurisdictions , , the or elsewhere fall under. would be preferable to the current arrangement, if the former led to more freedom, which it plausibly could. I post some independence-oriented links simply because I know that questions of territorial control matter deeply to states and my goal here is to increase my FCR.

You should attempt to increase your FCR, too. No doubt you can find better links than I did. If enough people try, the Google.cn index will become less interesting, though by one global method of guestimation, it is already seriously lacking. Add claimed hits for queries for html and -html to get a total index size.

google.com: 4,290,000,000 + 6,010,000,000 = 10,300,000,000
google.cn: 2,370,000,000 + 3,540,000,000 = 5,910,000,000

So the global FCR is 10,300,000,000/5,910,000,000 = 1.74

Although my domain FCR is lame, my name FCR is not bad (query for linksvayer) — 98,200/21,500 = 4.57.

Give me ∞ or give me the death of censorship!

(I eagerly await evidence that my methodology and assumptions are completely wrong.)

Alexa Grapher

Wednesday, January 18th, 2006

Many people look at to see how the Alexa traffic rank of a site of interest is faring (usually not well — there are always more websites, so even maintaining ordinal rank is an uphill battle). People who don’t twiddle URLs out of habit probably don’t realize that Alexa can be asked to graph data back to late 2001 or that graphs may be arbitarily sized.

I’d been meaning to put together an Alexa graph generating utility for months (well, one more accessible than URL editing, which I’ve always used, e.g., the graphs at the bottom of a post on blog search) and I finally got started last night.

Funny thing then that I read via Brad Neuberg that Joe Walker just published an “Ajax” Alexa grapher, so I guess I’ll just publish my own ultra-crufty Alexa grapher rather than cleaning it up first. I’m not sure what is Ajaxian about Walker’s (haven’t looked), but mine doesn’t qualify, I think — it is just plain old javascript, with the graph updated by setting innerHTML. No Async XML communication with a server.

I was going to write up a bunch of caveats about how Alexa graphs should be interpreted, but in the interest of completing this post, I’ll just point out one oddity I discovered — the url parameter of Alexa’s traffic detail page (click on the graph on my Alexa grapher to get to such a page) must be the last querystring parameter, otherwise every parameter after it gets interpreted as being part of the url parameter. Some kind of odd URL parsing going on there at Alexa. (Nevermind that they really want a domain name, not a URL, for that parameter.)

Search 2006

Saturday, January 14th, 2006

I’m not going to make new predictions for search this year — it’s already underway, and my predictions for 2005 mostly did not come true. I predict that most of them will, in the fullness of time:

Metadata-enhanced search. Yahoo! and Google opened Creative Commons windows on their web indices. Interest in semantic markup (e.g., microformats) increased greatly, but search that really takes advantage of this is a future item. (NB I consider the services enabled by more akin to browse than search and as far as I know they don’t allow combinging tag and keyword queries.)

Proliferation of niche web scale search engines. Other than a few blog search services, which are very important, I don’t know of anything that could be called “web scale” — and I don’t know if blog search could really be called niche. One place to watch is public search engines using Nutch. Mozdex is attempting to scale up, but I don’t know that they really have a niche, unless “using open source software” is one. Another place is Wikipedia’s list of internet search engines.

On the other hand, weblications (as Web 2.0) did take off.

I said lots of desktop search innovation was a near certainty, but if so, it wasn’t very visible. I predicted slow progress on making multimedia work with the web, and I guess there was very slow progress. If there was forward progress on usable security it was slow indeed. Open source did slog toward world domination (e.g., Firefox is the exciting platform for web development, but barely made a dent in Internet Explorer’s market share) with Apple’s success perhaps being a speed bump. Most things did get cheaper and more efficient, with the visible focus of the semiconductor industry swinging strongly in that direction (they knew about it before 2005).

Last year I riffed on John Battelle’s predictions. He has a new round for 2006, one of which was worth noting at Creative Commons.

Speaking of predictions, of course Google began using prediction markets internally. Yahoo!s Tech Buzz Game has some markets relevant to search but I don’t know how to interpret the game’s prices.

Going overboard with Wikipedia tags

Thursday, January 12th, 2006

A frequent correspondent recently complained that my linking to articles about organizations rather than the home pages of organizations is detrimental to the of this site, probably spurred by my linking to a stub article about Webjay.

I do so for roughly two reasons. First, I consider a Wikipedia link more usable than a link to an organization home page. An organization article will link directly to an organization home page, if the latter exists. The reverse is almost never true (though doing so is a great idea). An organization article at Wikipedia is more likely to be objective, succinct, and informational than an organizational home page (not to mention there is no chance of encountering , window resizing, or other annoying distractions — less charitably, attempts to control my browser — at Wikipedia). When I hear about something new these days, I nearly always check for a Wikipedia article before looking for an actual website. Finally, I have more confidence that the content of a Wikipedia article will be relevant to the content of my post many years from now.

(link to webjay.org) is actually a good example of these usability issues. Perhaps I have an unusually strong preference for words, but I think its still very brief Wikipedia article should allow one to understand exactly what Webjay is in under a minute.1 If I were visiting the Webjay site for the first time, I’d need to click around awhile to figure the service out — and Webjay’s interface is very to the point, unlike many other sites. Years from now I’d expect webjay.org to be a yet another site — or since the Yahoo! acquisition, to redirect to some Yahoo! property — or the property of whatever entities own Yahoo! in the future. (Smart browser integration with the ‘s Wayback Machine could mitigate this problem.)

Anyway, I predict that in the forseeable future your browser will be able to convert a Wikipedia article link into a home page link if that is your preference, aided by Semantic Mediawiki annotations or similar.

The second reason I link to Wikipedia preferentially2 is that Wikipedia article URLs conveniently serve as “, as specified by the . If Technorati and its competitors happen to index this blog this month, it will show up in their tag-based searches, the names of the various Wikipedia articles I’ve linked to serving to name tags. I’ve never been enthusiastic about the overall utility of author applied tags, but I figure linking to Wikipedia is not as bad as linking to a tagreggator.

Also, Wikipedia serves as a tag disambiguator. Some tagging service is going to use Wikipedia data to disambiguate, cluster, merge, and otherwise enhance tags. I think this is pretty low hanging fruit — I’d work on it if I had concentration to spare.

Update: Chris Masse responds (see bottom of page). Approximate answer to his question: 14,000 links to www.tradesports.com, 17 links to en.wikipedia.org/wiki/Tradesports (guess where from). I’ll give Masse convention.

In the same post Masse claims that his own “following of Jakob Nielsen’s guidelines is responsible for the very high intergalactic popularity of my Internet presence.” How very humble of Masse to attribute the modest success of his site to mere guideline following rather than his own content and personality. Unfortunately I think there’s a missing counterfactual.

1 I would think that, having written most of the current Webjay article.

2 Actually my first link preference is for my past posts to this blog. I figure that if someone is bothering to read my ramblings, they may be interested in my past related ramblings — and I can use the memory aid.

Lightnet!

Monday, January 9th, 2006

Congratulations to Lucas Gonze on the /Yahoo! merger. (Via Kevin Burton.)

Yahoo! made a very wise decision to be acquired by the light side rather than the dark side.

My favorite Gonze post: Totally fucking bored with Napster (more at CC).

Also have a listen to the best track on ccMixter (if you share my taste, probably not), also a Gonze creation.

I could gonze on, but enough of this!

XTech 2006 CFP deadline

Tuesday, January 3rd, 2006

I mentioned elsewhere that I’m on the program committe for XTech 2006, the leading web technology conference in Europe, to be held in Amsterdam May 16-19.

Presentation, tutorial and panel proposals are due in less than a week–January 9. If you’re building an extraordinary Web 2.0 application or doing research that Web 2.0 (very broadly construed) developers and entrepreneurs need to hear about, please consider submitting a proposal.

See the CFP and track descriptions.

Best tech, policy, and idea blogs of 2005

Saturday, December 31st, 2005

Only one of each, according to me, highly subjective:

Technology: Danny Ayers’ Raw provides one stop for very well done semantic web (and nearby) news and analysis, written at a level perfect for me. He also has a knack for posting about obscure (to me) topics I’ve wondered about recently, or will soon, most recently about accounting for whether something is known.

Policy: Ronnie Horesh doesn’t post all that often and his Social Policy Bonds Blog is mostly about one topic. Regardless of what you think of his proposed implementation, Horesh’s mantra, that policies should be subordinated to outcomes, is so simple, obvious, and rarely followed, that it needs to be heard around the world. Here’s to a great new year.

Ideas: Brad Templeton posts (mostly good) Brad Ideas. Many are moderately ambitious, few are crazy. Executives with more ambition than imagination (especially airline executives), please read Templeton’s blog. The most recent Brad Idea, that crash avoidance technology could be financially justified by lower insurance rates, is less concrete than most.

Sorry, no recommendations for celebrity gossip, sex, photo, conspiracy, spam/seo/marketing, or war blogs.

The Anti-Authoritarian Age

Saturday, December 24th, 2005

In a compelling post Chris Anderson claims that people are unconfortable with distributed systems “[b]ecause these systems operate on the alien logic of probabilistic statistics, which sacrifices perfection at the microscale for optimization at the macroscale.”

I suspect one could make an even stronger claim closer to people’s actual thoughts, which aren’t about probability: people crave authority, and any system that doesn’t claim authority is suspect.

The most extreme example does not involve the web, blogs, wikipedia, markets, or democracy, all of which Anderson mentions. Science is the extreme example, and its dual, religion.

Science disclaims authority and certain knowledge. Even scientific “laws” are subject to continued investigation, criticism, and revision. Religions claim certain knowledge with no evidence, only assertions of authority, and count billions as believers.

Distributed systems sacrifice claims of perfection for optimization at the macroscale.

What wikipedia really needs is the pope to declare certain articles .

On the subject of response to the ongoing rounds of wikipedia criticism, this otherwise excellent post from Rob Kaye is pretty typical:

The Wikipedians will carry on their work and in another 5 years time it will be better than encyclopedia britannica — its only a matter of time.

For me this time is measured in negative years. I loved paper encyclopedias as a kid (but was always skeptical of their content–very incomplete at best). I haven’t looked at one in years. I use wikipedia every day.

Not having access to a paper encyclopedia means I have more shelf space to work with. Not having access to wikipedia would be a severe annoyance. In another 5 years time it would be a severe disability.

Addendum 20051225: I forgot to mention another example of ready acceptance of bogus authority versus rejection of uncertain discovery: the WMD excuse for invading Iraq versus the horror at an .

Five Reasons Why Bathroom Tissue Matters

Friday, December 9th, 2005

I’d like to be annoyed by the “Web 2.0” label, but overall I think it loosely denotes a collection of good trends, and that’s slightly useful. The has a good summary. But then there are completely vacuous articles such as Five Reasons Why Web 2.0 Matters (via /.) that simply cry out for parody.

Five Reasons Why Bathroom Tissue Matters

  1. The Focus of Technology Moves To People With Bathroom Tissue.
  2. Bathroom Tissue Represents Best Practices.
  3. Bathroom Tissue Has Excellent Feng Shui.
  4. Quality Is Maximized, Waste Is Minimized.
  5. Bathroom Tissue Has A Ballistic Trajectory.

Certainly there are other reasons why Bathroom Tissue is important and you’re welcome to list them here, but I think this captures the central vision in a way that most anyone who craps can grasp and access.

BTW, I will also use this moment to state that Bathroom Tissue is a terrible name for this new vision of paper-based people-centric product. Except that is for every other name we have at the moment (for example, like “next generation of the arsewipe”). So I will continue to use Bathroom Tissue until something better comes along.

OK, don’t agree? Please straighten me out. Why does bathroom tissue matter (or not) to you?

Toilet paper anyone?

Most Rights Denied

Saturday, November 5th, 2005

Ryan King has created a funny spoof of Creative Commons licenses–the Uncreative Uncommons
Humor Link Back Don’t Repeat 0.1beta3 license–compare to the Creative Commons Attribution-NonCommercial-ShareAlike 2.5 license. Can you use hu-lb-dr? Nope:

The UU license is itself availble under the UU license, which means, no. See stipulation #3: “You may not paraphrase, repurpose or in any way retell the content. It is like “telling someone else’s joke” and that’s not cool.”

Ha ha.

Someone ought to create a CC license deed spoof for EULAs and :

See the EFF’s A User’s Guide to EULAs for more ideas.