January – 2006

Archive for January, 2006

Tiananmen photo mashup

Saturday, January 28th, 2006

This cries out for a photo mashup, so here it is:

That’s the first photo mashup I’ve ever done, so it’s very simple. I opened the protester facing a column of tanks photo in the GIMP, opened the pretty Tiananmen Building photo in a second layer, then searched for filters that would allow me to combine them — Layer|Transparency|Color to Alpha accomplished exactly what I wanted.

I thought this JPEG export at zero quality looks kind of neat.

NB I don’t think Google has done anything wrong google.cn. The appropriate response is not anger with Google, but action to spread the information the Communist Party of China wants to suppress.

Posted in Blogs, Free Speech, Intellectual Protectionism, Open Source, Politics | 6 Comments »

What’s your Freedom/China Ratio?

Thursday, January 26th, 2006

Fred Stutzman points out that for the query site:ibiblio.org google.com estimates 7,640,000 hits while google.cn estimates 1,610,000, perhaps explained in part by ibliblio’s support of freedom in Tibet.

That’s an impressive ratio of 4.75 pages findable in the relatively free world to 1 page findable in China, call it a domain FCR of 4.75.

The domain FCR of a few sites I’m involved with:

bitzi.com: 635,000/210,000 = 3.02
creativecommons.org: 213,000/112,000 = 1.90
gondwanaland.com: 514/540 = 0.95

Five other sites of interest:

archive.org: 5,900,000/427,000 = 13.82
blogspot.com: 24,300,000/15,400,000 = 1.58
ibiblio.org: 5,260,000/ 1,270,000 = 4.14
typepad.com: 13,100,000 /2,850,000 = 4.60
wikipedia.org: 156,000,000/17,000,000 = 9.18

If you are cool your FCR will be very high. The third site above is my personal domain. I am obviously very uncool and so loved by the Communist Party of China that they have twisted Google’s arm to make more of my blog posts available in China than are available elsewhere.

The Internet Archive is obviously the coolest site by far amongst those surveyed above, followed by Wikipedia. Very curious that Google apparently blocks a far higher percentage of pages at the blog service TypePad than of those at Google property Blogspot.

It must be noted that the number of hits any web scale search engine claims are only estimates and these can vary considerably. Presumably Stutzman and I were hitting different Google servers, or perhaps his preferences are set slightly differently (I do have “safe search” off and accept results in any language — the obvious variables). However, the FCR from our results for site:ibiblio.org roughly agree.

Here’s a feeble attempt to draw the ire of PRC censors and increase my FCR:

Bryan Caplan’s Museum of Communism
Human Rights in China
Tiananmen Square Massacre
Government of Tibet in Exile
Tibet Online
æ°‘ä¸»é€²æ¥é»¨ (Taiwan Democratic Progressive Party)

Note that I don’t really care about which jurisdiction or jurisdictions Taiwan, Tibet, the Spratly Islands or elsewhere fall under. One jurisdiction, n systems would be preferable to the current arrangement, if the former led to more freedom, which it plausibly could. I post some independence-oriented links simply because I know that questions of territorial control matter deeply to states and my goal here is to increase my FCR.

You should attempt to increase your FCR, too. No doubt you can find better links than I did. If enough people try, the Google.cn index will become less interesting, though by one global method of guestimation, it is already seriously lacking. Add claimed hits for queries for html and -html to get a total index size.

google.com: 4,290,000,000 + 6,010,000,000 = 10,300,000,000
google.cn: 2,370,000,000 + 3,540,000,000 = 5,910,000,000

So the global FCR is 10,300,000,000/5,910,000,000 = 1.74

Although my domain FCR is lame, my name FCR is not bad (query for linksvayer) — 98,200/21,500 = 4.57.

Give me âˆž or give me the death of censorship!

(I eagerly await evidence that my methodology and assumptions are completely wrong.)

Posted in Blogs, Free Speech, Intellectual Protectionism, Peeves, Politics, Wikipedia | 16 Comments »

Alexa Grapher

Wednesday, January 18th, 2006

Many people look at Alexa to see how the Alexa traffic rank of a site of interest is faring (usually not well — there are always more websites, so even maintaining ordinal rank is an uphill battle). People who don’t twiddle URLs out of habit probably don’t realize that Alexa can be asked to graph data back to late 2001 or that graphs may be arbitarily sized.

I’d been meaning to put together an Alexa graph generating utility for months (well, one more accessible than URL editing, which I’ve always used, e.g., the graphs at the bottom of a post on blog search) and I finally got started last night.

Funny thing then that I read via Brad Neuberg that Joe Walker just published an “Ajax” Alexa grapher, so I guess I’ll just publish my own ultra-crufty Alexa grapher rather than cleaning it up first. I’m not sure what is Ajaxian about Walker’s (haven’t looked), but mine doesn’t qualify, I think — it is just plain old javascript, with the graph updated by setting innerHTML. No Async XML communication with a server.

I was going to write up a bunch of caveats about how Alexa graphs should be interpreted, but in the interest of completing this post, I’ll just point out one oddity I discovered — the url parameter of Alexa’s traffic detail page (click on the graph on my Alexa grapher to get to such a page) must be the last querystring parameter, otherwise every parameter after it gets interpreted as being part of the url parameter. Some kind of odd URL parsing going on there at Alexa. (Nevermind that they really want a domain name, not a URL, for that parameter.)

Posted in Blogs, Programming | No Comments »

[Hot]link policy

Sunday, January 15th, 2006

I’m out of the loop. Until very recently (upon reading former Creative Commons intern Will Frank’s writeup of a brief hotlink war) I thought ‘hotlink‘ was an anachronistic way to say ‘link’ used back when the mere fact that links led to a new document, perhaps on another server, was exciting. It turns out ‘hotlink’ is now vernacular for inline linking — displaying or playing an image, audio file, video, or other media from another website.

Lucas Gonze, who has lots of experience dealing with hotlink complaints due to running Webjay, has a new post on problems with complaint forms as a solution to hotlinks. One thing missing from the post is a distinction between two completely different sets of complainers who will have different sets of solutions beyond complaining.

One sort of complainer wants a link to a third party site to go away. I suspect the complainer usually really wants the content on the third party site to go away (typically claiming the third party site has no right to distribute the content in question). Removing a link to that content from a link site works as a partial solution by making the third party hosted content more obscure. A solution in this case is to tell the complainer that the link will go away when it no longer works — in effect, the linking site ignore complaints and it is the responsibility of the complainer to directly pursue the third party site via takedown notices and other threats. This allows the linking site to completely automate the removal of links — those removed as a result of threatened or actual legal action look exactly the same as any other link gone bad and can be tested for and culled using the same methods. Presumably such a hands-off policy only pisses off complainers to the extent that they become more than a minor nuisance, at least on a Webjay-like site, though it must be an option for some.

Creative Commons has guidelines very similar to this policy concerning how to consider license information in files distributed off the web — don’t believe it unless a web page (which can be taken down) has matching license information concerning the file in question.

Another sort of complainer wants a link to content on their own site to go away, generally for one or two reasons. The first reason is that hotlinking uses bandwidth and other resources on the hotlinked site which the site owner may not be able to afford. The second reason, often coupled with the first, is that the site owner does not want their content to be available outside of the context of their own site (i.e., they want viewers to have to come to the source site to view the content).

With a bit of technical savvy the complainer who wants a link to their own site removed has several options for self help. Those merely concerned with cost could redirect requests without the relevant referrer (from their own site) or maybe cookie (e.g., for a logged in user) to the free Coral content distribution network or similar, which should drastically reduce originating site bandwidth, if hotlinks are actually generating many requests (if they are not there is no problem).

A complainer who does not want their content appearing in third party sites can return a small “visit my site if you want to view this content” image, audio file, or video as appropriate in the abscense of the desired referrer or cookie. Hotlinking sites become not an annoyance, but free advertising. Many sites take this strategy already.
Presumably many publishers do not have any technical savvy, so some Webjay-like sites find it easier to honor their complaints than to ignore them.

There is a potential for technical means of saying “don’t link to me” that could be easily implemented by publishers and link sites with any technical savvy. One is to interpret robots.txt exclusions to mean “don’t link to me” as well as “don’t crawl and index me.” This has the nice effect that those stupid enough to not want to be linked to also become invisible to search engines.

Another solution is to imitate rel=nofollow — perhaps rel=nolink, though the attribute would need to be availalable on img, object, and other elements in addtion to a, or simply apply rel=nofollow to those additional elements a la the broader interpretation of robots.txt above.

I don’t care for rel=nolink as it might seem to give some legitimacy to brutally bogus link policies (without the benefit of search invisibility), but it is an obvious option.

The upshot of all this is that if a link site operator is not as polite as Lucas Gonze there are plenty of ways to ignore complainers. I suppose it largely comes down to customer service, where purely technical solutions may not work as well as social solutions. Community sites with forums have similar problems. Apparently Craig Newmark spends much of his time tending to customer service, which I suspect has contributed greatly to making Craigslist such a success. However, a key difference, I suspect, is that hotlink complainers are not “customers” of the linking site, while most people who complain about behavior on Craigslist are “customers” — participants in the Craigslist community.

Posted in Creative Commons, Intellectual Protectionism, P2P, Peeves, Politics, Semantic Web | 2 Comments »

Credit card numbers from Ï€

Sunday, January 15th, 2006

credit card numbers from pi

I had to run an errand and was disappointed to find Andi had left the channel. I really wanted to help him in his quest for credit card numbers. They are all to be found in pi. If Andi is any good he could’ve fleeced others searching for credit card numbers with that one.

Addendum: It’s an old joke. I probably heard it before and forgot.

Posted in Creative Commons, Peeves | 32 Comments »

Search 2006

Saturday, January 14th, 2006

I’m not going to make new predictions for search this year — it’s already underway, and my predictions for 2005 mostly did not come true. I predict that most of them will, in the fullness of time:

Metadata-enhanced search. Yahoo! and Google opened Creative Commons windows on their web indices. Interest in semantic markup (e.g., microformats) increased greatly, but search that really takes advantage of this is a future item. (NB I consider the services enabled by tags more akin to browse than search and as far as I know they don’t allow combinging tag and keyword queries.)

Proliferation of niche web scale search engines. Other than a few blog search services, which are very important, I don’t know of anything that could be called “web scale” — and I don’t know if blog search could really be called niche. One place to watch is public search engines using Nutch. Mozdex is attempting to scale up, but I don’t know that they really have a niche, unless “using open source software” is one. Another place is Wikipedia’s list of internet search engines.

On the other hand, weblications (as Web 2.0) did take off.

I said lots of desktop search innovation was a near certainty, but if so, it wasn’t very visible. I predicted slow progress on making multimedia work with the web, and I guess there was very slow progress. If there was forward progress on usable security it was slow indeed. Open source did slog toward world domination (e.g., Firefox is the exciting platform for web development, but barely made a dent in Internet Explorer’s market share) with Apple’s success perhaps being a speed bump. Most things did get cheaper and more efficient, with the visible focus of the semiconductor industry swinging strongly in that direction (they knew about it before 2005).

Last year I riffed on John Battelle’s predictions. He has a new round for 2006, one of which was worth noting at Creative Commons.

Speaking of predictions, of course Google began using prediction markets internally. Yahoo!s Tech Buzz Game has some markets relevant to search but I don’t know how to interpret the game’s prices.

Posted in Blogs, Computers, Creative Commons, Microformats, Open Source, Prediction Markets, Semantic Web | 2 Comments »

Fraud of War in Iraq

Friday, January 13th, 2006

Cost of War in Iraq, a new paper from Linda Bilmes and Joseph Stiglitz, has already been discussed, at least superficially, on a large number of blogs. Comments at Marginal Revolution helpfully cite a number of related papers.

Bilmes and Stiglitz conservatively project that the total economic costs for the U.S. jurisdiction at $1 to $2 trillion. Direct budgetary costs are projected to be $750 billion to $1.2 billion. I have only skimmed the paper, which looks interesting enough, but nothing really new.

I’ve mentioned increasing cost projects several times last year and before, directly in Trillion dollar fraud (August), $700 billion fraud (July) and A lie halfway fulfilled (January 2005).

I won’t bother to explain the fraud this time, read the past posts. Hint: it involves repeatability.

One thing I’m struck by, skimming comments contesting Bilmes and Stiglitz (the political ones, not the technical ones concerning borrowing costs should be included, though they overlap), is how easily extreme scenarios are justified through confident speculation. People confidently check out different UK sportsbooks to weigh their chances before placing bets; similarly, many retrospectively justify the invasion’s enormous cost, economically and otherwise, regardless of the final figure. $5 trillion? (NB, that is a hypothetical, not a prediction!) It was worth getting rid of Hussein and deterring would-be Husseins. $10 trillion? Just goes to show how nasty “our” opponents are. $100 trillion? Civilization must be destroyed to save civilization!

All the more reason to be cognizant of probable costs before going to war. There’s not really a need for prediction markets here. Just multiply proponents’ estimates by ten. However, people stupidly believe words that come out of politicians’ mouths. Prediction market estimates could, ironically, provide a countervailing authority.

A better way? See Wright, Scheer, Zakaria, Hardar, Tierney, and Pape.

Posted in Economics, Iraq, Peeves, Politics, Prediction Markets | 1 Comment »

CodeCon 2006 Program

Thursday, January 12th, 2006

The CodeCon 2006 program has been announced and it looks fantastic. I highly recommend attending if you’re near San Francisco Feb 10-12 and any sort of computer geek. There’s an unofficial CodeCon wiki.

My impressions of last year’s CodeCon: Friday, Saturday, and Sunday.

Via Wes Felter

Posted in Computers, Open Source, P2P, Programming, Semantic Web | No Comments »

Going overboard with Wikipedia tags

Thursday, January 12th, 2006

A frequent correspondent recently complained that my linking to Wikipedia articles about organizations rather than the home pages of organizations is detrimental to the usability of this site, probably spurred by my linking to a stub article about Webjay.

I do so for roughly two reasons. First, I consider a Wikipedia link more usable than a link to an organization home page. An organization article will link directly to an organization home page, if the latter exists. The reverse is almost never true (though doing so is a great idea). An organization article at Wikipedia is more likely to be objective, succinct, and informational than an organizational home page (not to mention there is no chance of encountering Flash, window resizing, or other annoying distractions — less charitably, attempts to control my browser — at Wikipedia). When I hear about something new these days, I nearly always check for a Wikipedia article first because I can quickly see full list of references to verify accuracy before looking for an actual website. Finally, I have more confidence that the content of a Wikipedia article will be relevant to the content of my post many years from now.

Webjay (link to webjay.org) is actually a good example of these usability issues. Perhaps I have an unusually strong preference for words, but I think its still very brief Wikipedia article should allow one to understand exactly what Webjay is in under a minute.¹ If I were visiting the Webjay site for the first time, I’d need to click around awhile to figure the service out — and Webjay’s interface is very to the point, unlike many other sites. Years from now I’d expect webjay.org to be a yet another link farm site — or since the Yahoo! acquisition, to redirect to some Yahoo! property — or the property of whatever entities own Yahoo! in the future. (Smart browser integration with the Internet Archive‘s Wayback Machine could mitigate this problem.)

Anyway, I predict that in the forseeable future your browser will be able to convert a Wikipedia article link into a home page link if that is your preference, aided by Semantic Mediawiki annotations or similar.

The second reason I link to Wikipedia preferentially² is that Wikipedia article URLs conveniently serve as “tags, as specified by the rel="tag" microformat. If Technorati and its competitors happen to index this blog this month, it will show up in their tag-based searches, the names of the various Wikipedia articles I’ve linked to serving to name tags. I’ve never been enthusiastic about the overall utility of author applied tags, but I figure linking to Wikipedia is not as bad as linking to a tagreggator.

Also, Wikipedia serves as a tag disambiguator. Some tagging service is going to use Wikipedia data to disambiguate, cluster, merge, and otherwise enhance tags. I think this is pretty low hanging fruit — I’d work on it if I had concentration to spare.

Update: Chris Masse responds (see bottom of page). Approximate answer to his question: 14,000 links to www.tradesports.com, 17 links to en.wikipedia.org/wiki/Tradesports (guess where from). I’ll give Masse convention.

In the same post Masse claims that his own “following of Jakob Nielsen’s guidelines is responsible for the very high intergalactic popularity of my Internet presence.” How very humble of Masse to attribute the modest success of his site to mere guideline following rather than his own content and personality. Unfortunately I think there’s a missing counterfactual.

¹ I would think that, having written most of the current Webjay article.

² Actually my first link preference is for my past posts to this blog. I figure that if someone is bothering to read my ramblings, they may be interested in my past related ramblings — and I can use the memory aid.

Posted in Blogs, Microformats, Peeves, Semantic Web, Wikipedia | 6 Comments »

Pro abortion

Tuesday, January 10th, 2006

Why would anyone, especially a self-styled economist say something as silly as the following?

In spite of the slander of pro-lifers, nobody is in favor of abortion. Abortion is horrible. Ask anybody who had one.

Clearly anybody who has had an abortion favored abortion over giving birth, just as anybody who has had a root canal favored enduring the operation over an eventual jaw infection and chronic pain. People don’t bother saying “nobody is in favor of root canals.” Decisions around sensitive procedures like abortion rely profoundly on assurance about the safety and credibility of the service providers, similar to how ka?de prawdziwe kasyno przechodzi rzeteln? weryfikacj? przez wpisaniem do rankingu to ensure users feel secure about their choice. An economist of all people should recognize the nullity of claiming nobody favors a choice that many people actually make, given real world constraints.

I favor abortion. Strongly. Kill the parasite! I favor vasectomy even more strongly, but abortion is a good backup plan.

Posted in Economics, Health, Peeves, Politics | 6 Comments »