Archive for September, 2006

Friends don’t let friends click spam

Thursday, September 7th, 2006

Doc Searls unfortunately decided the other day that offering his blog under a relatively restrictive Creative Commons NonCommercial license instead of placing its contents in the public domain is chemo for splogs (spam blogs). I doubt that, strongly. Spam bloggers don’t care about copyright. They’ll take “all rights reserved” material, that which only limits commercial use, and stuff in the public domain equally. Often they combine tiny snippets from many sources, probably triggering copyright for none of them.

A couple examples found while looking at people who had mentioned Searls’ post: all rights reserved material splogged, commenter here says “My blog has been licensed with the CC BY-NC-SA 2.5 for a while now, and sploggers repost my content all the time.” A couple anecdotes prove nothing, but I’d be surprised to find that sploggers are, for example, using CC-enabled search to find content they can legally re-splog. I hope someone tries to figure out what characteristics make blog content more likely to be used in splogs and whether licensing is one of them. I’d get some satisfaction from either answer.

Though Searls’ license change was motived by a desire “to come up with new forms of treatment. Ones that don’t just come from Google and Yahoo. Ones that come from us” I do think blog spam is primarily the search engines’ problem to solve. Search results that don’t contain splogs are more valuable to searchers than spam-ridden results. Sites that cannot be found through search effectively don’t exist. That’s almost all there is to it.

Google in particular may have mixed incentives (they want people to click on their syndicated ads wherever the ads appear), but others don’t (Technorati, Microsoft, Ask, etc. — Yahoo! wishes it had Google’s mixed incentives). At least once where spam content seriously impacted the quality of search results Google seems to have solved the problem — at some point in the last year or so I stopped seeing Wikipedia content reposted with ads (an entirely legal practice) in Google search results.

What can people outside the search engines do to fight blog and other spam? Don’t click on it. It seems crazy, but clickfraud aside, real live idiots clicking on and even buying stuff via spam is what keeps spammers in business. Your uncle is probably buying pills from a spammer right now. Educate him.

On a broader scale, why isn’t the , or the blogger equivalent, running an educational campaign teaching people to avoid spam and malware? Some public figure should throw in “dag gammit, don’t click on spam” along with “don’t do drugs.” Ministers too.

Finally, if spam is so easy for (aware) humans to detect (I certainly have a second sense about it), why isn’t human-augmented computation being leveraged? Opportunities abound…

Google whenever

Sunday, September 3rd, 2006

For years I’ve heard speculation that Google is buiding a web archive. Now there are domain name purchases to fuel the speculation. The Internet Archive has been providing an invaluable service with the and has set up mirrors in multiple jurisdictions, but recording the web is too important to rely on any single organization, no matter how good or robust. So I hope Google and others are maintaining web archives and will make them available to the public.

Via Tim Finin, who also notes an interesting paper about using article and user history to assign trust levels to Wikipedia article fragments and a Semantic Web archive.

Archives are important for establishing provenance in many situations, though one I’m particularly interested in is citing that a particular work was offered under a Creative Commons license at a particular time. This and other uses (e.g., citation in general, which is often of the form “http://example.com accessed 2005-03-10”, though who knows if a copy of the content as it existed on that date exists) would be enhanced if on-demand archiving were available. The Internet Archive does offer Archive-It.org, but this service is for institutional use and uses periodic crawls rather than immediate archiving of individual pages.

Update, 2 minutes later: I should read a bit more before posting: does exactly what I want. However, I hate that it uses opaque identifiers, and as such is nearly as evil as TinyURL.

Experts agree to bark like dogs

Saturday, September 2nd, 2006

One of the more annoying things political pundits do is to consistently make the case that their candidate or cause is a likely winner, or if too obvious a loser, at least will beat expectations. Surely there is demand for pundits as critical about their favored outcome’s chances as they are about their ufavored outcomes? Perhaps if I watched lots of television I would know of such a chimera.

Fortunately there are again (see Historical Presidential Betting Markets) markets to give anyone who wants one a reality check. However, it is rare (in the U.S.) for a “third party” candidate to be significant enough for an election market to cast any light on their chances. Often “field” will be available (for example, Intrade currently lists the following spreads for 2008 Presidential Election Winner (Political Party): Democrat 49.1/49.2, Republican 47.6/48.4, Field 2.9/3.2) but chance accorded by traders to “the field” has to be based on the expectation that a viable independent will come out of the woodwork (e.g., Ross Perot in 1992) rather than the expectation that a Green, Libertarian, or other minor party candidate has a non-negligible chance of victory. This is too bad in a way, as my casual observation says that minor party backers are more delusional than most when it comes to their candidate’s chances.

It appears that in the there is a possibility that “the field” may map strongly to a minor party candidate’s chances — Libertarian Party nominee . Democrat is the only major party candidate on the ballot. Republican is running a write-in campaign.

A Smither press release proclaims that “The Experts Agree” that Smither has the best chance of defeating Lampson, and quotes four sources that say something along those lines. These “experts” aren’t putting anything on the line though — the Intrade CD22 market has the following current bid/ask/last values: Democrat 70.0/90.0/76.0, Republican 12.0/19.9/12.0, Field 2.0/9.9 /0.1.

Traders seem to think a Smither victory is about as likely as Lampson and Sekula-Gibbs photographed together in bed, with a dog. Maybe that isn’t too unlikely. Put your money where your delusions are!

Regarding expert political judgement, I’m planning to read that book soon.