Google whenever

For years I’ve heard speculation that Google is buiding a web archive. Now there are domain name purchases to fuel the speculation. The Internet Archive has been providing an invaluable service with the and has set up mirrors in multiple jurisdictions, but recording the web is too important to rely on any single organization, no matter how good or robust. So I hope Google and others are maintaining web archives and will make them available to the public.

Via Tim Finin, who also notes an interesting paper about using article and user history to assign trust levels to Wikipedia article fragments and a Semantic Web archive.

Archives are important for establishing provenance in many situations, though one I’m particularly interested in is citing that a particular work was offered under a Creative Commons license at a particular time. This and other uses (e.g., citation in general, which is often of the form “http://example.com accessed 2005-03-10”, though who knows if a copy of the content as it existed on that date exists) would be enhanced if on-demand archiving were available. The Internet Archive does offer Archive-It.org, but this service is for institutional use and uses periodic crawls rather than immediate archiving of individual pages.

Update, 2 minutes later: I should read a bit more before posting: does exactly what I want. However, I hate that it uses opaque identifiers, and as such is nearly as evil as TinyURL.

2 Responses

  1. […] In the case of non-Wikipedia links (and those too), combatting linkrot and providing alternate and related (e.g., reference, reply, archival) links is an obvious feature add for social bookmarking services and can be made available to a CMS or browser via the usual web API/feed/scraping mechanisms. […]

Leave a Reply