Speaking of public benefit spaces on the internet, tonight the Internet Archive is having its annual celebration and announcements event. It’s a top contender for the long-term most important site on the internet. The argument for it might begin with it having many copies at many points in time of many sites, mostly accessible to the public (Google, the NSA and others must have vast dark archives), but would not end there.
I think the Internet Archive is awesome. Brewster Kahle, its founder, is too. It is clear to me that he’s the most daring and innovative founder or leader in the bay area/non-profit/open/internet field and adjacencies. And he calls himself Digital Librarian. Hear, hear!
But, the Internet Archive could be even more awesome. Here’s what I humbly wish they would announce tonight:
- A project to release all of the code that runs their websites and all other processes, under free/open source software licenses, and do their work in public repositories, issue trackers, etc. Such crucial infrastructure ought be open to public audit, and welcoming to public contribution. Obviously much of the code is ancient, crufty, and likely has security issues. No reason for embarrassment or obscurity. The code supporting the recording of this era of human communication is itself a dark archive. Danger! Fix it.
- WikiNurture media collections. I believe media item metadata is now unversioned. It should be versioned. And the public should be able to enhance and correct metadata. Currently media in the Internet Archive is much less useful than it could be due to poor metadata (eg I expect music I download from the archive to not have good artist/album/title tags, making it a huge pain to integrate into my listenng habits, including to tell the world and make popular) and very limited relations among media items.
- Aggressively support new free media formats, specifically Opus and WebM right now. This is an important issue for the free and open issue, and requires collective action. Internet Archive is in a key position, and should be exploit is strong position.
- On top of existing infrastructure and much richer data, above, build Netflix-level experiences around the highest quality media in the archive, and perhaps all media with high quality metadata. This could be left to third parties, but centralization is powerful.
- Finally, and perhaps the deadly combination of most contentious and least exciting: stop paying DRM vendors and publishers. Old posts on this: 1, 2, 3. Internet Archive is not in the position Mozilla apparently think they are, of tolerating DRM out of fear of losing relevance. Physical libraries may think they are in such a position, but only to the extent they think of themselves as book vendors, and lack vision. Please, show leadership to the digital libraries we want in the future, not grotesque compromises, Digital Librarian!
These enhancements would elevate Internet Archive to is proper status, and mean nobody could ever again justifiably say that ‘Aside from Wikipedia, there is no large, popular space being carved out for the public good.’
Addendum 20131110: “What happened to the Library of Alexandria?” as a lead in to explaining why the Internet Archive has multiple data centers will take on new meaning from a few days ago, when there was a fire at its scanning center (no digital records were lost). Donate.