Post Creative Commons

SXSWi wrap

Saturday, March 18th, 2006

There were a surprising number of panels more or less concerning entrepreneurship. I only attended one of these, Sink or Swim: The Five Most Important Startup Decisions. It was very mildly amusing but as far as I could tell the only important decision discussed was whether to look for outside funding or not, a well-trod topic if there ever was one. There was even one panel on Selling (Big Ideas to Big Clients).

I understand that was mentioned in passing on many panels. Attendees coming to our booth were much better informed than in years past, part of a greater trend.

The Digital Preservation and Blogs panel I was on was interesting for the self-selection of the audience — I imagine every librarian and historian attending were present. A writeup, photo, and my narrow take.

Both accepted panels I helped conceive went very well, especially Open Science. Though an outlier for SXSW the audience Q&A as high quality. Moderator John Wilbanks did a great job of keeping a diverse panel (open access journal editor, synthetic biologist, IT standards person, and VC) on point.

Commons-Based Business Models included Ian Clarke of Revver, which encourages sharing of short videos with an unobtrusive advertisement at the end under a CC license that does not permit derivative works. This licensing choice was made so that stripping out the advertisement is not permitted. Jimmy Wales challenged Clarke to think about opening up some content on an experimental basis. Sounds like a good idea to me. I suggested from the audience that attribution can require a link back to Revver, so even modified videos are valuable. Clarke responded that advertising at a link away is far less valuable. True, but the question is whether derivative works that could not otherwise exist become popular enough to outweigh those that merely remove advertising. I suspect many derivatives would be uploaded directly to Revver, allowing the company and original creators to take full advantage of additional revenue and to become the leading site for explicit remixing of video, a la ccMixter for audio. Seems worth an experiment — Revver is in no danger of becoming the leading video site at the current rate.

I also asked Clarke about interest in his patronage system. He said Revver is aimed at the same problem (funding creators) but was easier to implement. In the same vein I met John Pratt of Fundable, which is based in Austin. I got the impression he didn’t think the service could be viral (I disagree). I’ve written about FairShare, Fundable and related ideas several times in the past, mostly linked to in my Public Goods Group Shopping post and its comments. The field is ripe for a really good service.

The EFF/CC party was very well attended, even not considering its obscure location (an Elks club). In the middle of the facility was a room of Elks members, playing cards and other games, oblivious to the SXSW crowd that outnumbered Elks even in that room. I gave a very brief thank-you speech for CC, which I closed with a prayer (because we were in Texas) to J.R. “Bob” Dobbs (because we were in Austin).

At the end of the trade show Rob Kaye alerted me to the giveaway of every book at a well-respected computer publisher’s booth to “cool geeks” or similar. 5-10 years ago this would’ve really excited me, but this time I was mostly concerned about bulk and weight. I took a few. I suspect they’ll be among the last computer books I obtain, free or otherwise.

James Surowiecki gave a presentation which I did not attend but I hear focused on prediction markets. I should’ve made the time to attend simply to see the crowd reaction. Several of the latest sites cropping up in that field certainly look like they were designed by potential SXSW attendees — circa 2004/5 generically attractive web applications. I should have some posts on that topic soon, starting with Chris F. Masse’s 2005 Awards.

MusicBrainzDNS

Sunday, March 12th, 2006

Congratulations to for taking care of a longstanding substandard feature — a proprietary and not very scalable acoustic fingerprinting technology (Relatable TRM). Today MusicBrainz announced integration with MusicIP’s MusicDNS fingerprinting service, full details in the announcement.

Funny thing, I just cleared all the (old, mostly gathered in 2001) TRM tags from a couple weeks ago.

Creative Commons license tracking is also now enabled at both MusicBrainz and MusicIP, no doubt more on that at the CC weblog in the near future.

Belated congratulations to MusicBrainz for signing their first commercial deal in January.

I wrote some about MusicBrainz about 15 months ago. I predict the next 15 months will be very good for what I’ll call “open music infrastructure.”

Semantic Technology Conference wrap

Saturday, March 11th, 2006

The 2006 Semantic Technology Conference was more interesting than I expected. The crowd was older and much more formally dressed and there was far less emphasis on open source solutions than any conference I’ve attended in a long time but it wasn’t merely a vendor schmoozefest.

James Hendler and Ora Lassila’s Semantic Web @5 keynote claimed that Semantic Web technologies have made great strides over the past five years. They pointed out that middle levels of the Semantic Web layer cake are mature and higher levels are subjects of funded research (in 2001 lower and middle levels were mature and research respectively). Near the end they made a strong call to “share; give it away!” — open source tools, datasets, and harvesters are needed to grow the Semantic Web.

My presentation on Semantic Search on the Public Web with Creative Commons went fairly well apart from some audio problems. I began with a hastily added segue (not in the slides) from the keynote, highlighting Science Commons’ database licenseing FAQ and Uniprot. Questions were all over the map, befitting the topic.

I think Uche Ogbuji’s Microformats: Partial Bridge from XML to the Semantic Web is the first talk I’ve heard on that I’ve heard from a non-cheerleader and was a pretty good introduction to the upsides and downsides of microformats and how can leverage microformats for officious Semantic Web purposes. My opinion is that the value in microformats hype is in encouraging people to take advantage of XHTML semantics in however a conventional in non-rigorous fashion they may. It is a pipe dream to think that most pages containing microformats will include the correct profile references to allow a spec-following crawler to extract much useful data via GRDDL. Add some convention-following heuristics a crawler may get lots of interesting data from microformatted pages. The big search engines are great at tolerating ambiguity and non-conformance, as they must.

Ogbuji’s talk was the ideal lead in to Ben Adida’s Interoperable Metadata for a Bottom-Up Semantic Web which hammered home five principles of metadata interoperability: publisher independence, data reuse, self-containment, schema modularity, and schema evolvability. , , Microformats, GRDDL, and RDF/A were evaluated against the principles. It is no surprise that RDF/A came out looking best — Adida has been chairing the relevant W3C taskforce. I think RDF/A has great promise — it feels like microformats minus annoyances, or microformats with a model — but may say otherwise. The oddest response to the talk came from someone of the opinion that [X]HTML is irrelevant — everything should be custom XML rendered with custom XSLT when necessary.

I was somewhat surprised by the strong focus of most talks and vendors on RDF and friends rather than any other “semantic technologies.” was one exception. He apparently claimed last year that by this year would be growing primarily through machine learning rather than input by knowledge engineers. A questioner called Lenat on this prediction. Lenat claimed the prediction came true but did not offer any quantatative measure. It looked like from the slides (unavailable) that Cyc can have databases and similar described to it and may access same (e.g., via JDBC), giving it access to an arbitrary number of “facts.”

If there was a theme that flowed through the conference it was first integrating heterogenous data sources (I don’t recall who, but someone characterized semantic technologies as liberating enterprises from vendors) and second multiplying the value of that data through linking and inference.

Mills Davis’ closing keynote blew up these themes, claiming outrageous productivity improvements are coming very shortly due to semantic technologies, including a slide. The conference hotel fire alarm went off during the keynote, serving as a hype alert to any willing to hear.

SemTech06 reinforces my confidence in what I said in the SemWeb, AI, Java: The Ontological Parallels mini-rant given at SXSW last year. Too bad they rejected my proposal for this year:

Semantic Web vs. Tags Deathmatch: Tags are hot, but are they a dead end? The Semantic Web is still a research project, but will it awaken in August, 2009? People in the trenches fight over the benefits and limits of tags, the viability of officious Semantic Web technologies, what the eventual semantic web will look like, and how to bridge the gap between the two.

I’m off to SXSW tomorrow anyway. My schedule.

Creative Commons Salon San Francisco

Thursday, March 2nd, 2006

Next Wednesday evening in San Francisco come to the Creative Commons Salon sadly not featuring avant-drone-noise electric violin music played on a stage behind white sheets (apropos of nothing apart from listening to that now and not wanting anything else), but should be pretty excellent anyway.

I’ve wanted to do a one-time event like this for a long time, but Eric Steuer and Jon Phillips, who are curating the event and series to be are doing a far better job than I ever could have.

Also next week I’ll be presenting at the 2006 Semantic Technology Conference, then on to SXSW for a panel on digital preservation and blogs and silly parties, but leaving too soon to see the great Savage Republic perform on Friday (in two weeks). Perhaps I should change my flight and find somewhere to crash for two days?

My partner and I are also looking for a new apartment. Know of a great place in San Francisco around $2,000/month and not in the far west or south?

Update 20060303:

CC salon invite

Note Shine’s 1337 address.

Update 20060311: Success.

CodeCon Friday

Saturday, February 11th, 2006

This year Gordon Mohr had the devious idea to do preemtive reviews of CodeCon presentations. I’ll probably link to his entries and have less to say here than last year.

Daylight Fraud Prevention. I missed most of this presentation but it seems they have a set of non-open source Apache modules each of which could make phishers and malware creators work slightly harder.

SiteAdvisor. Tests a website’s evilness by downloading and running software offered by the site and filling out forms requesting an email address on the site. If virtual Windows machine running downloaded software becomes infected or email address set up for test is inundated with spam the site is considered evil. This testing is mostly automated and expensive (many Windows licenses). Great idea, surprising it is new (to me). I wonder how accurate evil readings one could obtain at much lower cost by calculating a “SpamRank” for sites based on links found in email classified as spam and links found on pages linked to in spams? (A paper has already taken the name SpamRank, though at a five second glance it looks to propose tweaks to make PageRank more spam-resistant rather than trying to measure evil.) Fortunately SiteAdvisor says that both bitzi.com and creativecommons.org are safe to use. SiteAdvisor’s data is available for use under the most restrictive Creative Commons license — Attribution-NonCommercial-NoDerivs 2.5.

VidTorrent/Peers. Streaming joke. Peers, described as a “toolkit for P2P programming with continuation passing style” I gather works syntactically as a Python code preprocessor, could be interesting. I wish they had compared Peers to other P2P toolkits, e.g., .

Localhost. A global directory shared with a modified version of the BitTorrent client. I tried about a month ago. Performance was somewhere between abysmal and nonexistent. BitTorrent is fantastic for large popular files. I’ll be surprised if localhost’s performance, which depends on transferring small XML files, ever reaches mediocrity. They’re definitely going away from BitTorrent’s strengths by uploading websites into the global directory as lots of small files (I gather). The idea of a global directory is interesting, though tags seem a more fruitful navigation method than localhost’s hierarchy.

Truman. A “sandnet” for investigating suspected malware in. Faux services (e.g., DNS, websites) can be scripted to elicit the suspected malware’s behavior, and more.

[Hot]link policy

Sunday, January 15th, 2006

I’m out of the loop. Until very recently (upon reading former Creative Commons intern Will Frank’s writeup of a brief hotlink war) I thought ‘‘ was an anachronistic way to say ‘link’ used back when the mere fact that links led to a new document, perhaps on another server, was exciting. It turns out ‘hotlink’ is now vernacular for inline linking — displaying or playing an image, audio file, video, or other media from another website.

Lucas Gonze, who has lots of experience dealing with hotlink complaints due to running Webjay, has a new post on problems with complaint forms as a solution to hotlinks. One thing missing from the post is a distinction between two completely different sets of complainers who will have different sets of solutions beyond complaining.

One sort of complainer wants a link to a third party site to go away. I suspect the complainer usually really wants the content on the third party site to go away (typically claiming the third party site has no right to distribute the content in question). Removing a link to that content from a link site works as a partial solution by making the third party hosted content more obscure. A solution in this case is to tell the complainer that the link will go away when it no longer works — in effect, the linking site ignore complaints and it is the responsibility of the complainer to directly pursue the third party site via and other threats. This allows the linking site to completely automate the removal of links — those removed as a result of threatened or actual legal action look exactly the same as any other link gone bad and can be tested for and culled using the same methods. Presumably such a hands-off policy only pisses off complainers to the extent that they become more than a minor nuisance, at least on a Webjay-like site, though it must be an option for some.

Creative Commons has guidelines very similar to this policy concerning how to consider license information in files distributed off the web — don’t believe it unless a web page (which can be taken down) has matching license information concerning the file in question.

Another sort of complainer wants a link to content on their own site to go away, generally for one or two reasons. The first reason is that hotlinking uses bandwidth and other resources on the hotlinked site which the site owner may not be able to afford. The second reason, often coupled with the first, is that the site owner does not want their content to be available outside of the context of their own site (i.e., they want viewers to have to come to the source site to view the content).

With a bit of technical savvy the complainer who wants a link to their own site removed has several options for self help. Those merely concerned with cost could redirect requests without the relevant referrer (from their own site) or maybe cookie (e.g., for a logged in user) to the free or similar, which should drastically reduce originating site bandwidth, if hotlinks are actually generating many requests (if they are not there is no problem).

A complainer who does not want their content appearing in third party sites can return a small “visit my site if you want to view this content” image, audio file, or video as appropriate in the abscense of the desired referrer or cookie. Hotlinking sites become not an annoyance, but free advertising. Many sites take this strategy already.
Presumably many publishers do not have any technical savvy, so some Webjay-like sites find it easier to honor their complaints than to ignore them.

There is a potential for technical means of saying “don’t link to me” that could be easily implemented by publishers and link sites with any technical savvy. One is to interpret exclusions to mean “don’t link to me” as well as “don’t crawl and index me.” This has the nice effect that those stupid enough to not want to be linked to also become invisible to search engines.

Another solution is to imitate — perhaps rel=nolink, though the attribute would need to be availalable on img, object, and other elements in addtion to a, or simply apply rel=nofollow to those additional elements a la the broader interpretation of robots.txt above.

I don’t care for rel=nolink as it might seem to give some legitimacy to brutally bogus link policies (without the benefit of search invisibility), but it is an obvious option.

The upshot of all this is that if a link site operator is not as polite as Lucas Gonze there are plenty of ways to ignore complainers. I suppose it largely comes down to customer service, where purely technical solutions may not work as well as social solutions. Community sites with forums have similar problems. Apparently Craig Newmark spends much of his time tending to customer service, which I suspect has contributed greatly to making such a success. However, a key difference, I suspect, is that hotlink complainers are not “customers” of the linking site, while most people who complain about behavior on Craigslist are “customers” — participants in the Craigslist community.

Credit card numbers from π

Sunday, January 15th, 2006

credit card numbers from pi

I had to run an errand and was disappointed to find Andi had left the channel. I really wanted to help him in his quest for credit card numbers. They are all to be found in . If Andi is any good he could’ve fleeced others searching for credit card numbers with that one.

Addendum: It’s an old joke. I probably heard it before and forgot.

Search 2006

Saturday, January 14th, 2006

I’m not going to make new predictions for search this year — it’s already underway, and my predictions for 2005 mostly did not come true. I predict that most of them will, in the fullness of time:

Metadata-enhanced search. Yahoo! and Google opened Creative Commons windows on their web indices. Interest in semantic markup (e.g., microformats) increased greatly, but search that really takes advantage of this is a future item. (NB I consider the services enabled by more akin to browse than search and as far as I know they don’t allow combinging tag and keyword queries.)

Proliferation of niche web scale search engines. Other than a few blog search services, which are very important, I don’t know of anything that could be called “web scale” — and I don’t know if blog search could really be called niche. One place to watch is public search engines using Nutch. Mozdex is attempting to scale up, but I don’t know that they really have a niche, unless “using open source software” is one. Another place is Wikipedia’s list of internet search engines.

On the other hand, weblications (as Web 2.0) did take off.

I said lots of desktop search innovation was a near certainty, but if so, it wasn’t very visible. I predicted slow progress on making multimedia work with the web, and I guess there was very slow progress. If there was forward progress on usable security it was slow indeed. Open source did slog toward world domination (e.g., Firefox is the exciting platform for web development, but barely made a dent in Internet Explorer’s market share) with Apple’s success perhaps being a speed bump. Most things did get cheaper and more efficient, with the visible focus of the semiconductor industry swinging strongly in that direction (they knew about it before 2005).

Last year I riffed on John Battelle’s predictions. He has a new round for 2006, one of which was worth noting at Creative Commons.

Speaking of predictions, of course Google began using prediction markets internally. Yahoo!s Tech Buzz Game has some markets relevant to search but I don’t know how to interpret the game’s prices.

Lightnet!

Monday, January 9th, 2006

Congratulations to Lucas Gonze on the /Yahoo! merger. (Via Kevin Burton.)

Yahoo! made a very wise decision to be acquired by the light side rather than the dark side.

My favorite Gonze post: Totally fucking bored with Napster (more at CC).

Also have a listen to the best track on ccMixter (if you share my taste, probably not), also a Gonze creation.

I could gonze on, but enough of this!

XTech 2006 CFP deadline

Tuesday, January 3rd, 2006

I mentioned elsewhere that I’m on the program committe for XTech 2006, the leading web technology conference in Europe, to be held in Amsterdam May 16-19.

Presentation, tutorial and panel proposals are due in less than a week–January 9. If you’re building an extraordinary Web 2.0 application or doing research that Web 2.0 (very broadly construed) developers and entrepreneurs need to hear about, please consider submitting a proposal.

See the CFP and track descriptions.