Archive for the ‘Blogs’ Category

Technorati DeepCosmos

Saturday, March 5th, 2005

Late last year I requested that some blog aggregator give some indication of the existence of indirect blog post citations, i.e., a blog thread. Adam Hertz suggested that this could be done using Technorati’s API.

I whipped up a crummy implementation the following weekend and contributed a small technorati.py patch along the way. I decided I’m not getting around to producing a non-crummy version, so here it is:

If you attempt to use the DeepCosmos demo the first thing to note is that you need to obtain and use your own Technorati API Key. Check out the examples above if you just want to see what the output looks like.

I haven’t used this much since I wrote it. My request still stands. I’d use the information all the time if integrated into the output of Technorati, Bloglines, Rojo or similar.

CodeCon Saturday

Sunday, February 13th, 2005

CodeCon is 5/5 today.

The Ultra Gleeper. A personal web page recommendation system. Promise of collaborative filtering unfulfilled, in dark ages since Firefly was acquired and shut down in the mid-90s. Presenter believes we’re about to experience a renaissance in recommendation systems, citing Audiocrobbler recommendations (I would link to mine, but personal recommendations seem to have disappeared since last time I looked; my audioscrobbler page) as a useful example (I have found no automated music recommendation system useful) and blogs as a use case for recommendations (I have far too much very high quality manually discovered reading material, including blogs, to desire automated recommendations for more and I don’t see collaborative filtering as a useful means of prioritizing my lists). The Ultra Gleeper crawls pages you link to, treating links as positive ratings, pages that link to you (via Technorati CosmosQuery and Google API), presents suggested pages to rate in a web interface. Uses a number of tricks to avoid showing obvious recommendations (does not recommend pages that are two popular) and pages you’ve already seen (including those linked to in feeds you subscribe to). Some problems faced by typical recommendation systems (new users get crummy recommendations until they enter lots of data, early adopters get doubly crummy recommendations due to lack of existing data to correlate with) obviated by bootstrapping from data in your posts and subscriptions. I suppose if lots of people run something like Gleeper robot traffic increases, more people complain about syndication bandwidth-like problems (I’m skeptical about this being a major problem). I don’t see lots of people running Gleepers as automated recommendation systems are still fairly useless and will remain so for a long time. Interesting software and presentation nonetheless.

H2O. Primarily a discussion system tuned to facilitate professor-assigned discussions. Posts may be embargoed and professor may assign course participants specific messages or other participants to respond to. Discussions may include participants from multiple courses, e.g., to facilitate a MIT engineering-Harvard law exchange. Anyone may register at H2O and create own group, acting as professor for created group. Some of the constraints that may be iposed by H2O are often raised in mailing list meta discussions following flame wars, in particular posting delays. I dislike web forums but may have to try H2O out. Another aspect of H2O is syllabus management and sharing, which is interesting largely because syllabi are typically well hidden. Professors in the same school of the same university may not be aware of what each other are teaching.

Jakarta Feedparser. Kevin Burton gave a good overview of syndication and related standards and the many challenges of dealing with feeds in the wild, which are broken in every conceivable way. Claims SAX (event) based Jakarta FeedParser is an order of magnitude faster than DOM (tree) based parsers. Nothing new to me, but very useful code.

MAPPR. Uses Flickr tags, GNS to divine geographic location of photos. REST web services modeled on Flickr’s own. Flash front end, which you could spend many hours playing with.

Photospace. Personal image annotation and search service, focus on geolocation. Functionality available as library, web fron end provided. Photospace publishes RDF which may be consumed by RDFMapper.

Note above two personal web applications that crawl or use services of other sites (The Ultra Gleeper is the stronger example of this). I bet we’ll see many more of increasing sophistication enabled by ready and easily deployable software infrastructure like Jakarta FeedParser, Lucene, SQLite and many others. A personal social networking application is an obvious candidate. Add in user hosted or controlled authentication (e.g., LID, perhaps idcommons) …

Yesterday.

Not following tags

Thursday, January 20th, 2005

“Do not credit this link” is a useful assertion that cannot be gleaned from surrounding content.

Thus, rel="nofollow" is a good if old idea. At least one of my two search predictions for 2005 is already coming true.

Creator assigned keywords or “tags” on the other hand, strike me as a contemporary implementation of HTML meta description tags, which failed because they placed a burden on good webmasters (classification is hard) and presented an open field for spammers, who tag[ged] their pages making a hard sell for whatever with completely unrelated keywords.

Global classification strikes me as a case in which Google is right — metadata inferred from content beats explicit, manual metadata when it comes to categorization. From the Peter Norvig (Google Director of Search Quality) interview I cited:

This is a Google News page from last night, and what we’ve done here is apply clustering technology to put the news stories together in categories, so you see the top story there about Blair, and there’re 658 related stories that we’ve clustered together.

Now imagine what it would be like if instead of using our algorithms we relied on the news suppliers to put in all the right metadata and label their stories the way they wanted to. “Is my story a story that’s going to be buried on page 20, or is it a top story? I’ll put my metadata in. Are the people I’m talking about terrorists or freedom fighters? What’s the definition of patriot? What’s the definition of marriage?”

Folksonomies are great in limited domains, thus far most famously for organizing and sharing bookmarks (decentralize using same technology as Technorati’s self-tagging) and organizing photos.

Keyword tagging is also a lightweight way to provide navigation for a website. I might categorize more posts on this weblog if I could do so in a similarly lightweight manner (now I have to create categories via an interface separate from posting). Haven’t I come right back to the creator-assigned keywords that I criticized above? No, there’s a subtle but very important difference: metadata as a side effect of useful work versus metadata as spammy make work.

N-level blog entry references

Monday, December 27th, 2004

Dear LazyWeb,

Bloglines, Technorati and probably others do a passable job of presenting direct references to a blog entry. (Minor complaints: With Bloglines you have to subscribe to a feed or preview with a “siteid” internal to Bloglines; if your blog has multiple duplicative feeds (e.g., rdf/rss2/atom) direct entry references only appear for one of the feeds; Bloglines makes no attempt to consolidate or allow feed owners to consolidate; Technorati appears to use screen scraping and picks up some garbage along the way.)

So here’s my LazyWeb request:

I want to know, without lots of extra clicking, not just resources that directly cite blog entry A, but resources that cite resources that cite blog entry A, and so on. In the context of Bloglines, instead of “2 references” I want to see “2 direct references, 3 level1 indirect references, 1 level2 indirect reference”, “6 references, 2 direct” or similar, and I want to be able to see all references, direct and indirect, on a single page. In the context of Technorati, indirect references could optionally be part of a “watchlist” feed.

Will Bloglines, Technorati, or some up and coming aggregation service please do this?

I’m not terribly interested in visualization of social networks implied by blogs (blogosphere visualization, blogversation maps?) or even blogthread visualization and the like here. Neat, but too heavyweight to use daily. I just want a small feature increment.

Speculate on Creators

Wednesday, November 17th, 2004

Alex Tabarrok writes about An Auction Market for Journal Articles (PDF). Publishers bid for the right to publish a paper. The amount of the winning bid is divided by the authors and publishers of papers cited by the paper just auctioned. Unless I’m missing something all participating journals taken together lose money unless the share of cited authors is zero and transaction costs are nil. Still, the system could increase incentives to publish quality papers, where “subsequent authors will want to cite this” is a proxy for quality.

I’m reminded a tiny bit of BlogShares (”Blogs are valued by their incoming links and add value to other blogs by linking to them”), but especially of Ian Clarke’s FairShare, which is a proposal for speculative donations:

Anybody can “invest” in an artist, and if that artist goes on to be a success, then the person is reward in proportion to their investment and how early they made it. But where does this return on investment come from? The answer is that it comes from subsequent investors. For example, lets say that you invest $10. $4.50 might go straight to the band, $1 might go to the operator of the system, and the remaining $4.50 would be distributed among previous investors in the band, those who invested more early would get a bigger proportion than those who invested less, later-on. Of course, most people will not make a profit, but they are rewarded by knowing that they contributed towards an artist that they liked, and helped reward others who believed in that artist, and who may have brought the artist to their attention.

Under FairShare participating creators taken together and individually would make money, as payments are from without the system, driven by the generosity and greed of fans and speculators.

A system in the spirit of one or both of these proposals could perhaps help fund a voluntary collective licensing scheme of the sort contemplated for digital music, but conceivably applicable to other types of work.

If the journal market idea really could foster a self-sustaining business model it could be a boon to the open access movement. Restricting access is rather pointless when your main business concern is to get your articles cited.

I’ve rambled about open access models elsewhere.

Invitation Marketing: Six Gmail Shills Available

Thursday, September 2nd, 2004

Consulting firm Accenture has a paper called Invitation Marketing: Using Customer Preferences to Overcome Ad Avoidance. While the paper paints in broad strokes, it is clear that Google has implemented a variation with great (Orkut) and even greater (Gmail) success.

How many otherwise respectable folk have you seen dedicating email broadcasts and blog entries to announcing that they have a few Gmail invites to give away, especially in the last couple weeks? I lost count long ago. Ad avoidance overcome, indeed.

Kudos to Google’s marketing department.

Updates: Wendy Seltzer cited this post: Gmail’s Viral Marketing. My trackback broke. Oops.

I like Joey Hess’s take on the Gmail invite virus: stop wasting my time with gmail. Joey notes that the going price on eBay for both Gmail invites and ancient 1 gigabyte hard drives is less than one dollar.

Sloths and Their Slothfulness

Tuesday, June 8th, 2004

Via Elizabeth Rader I discovered Kairosnews criticizing the Creative Commons weblog and others for using non-free weblog software. The CC weblog currently uses the “lars-blogger” package for OpenACS, both GPL.

I would’ve posted a comment to Kairosnews, but that would’ve required registering and logging in. Trackback is great for sloths.

Sort of apropos: I didn’t switch to WordPress, but I did delay starting a public blog for ages while waiting for simple libre blog software that supports pretty URLs, comments, trackbacks, pings, syndication, etc. Other reason for delay: slothfulness.

Will weblog software will disappear as a category? I want to manage an entire site with one application (up til now: “vi”, more or less). It isn’t hard for a CMS to include a nice weblog feature. It is kind of a pain for users to force weblog applications to serve as a whole-site CMS, though many people do that.

Client-side remixing isn’t so loopy

Saturday, March 13th, 2004

Lucas Gonze’s analysis of client-side remixing is spot on. Summary: client-side remixing is to precise syncrhonization as HTML is to precise layout. If you don’t need precision, enjoy.

I see three limits to client-side remixing. All can be raised:

  • Bad client software. It either doesn’t work or barely works and you need a very keen eye to find a gratis download amongst enticements to buy a super-premium subscription version (cf RealPlayer).
  • Lack of expressivity. Remixers don’t just overlay source segments, they also apply various effects to the same.
  • Streaming-like experience. In order to obtain a smooth client-side remix playback you (actually your client, this is a subset of “bad client software”) will have to download most of the needed source content first. I often have a bad experience with playing-while-downloading of individual songs and videos over the net, nevermind many coordinated sources.

I suspect that with excellent client software the client-side remix experience could be very good. Lack of expressivity seems like the toughest hurdle to me. However, if said excellent client software can download and run code safely … effectlets?

Video games seem like a highly constrained example of what client-side remixing could do. They pull off co-ordinating lots of different source media (sometimes all local, but that’s beside the point) with code quite well, versus hardcoding different sources into a single stream at the point of production.

However, anytime in the near future using client-side remixing to evade those who would prevent distribution of The Grey Album and the like is pointless. Client-side remixing isn’t up to the task, and you can still download the album from the web after weeks of brouhaha, nevermind P2P networks.

Memory augmentation: cc-metadata client-side remixing [1] [2]

REGISTER NOW. IT’S FREE AND IT’S REQUIRED.

Thursday, February 26th, 2004

Experimenting with vote links:

Goodbye, WP, join the LAT in the infinite unread bin.