Post Programming

Darkfox

Tuesday, December 27th, 2005

I hate to write about software that could be vaporware, but AllPeers (via Asa Dotzler) looks like a seriously interesting darknet/media sharing/BitTorrent/and more Firefox extension.

It’s sad, but simply sending a file between computers with no shared authority nor intermediary (e.g, web or ftp server) is still a hassle. IM transfers often fail in my experience, traditional filesharing programs are too heavyweight and are configured to connect to and share with any available host, and previous attempts at clients (e.g., ) were not production quality. Merely solving this problem would make AllPeers very cool.

Assuming AllPeers proves a useful mechanism for sharing media, perhaps it could also become a lightnet bridge– as a Firefox extension.

Do check out AllPeers CTO Matthew Gertner’s musings on the AllPeers blog. I don’t agree with everything he writes, but his is a very well informed and well written take on open source, open content, browser development and business models.

Songbird Media Player looks to be another compelling application built on the (though run as a separate program rather than as a Firefox extension), to be released real soon now. 2006 should be another banner year for Firefox and Mozilla technology generally.

Lucas Gonze’s original lightnet post is now near the top of results for ‘lightnet’ on Google, Yahoo!, and MSN, and related followups fill up much of the next few dozen results, having displaced most of the new age and lighting sites that use the same term.

Machine learning patterns

Sunday, November 27th, 2005

I first heard of the Silicon Valley Patterns meetings from Alex Chafee a few years ago while participating in his “bootstrap” practice group. SVP sounded like fun, but I only got around to attending a meeting this spring, a one-off on led by Johannes Ernst (notes). I was going to write something about that meeting, but just can’t get worked up about digital identity.

SVP’s next extended track was on , a topic I have some interest in and very cursory knowledge of from reading popular books on AI. The track lasted from May through October. Mostly our study was guided by Andrew Moore’s statistical data mining tutorials, with occasional reference to Russell & Norvig.

I don’t think any of the regular attendees were machine learning experts, but with occasional contributions from everyone, I think everyone was able to increase their knowledge of the material. Overall a gratifying method of learning, though not a perfect substitute for lecture.

My secondary take way from the track was that I need a serious brush up on calculus and statistics, neither of which I’ve studied, and barely used, in fifteen years. I’m working on that.

The current SVP track should be very different–hands on Ruby on Rails practice. I’m attempting to justify putting in the time…

Lucene red handed

Thursday, August 25th, 2005

A review of Lucene in Action posted on Slashdot yesterday reminded me to make this post. I read the book in March shortly before giving a related talk at Etech in order to avoid sounding too stupid.

Lucene in Action is very well written. I liked the presentation of code samples as and found almost no fluff. If you don’t have a background in (I don’t) I think you’ll enjoy this book for the background information on IR that is thoroughly integrated with the text even if you have no plans to use (though you’ll obtain an itch to use Lucene, it’s so simple and powerful).

One non-technical comment I made about Lucene in the Etech talk is that it may be another open source . As eliminated much of the opportunity to sell HTTP servers, I suspect Lucene will eliminate much of the opportunity to sell embedded search libraries (which seems somewhat significant judging by the quantity of ads for same in programming magazines).

Agriculture

Wednesday, August 3rd, 2005

It’s possible to explore too little, only producing, getting stuck in a productive rut, a local maximum, eventually obsolete. I’m too often stuck in the opposite rut–only exploring, not getting anything done. So I take vicarious pleasure in reading that Bryn Keller has settled down and chosen a language. Typically, I subscribed to Keller’s blog about six months ago while looking at , a very pragmatic and nice JVM hosted language about which Keller has written several times.

Keller chose Haskell, which is doubtless a good choice, though it sounds like he doesn’t have any outside constraints:

The thing is, I’ve noticed that the code I write in Haskell is usually more elegant than the code I write in other languages, and since this is my time, I can choose what’s important. Crisp, elegant code is important to me.

These days what little programming I get to is mostly Python (largely because Nathan brings Python expertise to my employer, and Python is acceptable), Tcl (two legacy codebases), and PHP and Java (because they’re impossible to avoid).

I don’t think I can resolve to settle down, but I do resolve to retire those Tcl codebases, real soon now (nothing against Tcl; I’ve grown a bit fond of it over the years).

Where is server side JavaScript?

Thursday, July 7th, 2005

Nearly a decade ago Netscape released Enterprise Server 2.0 with LiveWire, their name for JavaScript used as a server side web scripting language as PHP is most commonly today. LiveWire was extremely buggy, but Netscape was golden in large organizations, so I had the opportunity to develop or support development of several large web applications written in LiveWire. The world’s buggiest webmail client ever was a component of one of them.

Thankfully Netscape’s server products faded over the next few years. As far as I know LiveWire is almost completely forgotten.

The only uses of server side JavaScript that I’m aware of today are in Helma Object Publisher and as an alternative scripting language for Active Server Pages (though I understand the vast majority of ASP development uses VBScript). Some Java-based web applications may embed the Rhino JavaScript engine (that’s what the Helma framework does, prominently).

I’m mildly suprised that server side JavaScript isn’t more popular, given the opportunity for sharing skills and code with client side JavaScript. Data validation is one obvious opportunity for the same code executed on both the web browser and server, however the one that prompted me to write this post is web services. Suppose you want to offer a “web service” that can be consumed directly by browsers, i.e., a JavaScript application without a UI. You also want to offer approximately the same service as a traditional web service for consumption by non-JavaScript clients, and you don’t want to write much of the same code twice.

The only page I could find about sharing JavaScript code between client and server applications is a terse article on Shared Sides JavaScript.

So why hasn’t JavaScript seen more use on the server side? Possibilities:

  • JavaScript is unfairly looked down upon by server developers. (The success of PHP is a counterexample.)
  • Client side JavaScript is typically spaghetti code written by designers. The cost of sharing code between client and server applications in this context is too high.
  • No obvious way to deploy JavaScript code in Apache. There was work on mod_javascript/mod_js in the late nineties, but I see no evidence it went anywhere.
  • It’s easier for developers to handle different languages on the client and server. In my experience with LiveWire last decade I did encounter a few developers unclear on the concept that some JavaScript executed on the server, some on the client.

Perhaps the recent hype around AJAX will attact a critical mass of good programmers to JavaScript, some of whom will want to reuse well structured code on the server, leading to server side JavaScript’s first renaissance.

Pre-posting update: As I was about to post this Brian Heung told me about TrimJunction, which has more or less the motivation I was thinking of:

With an eye towards Don’t Repeat Yourself, spiced up with a little bit of Ajax, the grand vision of the TrimPath Junction project is to be able to write web application logic just once, in familiar languages like JavaScript.

A Junction-based web application should run, validate input, process data, generate output and do its thing on both the server AND the client. We intend to use the Rhino JavaScript runtime for the server and use your favorite, modern web browser for the client.

Check out the Tales on TrimPath blog for some interesting JavaScript ideas. The Junction announcement is here, two months old.

Update 20050716: OpenMocha is ready for a spin:

The goal of OpenMocha is to maximize the fun and productivity of Javascript development by blending the gap between browser and server based scripting.

SemWeb, AI, Java: The Ontological Parallels

Friday, March 18th, 2005

“The Semantic Web: Promising Future or Utter Failure”, the panel I took part in at SXSW shed little light on the topic. Each panelist (including me) brought their own idiosyncratic views to bear and largely talked past each other. The overall SXSW interactive crowd seemed to tend toward web designers and web marketers, not sure about the audience for this panel. Some people, e.g., Chet Campbell, and others in person, apparently left with the impression that all of the panelists agreed that the semantic web is an utter failure (not my view at all).

Sam Felder and Josh Knowles have posted loose transcripts and Christian Bradford a photo of the panel.

The approximate (with links and a few small corrections) text of my introductory statement follows. I got a few laughs.

I want to draw some parallels between semantic web technologies and artificial intelligence and between semantic web technologies and Java.

AI was going to produce intelligent machines. It didn’t and since the late 80s we’ve been in an “AI winter.” That’s nearly twenty years, so web people who suffered and whined in 2001-3, your cup is more than half full. Anyway since then AI techniques have been used in all sorts of products, but once deployed the technology isn’t seen as AI. I mean, where are the conscious robots?

Semantic web technologies have a shorter history, but may play out similarly: widely used but not recognized as such. Machine “agents” aren’t inferring a perfect date for me from my FOAF profile. Or something. This problem is magnified because there’s a loose connection between sematnic web grand visions and AI. People work on both at MIT after all.

Now Java. Applets were going to revolutionize the web. In 1996! Applets didn’t work very well, but lots of people learned Java and it runs out Java is a pretty good solution on the server side. Java is hugely successful as the 21st century’s COBOL. Need some “business logic?” You won’t get fired for implementing it in Java, preferably using JDBC, JSP, JMX, JMS, EJB, JAXB, JDO and other buzzword-compliant APIs.

Semantic web technologies may be following a similar path. Utter failure to live up to initial hype in a sexy domain, but succeeding in the enterprise where the money is anyway. I haven’t heard anyone utter the word enterprise at this conference, so I won’t repeat it.

It turns out that semantic web technologies are really useful for data integration when you have heterogenous data, as many people do these days. Just one example: Oracle will support a “Network Data Model” in the next release of their database. That may sound like a throwback if you know database history, but it basically means explicit support for storing and querying graphs, which are the data model of RDF and the semantic web.

If you talk to a few of the people trying to build intelligent machines today, who may use the term Artificial General Intelligence to distinguish themselves from AI, you may get a feeling that AI research hasn’t really moved us toward the goal of building an AGI.

Despite Java’s success on the server it is no closer to being important on the web client than it was in 1996. It is probably further. If what you care about is sexy web browser deployment, all Java’s server success has accomplished is to keep the language alive.

Semantic web technologies may be different. Usefulness in behind the scenes data integration may help these technologies gain traction on the web. Why? Because for someone trying to make use of data on the web, the web is one huge heterogenous data integration problem.

An example of a project that uses RDF for data integration that you can see is mSpace. You can read a a paper about how they use RDF inside the application, but it won’t be obvious to an end user that they’re a semantic web technologies application, and that’s as it should be.

One interesting thing about mSpace is that they’re using a classical music ontology developed by someone else and found on SchemaWeb. SchemaWeb is a good place to look for semantic web schemas that can be reused in your project. Similarly, rdfdata.org is a good place to look for RDF datasets to reuse. There are dozens of schemas and datasets listed on these sites contributed by people and organizations around the world, covering beer, wine, vegetarian food, and lots of stuff you don’t put in your mouth.

I intended to close my statement with a preemption of the claim that use of semantic web technologies mandates hashing everything out in committees before deployment (wrong), but I trailed off with something I don’t recall. The committee myth came up again during the discussion anyway.

Perhaps I should’ve stolen Eric Miller’s The Semantic Web is Here slides.

Bitcollider-PHP

Saturday, March 5th, 2005

Here you’ll find a little PHP API that wraps the single file metadata extraction feature of Bitzi’s bitcollider tool. Bitcollider also can submit file metadata to Bitzi. This PHP API doesn’t provide access to the submission feature.

Other possibly useful code provided with Bitcollider-PHP:

  • function hex_to_base_32($hex) converts hexidecimal input to Base32.
  • function magnetlink($sha1, $filename, $fileurl, $treetiger, $kzhash) returns a MAGNET link for the provided input.
  • magnetlink.php [file ...] is a command line utility that outputs MAGNET links for the files specified, using the bitcollider if available (if not kzhash and tree:tiger are not included in MAGNET links).

Versions of this code are deployed on a few sites in service of producing MAGNET links or urn:sha1: identifiers for RDF along these lines, both in the case of CC Mixter.

Criticism welcome.

Technorati DeepCosmos

Saturday, March 5th, 2005

Late last year I requested that some blog aggregator give some indication of the existence of indirect blog post citations, i.e., a blog thread. Adam Hertz suggested that this could be done using Technorati’s API.

I whipped up a crummy implementation the following weekend and contributed a small technorati.py patch along the way. I decided I’m not getting around to producing a non-crummy version, so here it is:

If you attempt to use the DeepCosmos demo the first thing to note is that you need to obtain and use your own Technorati API Key. Check out the examples above if you just want to see what the output looks like.

I haven’t used this much since I wrote it. My request still stands. I’d use the information all the time if integrated into the output of Technorati, Bloglines, Rojo or similar.

CodeCon Sunday

Tuesday, February 15th, 2005

I say CodeCon was 3/4 (one abstention) on Sunday.

Wheat. An environment (including a language) for developing web applications. Objects are arranged in a tree with some filesystem-like semantics. Every object has a URL (not necessarily in a public portion of the tree). Wheat‘s web object publishing model and templating seem clearly reminiscent of Zope. In response to the first of several mostly redundant questions regarding Wheat and Zope, Mark Lentczner said that he used Zope a few years ago and was discouraged by the need to use external scripts and the lack of model-view separation in templates (I suspect Mark used DTML — Wheat’s TinyTemplates reminded me of DTML’s replacement, Zope Page Templates, currently my favorite and implemented in several languages). I’m not sure Wheat is an environment I’d like to develop in, but I suspect the world might learn something from pure implementations of URL-object identity (not just mapping) and a web domain specific language/environment (I understand that Wheat has no non-web interface). Much of the talk used these slides.

Incoherence. I find it hard to believe that nobody has done exactly this audio visualization method before (x = left/right, y = frequency, point intensity and size = volume), but as an audio-ignoramous I’ll take the Incoherence team’s word. I second Wes Felter’s take: “I learned more about stereo during that talk than in the rest of my life.”

i-Brokers. This is where XNS landed and where it might go. However, the presentation barely mentioned technology and left far more questions than answers. There was talk of Zooko’s Triangle (“Names: Decentralized, Secure, Human-Memorizable: Choose Two”). 2idi and idcommons seem to have chosen the last two, temporarily. It isn’t clear to me why they brought it up, as i-names will be semi-decentralized (like DNS). In theory i-names provide privacy (you provide only your i-name to an i-name enabled site, always logging in via your i-broker, and access to your data is provided through your i-broker — never enter your password or credit card anywhere else — you set the policies for who can access your data) and persistence (keep an i-name for life, and i-names may be transparently aliased or gatewayed should you obtain others). These benefits, if they exist in the future, are subtler than the claims. Having sites access your data via a broker rather than via you typing it in does little to protect your privacy by itself. You make a decision in both cases whether you want a site to have your credit card number. Once the site has your credit card… Possibly over the long term if lots of people and sites adopt i-names sites will collect or keep less personal information. Users, via their i-brokers, may be on more equal terms with sites, as i-broker access will presumably be governed by some you-have-no-rights-at-all terms of service. Some sites may decide (for new applications) they don’t want to have to worry about the security of customer information and access the same via customers’ i-names. However, once a user has provided their i-broker with lots of personal information, it becomes easy for sites to ask for it all. Persistence is also behavioral. Domain names and URLs can last a long time; good ones don’t change. Similarly an i-name will go away if the owner stops paying for it. Can the i-name ecology be structured so that i-names tend to be longer lived than domain names or URLs? Probably, but that’s a different story. In the short term 2idi is attempting to get adoption in the convention registration market. Good luck, but I wish Fen and Victor had spent their time talking about XRI resolution or other code behind the 2idi broker.

SciTools. A collection of free to use web applications for genetic design and analysis. Integrated DNA Technologies, the company that offers SciTools, makes its money selling (physical) synthesized nucleic acids. I was a cold, tired, bio-ignoramous, so have little clue whether this is novel. (Ted Leung seems to think so and also has interesting things to say about the other presentations.)

OzymanDNS. DNS can route and move data, is deployed and not filtered everywhere, so with a little cleverness we can tunnel arbitrary streams over DNS. Dan Kaminsky is clearly the crowd pleaser, not only for his showmanship and the audacity of his hacks (streaming anime over DNS this time). More than a few in the crowd wanted to put DNS hacks to work, e.g., on aspects of supposed syndication problems. PPT slides of an older version of the talk.

Yesterday.

CodeCon Saturday

Sunday, February 13th, 2005

CodeCon is 5/5 today.

The Ultra Gleeper. A personal web page recommendation system. Promise of collaborative filtering unfulfilled, in dark ages since Firefly was acquired and shut down in the mid-90s. Presenter believes we’re about to experience a renaissance in recommendation systems, citing Audiocrobbler recommendations (I would link to mine, but personal recommendations seem to have disappeared since last time I looked; my audioscrobbler page) as a useful example (I have found no automated music recommendation system useful) and blogs as a use case for recommendations (I have far too much very high quality manually discovered reading material, including blogs, to desire automated recommendations for more and I don’t see collaborative filtering as a useful means of prioritizing my lists). The Ultra Gleeper crawls pages you link to, treating links as positive ratings, pages that link to you (via Technorati CosmosQuery and Google API), presents suggested pages to rate in a web interface. Uses a number of tricks to avoid showing obvious recommendations (does not recommend pages that are two popular) and pages you’ve already seen (including those linked to in feeds you subscribe to). Some problems faced by typical recommendation systems (new users get crummy recommendations until they enter lots of data, early adopters get doubly crummy recommendations due to lack of existing data to correlate with) obviated by bootstrapping from data in your posts and subscriptions. I suppose if lots of people run something like Gleeper robot traffic increases, more people complain about syndication bandwidth-like problems (I’m skeptical about this being a major problem). I don’t see lots of people running Gleepers as automated recommendation systems are still fairly useless and will remain so for a long time. Interesting software and presentation nonetheless.

H2O. Primarily a discussion system tuned to facilitate professor-assigned discussions. Posts may be embargoed and professor may assign course participants specific messages or other participants to respond to. Discussions may include participants from multiple courses, e.g., to facilitate a MIT engineering-Harvard law exchange. Anyone may register at H2O and create own group, acting as professor for created group. Some of the constraints that may be iposed by H2O are often raised in mailing list meta discussions following flame wars, in particular posting delays. I dislike web forums but may have to try H2O out. Another aspect of H2O is syllabus management and sharing, which is interesting largely because syllabi are typically well hidden. Professors in the same school of the same university may not be aware of what each other are teaching.

Jakarta Feedparser. Kevin Burton gave a good overview of syndication and related standards and the many challenges of dealing with feeds in the wild, which are broken in every conceivable way. Claims SAX (event) based Jakarta FeedParser is an order of magnitude faster than DOM (tree) based parsers. Nothing new to me, but very useful code.

MAPPR. Uses Flickr tags, GNS to divine geographic location of photos. REST web services modeled on Flickr’s own. Flash front end, which you could spend many hours playing with.

Photospace. Personal image annotation and search service, focus on geolocation. Functionality available as library, web fron end provided. Photospace publishes RDF which may be consumed by RDFMapper.

Note above two personal web applications that crawl or use services of other sites (The Ultra Gleeper is the stronger example of this). I bet we’ll see many more of increasing sophistication enabled by ready and easily deployable software infrastructure like Jakarta FeedParser, Lucene, SQLite and many others. A personal social networking application is an obvious candidate. Add in user hosted or controlled authentication (e.g., LID, perhaps idcommons) …

Yesterday.