Archive for the ‘Programming’ Category

CodeCon Sunday

Monday, February 13th, 2006

Dido. I think this provides AGI, or a way to script voice response systems using and a voice template system analogous to scripting and HTML templates for web servers, though questioners focused on a controversial feature to reorder menus based on popularity. The demo didn’t really work, except as a demonstration of everyone’s frustration with IVRs, as an audience member pointed out.

Deme. Kitchen sink collaboration web app. They aren’t done putting dishes in the sink. They’re thinking about taking all of the dishes out of the sink, replacing the sink, and putting the dishes back in (PHP to something cooler). Let’s vote on what kind of vote to put this to.

Monotone. Elegant distributed , uses SHA1 hashes to identify files and repository states. Hash of previous repository state included in current repository state, making lineage cryptographically provable. used to quickly determine file level differences between repositories (for sync). Storage and (especially) merge and diff are loosely coupled. Presentation didn’t cover day to day use, probably a good decision in terms of interestingness. The revision control presentations have been some of the best every year at CodeCon. They should consider having two or three next year. may be the only project presented this year that had a Wikipedia article before the conference.

Rhizome. Unlike Gordon (and perhaps most people), hearing the triplet doesn’t make my eyes glaze over, but I’m afraid this presentation did. Some of the underlying code ( etc) might be interesting, but was the second to last presentation, and the top level project, Rhizome, amounts to yet another idiosyncratic , with the idiosyncratic dial turned way up.

Elkhound/Elsa/Oink/Cqual++. generator that handles ambiguous grammars in a straightforward manner, C++ parser and tools built on top of same. Can find with a reasonable false positive rate. Expressed confidence that future work would lead the compiler catching far more bugs than usually thought possible (as opposed to only at runtime). Cool and important stuff, too bad I only grok it at a high level. Co-presenter Dan Wilkerson (and sole presenter on Saturday of Delta) is with the Open Source Quality Project at UC Berkeley.

Saturday
Sunday 2005

CodeCon Saturday

Sunday, February 12th, 2006

Delta. Arbitrarily large codebase triggers specific bug. Run delta, which attempts to provide you with only the code that triggers the bug (usually a page or so, no matter the size of the codebase) via a like algorithm (the evaluation function requires triggering the bug and considers code size). Sounds like a big productivity and quality booster where it can be used.

Djinni. Framework for approximation of problems, supposedly faster and easier to use than more academic oriented approximation frameworks. An improved simulated annealing algorithm is or will be in the mix, including an analog of “pressue” in . Super annoying presentation style. Thank you for letting us know that CodeCon is where the rubber meets the road.

iGlance. Instant Messaging with audio and video, consistent with the IM metaphor (recipient immediately hears and sees initiator) rather than telephone metaphor (recipient must pick up call). Very low bitrate video buddy lists. Screen and window sharing with single control and dual pointers so that remote user can effectively point over your shoulder. Impressive for what seems to be a one person spare time project. Uses OVPL and OVLPL licenses, very similar to GPL and LGPL, but apparently easier to handle contributor agreements, so project owner can move code between application and library layers. Why not just make the entire application ?

Overlay Anycast Service InfraStructure. Locality-aware server selection (to be) used by , easy to implement for your service. Network locality correlates highly with geographic locality due to the speed of light bound. Obvious, but the graph was neat. OpenDHT was also mentioned, another hosted service. OpenDHT clients can use OASIS to find a gateway. Super easy to play with a with around 200 nodes. Someone has built fileshare using OpenDHT, see Octopod. As Wes Felter says, this stuff really needs to be moved to a non-research network.

Query By Example. Find and rank rows [dis]similar to others in SQL using extension for , which uses a for classification (last is not visible to user). Sounds great for data mining engagements.

Friday
Saturday 2005

CodeCon Friday

Saturday, February 11th, 2006

This year Gordon Mohr had the devious idea to do preemtive reviews of CodeCon presentations. I’ll probably link to his entries and have less to say here than last year.

Daylight Fraud Prevention. I missed most of this presentation but it seems they have a set of non-open source Apache modules each of which could make phishers and malware creators work slightly harder.

SiteAdvisor. Tests a website’s evilness by downloading and running software offered by the site and filling out forms requesting an email address on the site. If virtual Windows machine running downloaded software becomes infected or email address set up for test is inundated with spam the site is considered evil. This testing is mostly automated and expensive (many Windows licenses). Great idea, surprising it is new (to me). I wonder how accurate evil readings one could obtain at much lower cost by calculating a “SpamRank” for sites based on links found in email classified as spam and links found on pages linked to in spams? (A paper has already taken the name SpamRank, though at a five second glance it looks to propose tweaks to make PageRank more spam-resistant rather than trying to measure evil.) Fortunately SiteAdvisor says that both bitzi.com and creativecommons.org are safe to use. SiteAdvisor’s data is available for use under the most restrictive Creative Commons license — Attribution-NonCommercial-NoDerivs 2.5.

VidTorrent/Peers. Streaming joke. Peers, described as a “toolkit for P2P programming with continuation passing style” I gather works syntactically as a Python code preprocessor, could be interesting. I wish they had compared Peers to other P2P toolkits, e.g., .

Localhost. A global directory shared with a modified version of the BitTorrent client. I tried about a month ago. Performance was somewhere between abysmal and nonexistent. BitTorrent is fantastic for large popular files. I’ll be surprised if localhost’s performance, which depends on transferring small XML files, ever reaches mediocrity. They’re definitely going away from BitTorrent’s strengths by uploading websites into the global directory as lots of small files (I gather). The idea of a global directory is interesting, though tags seem a more fruitful navigation method than localhost’s hierarchy.

Truman. A “sandnet” for investigating suspected malware in. Faux services (e.g., DNS, websites) can be scripted to elicit the suspected malware’s behavior, and more.

Alexa Grapher

Wednesday, January 18th, 2006

Many people look at to see how the Alexa traffic rank of a site of interest is faring (usually not well — there are always more websites, so even maintaining ordinal rank is an uphill battle). People who don’t twiddle URLs out of habit probably don’t realize that Alexa can be asked to graph data back to late 2001 or that graphs may be arbitarily sized.

I’d been meaning to put together an Alexa graph generating utility for months (well, one more accessible than URL editing, which I’ve always used, e.g., the graphs at the bottom of a post on blog search) and I finally got started last night.

Funny thing then that I read via Brad Neuberg that Joe Walker just published an “Ajax” Alexa grapher, so I guess I’ll just publish my own ultra-crufty Alexa grapher rather than cleaning it up first. I’m not sure what is Ajaxian about Walker’s (haven’t looked), but mine doesn’t qualify, I think — it is just plain old javascript, with the graph updated by setting innerHTML. No Async XML communication with a server.

I was going to write up a bunch of caveats about how Alexa graphs should be interpreted, but in the interest of completing this post, I’ll just point out one oddity I discovered — the url parameter of Alexa’s traffic detail page (click on the graph on my Alexa grapher to get to such a page) must be the last querystring parameter, otherwise every parameter after it gets interpreted as being part of the url parameter. Some kind of odd URL parsing going on there at Alexa. (Nevermind that they really want a domain name, not a URL, for that parameter.)

CodeCon 2006 Program

Thursday, January 12th, 2006

The 2006 program has been announced and it looks fantastic. I highly recommend attending if you’re near San Francisco Feb 10-12 and any sort of computer geek. There’s an unofficial CodeCon wiki.

My impressions of last year’s CodeCon: Friday, Saturday, and Sunday.

Via Wes Felter

XTech 2006 CFP deadline

Tuesday, January 3rd, 2006

I mentioned elsewhere that I’m on the program committe for XTech 2006, the leading web technology conference in Europe, to be held in Amsterdam May 16-19.

Presentation, tutorial and panel proposals are due in less than a week–January 9. If you’re building an extraordinary Web 2.0 application or doing research that Web 2.0 (very broadly construed) developers and entrepreneurs need to hear about, please consider submitting a proposal.

See the CFP and track descriptions.

Darkfox

Tuesday, December 27th, 2005

I hate to write about software that could be vaporware, but AllPeers (via Asa Dotzler) looks like a seriously interesting darknet/media sharing/BitTorrent/and more Firefox extension.

It’s sad, but simply sending a file between computers with no shared authority nor intermediary (e.g, web or ftp server) is still a hassle. IM transfers often fail in my experience, traditional filesharing programs are too heavyweight and are configured to connect to and share with any available host, and previous attempts at clients (e.g., ) were not production quality. Merely solving this problem would make AllPeers very cool.

Assuming AllPeers proves a useful mechanism for sharing media, perhaps it could also become a lightnet bridge– as a Firefox extension.

Do check out AllPeers CTO Matthew Gertner’s musings on the AllPeers blog. I don’t agree with everything he writes, but his is a very well informed and well written take on open source, open content, browser development and business models.

Songbird Media Player looks to be another compelling application built on the (though run as a separate program rather than as a Firefox extension), to be released real soon now. 2006 should be another banner year for Firefox and Mozilla technology generally.

Lucas Gonze’s original lightnet post is now near the top of results for ‘lightnet’ on Google, Yahoo!, and MSN, and related followups fill up much of the next few dozen results, having displaced most of the new age and lighting sites that use the same term.

Machine learning patterns

Sunday, November 27th, 2005

I first heard of the Silicon Valley Patterns meetings from Alex Chafee a few years ago while participating in his “bootstrap” practice group. SVP sounded like fun, but I only got around to attending a meeting this spring, a one-off on led by Johannes Ernst (notes). I was going to write something about that meeting, but just can’t get worked up about digital identity.

SVP’s next extended track was on , a topic I have some interest in and very cursory knowledge of from reading popular books on AI. The track lasted from May through October. Mostly our study was guided by Andrew Moore’s statistical data mining tutorials, with occasional reference to Russell & Norvig.

I don’t think any of the regular attendees were machine learning experts, but with occasional contributions from everyone, I think everyone was able to increase their knowledge of the material. Overall a gratifying method of learning, though not a perfect substitute for lecture.

My secondary take way from the track was that I need a serious brush up on calculus and statistics, neither of which I’ve studied, and barely used, in fifteen years. I’m working on that.

The current SVP track should be very different–hands on Ruby on Rails practice. I’m attempting to justify putting in the time…

Lucene red handed

Thursday, August 25th, 2005

A review of Lucene in Action posted on Slashdot yesterday reminded me to make this post. I read the book in March shortly before giving a related talk at Etech in order to avoid sounding too stupid.

Lucene in Action is very well written. I liked the presentation of code samples as and found almost no fluff. If you don’t have a background in (I don’t) I think you’ll enjoy this book for the background information on IR that is thoroughly integrated with the text even if you have no plans to use (though you’ll obtain an itch to use Lucene, it’s so simple and powerful).

One non-technical comment I made about Lucene in the Etech talk is that it may be another open source . As eliminated much of the opportunity to sell HTTP servers, I suspect Lucene will eliminate much of the opportunity to sell embedded search libraries (which seems somewhat significant judging by the quantity of ads for same in programming magazines).

Agriculture

Wednesday, August 3rd, 2005

It’s possible to explore too little, only producing, getting stuck in a productive rut, a local maximum, eventually obsolete. I’m too often stuck in the opposite rut–only exploring, not getting anything done. So I take vicarious pleasure in reading that Bryn Keller has settled down and chosen a language. Typically, I subscribed to Keller’s blog about six months ago while looking at , a very pragmatic and nice JVM hosted language about which Keller has written several times.

Keller chose Haskell, which is doubtless a good choice, though it sounds like he doesn’t have any outside constraints:

The thing is, I’ve noticed that the code I write in Haskell is usually more elegant than the code I write in other languages, and since this is my time, I can choose what’s important. Crisp, elegant code is important to me.

These days what little programming I get to is mostly Python (largely because Nathan brings Python expertise to my employer, and Python is acceptable), Tcl (two legacy codebases), and PHP and Java (because they’re impossible to avoid).

I don’t think I can resolve to settle down, but I do resolve to retire those Tcl codebases, real soon now (nothing against Tcl; I’ve grown a bit fond of it over the years).

Where is server side JavaScript?

Thursday, July 7th, 2005

Nearly a decade ago Netscape released Enterprise Server 2.0 with LiveWire, their name for JavaScript used as a server side web scripting language as PHP is most commonly today. LiveWire was extremely buggy, but Netscape was golden in large organizations, so I had the opportunity to develop or support development of several large web applications written in LiveWire. The world’s buggiest webmail client ever was a component of one of them.

Thankfully Netscape’s server products faded over the next few years. As far as I know LiveWire is almost completely forgotten.

The only uses of server side JavaScript that I’m aware of today are in Helma Object Publisher and as an alternative scripting language for Active Server Pages (though I understand the vast majority of ASP development uses VBScript). Some Java-based web applications may embed the Rhino JavaScript engine (that’s what the Helma framework does, prominently).

I’m mildly suprised that server side JavaScript isn’t more popular, given the opportunity for sharing skills and code with client side JavaScript. Data validation is one obvious opportunity for the same code executed on both the web browser and server, however the one that prompted me to write this post is web services. Suppose you want to offer a “web service” that can be consumed directly by browsers, i.e., a JavaScript application without a UI. You also want to offer approximately the same service as a traditional web service for consumption by non-JavaScript clients, and you don’t want to write much of the same code twice.

The only page I could find about sharing JavaScript code between client and server applications is a terse article on Shared Sides JavaScript.

So why hasn’t JavaScript seen more use on the server side? Possibilities:

  • JavaScript is unfairly looked down upon by server developers. (The success of PHP is a counterexample.)
  • Client side JavaScript is typically spaghetti code written by designers. The cost of sharing code between client and server applications in this context is too high.
  • No obvious way to deploy JavaScript code in Apache. There was work on mod_javascript/mod_js in the late nineties, but I see no evidence it went anywhere.
  • It’s easier for developers to handle different languages on the client and server. In my experience with LiveWire last decade I did encounter a few developers unclear on the concept that some JavaScript executed on the server, some on the client.

Perhaps the recent hype around AJAX will attact a critical mass of good programmers to JavaScript, some of whom will want to reuse well structured code on the server, leading to server side JavaScript’s first renaissance.

Pre-posting update: As I was about to post this Brian Heung told me about TrimJunction, which has more or less the motivation I was thinking of:

With an eye towards Don’t Repeat Yourself, spiced up with a little bit of Ajax, the grand vision of the TrimPath Junction project is to be able to write web application logic just once, in familiar languages like JavaScript.

A Junction-based web application should run, validate input, process data, generate output and do its thing on both the server AND the client. We intend to use the Rhino JavaScript runtime for the server and use your favorite, modern web browser for the client.

Check out the Tales on TrimPath blog for some interesting JavaScript ideas. The Junction announcement is here, two months old.

Update 20050716: OpenMocha is ready for a spin:

The goal of OpenMocha is to maximize the fun and productivity of Javascript development by blending the gap between browser and server based scripting.

SemWeb, AI, Java: The Ontological Parallels

Friday, March 18th, 2005

“The Semantic Web: Promising Future or Utter Failure”, the panel I took part in at SXSW shed little light on the topic. Each panelist (including me) brought their own idiosyncratic views to bear and largely talked past each other. The overall SXSW interactive crowd seemed to tend toward web designers and web marketers, not sure about the audience for this panel. Some people, e.g., Chet Campbell, and others in person, apparently left with the impression that all of the panelists agreed that the semantic web is an utter failure (not my view at all).

Sam Felder and Josh Knowles have posted loose transcripts and Christian Bradford a photo of the panel.

The approximate (with links and a few small corrections) text of my introductory statement follows. I got a few laughs.

I want to draw some parallels between semantic web technologies and artificial intelligence and between semantic web technologies and Java.

AI was going to produce intelligent machines. It didn’t and since the late 80s we’ve been in an “AI winter.” That’s nearly twenty years, so web people who suffered and whined in 2001-3, your cup is more than half full. Anyway since then AI techniques have been used in all sorts of products, but once deployed the technology isn’t seen as AI. I mean, where are the conscious robots?

Semantic web technologies have a shorter history, but may play out similarly: widely used but not recognized as such. Machine “agents” aren’t inferring a perfect date for me from my FOAF profile. Or something. This problem is magnified because there’s a loose connection between sematnic web grand visions and AI. People work on both at MIT after all.

Now Java. Applets were going to revolutionize the web. In 1996! Applets didn’t work very well, but lots of people learned Java and it runs out Java is a pretty good solution on the server side. Java is hugely successful as the 21st century’s COBOL. Need some “business logic?” You won’t get fired for implementing it in Java, preferably using JDBC, JSP, JMX, JMS, EJB, JAXB, JDO and other buzzword-compliant APIs.

Semantic web technologies may be following a similar path. Utter failure to live up to initial hype in a sexy domain, but succeeding in the enterprise where the money is anyway. I haven’t heard anyone utter the word enterprise at this conference, so I won’t repeat it.

It turns out that semantic web technologies are really useful for data integration when you have heterogenous data, as many people do these days. Just one example: Oracle will support a “Network Data Model” in the next release of their database. That may sound like a throwback if you know database history, but it basically means explicit support for storing and querying graphs, which are the data model of RDF and the semantic web.

If you talk to a few of the people trying to build intelligent machines today, who may use the term Artificial General Intelligence to distinguish themselves from AI, you may get a feeling that AI research hasn’t really moved us toward the goal of building an AGI.

Despite Java’s success on the server it is no closer to being important on the web client than it was in 1996. It is probably further. If what you care about is sexy web browser deployment, all Java’s server success has accomplished is to keep the language alive.

Semantic web technologies may be different. Usefulness in behind the scenes data integration may help these technologies gain traction on the web. Why? Because for someone trying to make use of data on the web, the web is one huge heterogenous data integration problem.

An example of a project that uses RDF for data integration that you can see is mSpace. You can read a a paper about how they use RDF inside the application, but it won’t be obvious to an end user that they’re a semantic web technologies application, and that’s as it should be.

One interesting thing about mSpace is that they’re using a classical music ontology developed by someone else and found on SchemaWeb. SchemaWeb is a good place to look for semantic web schemas that can be reused in your project. Similarly, rdfdata.org is a good place to look for RDF datasets to reuse. There are dozens of schemas and datasets listed on these sites contributed by people and organizations around the world, covering beer, wine, vegetarian food, and lots of stuff you don’t put in your mouth.

I intended to close my statement with a preemption of the claim that use of semantic web technologies mandates hashing everything out in committees before deployment (wrong), but I trailed off with something I don’t recall. The committee myth came up again during the discussion anyway.

Perhaps I should’ve stolen Eric Miller’s The Semantic Web is Here slides.

Bitcollider-PHP

Saturday, March 5th, 2005

Here you’ll find a little PHP API that wraps the single file metadata extraction feature of Bitzi’s bitcollider tool. Bitcollider also can submit file metadata to Bitzi. This PHP API doesn’t provide access to the submission feature.

Other possibly useful code provided with Bitcollider-PHP:

  • function hex_to_base_32($hex) converts hexidecimal input to Base32.
  • function magnetlink($sha1, $filename, $fileurl, $treetiger, $kzhash) returns a MAGNET link for the provided input.
  • magnetlink.php [file ...] is a command line utility that outputs MAGNET links for the files specified, using the bitcollider if available (if not kzhash and tree:tiger are not included in MAGNET links).

Versions of this code are deployed on a few sites in service of producing MAGNET links or urn:sha1: identifiers for RDF along these lines, both in the case of CC Mixter.

Criticism welcome.

Technorati DeepCosmos

Saturday, March 5th, 2005

Late last year I requested that some blog aggregator give some indication of the existence of indirect blog post citations, i.e., a blog thread. Adam Hertz suggested that this could be done using Technorati’s API.

I whipped up a crummy implementation the following weekend and contributed a small technorati.py patch along the way. I decided I’m not getting around to producing a non-crummy version, so here it is:

If you attempt to use the DeepCosmos demo the first thing to note is that you need to obtain and use your own Technorati API Key. Check out the examples above if you just want to see what the output looks like.

I haven’t used this much since I wrote it. My request still stands. I’d use the information all the time if integrated into the output of Technorati, Bloglines, Rojo or similar.

CodeCon Sunday

Tuesday, February 15th, 2005

I say CodeCon was 3/4 (one abstention) on Sunday.

Wheat. An environment (including a language) for developing web applications. Objects are arranged in a tree with some filesystem-like semantics. Every object has a URL (not necessarily in a public portion of the tree). Wheat’s web object publishing model and templating seem clearly reminiscent of Zope. In response to the first of several mostly redundant questions regarding Wheat and Zope, Mark Lentczner said that he used Zope a few years ago and was discouraged by the need to use external scripts and the lack of model-view separation in templates (I suspect Mark used DTML — Wheat’s TinyTemplates reminded me of DTML’s replacement, Zope Page Templates, currently my favorite and implemented in several languages). I’m not sure Wheat is an environment I’d like to develop in, but I suspect the world might learn something from pure implementations of URL-object identity (not just mapping) and a web domain specific language/environment (I understand that Wheat has no non-web interface). Much of the talk used these slides.

Incoherence. I find it hard to believe that nobody has done exactly this audio visualization method before (x = left/right, y = frequency, point intensity and size = volume), but as an audio-ignoramous I’ll take the Incoherence team’s word. I second Wes Felter’s take: “I learned more about stereo during that talk than in the rest of my life.”

i-Brokers. This is where XNS landed and where it might go. However, the presentation barely mentioned technology and left far more questions than answers. There was talk of Zooko’s Triangle (”Names: Decentralized, Secure, Human-Memorizable: Choose Two”). 2idi and idcommons seem to have chosen the last two, temporarily. It isn’t clear to me why they brought it up, as i-names will be semi-decentralized (like DNS). In theory i-names provide privacy (you provide only your i-name to an i-name enabled site, always logging in via your i-broker, and access to your data is provided through your i-broker — never enter your password or credit card anywhere else — you set the policies for who can access your data) and persistence (keep an i-name for life, and i-names may be transparently aliased or gatewayed should you obtain others). These benefits, if they exist in the future, are subtler than the claims. Having sites access your data via a broker rather than via you typing it in does little to protect your privacy by itself. You make a decision in both cases whether you want a site to have your credit card number. Once the site has your credit card… Possibly over the long term if lots of people and sites adopt i-names sites will collect or keep less personal information. Users, via their i-brokers, may be on more equal terms with sites, as i-broker access will presumably be governed by some you-have-no-rights-at-all terms of service. Some sites may decide (for new applications) they don’t want to have to worry about the security of customer information and access the same via customers’ i-names. However, once a user has provided their i-broker with lots of personal information, it becomes easy for sites to ask for it all. Persistence is also behavioral. Domain names and URLs can last a long time; good ones don’t change. Similarly an i-name will go away if the owner stops paying for it. Can the i-name ecology be structured so that i-names tend to be longer lived than domain names or URLs? Probably, but that’s a different story. In the short term 2idi is attempting to get adoption in the convention registration market. Good luck, but I wish Fen and Victor had spent their time talking about XRI resolution or other code behind the 2idi broker.

SciTools. A collection of free to use web applications for genetic design and analysis. Integrated DNA Technologies, the company that offers SciTools, makes its money selling (physical) synthesized nucleic acids. I was a cold, tired, bio-ignoramous, so have little clue whether this is novel. (Ted Leung seems to think so and also has interesting things to say about the other presentations.)

OzymanDNS. DNS can route and move data, is deployed and not filtered everywhere, so with a little cleverness we can tunnel arbitrary streams over DNS. Dan Kaminsky is clearly the crowd pleaser, not only for his showmanship and the audacity of his hacks (streaming anime over DNS this time). More than a few in the crowd wanted to put DNS hacks to work, e.g., on aspects of supposed syndication problems. PPT slides of an older version of the talk.

Yesterday.

CodeCon Saturday

Sunday, February 13th, 2005

CodeCon is 5/5 today.

The Ultra Gleeper. A personal web page recommendation system. Promise of collaborative filtering unfulfilled, in dark ages since Firefly was acquired and shut down in the mid-90s. Presenter believes we’re about to experience a renaissance in recommendation systems, citing Audiocrobbler recommendations (I would link to mine, but personal recommendations seem to have disappeared since last time I looked; my audioscrobbler page) as a useful example (I have found no automated music recommendation system useful) and blogs as a use case for recommendations (I have far too much very high quality manually discovered reading material, including blogs, to desire automated recommendations for more and I don’t see collaborative filtering as a useful means of prioritizing my lists). The Ultra Gleeper crawls pages you link to, treating links as positive ratings, pages that link to you (via Technorati CosmosQuery and Google API), presents suggested pages to rate in a web interface. Uses a number of tricks to avoid showing obvious recommendations (does not recommend pages that are two popular) and pages you’ve already seen (including those linked to in feeds you subscribe to). Some problems faced by typical recommendation systems (new users get crummy recommendations until they enter lots of data, early adopters get doubly crummy recommendations due to lack of existing data to correlate with) obviated by bootstrapping from data in your posts and subscriptions. I suppose if lots of people run something like Gleeper robot traffic increases, more people complain about syndication bandwidth-like problems (I’m skeptical about this being a major problem). I don’t see lots of people running Gleepers as automated recommendation systems are still fairly useless and will remain so for a long time. Interesting software and presentation nonetheless.

H2O. Primarily a discussion system tuned to facilitate professor-assigned discussions. Posts may be embargoed and professor may assign course participants specific messages or other participants to respond to. Discussions may include participants from multiple courses, e.g., to facilitate a MIT engineering-Harvard law exchange. Anyone may register at H2O and create own group, acting as professor for created group. Some of the constraints that may be iposed by H2O are often raised in mailing list meta discussions following flame wars, in particular posting delays. I dislike web forums but may have to try H2O out. Another aspect of H2O is syllabus management and sharing, which is interesting largely because syllabi are typically well hidden. Professors in the same school of the same university may not be aware of what each other are teaching.

Jakarta Feedparser. Kevin Burton gave a good overview of syndication and related standards and the many challenges of dealing with feeds in the wild, which are broken in every conceivable way. Claims SAX (event) based Jakarta FeedParser is an order of magnitude faster than DOM (tree) based parsers. Nothing new to me, but very useful code.

MAPPR. Uses Flickr tags, GNS to divine geographic location of photos. REST web services modeled on Flickr’s own. Flash front end, which you could spend many hours playing with.

Photospace. Personal image annotation and search service, focus on geolocation. Functionality available as library, web fron end provided. Photospace publishes RDF which may be consumed by RDFMapper.

Note above two personal web applications that crawl or use services of other sites (The Ultra Gleeper is the stronger example of this). I bet we’ll see many more of increasing sophistication enabled by ready and easily deployable software infrastructure like Jakarta FeedParser, Lucene, SQLite and many others. A personal social networking application is an obvious candidate. Add in user hosted or controlled authentication (e.g., LID, perhaps idcommons) …

Yesterday.

CodeCon Friday

Saturday, February 12th, 2005

CodeCon requires presenters to be active developers of the projects presented and projects must have demonstrably running code. There’s an emphasis on open source and decentralization. This generally makes for interesting presentations. Today was 4/5.

Aura. Case study in how not to give a CodeCon presentation. Talk for a long time about motivations for and very high level problems of reputation systems, which all attendees are surely familiar with. Give almost no specifics about Aura, apparently a peer-to-peer reputation system, including nothing on what differentiates it from other work nor on how or why I’d use it in my own code. Demo stumbles due to display problems, fails due to ill prepared data. One mostly irrelevant bit about Aura’s implementation: it uses SQLite, an embedded, zero configuration, endian neutral SQL database that many projects have started to use recently and tons more will in the near future. I’m certain that SQLite is in my future.

ArX. Very useful presentation on Walter Landry’s ArX, which began as a fork of the GNU Arch distributed revision control system (both are pronounced ‘arc’). Lists good, bad and ugly of active open source, distributed revision control systems (I agree that any system that does not have those attributes is strictly non-interesting), including GNU Arch/tla, ArX, monotone (also uses SQLite), Darcs, svk, and Codeville. I’ve tried tla a few times but have gotten hung up on what seems to me like uncessary complexity and strange conventions. I’d pretty much settled on using Darcs going forward, but now I’m a little concerned by its reordering of patches in order to solve merge conflicts, which apparently can be very slow and may make the repository’s view of its state at a point in time inaccurate. Not sure whether this is pragmatic, evil, or both, nor am I sure I understand it. See also Zooko’s notes (Darcs row, decentralization column).

Apache CA. A certification authority motivated by the needs of the Apache Software Foundation, which has around 900 developers with commit access working on around 100 projects. Program managers can add committers, but small admin team needs to create shell accounts, add to various text files, creating bottleneck. Solution: all services (most importantly source control — migrate to subversion) eventually use SSL, check for permission based on group membership noted in personal certificates and managed via email by program managers. Sounds like a long term project. “Open CA” feature is an interesting extension — allows anyone who can sign an email with GPG to create groups in the form of user@example.com/groupname. Not sure what the ASF motivation is for Open CA, but I’m sure interesting applications can be built on it.

Off-the-Record Messaging. Messaging using the PGP model (sign with sender’s public key, encrypt with recipient’s public key) can be attacked: the “bad guys” can intercept and store your messages. In the future they can break into your computer, obtain your private key, decrypt your messages and prove that you are the author. Very briefly OTR obtains “perfect forward secrecy” through the use of short lived encryption keys and “refutable authentication” using shared MAC keys — compromise of your long term keys doesn’t allow your messages to be decrypted, and it can’t be proved that you wrote your messages. A toolkit for forging transcripts is even provided to enhance deniability. Details here. This presentation seems to match the one given at CodeCon. They have a GAIM plugin, which I’m now running, and a standalone proxy for other AIM clients. Cool stuff.

RPOW. Reusable Proofs of Work is a system for sequential reuse of hashcash mediated by a server written by the great signal-to-noise enhancer Hal Finney. RPOW has many potential uses — apparently initially motivated by a desire to implement “P2Poker” with interesting “chips” and currently being experimented with in a modified BitTorrent client in which downloaders can pay for priority wit RPOW tokens, possibly encouraging people to leave clients running after completing a download (serving as seeds in BT lingo) in order to earn tokens which may be spent on future downloads. As the BTRP page notes, people could acquire RPOWs out of band, and not contribute more upload bandwidth, or even contribute less. The net effect is hard to predict. If buying download priority with RPOWs proves useful, I expect non-BT filesharing clients, which have far less reason to cooperate, would benefit more than BT clients. Perhaps the most interesting thing about the RPOW system is its great effort to ensure that there can be no cheating, in particular by the server operator. The RPOW server will zero all data if it is physically tampered with, it is possible for anyone to verify the code it is running, and that code can verify that its database in its untrusted host has not been tampered with, using a Merkle hash tree to verify (the secure board only has two megabytes of memory). The RPOW server may be the world’s first transparent server, which could facilitate a world of distributed, cooperating RPOW servers. Presentation slides.

Saturday.

ccPublisher 1.0

Monday, December 27th, 2004

Nathan Yergler just cut ccPublisher 1.0, a Windows/Mac/Linux desktop app that helps you license, tag, and distribute your audio and video works. I’m very biased, but I think it’d be a pretty neat little application even if it weren’t Creative Commons centric.

  • It’s written in Python with a wxPython UI, but is distributed as a native windows installer or Mac disk image with no dependencies. Install and run like any other program on your platform, no implementation leakage. Drag’n'drop works.
  • Also invisible to the end user, it uses the Internet Archive’s XML contribution interface, ftp and CC’s nascent web services.
  • RDF metadata is generated, hidden from the user if published at IA, or available for self-publishing, ties into CC’s search and P2P strategies.

Python and friends did most of the work, but the 90/10-10/90 rule applied (making a cross platform app work well still isn’t trivial, integration is always messy, and anything involving ID3v2 sucks). Props to Nathan.

Version 2 will be much slicker, support more media types, and be extensible for use by other data repositories.

Addendum 2005-01-12: Check out Nathan’s 1.0 post mortem and 2.0 preview.

alias grep=’glark’

Saturday, April 3rd, 2004

Glark has improved my life.

A replacement for (or supplement to) the grep family, glark offers: Perl compatible regular expressions, highlighting of matches, context around matches, complex expressions (“and” and “or”), and automatic exclusion of non-text files.

CC Etech BoF points

Tuesday, February 10th, 2004

Points mentioned at the Etech Creative Commons participant session (it’s a BoF!):

One Year Launch Anniversary

Watch Reticulum Rex AKA Remix Culture for an update.

License Versioning

International Commons

iCommons is porting licenses to multiple jurisdictions.

Content

New (and newly packaged) Licenses

Technology

Standards

Technology Challenges

The list

Hero Nathan Yergler, who created:

POTOTYPE RDF-enhanced Creative Commons search