Post Programming

Open policy for a secure Internet-N-Life

Saturday, June 28th, 2014

(In)Security in Home Embedded Devices Jim Gettys says software needs to be maintained for decades considering where it is being deployed (e.g., embedded in products with multi-decade lifetimes, such as buildings) and the criticality of some of that software, an unpredictable attribute — a product might become unplanned “infrastructure” for example if it is widely deployed and other things come to depend on it. Without maintenance, including deployment of updates in the field, software (and thus systems it is embedded in) becomes increasingly insecure as vulnerabilities are discovered (cites a honeymoon period enjoyed by new systems).

This need for long-term maintenance and field deployment implies open source software and devices that users can upgrade — maintenance needs to continue beyond the expected life of any product or organization. “Upgrade” can also mean “replace” — perhaps some kinds of products should be more modular and with open designs so that parts that are themselves embedded systems can be swapped out. (Gettys didn’t mention, but replacement can be total. Perhaps “planned obsolescence” and “throwaway culture” have some security benefits. I suspect the response would be that many things continue to be used for a long time after they were planned to be obsolete and most of their production run siblings are discarded.)

But these practices are currently rare. Product developers do not demand source from chip and other hardware vendors and thus ship products with “binary blob” hardware drivers for Linux kernel which cannot be maintained, often based on kernel years out of date when product is shipped. Linux kernel near-monoculture for many embedded systems, increasing security threat. Many problems which do not depend on hardware vendor cooperation, ranging from unintentionally or lazily not providing source needed for rest of system, to intentionally shipping proprietary software, to intentionally locking down device to prevent user updates. Product customers do not demand long-term secure devices from product developers. There is little effort to fund commons-oriented embedded development (in contrast with Linux kernel and other systems development for servers, which many big companies fund).

Gettys is focused on embedded software in network devices (e.g., routers) as network access is critical infrastructure much else depends on, including the problem at hand: without network access, many other systems cannot be feasibly updated. He’s working on CeroWrt a cutting edge version of OpenWrt firmware, either of which is several years ahead of what typically ships on routers. A meme Gettys wishes to spread, the earliest instance of which I could find is on cerowrt-devel, a harsh example coming the next week:

Friends don’t let friends run factory firmware.

Cute. This reminds me of something a friend said in a group discussion that touched on security and embedded in body (or perhaps it was mind embedded in) systems, along the lines of “I wouldn’t run (on) an insecure system.” Or malware would give you a bad trip.

But I’m ambivalent. Most people, thus most friends, don’t know what factory firmware is. Systems need to be much more secure (for the long term, including all that implies) as shipped. Elite friend advice could help drive demand for better systems, but I doubt “just say no” will help much — its track records for altering mass outcomes, e.g., with respect to proprietary software or formats, seems very poor.

In Q&A someone asked about centralized cloud silos. Gettys doesn’t like them, but said without long-term secure alternatives that can be deployed and maintained by everyone there isn’t much hope. I agree.

You may recognize open source software and devices that users can upgrade above as roughly the conditions of GPL-3.0. Gettys mentioned this and noted:

  • It isn’t clear that copyright-based conditions are effective mechanism for enforcing these conditions. (One reason I say copyleft is a prototype for more appropriate regulation.)
  • Of “life, liberty, and pursuit of happiness”, free software has emphasized the latter two, but nobody realized how important free software would be for living one’s life given the extent to which one interacts with and depends on (often embedded) software. In my experience people have realized this for many years, but it should indeed move to the fore.

Near the end Gettys asked what role industry and government should have in moving toward safer systems (and skip the “home” qualifier in the talk title; these considerations are at least as important for institutions and large-scale infrastructure). One answer might be in open policy. Public, publicly-interested, and otherwise coordinated funders and purchasers need to be convinced there is a problem and that it makes sense for them to demand their resources help shift the market. The Free Software Foundation’s Respects Your Freedom criteria (ignoring the “public relations” item) is a good start on what should be demanded for embedded systems.

Obviously there’s a role for developers too. Gettys asked how to get beyond the near Linux kernel monoculture, mentioning BSD. My ignorant wish is that developers wanting to break the monoculture instead try to build systems using better tools, at least better languages (not that any system will reduce the need for security in depth).

Here’s to a universal, secure, and resilient web and technium. Yes, these features cost. But I’m increasingly convinced that humans underinvest in security (not only computer, and at every level), especially in making sure investments aren’t theater or worse.

Counter-donate in support of marriage equality and other Mozilla-related notes

Saturday, March 29th, 2014

I’m a huge fan of Mozilla and think their work translates directly into more human rights and equality. So like many other people, I find it pretty disturbing that their new CEO, Brendan Eich, donated US$1000 in support of banning same sex marriage. True, this is scrutiny beyond which most organizations’ leaders would receive, and Mozilla in deed seems to have excellent support for LGBT employees, endorsed by Eich, and works to make all welcome in the Mozilla community. But I think Evan Prodromou put it well:

If you lead an organization dedicated to human rights, you need to be a defender of human rights.

Maybe Eich will change his mind. Perhaps he believes an ancient text attributed to an ultra powerful being commands him to oppose same sex marriage. Believers have come around to support all kinds of liberal values and practices in spite of such texts. Perhaps he considers marriage an illegitimate institution and would prefer equality arrive through resetting marriage to civil unions for all, or something more radical. I can comprehend this position, but it isn’t happening this generation, and is no excuse for delaying what equality can be gained now.

Freedom to Marry logoIn the meantime one thing that Mozilla supporters might do to counter Eich’s support for banning same sex marriage, short of demanding he step down (my suspicion is that apart from this he’s the best person for the job; given what the mobile industry is, someone from there would likely be a threat to the Mozilla mission) is to “match” it in kind, with counter-donations to organizations supporting equal rights for LGBT people.

Freedom to Marry seems to be the most directly counter to Eich’s donation, so that’s what I donated to. The Human Rights Campaign is probably the largest organization. There are many more in the U.S. and around the world. Perhaps Eich could counter his own donation with one to an organization working on more basic rights where homosexuality is criminalized (of course once that is taken care of, they’ll demand the right to marry too).

Other Mozilla-related notes that I may otherwise never get around to blogging:

  • Ads in new tabs (“directory tiles”) have the potential to be very good. More resources for Mozilla would be good, “diversification” or not. Mozilla’s pro-user stance ought make their design and sales push advertisers in the direction of signaling trustworthiness, and away from the premature optimization of door-to-door sales. They should hire Don Marti, or at least read his blog. But the announcement of ads in new tabs was needlessly unclear.
  • Persona/BrowserID is brilliant, and with wide adoption would make the web a better place and further the open web. I’m disappointed Mozilla never built it into Firefox, and has stopped paying for development, handing it over to the community. But I still hold out some hope. Mozilla will continue to provide infrastructure indefinitely. Thunderbird seems to have done OK as a community development/Mozilla infrastructure project. And the problem still needs to be solved!
  • Contrary to just about everyone’s opinions it seems, I don’t think Mozilla’s revenue being overwhelmingly from Google is a threat, a paradox, or ironic. The default search setting would be valuable without Google. Just not nearly as valuable, because Google is much better at search and search ads than its nearest competitors. Mozilla has demonstrated with FirefoxOS that they’re willing to compete directly with Google in a hugely valuable market (mobile operating systems, against Android). I have zero inside knowledge, but I’d bet that Mozilla would jump at the chance to compete with Google on search or ads, if they came upon an approach which could reasonably be expected to be superior to Google’s offerings in some significant ways (to repeat, unlike Google’s nearest search and ads competitors today). Of course Mozilla is working on an ads product (first item), leveraging Firefox real estate rather than starting two more enormous projects (search and search ads; FirefoxOS must be enough for now).
  • The world needs a safe systems programming language. There have been and are many efforts, but Mozilla-developed Rust seems to have by far the most promise. Go Rust!
  • Li Gong of Mozilla Taiwan and Mozilla China was announced as Mozilla’s new COO at the same time Eich was made CEO. I don’t think this has been widely noted. My friend Jon Phillips has been telling me for years that Li Gong is the up and coming power. I guess that’s right.

I’m going to continue to use Firefox as my main browser, I’ll probably get a FirefoxOS phone soon, and I hope Mozilla makes billions with ads in new tabs. As I wrote this post Mozilla announced it supports marriage equality as an organization (even if the CEO doesn’t). Still, make your counter-donation.

Audio/video player controls should facilitate changing playback rate

Saturday, March 9th, 2013

Listening or viewing non-fiction/non-art (eg lectures, presentations) at realtime speed is tiresome. I’ve long used rbpitch (but more control than I need or want) or VLC’s built-in playback speed menu (but mildly annoyed by “Faster” and “Faster (fine)”; would prefer to see exact rate) and am grateful that most videos on YouTube now feature a playback UI that allows playback at 1.5x or 2x speed. The UI I like the best so far is Coursera’s, which very prominently facilitates switching to 1.5x or 2x speed as well as up and down by 0.25x increments, and saving a per-course playback rate preference.

HTML5 audio and video unadorned with a customized UI (latter is what I’m seeing at YouTube and Coursera) is not everywhere, but it’s becoming more common, and probably will continue to as adding video or audio content to a page is now as easy as adding a non-moving image, at least if default playback UI in browsers is featureful. I hope for this outcome, as hosting site customizations often obscure functionality, eg by taking over the context menu (could browsers provide a way for users to always obtain the default context menu on demand?).

Last month I submitted a feature request for Firefox to support changing playback speed in the default UI, and I’m really happy with the response. The feature is now available in nightly builds (which are non-scary; I’ve run nothing else for a long time; they just auto-update approximately daily, include all the latest improvements, and in my experience are as stable as releases, which these days means very stable) and should be available in a general release in approximately 18-24 weeks. You can test the feature on the page the screenshot above is from; note it will work on some of the videos, but for others the host has hijacked the context menu. Or something that really benefits from 2x speed (which is not at all ludicrous; it’s my normal speed for lectures and presentations that I’m paying close attention to).

Even better, the request was almost immediately triaged as a “[good first bug]” and assigned a mentor (Jared Wein) who provided some strong hints as to what would need to be done, so strong that I was motivated to set up a Firefox development environment (mostly well documented and easy; the only problem I had was figuring out which of the various test harnesses available to test Firefox in various ways was the right one to run my tests) and get an unpolished version of the feature working for myself. I stopped when darkowlzz indicated interest, and it was fun to watch darkolzz, Jared, and a couple others interact over the next few weeks to develop a production-ready version of the feature. Thank you Jared and darkowlzz! (While looking for links for each, I noticed Jared posted about the new feature, check that out!)

Kodus also to Mozilla for having a solid easy bug and mentoring process in place. I doubt I’ll ever contribute anything non-trivial, but the next time I get around to making a simple feature request, I’ll be much more likely to think about attempting a solution myself. It’s fairly common now for projects have at least tag easy bugs; OpenHatch aggregates many of those. I’m not sure how common mentored bugs are.

I also lucked out in that support for setting playback rate from javascript had recently been implemented in Firefox. Also see documentation for the javascript API for HTML5 media elements and what browser versions implement each.

Back to playback rate, I’d really like to see anything that provides an interface to playing timed media to facilitate changing playback rate. Anything else is a huge waste of users’ time and attention. A user preference for playback rate (which might be as simple as always using the last rate, or as complicated as a user-specified calculation based on source and other metadata) would be a nice bonus.


Thursday, May 17th, 2012

Former colleague Nathan Yerlger has a series of posts on technical debt (1, 2, 3). I’m responsible for some of the debt described in the third posting:

We had the “questions” used for selecting a license modeled as an XSLT transformation (why? don’t remember; wish I knew what we were thinking when we did that)

In 2004, I was thinking:

The idea is to encapsulate the “choose license” process (see in a file or a few files that can be reused in different environments (e.g., standalone apps) without having those apps reproduce the core language surrounding the process or the rules for translating user answers into a license choice and associated metadata.

Making the “questions” available as XML (questions.xml) and “rules” as XSL (chooselicense.xsl) attempts to maximize accessibility and minimize reimplementation of logic across multiple implementations.

I also thought about XSLT as an interesting mechanism for distributing untrusted code. Probably too complex, or just too unconventional and ill-supported, and driven by bad requirements. I’ll probably say more about the last in a future refutation post.

Anyway, I’m sorry for that bit. I recommend Nathan’s well written series.


Wednesday, April 25th, 2012

I attended BayHac over the weekend. There were a bunch of interesting impromptu talks. Notes on all those I recall follow, with other observations at the end.

  • The first talk encouraged people to get up, and demonstrated some hand stretches. Although almost everyone knows sitting hunched up all day is harmful, almost everyone needs an occasional reminder. A mention at any conference is well worthwhile for the individuals and community in question.
  • Plush is a POSIX shell server (in Haskell) with a web UI (Javascript; communication between them with JSON, session initiated with an unguessable URL), which already provides some nice context and control over display not available in a usual table, e.g., the output of each command is collapsible, pieces of the current path are clickable, and there are tooltips for each command argument.
  • You currently have to register (no verification) to see anything, but GitStar is a GitHub clone built on Hails, a framework for hosting mutually untrusted web applications (eg project wiki and source browser in case of GitStar), at least with respect to access to each others’ data, which is controlled via “Labeled IO”, with labels specifying policy around data based on Information Flow Control, a subject I had not heard of. GitStar and Hails source is mirrored on GitHub. An initial research paper and promise of more at the bottom of a README.
  • Visi is a language implemented in Haskell that seems somewhere between a spreadsheet and a traditional programming language read-eval-print-loop (ad hoc, immediate recalculation, but no grid). Spreadsheet programming is something I know almost nothing about, and ought to.
  • Composable Pipes. For readers who care about such things, note author dissuaded from using GPL in linked thread.
  • Something about typesafe reuse of types extending Agda’s typesystem. I understood very little (my fault).
  • cabal branch will checkout source for any Haskell package with source repository annotations — source of the specific version you’re using, if annotation specifies source-repository this.
  • A talk about Lift, a Scala web framework, mostly concerning the benefits of passing around a DOM representation rather than treating templates as blobs of text. I’m impressed by Lift, and played a bit with it a couple years ago, but was in no place to spend time to develop any real application.
  • Implementations of Paxos and parallel builds.
  • Interacting with DBUS (eg GNOME and KDE applications) from Haskell.
  • Shelly, a library for shell scripting in Haskell. Side point made that scripting languages, including Ruby, find initial popularity through scripting by sysadmins, not developer frameworks — true to my experience.
  • Visualizing n-gram relationships with SVG output.
  • Translating simple art pieces in Forth to C.
  • Pingwell is creating apps to bring pricing and other information to consumers when they can act on it, eg in a grocery store. I’m pretty sure this scenario has been imagined thousands of times over the past few decades, good that it will come to exist soon. The talk was mostly about using a Haskell computer vision library.

Other observations:

  • Macbooks in majority, but lower proportion than usual — and many, perhaps a majority, of people with Macbooks seemed to be developing on Linux in a virtual machine.
  • 100% male attendees, which is a bit disturbing, but I detected zero brogrammer vibe.
  • The first day was hosted at Hacker Dojo, which I had heard of but never visited. I was surprised at how large and quiet it was. At least during the day, it seems dozens of people use as a coworking space.
  • Web application development, Yesod in particular, is attracting more people to Haskell (I can’t find a reference, but recall that #haskell and/or /r/haskell watchers increased substantially on the day Yesod 1.0 was released). Newbie attendees (me included) leaning Haskell and Yesod further evidence.
  • Lots of anguish and anguished cries about dependency hell.

Thanks to BayHac organizer Mark Lentczner (also Plush developer and haskell-patform release manager; watch his intro to Haskell video) for putting together such a well run and friendly event. I felt some trepidation about attending, knowing that almost everyone would be both smarter and more experienced than me, but everyone was helpful and patient. I’m glad I went.

unset GREP_OPTIONS (alias grep instead) to get an “acceptable egrep”

Saturday, April 21st, 2012

Building some software from source, I recently encountered

checking for egrep... configure: error: no acceptable egrep could be found in /usr/local/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/usr/xpg4/binL

and couldn’t find a solution anywhere. In the shower I remembered setting GREP_OPTIONS in my environment. That seems to have been the problem. After unsetting GREP_OPTIONS and obtaining the same default behavior for myself with

alias grep='grep --color=auto --perl-regexp'

the error went away. I guess configure is finding and running /bin/grep, which is affected by the environment, but bypasses any aliases.

FOSDEM 2012 and computational diversity

Saturday, February 11th, 2012

I spent day 1 of FOSDEM in the legal devroom and day 2 mostly talking to a small fraction of the attendees I would’ve liked to meet or catch up with. I didn’t experience the thing I find in concept most interesting about FOSDEM: devrooms (basically 1-day tracks in medium sized classrooms) dedicated to things that haven’t been hyped in ~20 years but are probably still doing very interesting things technically and otherwise, eg microkernels and Ada.

Ada has an interesting history that I’d like to hear more about, with the requirement of highly reliable software (I suspect an undervalued characteristic; I have no idea whether Ada has proven successful in this regard, would appreciate pointers) and fast execution (on microbenchmarks anyway), and even an interesting free software story in that history, some of which is mentioned in a FOSDEM pre-interview.

I suppose FOSDEM’s low cost (volunteer run, no registration) and largeness (5000 attendees) allows for such seemingly surprising, retro, and maybe important tracks — awareness of computational diversity is good at least for fun, and for showing that whatever systems people are attached to or are hyping at the moment are not preordained.

I also wanted to mention one lightning talk I managed to see — Mike Sheldon on [update 20120213: video], which I think is one of the most important software projects for free culture — because it facilitates not production or sharing of “content”, but of popularity (I’ve mentioned as “peer production of [free] cultural relevance”). Sheldon (whose voice you can hear on the occasional podcast) stated that GNU FM (the software runs) will support sharing of listener tastes across installations, so that a user of or a personal instance might tell another instance (say one set up for a local festival) to recommend music that instance knows about based on a significant history. Sounds neat. You can see what libre music I listen to at and more usefully get recommendations for yourself.

Addendum: In preemptive defense of this post’s title, of course I realize neither microkernels nor Ada are remotely weird, retro, alternative, etc. and that there are many other not quite mainstream but still relevant and modern systems and paradigms (hmm, free software desktops)…


It started snowing as soon as I arrived in Brussels, and was rather cold.


I got on the wrong train to the airport and got to see the Leuven train station. I made it to the airport half an hour before my flight, and arrived at the gate during pre-boarding. Try that in a US airport.

Web Data Common[s] Crawl Attribution Metadata

Monday, January 23rd, 2012

Via I see Web Data Commons which has “extracted structured data out of 1% of the currently available Common Crawl corpus dating October 2010”. WDC publishes the extracted data as N-Quads (the fourth item denotes the immediate provenance of each subject/predictate/object triple — the URL the triple was extracted from).

I thought it would be easy and fun to run some queries on the WDC dataset to get an idea of how annotations associated with Creative Commons licensing are used. Notes below on exactly what I did. The biggest limitation is that the license statement itself is not part of the dataset — not as xhv:license in the RDFa portion, and for some reason rel=license microformat has zero records. But cc:attributionName, cc:attributionURL, and cc:morePermissions are present in the RDFa part, as are some Dublin Core properties that the Creative Commons license chooser asks for (I only looked at dc:source) but are probably widely used in other contexts as well.

Dataset URLs Distinct objects
Common Crawl 2010 corpus 5,000,000,000a
1% sampled by WDC ~50,000,000
with RDFa 158,184b
with a cc: property 26,245c
cc:attributionName 24,942d 990e
cc:attributionURL 25,082f 3,392g
dc:source 7,235h 574i
cc:morePermissions 4,791j 253k
cc:attributionURL = dc:source 5,421l
cc:attributionURL = cc:morePermissions 1,880m
cc:attributionURL = subject 203n

Some quick takeaways:

  • Low ratio of distinct attributionURLs probably indicates HTML from license chooser deployed without any parameterization. Often the subject or current page will be the most useful attributionURL (but 203 above would probably be much higher with canonicalization). Note all of the CC licenses require that such a URL refer to the copyright notice or licensing information for the Work. Unless one has set up a side-wide license notice somewhere, a static URL is probably not the right thing to request in terms of requiring licensees to provide an attribution link; nor is a non-specific attribution link as useful to readers as a direct link to the work in question. As (and if) support for attribution metadata gets built into Creative Commons-aware CMSes, the ratio of distinct attributionURLs ought increase.
  • 79% of subjects with both dc:source and cc:attributionURL (6,836o) have the same values for both properties. This probably means people are merely entering their URL into every form field requesting a URL without thinking, not self-remixing.
  • 47% of subjects with both cc:morePermissions and cc:attributionURL (3,977p) have the same values for both properties. Unclear why this ratio is so much lower than previous; it ought be higher, as often same value for both makes sense. Unsurprising that cc:morePermissions least provided property; in my experience few people understand it.

I did not look at the provenance item at all. It’d be interesting to see what kind of assertions are being made across authority boundaries (e.g. a page on makes a statements with an URI as the subject) and when to discard such. I barely looked directly at the raw data at all; just enough to feel that my aggregate numbers could possibly be accurate. More could probably be gained by inspecting smaller samples in detail, informing other aggregate queries.

I look forward to future extracts. Thanks indirectly to Common Crawl for providing the crawl!

Please point out any egregious mistakes made below…

# a I don't really know if the October 2010 corpus is the
# entire 5 billion Common Crawl corpus

# download RDFa extract from Web Data Commons
wget -c

# Matches number stated at
wc -l ccrdf.html-rdfa.nq

# Includes easy to use no-server triplestore
apt-get install redland-utils

# sanity check
grep '<>' ccrdf.html-rdfa.nq |wc -l

# Import rejects a number of triples for syntax errors
rdfproc xyz parse ccrdf.html-rdfa.nq nquads

# d Perhaps syntax errors explains fewer triples than above grep might
# indicate, but close enough
rdfproc xyz query sparql - 'select ?o where { ?s <> ?o}' |wc -l

# These replicated below with 4store because...
rdfproc xyz query sparql - 'select distinct ?o where { ?s <> ?o}' |wc -l
rdfproc xyz query sparql - 'select ?o where { ?s <> ?o}' |wc -l
rdfproc xyz query sparql - 'select distinct ?o where { ?s <> ?o}' |wc -l
rdfproc xyz query sparql - 'select ?o where { ?o <> ?o }' |wc -l
rdfproc xyz query sparql - 'select ?o where { ?s <> ?o}' |wc -l
rdfproc xyz query sparql - 'select distinct ?o where { ?s <> ?o}' |wc -l
rdfproc xyz query sparql - 'select ?o where { ?o <> ?o }' |wc -l

# ...this query takes forever, hours, and I have no idea why
rdfproc xyz query sparql - 'select ?s, ?o where { ?s <> ?o ; <> ?o }'

# 4store has a server, but is lightweight
apt-get install 4store

# 4store couldn't import with syntax errors, so export good triples from
# previous store first
rdfproc xyz serialize > ccrdf.export-rdfa.rdf

# import into 4store
curl -T ccrdf.export-rdfa.rdf 'http://localhost:8080/data/wdc'

# egrep is to get rid of headers and status output prefixed by ? or #
4s-query wdc -s '-1' -f text 'select ?s, ?o where { ?s <> ?o}' |egrep -v '^[\?\#]' |wc -l

4s-query wdc -s '-1' -f text 'select ?s, ?o where { ?s <> ?o}' |egrep -v '^[\?\#]' |wc -l

4s-query wdc -s '-1' -f text 'select ?s, ?o where { ?s <> ?o}' |egrep -v '^[\?\#]' |wc -l

#Of course please use instead.
#Should be more widely deployed soon.
4s-query wdc -s '-1' -f text 'select ?s, ?o where { ?s <> ?o}' |egrep -v '^[\?\#]' |wc -l

4s-query wdc -s '-1' -f text 'select ?s, ?o where { ?s <> ?o}' |egrep -v '^[\?\#]' |wc -l

4s-query wdc -s '-1' -f text 'select distinct ?o where { ?s <> ?o}' |egrep -v '^[\?\#]' |wc -l

4s-query wdc -s '-1' -f text 'select distinct ?o where { ?s <> ?o}' |egrep -v '^[\?\#]' |wc -l

4s-query wdc -s '-1' -f text 'select distinct ?o where { ?s <> ?o}' |egrep -v '^[\?\#]' |wc -l

4s-query wdc -s '-1' -f text 'select distinct ?o where { ?s <> ?o}' |egrep -v '^[\?\#]' |wc -l

4s-query wdc -s '-1' -f text 'select ?o where { ?o <> ?o}' |egrep -v '^[\?\#]' |wc -l

4s-query wdc -s '-1' -f text 'select ?o where { ?o <> ?o}' |egrep -v '^[\?\#]' |wc -l

4s-query wdc -s '-1' -f text 'select ?o where { ?o <> ?o}' |egrep -v '^[\?\#]' |wc -l

4s-query wdc -s '-1' -f text 'select ?s, ?o where { ?s <> ?o ; <> ?o }' |egrep -v '^[\?\#]' |wc -l

4s-query wdc -s '-1' -f text 'select distinct ?o where { ?s <> ?o ; <> ?o }' |egrep -v '^[\?\#]' |wc -l

4s-query wdc -s '-1' -f text 'select ?o where { ?o <> ?o ; <> ?o }' |egrep -v '^[\?\#]' |wc -l

4s-query wdc -s '-1' -f text 'select ?s, ?o where { ?s <> ?o ; <> ?o }' |egrep -v '^[\?\#]' |wc -l

4s-query wdc -s '-1' -f text 'select distinct ?o where { ?s <> ?o ; <> ?o }' |egrep -v '^[\?\#]' |wc -l

4s-query wdc -s '-1' -f text 'select ?o where { ?o <> ?o ; <> ?o }' |egrep -v '^[\?\#]' |wc -l

4s-query wdc -s '-1' -f text 'select ?s, ?o, ?n where { ?s <> ?o ; <> ?n }' |egrep -v '^[\?\#]' |wc -l

4s-query wdc -s '-1' -f text 'select ?s, ?o, ?n where { ?s <> ?o ; <> ?n }' |egrep -v '^[\?\#]' |wc -l

4s-query wdc -s '-1' -f text 'select ?s, ?o, ?n, ?m where { ?s <> ?o ; <> ?n ; <> ?m }' |egrep -v '^[\?\#]' |wc -l
4s-query wdc -s '-1' -f text 'select ?s, ?o where { ?s <> ?o ; <> ?o ; <> ?o }' |egrep -v '^[\?\#]' |wc -l

4s-query wdc -s '-1' -f text 'select distinct ?s where { { ?s <> ?o } UNION { ?s <> ?n } UNION { ?s <> ?m }  }' |egrep -v '^[\?\#]' |wc -l

4s-query wdc -s '-1' -f text 'select distinct ?s where { { ?s <> ?o } UNION { ?s <> ?n }}' |egrep -v '^[\?\#]' |wc -l

#b note subjects not the same as pages data extracted from (158,184)
4s-query wdc -s '-1' -f text 'select distinct ?s where { ?s ?p ?o }'  |egrep -v '^[\?\#]' |wc -l

# Probably less than 1047250 claimed due to syntax errors
4s-query wdc -s '-1' -f text 'select ?s where { ?s ?p ?o }'  |egrep -v '^[\?\#]' |wc -l

4s-query wdc -s '-1' -f text 'select ?s where { ?s ?p ?s }'  |egrep -v '^[\?\#]' |wc -l

4s-query wdc -s '-1' -f text 'select ?s where { ?s ?s ?s }'  |egrep -v '^[\?\#]' |wc -l

4s-query wdc -s '-1' -f text 'select ?s where { ?s ?s ?o }'  |egrep -v '^[\?\#]' |wc -l

Faded sidebar

Sunday, January 22nd, 2012

I’ve occasionally mucked with my blog’s theme with a general aim of removing superfluous crap that makes reading posts harder. A couple weeks ago something Parker Higgins wrote inspired me to try a little harder:

How awesome is @niemanlab’s “faded” sidebar? And zen mode? That’s a site that cares about its readers.

You can view an archived version of the Nieman Lab site should their design change. The top of the sidebar isn’t faded, but overall I think the fading there is makes it a little easier to concentrate on a post without switching to “zen” mode (removing all navigation).

For my theme, I made the sidebar always faded except when hovered over, and float to the right as far away from the main content area as possible. All “header” content is in the sidebar so that there’s nothing preceding a post’s title.

I intended to remove anything hardcoded* for my blog, anything I don’t understand or not used, and anything that doesn’t validate, but I didn’t get very far on any of those. I doubt this will be useful to anyone, but patches welcome.

*Yes it is also a little ironic I’ve never bothered to published modified source used to run this blog until now.

CSS text overlay image, e.g. for attribution and license notice

Sunday, January 8th, 2012

A commenter called me on providing inadequate credit for an map image I used on this blog. I’ve often seen map credits overlaid on the bottom right of maps, so I decided to try that. I couldn’t find an example of using CSS to overlay text on an image that only showed the absolute minimum needed to achieve the effect, and explained why. Below is my attempt.

Example 1

The above may be a good example of when to not use a text overlay (there is already text at the bottom of the image), but the point is to demonstrate the effect, not to look good. I have an image and I want to overlay «Context+Colophon» at the bottom right of the image. Here’s the minimal code:

<div style="position:relative;z-index:0;width:510px">
  <img src=""/>
  <div style="position:absolute;z-index:1;right:0;bottom:0">
    <a href="">Context</a>+<a href="">Colophon</a>


The outer div creates a container which the text overlay will be aligned with. A position is necessary to enable z-index, which specifies how objects will stack. Here position:relative as I want the image and overlay to flow with the rest of the post, z-index:0 as the container is at the bottom of the stack. I specify width:510px as that’s how wide the image is, and without hardcoding the size of the div, the overlay as specified will float off to the right rather than align with the image. There’s nothing special about the img; it inherits from the outer div.

The inner div contains and styles the text I want to overlay. position:absolute as I will specify an absolute offset from the container, right:0;bottom:0, and z-index:1 to place above the image. Finally, I close both divs.

That’s it. I know precious little CSS; please tell me what I got wrong.

Example 2

Above is the image that prompted this post, with added attribution and license notice. Code:

<div style="z-index:0;position:relative;width:560px"
  <a href=";lon=-122.2776&amp;zoom=14&amp;layers=Q">
    <img src=""/></a>
  <div style="position:absolute;z-index:1;right:0;bottom:0;">
      © <a rel="cc:attributionURL"
           href=";lon=-122.2776&amp;zoom=14&amp;layers=Q">OpenStreetMap contributors</a>,
        <a rel="license"


With respect to the achieving the text overlay, there’s nothing in this example not in the first. Below I explain annotations added that (but are not required by) fulfillment of OpenStreetMap/CC-BY-SA attribution and license notice.

The xmlns:ccprefix, and even that may be superfluous, given cc: as a default prefix.

about sets the subject of subsequent annotations.

small isn’t an annotation, but does now seem appropriate for legal notices, and is usually rendered nicely.

rel="cc:attributionURL" says that the value of the href property is the link to use for attributing the subject. property="cc:attributionName" says that the text (“OpenStreetMap contributors”) is the name to use for attributing the subject. rel="license" says the value of its href property is the subject’s license.

If you’re bad and not using HTTPS-Everywhere (referrer not sent due to protocol change; actually I’m bad for not serving this blog over https), clicking on BY-SA above might obtain a snippet of HTML with credits for others to use. Or you can copy and paste the above code into RDFa Distiller or checkrdfa to see that the annotations are as I’ve said.

Addendum: If you’re reading this in a feed reader or aggregator, there’s a good chance inline CSS is stripped — text intended to overlay images will appear under rather than overlaying images. Click through to the post in order to see the overlays work.