Unsatisfying, or perhaps this blog is just that uninteresting. Code used to produce yearly wordlists. Some possible improvements:
- Rewrite as WordPress plugin OR abstract from WordPress
- Case insensitivity
- Suppress common words (used Wordle menu for this, but it isn’t very aggressive), perhaps using a word frequency dataset
- Use free software alternative to Wordle to generate wordclouds (suggestions?)
- Automate generation of wordclouds (very difficult using Wordle, would involve browser automation, thus previous bullet)
I started doing this in part to see five years of topic changes on this blog, but mostly because if it worked well, I’d use it on the Creative Commons blog, which is a 6+ year mass of around 2,500 almost completely uncategorized/untagged posts. In that vein, I intend to look into automated term extraction and user tagging code.