In this post I’ll show you how I added tag clouds to my Octopress blog.
Before I go any further I want to stress something: While this is a robust
solution, it’s not the optimal solution, and not a permanent solution. I
don’t know whether it’s particularly elegant, either. It’s not ugly, but it isn’t beautiful. There
is at least one project in the
works to add tags to Octopress. When that’s pulled into the main Octopress
repository, it will probably be the way tag clouds ought to be done. Despite that
I came up with my own way to do it for the following reasons:
I don’t know Ruby yet. I’m learning, but I don’t feel comfortable
incorporating so much code I don’t understand.
I don’t want to re-invent the wheel in Ruby. Ted Kulp, the author of the
commit shown above, seems to have done all the heavy lifting. I look forward
to incorporating his changes once they’re in Octopress proper. My solution’s
written in Perl, and as far as I’m concerned, my solution is a short-term
one.
Having said that, read on to see how I got the tag clouds that you see in this blog working.
There are three parts to this problem:
Displaying a list of tags applied to the current post at the bottom of each
post.
Clicking through from a tag name in that list to a tag page that lists all
posts with that tag.
Generating and displaying the tag cloud.
Displaying a list of tags
This was pretty easy to get right. I modified source/_layouts/post.html to include a new file called tags.html:
<divid="tag_list"> Tags:
<ulid="tags_ul">{% for t in page.tags %}
<li><ahref="/tags/{{t}}/">{{t}}</a></li>{% endfor %}
</ul></div>
I chose to display tags as a list so that screen readers and other renderers
interpret this correctly as a list of entries, rather than a sentence where
each word is an anchor. Finally I added the appropriate CSS to display the tag
list properly in the browser.
This was the most involved step of all. I wrote a script called tagify.pl
that generates the tag markdown files as well as a file containing the HTML
markup for the tag cloud. I started off parsing the markdown files with a YAML
parser and gathering the tags from there. But then I realized that for each
tag’s page I would have had to know the URL of the posts that are marked with
that tag. If I only have access to the markdown then I would have to duplicate
Octopress’s code that determines a location for a post in the public directory
- the /yyyy/mm/dd/modified-file-name.html logic.
Rather than do that and tightly couple my hack with Octopress’s code I decided
to parse the generated HTML instead. Therefore, my script would have to be run
after rake generate is called. But! My script has to generate tag
files and tag clouds that get pulled into asides in the sidebar! That means
that rake generate has to be called after my script as well - a second
time. Now do you believe me when I say that this is a temporary solution? It
works, but it’s not very efficient.
But you know what? I’m happy with it. I could have waited until I was done
learning Ruby, and done learning the Liquid Template Manager, and done learning
Octopress’s idioms and then started with this project. I could have waited
until Ted Kulp’s changes were pulled into the master Octopress branch. But I
know what I want, and I know I can write decent code. It took a few hours, but
I was able to get tag clouds done and move on. I’m not emotionally attached to
this code and will gladly abandon it when something better comes along. The
way I’ve designed it, it will be easy to pull it out - instead of running
rake generate
./tagify.pl
rake generate
I’ll just run
rake generate
Since tagify.pl is rather long (307 lines with comments), I’ll include it
in the Appendix at the bottom of this post, and just
link to it here.
What’s important to remember is that tagify.pl does 3 things:
It creates a single file called source/_includes/custom/tag_cloud.html that looks like this
The title is included for accessibility, and the classes tag_1
through tag_10 are used to display the tags in the appropriate size.
For each tag it creates a file that lists all the posts tagged with that tag in reverse chronological order. For the tag Editors it would create source/tags/Editors/index.markdown
It creates one file called source/tags/index.markdown that includes the tag_cloud.html file in the main article area.
Displaying the tag cloud
To display the tag cloud in the right sidebar I added a default aside in _config.yaml:
I then created the file source/_includes/asides/tag_cloud.html. This
file includes the source/_includes/custom/tag_cloud.html file that was
generated by tagify.plabove.
source/_includes/asides/tag_cloud.html
123456
<section><h1>Tags</h1><divclass="tag_cloud"> {% include custom/tag_cloud.html %}
</div></section>
I then modified sass/custom/_styles.css to include the css for each of the 10 tag ‘buckets’:
It bears mentioning that you don’t always need to run tagify.pl. You
only need to run it if you’ve updated the tags on a post. If you have changed
a tag, you must run rake generate before and after running
tagify.pl. If you’re just working on edits to a post before publishing
it, you don’t need to run tagify.pl every time you want to view your post
on your local machihne . rake generate is enough.
I’m glad to say that I was able to get tags to work with Octopress exactly the
way I wanted. It was pretty quick, too. It took me longer to write this blog
post than to actually do the work. If you like this post, please let me know
on Twitter, where I’m @_aijaz_. Thanks.
Appendix - tagify.pl
This is what tagify.pl looks like. I describe the code in the comments within the file.
#!/usr/bin/perlusestrict;usewarnings;useFile::Find;useHTML::TreeBuilder;useGetopt::Long;my$octopress_root;my$options_read=GetOptions("dir=s",\$octopress_root);############################################################unless($options_read&&$octopress_root){print"\n";print"\n";print"usage: tagify.pl --dir d\n";print"\n";print"where d is the root octopress directory\n";print" - the parent of source, public, etc.\n";print"\n";exit1;}# The tag cloud HTML gets saved into this file.# This file is included by two others: # a) The file used for the sidebar aside# b) The page used to display all tags # (accessible as /tags/index.html)#my$custom_file="$octopress_root/source/_includes"."/custom/tag_cloud.html";# This is the data structure that contains all the tag # data parsed by the HTML files.# It's key is the tag name (not case-normalized).# The value is another hash. That hash has 2 keys:# count - number of pages with that tag# range_num - a number from 1 - 10 indicating # popularity (see below)# pages - an array of hashes## Each hash in the pages array has 3 keys: # title - the HTML title of the post# file - the full file name of the HTML file# categories - yet another hash## The categories hash has two keys: # href - the url to the category page (as determined# by OctoPress)# text - the name of the category (as displayed by # Octopress)#my$tag_data={};# This function populates the tag_data data structure#find(\&getTags,"$octopress_root/public");# Find the number of times the most popular tag is used#my$max=1;# start with 1, not 0 to prevent a # divide-by-zero error later# if none of the posts have tagsforeachmy$tag(keys%$tag_data){$tag_data->{$tag}->{count}=scalar(@{$tag_data->{$tag}->{pages}});if($tag_data->{$tag}->{count}>$max){$max=$tag_data->{$tag}->{count};}}# Assign each tag a range number from 1 - 10# based on popularity. This range number will# be used along with CSS to print tags with# the appropriate size.#foreachmy$tag(keys%$tag_data){$tag_data->{$tag}->{range_num}=int(($tag_data->{$tag}->{count}/$max)*10+0.5);# nearest whole numberif($tag_data->{$tag}->{range_num}==0){$tag_data->{$tag}->{range_num}=1;# we want 1-10, not 0-10}}# Write the tag cloud file#open(O,">$custom_file")||die;printO"<div id='tag_cloud'>\n";# sort by tag name, case insensitive#foreachmy$tag(sort{lc($a)cmplc($b)}keys%$tag_data){# give each tag anchor a title, # for screen readers and the like#my$plural="y";if($tag_data->{$tag}->{count}>1){$plural='ies';}printOqq[<a href="/tags/$tag/" ];printOqq[title="$tag_data->{$tag}->{count} entr$plural" ];printOqq[class="tag_$tag_data->{$tag}->{range_num}">];printOqq[$tag];printOqq[</a>\n];}printO"</div>\n";closeO;# Now save the individual tag files# First, clear out the directory because we're gonna # regenerate all the files.#my$tag_dir="$octopress_root/source/tags";# If source/tags exists but is a file#die"source/tags is a file"if(-f$tag_dir);# Create the directory if it doesn't exist#if(!-d$tag_dir){mkdir$tag_dir;createTagsIndexMarkdown($tag_dir);}else{# Delete all directories under source/tags.# We don't want to delete everything because we # need to preserve tags/index.markdown in case # something was modified there.#my$dirs=`find $tag_dir/* -type d`;my@dirs=split(/[\r\n]+/,$dirs);foreachmy$dir(@dirs){`/bin/rm -rf $dir`;}}# Make a file for each tag.#foreachmy$tag(keys%$tag_data){makeTagFile($tag);}## ###################################################### Functions## ####################################################sub makeTagFile{my$tag=shift;mkdir"source/tags/$tag"||die"Couldn't make directory source/tags/$tag";open(O,"> source/tags/$tag/index.markdown")||die"Can't open source/tags/$tag/index.markdown";printOqq^---layout: pagetitle: Tag: $tagfooter: false---<div id="blog-archives" class="category">^;my$year=0;# Sort by file name descending# This is the same as sorting by date descending#foreachmy$file(sort{$b->{file}cmp$a->{file}}@{$tag_data->{$tag}->{pages}}){# Get the year month and date#my($yyyy,$mm,$dd)=$file->{file}=~m!(\d\d\d\d)/(\d\d)/(\d\d)/!;# The HTML and associated logic here mimics# the HTML of the category pages - print # a H2 for every new year#if($yyyy!=$year){$year=$yyyy;printO"<h2>$year</h2>\n";}# Construct the URL & date string#my$url=$file->{file};$url=~s/^$octopress_root\/public//;my$title=$file->{title};my@months=qw ( xJanFebMarAprMayJunJulAugSepOctNovDec);my$mon=$months[$mm*1];printOqq[<article><h1><a href="$url">$title</a></h1><time datetime="$yyyy-$mm-${dd}T00:00:00-06:00" pubdate><span class='month'>$mon</span> <span class='day'>$dd</span> <span class='year'>$yyyy</span></time><footer><span class="categories">posted in ];# Print each category, separated by commas#printOjoin(", ",map{"<a href='$_->{href}'>$_->{text}</a>"}@{$file->{categories}});printOqq[</span></footer></article>];}printO"</div>\n";closeO;}sub getTags{my$file=$File::Find::name;# Only parse files that look like posts#returnunless$file=~ /\.html$/;returnunless$file=~ /^$octopress_root\/public\/\d{4}\/\d{2}\/\d{2}\//;# Read the contents of the HTML file#open(HTML,$_)||die"Can't open $file";my$contents=join("",<HTML>);closeHTML;my$tree=HTML::TreeBuilder->new();$tree->parse($contents);# Get the title#my$title=$tree->look_down(_tag=>"h1",class=>"entry-title");$title=$title->as_trimmed_text();# Get the categories#my$category_ent=$tree->look_down(_tag=>"span",class=>"categories");my@as=$category_ent->look_down(_tag=>"a",class=>"category");my@categories=();foreachmy$a(@as){push(@categories,{href=>$a->attr('href'),text=>$a->as_trimmed_text});}# Get the tags#my$ul=$tree->look_down("_tag","ul","id","tags_ul");if($ul){my@items=$ul->look_down("_tag"=>"li");foreachmy$item(@items){my$tag=$item->as_trimmed_text();# Finally, populate the data structure#push(@{$tag_data->{$tag}->{pages}},{title=>$title,file=>$file,categories=>\@categories});}}else{# no tags in this document}$tree->delete();}# This function creates a default # source/tags/index.markdown#sub createTagsIndexMarkdown{my$tag_dir=shift;open(O,">$tag_dir/index.markdown");printOqq[---layout: pagetitle: Tagsfooter: false---<div class="tag_page"> {% include custom/tag_cloud.html %}</div>];closeO;}