TheJoyOfHack

For people who like to make things

In this post I’ll show you how I added tag clouds to my Octopress blog.

Before I go any further I want to stress something: While this is a robust solution, it’s not the optimal solution, and not a permanent solution. I don’t know whether it’s particularly elegant, either. It’s not ugly, but it isn’t beautiful. There is at least one project in the works to add tags to Octopress. When that’s pulled into the main Octopress repository, it will probably be the way tag clouds ought to be done. Despite that I came up with my own way to do it for the following reasons:

  • I don’t know Ruby yet. I’m learning, but I don’t feel comfortable incorporating so much code I don’t understand.
  • I don’t want to re-invent the wheel in Ruby. Ted Kulp, the author of the commit shown above, seems to have done all the heavy lifting. I look forward to incorporating his changes once they’re in Octopress proper. My solution’s written in Perl, and as far as I’m concerned, my solution is a short-term one.

Having said that, read on to see how I got the tag clouds that you see in this blog working.

There are three parts to this problem:

  1. Displaying a list of tags applied to the current post at the bottom of each post.
  2. Clicking through from a tag name in that list to a tag page that lists all posts with that tag.
  3. Generating and displaying the tag cloud.

Displaying a list of tags

This was pretty easy to get right. I modified source/_layouts/post.html to include a new file called tags.html (View in GitHub):

...
       {% include post/author.html %}
       {% include post/date.html %}{% if updated %}{{ updated }}{% else %}{{ time }}{% endif %}
       {% include post/categories.html %}
+      {% include post/tags.html %}
     </p>
     {% unless page.sharing == false %}
       {% include post/sharing.html %}
...

I then created source/_includes/post/tags.html which is shown in it’s entirety here: (View in GitHub)

<div id="atag_list">
    Tags: 
    <ul id="tags_ul">
{% for t in page.tags  %}
        <li><a href="/tags/{{t}}/">{{t}}</a></li>
{% endfor %}
    </ul>
</div>

I chose to display tags as a list so that screen readers and other renderers interpret this correctly as a list of entries, rather than a sentence where each word is an anchor. Finally I added the appropriate CSS to display the tag list properly in the browser.

div#tag_list {
    font-size: 12pt;
}

#tags_ul { 
    display: inline;
}

#tags_ul li:last-child:after {
  content: "";
}

#tags_ul li:after {
  content: ", ";
}

#tags_ul li {
    display: inline; 
}

Generating the tag files and tag cloud

This was the most involved step of all. I wrote a script called tagify.pl that generates the tag markdown files as well as a file containing the HTML markup for the tag cloud. I started off parsing the markdown files with a YAML parser and gathering the tags from there. But then I realized that for each tag’s page I would have had to know the URL of the posts that are marked with that tag. If I only have access to the markdown then I would have to duplicate Octopress’s code that determines a location for a post in the public directory - the /yyyy/mm/dd/modified-file-name.html logic.

Rather than do that and tightly couple my hack with Octopress’s code I decided to parse the generated HTML instead. Therefore, my script would have to be run after rake generate is called. But! My script has to generate tag files and tag clouds that get pulled into asides in the sidebar! That means that rake generate has to be called after my script as well - a second time. Now do you believe me when I say that this is a temporary solution? It works, but it’s not very efficient.

But you know what? I’m happy with it. I could have waited until I was done learning Ruby, and done learning the Liquid Template Manager, and done learning Octopress’s idioms and then started with this project. I could have waited until Ted Kulp’s changes were pulled into the master Octopress branch. But I know what I want, and I know I can write decent code. It took a few hours, but I was able to get tag clouds done and move on. I’m not emotionally attached to this code and will gladly abandon it when something better comes along. The way I’ve designed it, it will be easy to pull it out - instead of running

rake generate
./tagify.pl
rake generate

I’ll just run

rake generate

Since tagify.pl is rather long (307 lines with comments), I’ll include it in the Appendix at the bottom of this post, and just link to it here.

What’s important to remember is that tagify.pl does 3 things:

  • It creates a single file called source/_includes/custom/tag_cloud.html that looks like this

    :::text

The title is included for accessibility, and the classes tag_1 through tag_10 are used to display the tags in the appropriate size.

  • For each tag it creates a file that lists all the posts tagged with that tag in reverse chronological order. For the tag Editors it would create source/tags/Editors/index.markdown

  • It creates one file called source/tags/index.markdown that includes the tag_cloud.html file in the main article area.

Displaying the tag cloud

To display the tag cloud in the right sidebar I added a default aside in _config.yaml:

default_asides: [asides/recent_posts.html, asides/twitter.html, asides/tag_cloud.html]

I then created the file source/_includes/asides/tag_cloud.html. This file includes the source/_includes/custom/tag_cloud.html file that was generated by tagify.plabove.

<section>
    <h1>Tags</h1>
    <div class="tag_cloud">
     {% include custom/tag_cloud.html %}
    </div>
</section>

I then modified sass/custom/_styles.css to include the css for each of the 10 tag ‘buckets’:

.tag_1 { 
    font-weight: 200; 
    font-size: 10pt;
}
.tag_2 { 
    font-weight: 200; 
    font-size: 12pt;
}
...
.tag_10 { 
    font-weight: 900; 
    font-size: 24pt;
}

Finally, I added a line to source/_includes/custom/navigation.html to link to the main Tags page: (View in GitHub)

...
   <li><a href="{{ root_url }}/">Home</a></li>
   <li><a href="{{ root_url }}/about/">About Me</a></li>
   <li><a href="{{ root_url }}/categories/">Categories</a></li>
+  <li><a href="{{ root_url }}/tags/">Tags</a></li>
   <li><a href="{{ root_url }}/blog/archives">Archives</a></li>
...

Summary

It bears mentioning that you don’t always need to run tagify.pl. You only need to run it if you’ve updated the tags on a post. If you have changed a tag, you must run rake generate before and after running publishing tagify.pl. If you’re just working on edits to a post before it, you don’t need to run tagify.pl every time you want to view your post on your local machihne . rake generate is enough.

I’m glad to say that I was able to get tags to work with Octopress exactly the way I wanted. It was pretty quick, too. It took me longer to write this blog post than to actually do the work. If you like this post, please let me know on Twitter, where I’m @_aijaz_. Thanks.

Appendix - tagify.pl

This is what tagify.pl looks like. I describe the code in the comments within the file.

tagify.pl tagify.pl download
#!/usr/bin/perl

use strict;
use warnings;
use File::Find;
use HTML::TreeBuilder;
use Getopt::Long;

my $octopress_root;

my $options_read = GetOptions("dir=s", \$octopress_root);

############################################################
unless ($options_read && $octopress_root) { 
    print "\n";
    print "\n";
    print "usage: tagify.pl --dir d\n";
    print "\n";
    print "where d is the root octopress directory\n";
    print "   - the parent of source, public, etc.\n";
    print "\n";
    exit 1;
}

# The tag cloud HTML gets saved into this file.
# This file is included by two others: 
#  a) The file used for the sidebar aside
#  b) The page used to display all tags 
#     (accessible as /tags/index.html)
#
my $custom_file = "$octopress_root/source/_includes".
                  "/custom/tag_cloud.html";

# This is the data structure that contains all the tag 
# data parsed by the HTML files.
# It's key is the tag name (not case-normalized).
# The value is another hash.  That hash has 2 keys:
#  count - number of pages with that tag
#  range_num - a number from 1 - 10 indicating 
#              popularity (see below)
#  pages - an array of hashes
#
#  Each hash in the pages array has 3 keys: 
#  title - the HTML title of the post
#  file  - the full file name of the HTML file
#  categories - yet another hash
#
#  The categories hash has two keys: 
#  href - the url to the category page (as determined
#         by OctoPress)
#  text - the name of the category (as displayed by 
#         Octopress)
#
my $tag_data      = { };

# This function populates the tag_data data structure
#
find(\&getTags, "$octopress_root/public");

# Find the number of times the most popular tag is used
#
my $max = 1;  # start with 1, not 0 to prevent a 
              # divide-by-zero error later
              # if none of the posts have tags
foreach my $tag (keys %$tag_data) {
    $tag_data->{$tag}->{count} = scalar(@{$tag_data->{$tag}->{pages}});
    if ($tag_data->{$tag}->{count} > $max) {
        $max = $tag_data->{$tag}->{count};
    }
}

# Assign each tag a range number from 1 - 10
# based on popularity.  This range number will
# be used along with CSS to print tags with
# the appropriate size.
#
foreach my $tag (keys %$tag_data) {
    $tag_data->{$tag}->{range_num} = 
      int(($tag_data->{$tag}->{count} / $max) 
          * 10 
          + 0.5); # nearest whole number

    if ($tag_data->{$tag}->{range_num} == 0) {
        $tag_data->{$tag}->{range_num} = 1;
        # we want 1-10, not 0-10
    }
}

# Write the tag cloud file
#
open (O, ">$custom_file") || die;
print O "<div id='tag_cloud'>\n";

# sort by tag name, case insensitive
#
foreach my $tag (sort { lc($a) cmp lc($b)} 
                      keys %$tag_data) {

    # give each tag anchor a title, 
    # for screen readers and the like
    #
    my $plural = "y";
    if ($tag_data->{$tag}->{count} > 1) { 
        $plural = 'ies'; 
    }

    print O qq[<a href="/tags/$tag/" ];
    print O qq[title="$tag_data->{$tag}->{count} entr$plural" ];
    print O qq[class="tag_$tag_data->{$tag}->{range_num}">];
    print O qq[$tag];
    print O qq[</a>\n];
}
print O "</div>\n";
close O;


# Now save the individual tag files
# First, clear out the directory because we're gonna 
# regenerate all the files.
#
my $tag_dir = "$octopress_root/source/tags";

# If source/tags exists but is a file
#
die "source/tags is a file" if (-f $tag_dir);

# Create the directory if it doesn't exist
#
if (!-d $tag_dir) { 
    mkdir $tag_dir;
    createTagsIndexMarkdown($tag_dir);
}
else {
    # Delete all directories under source/tags.
    # We don't want to delete everything because we 
    # need to preserve tags/index.markdown in case 
    # something was modified there.
    #
    my $dirs = `find $tag_dir/* -type d`;

    my @dirs = split(/[\r\n]+/, $dirs);
    foreach my $dir (@dirs) {
        `/bin/rm -rf $dir`;
    }

}


# Make a file for each tag.
#
foreach my $tag (keys %$tag_data) { 
    makeTagFile($tag);
}


##  ####################################################
##  Functions
##  ####################################################

sub makeTagFile { 
    my $tag = shift;
    mkdir "source/tags/$tag" 
      || 
      die "Couldn't make directory source/tags/$tag";

    open (O, "> source/tags/$tag/index.markdown") 
      || 
      die "Can't open source/tags/$tag/index.markdown";

    print O qq^---
layout: page
title: Tag&#58; $tag
footer: false
---

<div id="blog-archives" class="category">
^;

    my $year = 0;

    # Sort by file name descending
    # This is the same as sorting by date descending
    #
    foreach my $file (
               sort { $b->{file} cmp $a->{file} } 
                    @{$tag_data->{$tag}->{pages}}) {

        # Get the year month and date
        #
        my ($yyyy, $mm, $dd) = $file->{file} =~ 
            m!(\d\d\d\d)/(\d\d)/(\d\d)/!;

        # The HTML and associated logic here mimics
        # the HTML of the category pages - print 
        # a H2 for every new year
        #
        if ($yyyy != $year) { 
            $year = $yyyy;
            print O "<h2>$year</h2>\n";
        }
        
        # Construct the URL & date string
        #
        my $url = $file->{file};
        $url =~ s/^$octopress_root\/public//;
        my $title = $file->{title};
        my @months = qw ( x Jan Feb Mar Apr May Jun 
                          Jul Aug Sep Oct Nov Dec );
        my $mon = $months[$mm * 1];

        print O qq[
<article>
<h1><a href="$url">$title</a></h1>
<time datetime="$yyyy-$mm-${dd}T00:00:00-06:00" pubdate><span class='month'>$mon</span> <span class='day'>$dd</span> <span class='year'>$yyyy</span></time>
<footer>
<span class="categories">posted in 
];
        # Print each category, separated by commas
        #
        print O join(", ", 
            map { "<a href='$_->{href}'>$_->{text}</a>" } 
                @{$file->{categories}}
            );
            
        print O qq[</span>
</footer>
</article>
];
    }

    print O "</div>\n";
    close O;
}

        

sub getTags {
    my $file = $File::Find::name;

    # Only parse files that look like posts
    #
    return unless $file =~ /\.html$/;
    return unless $file =~ 
          /^$octopress_root\/public\/\d{4}\/\d{2}\/\d{2}\//;

    # Read the contents of the HTML file
    #
    open (HTML, $_) || die "Can't open $file";
    my $contents = join("", <HTML>);
    close HTML;

    my $tree = HTML::TreeBuilder->new();
    $tree->parse($contents);

    # Get the title
    #
    my $title = $tree->look_down(_tag  => "h1", 
                                 class => "entry-title");
    $title = $title->as_trimmed_text();

    # Get the categories
    #
    my $category_ent = $tree->look_down(_tag  => "span", 
                                        class => "categories");
    my @as = $category_ent->look_down(_tag  => "a", 
                                      class => "category");
    my @categories = ();
    foreach my $a (@as) { 
        push(@categories, 
            { href => $a->attr('href'), 
              text => $a->as_trimmed_text
            });
    }

    # Get the tags
    #
    my $ul = $tree->look_down("_tag", "ul", 
                              "id"  , "tags_ul");
    if ($ul) { 
        my @items = $ul->look_down("_tag" => "li");
        foreach my $item (@items) {
            my $tag = $item->as_trimmed_text();

            # Finally, populate the data structure
            #
            push (@{$tag_data->{$tag}->{pages}}, 
                { title      => $title, 
                  file       => $file, 
                  categories => \@categories 
          } );
        }
    }
    else { 
        # no tags in this document
    }

    $tree->delete();
    
}

# This function creates a default 
# source/tags/index.markdown
#
sub createTagsIndexMarkdown {
    my $tag_dir = shift;
    open (O, ">$tag_dir/index.markdown");
    print O qq[---
layout: page
title: Tags
footer: false
---

<div class="tag_page">
 {% include custom/tag_cloud.html %}
</div>
];
    close O;
}