Tuesday, July 28, 2009

Tokyo Cabinet and Me

There's been lots of talk about Tokyo Cabinet (TC) lately so I had to get my hands on this shiny new toy. The appeal is that key value stores like TC offer better performance than traditional RDBMs, and are also dramatically easier to setup. They're not right for every project, but I had a project come up that seemed like a good fit. The project was a background task that interacted with the Twitter API and had to keep track of a large (millions) number of key-value combinations. Fast existence tests for keys and basic get/set/iterate methods was all that was needed. Essentially what I wanted was a persistent Ruby Hash.

I looked at Rufus Tokyo and the Ruby bindings written by Mikio Hirabayashi. I ended up using Hirabayashi's Ruby bindings, because they were a little closer to the metal, and the less code that can break the better. While there is no gem, installing from source was easy, both on Centos and Leopard. If you're on Leopard, don't use the macports versions - they're broken. I'm not using Tokyo Tyrant, as only one process needs to access this data I'm using Tokyo Cabinet directly.

The only things the Ruby bindings didn't give me were a simple open method and a way to serialize values so I could put any Ruby data structure into the cabinet.

Hirabayashi's Ruby bindings already provide Ruby Hash like access, including hsh['foo'] = "bar", and hsh.each. Nor did it make sense to create my own interface class when really all I needed was to extend a tiny subset of TokyoCabinet::HDB's existing methods. What I came up with is this:

require 'tokyocabinet'
include TokyoCabinet

# Duck Punch of TokyoCabinet Hash database that uses YAML to serialize Ruby objects. Serialization
# necessary since TC expects values as strings or integers so won't handle other data types. Also provides
# a consistent, simpler open method.
class TokyoCabinet::HDB

# initialize db and return handle to db. There is one db file per data structure, e.g.
# new hash means new database and database file so call init again. Creates db file
# if it doesn't already exist.

alias original_open open
def open(path_to_db)
# open the database
if !self.original_open(path_to_db, HDB::OWRITER | HDB::OCREAT)
ecode = self.ecode
STDERR.printf("open error: %s\ntry to open file #{path_to_db} - ", self.errmsg(ecode))

alias original_get_brackets []
def [](key)
result = self.original_get_brackets(key)
result ? YAML.load(result) : nil

alias original_set_brackets []=
def []=(key,value)
self.original_set_brackets(key, YAML.dump(value) )

alias original_each each
def each
self.original_each { |k, v| yield( k, YAML.load(v) ) }


Dear MySQL, I think we should see other people.