Mongo Ruby Driver, Mongoid and MongoMapper

================================================================= =================================================================

Update Aug. 2010

On Whyday, I created a live demo of the examples, that is running on Heroku.

=============================================================== ===============================================================

I am constantly looking around for different storage mechanisms on Heroku that can be used for caching 3rd party data. A recent update of their platform offered an MongoDB addon to access the MongoHQ service that drew my attention, so I started to evaluate this noSQL document database…

MongoDB on OS X

It’s always a good starting point to have a local installation of a technology, here is how you get it running on your Mac with Homebrew:

brew install mongodb
# create a place for MongoDB to store the data
mkdir -p /data/db
# run server with default config (adapt to the right version)
mongod run --config /usr/local/Cellar/mongodb/1.4.4-x86_64/mongod.conf

Using MongoHQ requires a user-authentication, so it’s nice to have the same credentials on your local MongoDB instance:

# start the client
mongo
> use test
> db.addUser("test", "test")

evaluating different APIs

A very basic approach, that basically wraps the MongoDB API into Ruby code is the Mongo Ruby Driver, but there are two higher level APIs close to ActiveRecord called Mongoid and MongoMapper.

Mongo Ruby Driver

It’s pretty easy to connect to your MongoDB with the right connection string:

conn = Mongo::Connection.from_uri("mongodb://user:pass@host:port/db")
db = conn.db("db")

The Mongo Ruby Driver is very simple and close to the MongoDB API:

coll = db.collection('test')
coll.insert('a' => 1)
coll.find().each { |row| p row }

MongoMapper

MongoMapper can also be accessed with a connection string:

Mongo::Connection.from_uri(MONGO_URL)

Instead of using ActiveRecord::Base MongoMapper provides the MongoMapper::Document module to handle the object document mapping. Since the structure of a document in MongoDB is open and not static like in a SQL database, you have to define the structure in code, so MongoMapper knows how to map the document to your Ruby objects:

class Person
  include MongoMapper::Document

  key :name, String
  key :age, Integer
  key :born_at, Time
  key :active, Boolean
  key :fav_colors, Array

  connection Mongo::Connection.from_uri(MONGO_URL)
  set_database_name 'basement'
end

person = Person.create({
  :name => 'Nunemaker',
  :age => 27,
  :born_at => Time.mktime(1981, 11, 25, 2, 30),
  :active => true,
  :fav_colors => %w(red green blue)
})

person.save

Person.all.each do |p|
  ...
end

Mongoid

Configuring Mongoid is somewhat different but easy:

Mongoid.database = Mongo::Connection.new(host, port).db(db)
Mongoid.database.authenticate(user, pass)

The DSL for defining Mongoid Documents is similar to MongoMapper and works mostly the same way. Querying the database is also similar to the API provided by ActiveRecord:

class Tweeter 
  include Mongoid::Document 
  field :user 
  embeds_many :tweets 
end 

class Tweet 
  include Mongoid::Document 
  field :status, :type => String 

  embedded_in :tweeter, :inverse_of => :tweets 
end

tweet = Tweet.new(:status => "This is a tweet!") 
tweet.tweeter = Tweeter.new(:user => 'ted') 
tweet.save

Tweeter.all.each do |tweeter| 
  ...
end

You can get the complete code and some more links from the GitHub project created for testing.

MongoDB is a great way to store document focused data and it’s simple to use with these great libraries!

Simple DB caching for Heroku

Heroku is a great platform. I like the style of the page, I appreciate the documentation and you can start up for free! One thing that I miss a lot is decent caching. The readonly filesystem eats up a lot of flexibility.

I played around with HTTP caching and Herokus Varnish works really well. The problem is that my app loads a lot of stuff from different 3rd party services like Twitter, so every new visitor will have all the load time on his first visit. Not a surprise that New Relic indicates that request times were ‘Unacceptable’…

I would like to check out the ‘Memcached Basic’ plugin of Heroku, but I did not manage to get into the private beta. So there was no other option than implementing a DB cache.

There is just one requirement that I have. Load stuff from a 3rd party service only if it’s expired. For simplicity, expired means, that the data is older than a predefined interval. In my test environment I like to use a shorter period than in production, so I define the interval in the environment files:

# config/environments/development.rb
CACHE_TIME = 30.seconds

# config/environments/production.rb
CACHE_TIME = 10.hours

A simple key-data pair is enough for my needs, because I always have a unique key for the values I want to cache. I am using Marshal.dump/Marshal.load for serialization, as they play well with anonymous inner classes that YAML can’t deal with. Encoding the data Base64 helps working around some SQLITE issues with serialized data strings:

# app/models/storage.rb
class Storage < ActiveRecord::Base
  
  validates_presence_of :key, :data
  
  def data=(data)
    write_attribute :data, ActiveSupport::Base64.encode64(Marshal.dump(data))
  end
  
  def data
    Marshal.load(ActiveSupport::Base64.decode64(read_attribute :data))
  end
  
end

The actual caching logic is embeded in my application controller. I provide a simple cache method, that can be called with a block. The block contains the remote call that I want to cache and is only executed if there is no data stored for the given key or the stored data is expired:

  # app/controllers/application_controller.rb
  def cache(key, &to_cache)
    from_db = Storage.first(:conditions => {:key => key})
    if from_db.nil? || from_db.updated_at < Time.new - CACHE_TIME
      data = (yield to_cache).collect{|t|t}
      return [] if data.nil? || data.empty?
      from_db = (from_db || Storage.new)
      from_db.key = key
      from_db.data = data
      from_db.save!
    end
    instance_variable_set :"@#{key.to_s}", from_db.data
  end

Finally the data is pushed into an instance varaible, so that I have access to it within my views.

Caching is now as simple as this:

  # cache all twitter posts and make them accessible via @tweets
  cache(:tweets){Helper::twitter_posts}

This little tweak noteable improved the response time of my app:

  This week:
  Apdex Score: 0.700.5 (Fair)

  Last week:
  Apdex Score: 0.060.5 (Unacceptable)

Sugar on rails!