Mongo Ruby Driver, Mongoid and MongoMapper

================================================================= =================================================================

Update Aug. 2010

On Whyday, I created a live demo of the examples, that is running on Heroku.

=============================================================== ===============================================================

I am constantly looking around for different storage mechanisms on Heroku that can be used for caching 3rd party data. A recent update of their platform offered an MongoDB addon to access the MongoHQ service that drew my attention, so I started to evaluate this noSQL document database…

MongoDB on OS X

It’s always a good starting point to have a local installation of a technology, here is how you get it running on your Mac with Homebrew:

brew install mongodb
# create a place for MongoDB to store the data
mkdir -p /data/db
# run server with default config (adapt to the right version)
mongod run --config /usr/local/Cellar/mongodb/1.4.4-x86_64/mongod.conf

Using MongoHQ requires a user-authentication, so it’s nice to have the same credentials on your local MongoDB instance:

# start the client
mongo
> use test
> db.addUser("test", "test")

evaluating different APIs

A very basic approach, that basically wraps the MongoDB API into Ruby code is the Mongo Ruby Driver, but there are two higher level APIs close to ActiveRecord called Mongoid and MongoMapper.

Mongo Ruby Driver

It’s pretty easy to connect to your MongoDB with the right connection string:

conn = Mongo::Connection.from_uri("mongodb://user:pass@host:port/db")
db = conn.db("db")

The Mongo Ruby Driver is very simple and close to the MongoDB API:

coll = db.collection('test')
coll.insert('a' => 1)
coll.find().each { |row| p row }

MongoMapper

MongoMapper can also be accessed with a connection string:

Mongo::Connection.from_uri(MONGO_URL)

Instead of using ActiveRecord::Base MongoMapper provides the MongoMapper::Document module to handle the object document mapping. Since the structure of a document in MongoDB is open and not static like in a SQL database, you have to define the structure in code, so MongoMapper knows how to map the document to your Ruby objects:

class Person
  include MongoMapper::Document

  key :name, String
  key :age, Integer
  key :born_at, Time
  key :active, Boolean
  key :fav_colors, Array

  connection Mongo::Connection.from_uri(MONGO_URL)
  set_database_name 'basement'
end

person = Person.create({
  :name => 'Nunemaker',
  :age => 27,
  :born_at => Time.mktime(1981, 11, 25, 2, 30),
  :active => true,
  :fav_colors => %w(red green blue)
})

person.save

Person.all.each do |p|
  ...
end

Mongoid

Configuring Mongoid is somewhat different but easy:

Mongoid.database = Mongo::Connection.new(host, port).db(db)
Mongoid.database.authenticate(user, pass)

The DSL for defining Mongoid Documents is similar to MongoMapper and works mostly the same way. Querying the database is also similar to the API provided by ActiveRecord:

class Tweeter 
  include Mongoid::Document 
  field :user 
  embeds_many :tweets 
end 

class Tweet 
  include Mongoid::Document 
  field :status, :type => String 

  embedded_in :tweeter, :inverse_of => :tweets 
end

tweet = Tweet.new(:status => "This is a tweet!") 
tweet.tweeter = Tweeter.new(:user => 'ted') 
tweet.save

Tweeter.all.each do |tweeter| 
  ...
end

You can get the complete code and some more links from the GitHub project created for testing.

MongoDB is a great way to store document focused data and it’s simple to use with these great libraries!

ASIN vs ruby-aaws

I recently wrote about using ruby-aaws on Heroku. I used it for creating a virtual bookshelf on my website, so anybody interested in what I read can have a look at the ISBN, price, description and some reviews (in german). Since this is a trivial scenario it covers only a fragment of features that ruby-aaws offers.

I always felt that using ruby-aaws was way too complicated! This is how you call Amazon for the title of a book:

require "amazon"
require "amazon/aws"
require "amazon/aws/search"
il = Amazon::AWS::ItemLookup.new('ASIN', { 'ItemId'=>asin })
rg = Amazon::AWS::ResponseGroup.new('Medium')
req = Amazon::AWS::Search::Request.new
resp = req.search(il, rg)
puts resp.item_lookup_response.items[0].item.item_attributes.title.to_s

I also had to monkeypatch some stuff to get it working with Heroku the first time:

  • allow .amazonrc to be on a different location that can be used on Heroku
  • remove restriction to Ruby 1.8.7 and patch related Stuff

If you look into the source and documentation of ruby-aaws you will see that it is no fun to patch anything in there… I think I would not have done it without the help of Ian Macdonald.

Another thing was, that I could not use the builtin caching facility of ruby-aaws, cause it simply does not work on Heroku’s readonly file-system.

simplicity with ASIN

Given these restrictions, I decided to build a minimum featureset gem tailored for my requirements:

  • provide access to the Amazon-E-Commerce-API via REST
  • simple configuration points
  • minimum amount of code to write for a request
  • maximum flexibility

If you have a look into the Amazon documentation you see that it is quite easy to call the API via REST. Just append some query parameters to your desired endpoint (f.e. webservices.amazon.com) and as a result you get the desired information from Amazon. The tricky thing is, that since recently you have to sign your request with your AWS credentials. I did not find any specs on how to do that on the documentation, but Cloud Carpenters had a nice example using Python that I adapted for Ruby.
There is also the nice Amazon API signing service that frees you from self signing your requests. The reason I did not use it, is that it supports the amazon.com endpoint only (I need amazon.de).

requests with ASIN

Using ASIN is simple. You just have to provide your credentials to the configuration method, the rest is covered with sensible defaults that you can override if you wish:

require 'asin'
include ASIN

# use the configure method to setup your api credentials
configure :secret => 'your-secret', :key => 'your-key'

# you can override the api endpoint if you wish
configure :secret => 'your-secret', :key => 'your-key', :host => 'webservices.amazon.de'

After this setup you can call the REST api via the lookup method:

# lookup an item with the amazon standard identification number (asin)
item = lookup '1430218150'

# have a look at the title of the item
item.title
=> Learn Objective-C on the Mac (Learn Series)

# provide additional configuration options like the response group
lookup(asin, :ResponseGroup => :Medium)

Title is currently the only attribute that is directly supported from the Item class, but this is no restriction. ASIN uses Hashie::Mash for the internal data representation of the Amazon REST XML response. The Item class stores the response in a raw attribute that can be accessed for read:

# access the internal data representation (Hashie::Mash)
item.raw.ItemAttributes.ListPrice.FormattedPrice
=> $39.99

You can tailor the Item class to your needs by opening up the class and provide the methods you like or doing something entirely different with the raw attribute.

OR, just fork me on GitHub!

Maximum flexibility with some syntactic sugar!

Distinguish Ruby Runtimes with WhichRuby

Nowadays there are several decent Ruby runtimes available besides MRI ranging from alpha-versions to production-ready status. Using RVM these different interpreters become more and more interchangeable.

current problems

Since switching between runtimes became as easy rvm use x more care has to be taken to support a wide range of interpreters and versions. This is especially true for shared code like gems.

Some engines like JRuby have limitations that prevent the usage of some Ruby features. In most cases it’s possible to work around these limitations and provide a different solution that works, but might be somewhat less performant.

checking runtimes

Ruby is great at introspection, but especially 1.8 misses some key information like RUBY_ENGINE to determine the current interpreter at runtime and one has to extract it from the RUBY_DESCRIPTION constant.

WhichRuby aims at simplifying this tedious task and providing a simple API:

# irb@jruby
jruby-1.4.0 > require 'which_ruby'
 => true 
jruby-1.4.0 > include WhichRuby
 => Object 
jruby-1.4.0 > jruby?
 => true

Executing different code fragments becomes as easy as defining a scope:

ruby_scope(:jruby) do
  # custom jruby code here
end

This comes in very handy for stuff like accessing Java code natively via JRuby instead of using RJB.

I don’t like Rubbae - I love it!

Writing your own DSL with Ruby

I am a big fan of Ruby. There are so many beautiful libraries out there and most of them are based on some kind of domain specific language. Take builder as an example:

  Builder::XmlMarkup.new.person { |b| b.name("Jim"); b.phone("555-1234") }
  #=> <person><name>Jim</name><phone>555-1234</phone></person>

Generating XML in this manner is pretty cool! There is no crazy XML editor in the world that gives you as much flexibility as this straight forward DSL. I think that a good DSL makes coding feel natural.

HOWTO DSL

If you did read Ruby best practicies or stuff like that, this article won’t bring anything new to you. In case you did not, you should start right away!

Anyhow, there are some simple rules behind writing a good DSL or API:

  • let the user choose how to use it
  • make options optional
  • make use of an option hash for defaults
  • make use of scoped blocks
  • make use of instance_eval
  • consider implementing method_missing

I am going to explain these rules for you, so that you can start writing your own cool DSL and help make programming Ruby even more fun.

the Regexp DSL

I think that regular expressions are a great tool and they are ubiquitous in Ruby. Regexps are a first class citizen in Ruby and so you get a lot of built-in power for free. There are also some decent Regexp guides on the web waiting for you:

Regular expressions are a framework of their own, embedded into lots of programming languages. The biggest problem that I have with the Regexp DSL is missing readability. Take a look at the monster that lives in the dark pit of URI::REGEXP::PATTERN:

/
        ([a-zA-Z][-+.a-zA-Z\d]*):                     (?# 1: scheme)
        (?:
           ((?:[-_.!~*'()a-zA-Z\d;?:@&=+$,]|%[a-fA-F\d]{2})(?:[-_.!~*'()a-zA-Z\d;\/?:@&=+$,\[\]]|%[a-fA-F\d]{2})*)              (?# 2: opaque)
        |
           (?:(?:
             \/\/(?:
                 (?:(?:((?:[-_.!~*'()a-zA-Z\d;:&=+$,]|%[a-fA-F\d]{2})*)@)?  (?# 3: userinfo)
                   (?:((?:(?:(?:[a-zA-Z\d](?:[-a-zA-Z\d]*[a-zA-Z\d])?)\.)*(?:[a-zA-Z](?:[-a-zA-Z\d]*[a-zA-Z\d])?)\.?|\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}|\[(?:(?:[a-fA-F\d]{1,4}:)*(?:[a-fA-F\d]{1,4}|\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})|(?:(?:[a-fA-F\d]{1,4}:)*[a-fA-F\d]{1,4})?::(?:(?:[a-fA-F\d]{1,4}:)*(?:[a-fA-F\d]{1,4}|\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}))?)\]))(?::(\d*))?))?(?# 4: host, 5: port)
               |
                 ((?:[-_.!~*'()a-zA-Z\d$,;:@&=+]|%[a-fA-F\d]{2})+)           (?# 6: registry)
               )
             |
             (?!\/\/))                              (?# XXX: '\/\/' is the mark for hostport)
             (\/(?:[-_.!~*'()a-zA-Z\d:@&=+$,]|%[a-fA-F\d]{2})*(?:;(?:[-_.!~*'()a-zA-Z\d:@&=+$,]|%[a-fA-F\d]{2})*)*(?:\/(?:[-_.!~*'()a-zA-Z\d:@&=+$,]|%[a-fA-F\d]{2})*(?:;(?:[-_.!~*'()a-zA-Z\d:@&=+$,]|%[a-fA-F\d]{2})*)*)*)?              (?# 7: path)
           )(?:\?((?:[-_.!~*'()a-zA-Z\d;\/?:@&=+$,\[\]]|%[a-fA-F\d]{2})*))?           (?# 8: query)
        )
        (?:\#((?:[-_.!~*'()a-zA-Z\d;\/?:@&=+$,\[\]]|%[a-fA-F\d]{2})*))?            (?# 9: fragment)
      /xn

You don’t need to write such complex expressions to get to the point where you don’t understand the Regexp you wrote just 2 minutes ago. That’s why Ruby provides an extended mode that allows you to insert spaces, newlines and comments in the pattern:

  def test_multiline_regex_with_comments
    assert_match(%r[a # the a
                    n # the n
                    ]x, 'anna')
  end

This is nothing that feels natural to me… So I started working on my own regular expression DSL.

lessons learned from Rebuil

It is always a good idea to write a DSL. There is no better way of becoming a domain expert than implementing a custom language for that specific domain. The other great thing is that you get in depth knowledge of the key Ruby features that help you creating crisp APIs and neat frameworks.

The vision that I have about using Regexps is a very descriptive one. There are probably some people that would call it verbose, but there is always a tradeoff:

  exp = rebuil.many.group('rebuil', :cool_name)
  puts "hello #{exp.match('hello world with rebuil')[:cool_name]} world!" 
  #=> hello rebuil world

This code basically creates the Regexp ’.*(rebuil)’. The cool thing is that you get named groups like in Ruby 1.9 for accessing them with descriptive symbols instead of an index.

let the user choose how to use it

The first advice for creating a DSL is important if you want to make your library sexy for other developers. Since most hackers have different coding styles, they prefer different types of expressions. A good example for this are the different ways one can create Rebuil objects:

  # standard approach
  Rebuil::Expression.new.group('a')

  # helper within object
  rebuil.group('a')

  # helper with a block
  rebuil do |exp|
    exp.group('a')
  end

  # helper with some starting pattern
  rebuil("").group('a')

As all implemented Rebuil methods return the object instance itself, one can chain method calls for convenience.

make options optional

The rebuil method can take a string as an argument for the start of the expression. As you can see in the example above, this argument is optional.

A well designed API does not force the user to provide arguments that are not mandatory.

make use of an option hash for defaults

If you end up with a lot of optional parameters in your method signature, you should consider using an optional parameter hash as the last method argument. Using carefully picked keys for the parameters gives the impression of named parameters wich improves readablity greatly:

  def some_method(man, da, tory, options={:some=>'defaults'})
    ...
  end

make use of scoped blocks

Providing scopes is another way to improve the readability of your code. Ruby makes it easy to do scope based programming with blocks:

  # no scope
  exp = rebuil
  exp.group('a')
  exp.characters('a')
  exp.many

  # better with a block
  rebuil do |exp|
    exp.group('a')
    exp.characters('a')
    exp.many
  end

This can be achieved by just yielding a Rebuil::Expression if a block is given.

make use of instance_eval

Blocks can also be used to reduce the code you will have to write. The scoped block example above can be rewritten with less code:

  # smooth with instance_eval
  rebuil do
    group('a')
    characters('a')
    many
  end

Evaluating the block in the context of the current object is as simple as passing it on to Rubies instance_eval. The only drawback is that you can’t access stuff from your current scope like member variables. But there is a way to make both solutions work, just ask the block for the number of arguments:

  def rebuil(expression="", &block)
    ...
    # block.arity returns the number of expected arguments of the block
    block.arity < 1 ? re.instance_eval(&block) : block.call(re) if block_given?
    ...
  end

consider implementing method_missing

Highly dynamic DSLs like Builder follow a different approach. They implement method_missing. The great advantage is that the user can posibly call anything on your object. The domain logic lies within your implementation of method_missing, which passes the name of the originally called method, the parameters and an optional block. Since regular expressions are deterministic, this dynamic approach does not suite Rebuil well.

With great power comes great responsibility. Always try to find the most appropriate implementation for your specific domain.

And don’t forget the sugar!