Writing your own DSL with Ruby

I am a big fan of Ruby. There are so many beautiful libraries out there and most of them are based on some kind of domain specific language. Take builder as an example:

  Builder::XmlMarkup.new.person { |b| b.name("Jim"); b.phone("555-1234") }
  #=> <person><name>Jim</name><phone>555-1234</phone></person>

Generating XML in this manner is pretty cool! There is no crazy XML editor in the world that gives you as much flexibility as this straight forward DSL. I think that a good DSL makes coding feel natural.

HOWTO DSL

If you did read Ruby best practicies or stuff like that, this article won’t bring anything new to you. In case you did not, you should start right away!

Anyhow, there are some simple rules behind writing a good DSL or API:

  • let the user choose how to use it
  • make options optional
  • make use of an option hash for defaults
  • make use of scoped blocks
  • make use of instance_eval
  • consider implementing method_missing

I am going to explain these rules for you, so that you can start writing your own cool DSL and help make programming Ruby even more fun.

the Regexp DSL

I think that regular expressions are a great tool and they are ubiquitous in Ruby. Regexps are a first class citizen in Ruby and so you get a lot of built-in power for free. There are also some decent Regexp guides on the web waiting for you:

Regular expressions are a framework of their own, embedded into lots of programming languages. The biggest problem that I have with the Regexp DSL is missing readability. Take a look at the monster that lives in the dark pit of URI::REGEXP::PATTERN:

/
        ([a-zA-Z][-+.a-zA-Z\d]*):                     (?# 1: scheme)
        (?:
           ((?:[-_.!~*'()a-zA-Z\d;?:@&=+$,]|%[a-fA-F\d]{2})(?:[-_.!~*'()a-zA-Z\d;\/?:@&=+$,\[\]]|%[a-fA-F\d]{2})*)              (?# 2: opaque)
        |
           (?:(?:
             \/\/(?:
                 (?:(?:((?:[-_.!~*'()a-zA-Z\d;:&=+$,]|%[a-fA-F\d]{2})*)@)?  (?# 3: userinfo)
                   (?:((?:(?:(?:[a-zA-Z\d](?:[-a-zA-Z\d]*[a-zA-Z\d])?)\.)*(?:[a-zA-Z](?:[-a-zA-Z\d]*[a-zA-Z\d])?)\.?|\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}|\[(?:(?:[a-fA-F\d]{1,4}:)*(?:[a-fA-F\d]{1,4}|\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})|(?:(?:[a-fA-F\d]{1,4}:)*[a-fA-F\d]{1,4})?::(?:(?:[a-fA-F\d]{1,4}:)*(?:[a-fA-F\d]{1,4}|\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}))?)\]))(?::(\d*))?))?(?# 4: host, 5: port)
               |
                 ((?:[-_.!~*'()a-zA-Z\d$,;:@&=+]|%[a-fA-F\d]{2})+)           (?# 6: registry)
               )
             |
             (?!\/\/))                              (?# XXX: '\/\/' is the mark for hostport)
             (\/(?:[-_.!~*'()a-zA-Z\d:@&=+$,]|%[a-fA-F\d]{2})*(?:;(?:[-_.!~*'()a-zA-Z\d:@&=+$,]|%[a-fA-F\d]{2})*)*(?:\/(?:[-_.!~*'()a-zA-Z\d:@&=+$,]|%[a-fA-F\d]{2})*(?:;(?:[-_.!~*'()a-zA-Z\d:@&=+$,]|%[a-fA-F\d]{2})*)*)*)?              (?# 7: path)
           )(?:\?((?:[-_.!~*'()a-zA-Z\d;\/?:@&=+$,\[\]]|%[a-fA-F\d]{2})*))?           (?# 8: query)
        )
        (?:\#((?:[-_.!~*'()a-zA-Z\d;\/?:@&=+$,\[\]]|%[a-fA-F\d]{2})*))?            (?# 9: fragment)
      /xn

You don’t need to write such complex expressions to get to the point where you don’t understand the Regexp you wrote just 2 minutes ago. That’s why Ruby provides an extended mode that allows you to insert spaces, newlines and comments in the pattern:

  def test_multiline_regex_with_comments
    assert_match(%r[a # the a
                    n # the n
                    ]x, 'anna')
  end

This is nothing that feels natural to me… So I started working on my own regular expression DSL.

lessons learned from Rebuil

It is always a good idea to write a DSL. There is no better way of becoming a domain expert than implementing a custom language for that specific domain. The other great thing is that you get in depth knowledge of the key Ruby features that help you creating crisp APIs and neat frameworks.

The vision that I have about using Regexps is a very descriptive one. There are probably some people that would call it verbose, but there is always a tradeoff:

  exp = rebuil.many.group('rebuil', :cool_name)
  puts "hello #{exp.match('hello world with rebuil')[:cool_name]} world!" 
  #=> hello rebuil world

This code basically creates the Regexp ’.*(rebuil)’. The cool thing is that you get named groups like in Ruby 1.9 for accessing them with descriptive symbols instead of an index.

let the user choose how to use it

The first advice for creating a DSL is important if you want to make your library sexy for other developers. Since most hackers have different coding styles, they prefer different types of expressions. A good example for this are the different ways one can create Rebuil objects:

  # standard approach
  Rebuil::Expression.new.group('a')

  # helper within object
  rebuil.group('a')

  # helper with a block
  rebuil do |exp|
    exp.group('a')
  end

  # helper with some starting pattern
  rebuil("").group('a')

As all implemented Rebuil methods return the object instance itself, one can chain method calls for convenience.

make options optional

The rebuil method can take a string as an argument for the start of the expression. As you can see in the example above, this argument is optional.

A well designed API does not force the user to provide arguments that are not mandatory.

make use of an option hash for defaults

If you end up with a lot of optional parameters in your method signature, you should consider using an optional parameter hash as the last method argument. Using carefully picked keys for the parameters gives the impression of named parameters wich improves readablity greatly:

  def some_method(man, da, tory, options={:some=>'defaults'})
    ...
  end

make use of scoped blocks

Providing scopes is another way to improve the readability of your code. Ruby makes it easy to do scope based programming with blocks:

  # no scope
  exp = rebuil
  exp.group('a')
  exp.characters('a')
  exp.many

  # better with a block
  rebuil do |exp|
    exp.group('a')
    exp.characters('a')
    exp.many
  end

This can be achieved by just yielding a Rebuil::Expression if a block is given.

make use of instance_eval

Blocks can also be used to reduce the code you will have to write. The scoped block example above can be rewritten with less code:

  # smooth with instance_eval
  rebuil do
    group('a')
    characters('a')
    many
  end

Evaluating the block in the context of the current object is as simple as passing it on to Rubies instance_eval. The only drawback is that you can’t access stuff from your current scope like member variables. But there is a way to make both solutions work, just ask the block for the number of arguments:

  def rebuil(expression="", &block)
    ...
    # block.arity returns the number of expected arguments of the block
    block.arity < 1 ? re.instance_eval(&block) : block.call(re) if block_given?
    ...
  end

consider implementing method_missing

Highly dynamic DSLs like Builder follow a different approach. They implement method_missing. The great advantage is that the user can posibly call anything on your object. The domain logic lies within your implementation of method_missing, which passes the name of the originally called method, the parameters and an optional block. Since regular expressions are deterministic, this dynamic approach does not suite Rebuil well.

With great power comes great responsibility. Always try to find the most appropriate implementation for your specific domain.

And don’t forget the sugar!

Buildr – The build system that doesn’t suck

If you are a Java guy like me, you are probably doing a lot of ugly tasks in your everday work. One of these tasks is build management.

The Java community tries to tackle builds with standard tools like Ant and “all-in-one” solutions like Maven. Both are based on huge XML configuration files that are hard to read, painful to maintain and impossible to test. Even though these tools are mainstream, they are quite buggy. Just have a look at all the rants about maven you find on the net.

Java builds are known to become very complex over time. You have to compile things found here, include stuff from there, resolve dependencies in a remote repository, add something to the version control, […even more crazy stuff…]. Addressing such complex and dynamic tasks with static structures is just a stupid idea. What you really want to do is scripting!

scripting builds

Using scripting languages is a natural way to address build tasks. If you are close to the operating system you can do most things easily. Trust your admin, the shell is your friend!

The simplest way to get scripting power is to use bash scripts, which has the drawback that it’s not interoperable and you will always have to show consideration for WINDOW$ users…

There are portable frameworks that try to integrate the flexibilty of scripting languages like Ruby with the power of Ant or Maven. Have a look at Gradle, GMaven, Sbt, Raven, Gant or Antwrap as examples.
About one year ago I stumbled over another build tool that claims the slogan The build system that doesn’t suck which I found quite promising!

dynamic builds with Buildr

Buildr is an expressive Ruby DSL for managing builds based on Rake and Ant. Defining a Java project that follows the default Apache directory structure is dead simple:

# buildfile.rb
desc 'define a project called simple'
define 'simple' do
  info 'create a simple.jar from this project'
  package :jar
end

After installing the Buildr Gem you can call built-in or custom Buildr tasks on your project. This is done in the same way Rake would do it:

# install the Buildr gem
gem install buildr

# check buildr version
buildr -V
=> Buildr 1.3.5 (JRuby 1.4.0)

# build your Java project: compile, test, package
buildr package

Since Buildr is based on sensible defaults, it knows where to look for your class files, resources and tests. Before creating the jar, it compiles everything, executes the tests and then zips up the bytecode into a distributable.

Buildr does not support every build feature by default, but it’s easy to extend. Buildr at its core is just a bunch of tasks, chained in a predefined order. You can easily write your own tasks and plug them into the build to fit your needs. Buildr provides method hooks that allow you to integrate your tasks into the build cycle.

If this small example makes you say “Awesome!”, you should have a look at the best documentation of an open source framework I have ever seen. There are some sophisticated examples of how to use Buildr hosted on github as well.

Build, don’t goof up XML!

(X)Ruby on the Mac

=================================================================== ===================================================================

Update Aug. 2010

The RVM installer should be prefered over installation via gem:

bash < <( curl http://rvm.beginrescueend.com/releases/rvm-install-head )
=================================================================== ===================================================================

If you are a Mac user and a Ruby developer you probably ran into issues with custom Ruby installations and had troubles with Gems like these:

Using MacPorts

There are a lot of people that recommend using MacPorts for installing a custom Ruby version, but there are a lot of issues with this approach too. If you did manage installing the Ruby version of choice, you might run into issues using TextMate or other tools, that depend on the default OS X Ruby (I think that I had ALL the problems one could have, but this might be an exaggeration).

Custom (X)Ruby Versions

The next problem arises if you want to use different versions of Ruby or even different implementations/runtimes (EVIL666) like JRuby, MacRuby or Rubinius.

I am currently using:

  • Ruby 1.8.6 for integration with Heroku
  • Ruby 1.8.7 for daily usage
  • Ruby 1.9.1 for a fast Ruby experience
  • MacRuby for playing around and avoid looking at verbose Objective-C code
  • JRuby for Buildr

There are some simple approaches to fix this mess. MacPorts and JRuby provide different binaries to run like ruby18, ruby19 or jgem, but this is error prone and confusing. I was always installing Gems to the wrong Ruby environment and there were always conflicts with scripts and tools looking for the actual Ruby binary.

Since this did not work out quite well I started to wire the path tightly in the .profile file to include just the right Ruby and Rubygems binary. This had the drawback that I had to close terminals for every switch. So I started writing bash scripts manipulating the path directly, which worked well, but was unconvenient.

Most of the Ruby versions are not stable. Ruby 1.9 is under development, JRuby has frequent compatibility and performance releases and MacRuby is kind of an an “Alpha”.
As far as I am concerned, I always want to use the latest stable release of these distributions! Using MacPorts you often just get some release, so you have to compile stuff yourself. Doing so is time consuming and painful…

RVM to the rescue

Some weeks ago I stumbled over RVM, which can be installed as a Gem to the standard preinstalled OS X Ruby. RVM provides some neat features and does exactly what I want. It provides a simple interface to manage Ruby distributions and does some clever stuff organizing local Gems.

Right now I am using the preinstalled OS X Snow Leopard Ruby distribution with RVM and no ports:

# check that you are running default Ruby
ruby -v
=> ruby 1.8.7 (2008-08-11 patchlevel 72) [universal-darwin10.0]

# update your OS X Rubygems
sudo gem update --system

# check Gem version
gem -v
=> 1.3.5

# install RVM as a Gem to the defaut OS X Ruby
sudo gem install rvm

# run RVM installation
rvm-install
# follow the installation instructions (add RVM to the $PATH etc)

# open new terminal and check RVM version
rvm --version
=> rvm 0.1.0 by Wayne E. Seguin ([email protected]) [http://rvm.beginrescueend.com/]

# have a look at the manual
rvm --help

# install a (X)Ruby version
rvm install jruby

# see what is installed
rvm list

# see some more information
rvm info

# use a (X)Ruby version
rvm use jruby

# install a gem for the current (X)Ruby
gem install buildr

# install a gem for all Rubies under RVM control
rvm gem install -v=0.6.9 savon

# flip back to default Ruby
rvm system

Cut Rubies with ease…

Savon Handsoap Shootout

This documentation is deprecated, please have a look at savonrb.com!

Looking into The Ruby Toolbox there are currently two popular SOAP client libraries available. In this short article I am going to crunch the candidates Savon, which is currently the most “popular” library, and Handsoap which follows short after. Both are open source projects hosted on github.

Scope

This article won’t cover all facets of both libraries. I concentrate on the features that are relevant for integrating a Ruby SOAP client into our particular SOA platform. That platform is commonly based on Java SOAP services based on frameworks such as CXF and Axis providing interfaces to internal business logic.

Since this is not a complete feature list, it should show you at least how to work with the APIs and which client might be the best choice for yourself.

Requirements

Having lots of Java Guys around here, there is no great focus on things like beautiful API design or Ruby magic, the client should just work! Living in a Java environment, the SOAP client has to integrate smoothly with JRuby. Since a lot of Ruby libraries lack support for JRuby, we always have to monkey patch a lot of code to make it run on the JRE.

Examples

We refer to free, public SOAP services, so everyone can run the examples by themselves.

All the examples seen here can be cloned/downloaded from github.

Chapters

Have fun!

Savon vs. Handsoap: Calling a service

This documentation is deprecated, please have a look at savonrb.com!

The two libraries have different approaches on how to get things done. While Handsoap is using an oldschool inheritance style definition:

class HandsoapBankCode < Handsoap::Service
  
  endpoint :uri => "some_wsdl", :version => 2

  def on_create_document(doc)
    doc.alias "tns", "some_namespace"
  end

  def on_response_document(doc)
    doc.add_namespace "ns1", "some_namespace"
  end
  [...]
end

Savon clients are just a kind of wrapper or proxy around a WSDL:

client = Savon::Client.new "some_wsdl"

While inheritance is a base concept of object oriented programming, it’s usually better to use delegation instead. For not being stuck on the API of the Handsoap::Service class, one would wrap things up into some other class or module, creating more code than necessary.

The proxy style client of Savon is less code and provides a flexible API, especially looking at SOAP calls.

Using rspec to demonstrate the expected behavior of the clients results in two identical spec for getting a zip code of a concrete client implementation:

describe "Savon" do
  it "should return the corrent zip code for a given bank" do
    zip_code = Shootout::SavonBankCode.zip_code @bank_code
    zip_code.should eql @zip_code
  end
end

describe "Handsoap" do
  it "should return the corrent zip code for a given bank" do
    zip_code = Shootout::HandsoapBankCode.zip_code @bank_code
    zip_code.should eql @zip_code
  end
end

Compared to the spec, the code of the two implementations differs a great deal. The task at hand is to call the getBank method of the SOAP endpoint providing a blz (bank code) parameter and extracting the plz (zip code) value of the response.

Using the Handsoap client class defined above, sending the “invoke()” message to the Handsoap::Service will do the job:

def zip_code(bank_code)
  response = invoke("tns:getBank") do |message|
    message.add "tns:blz", bank_code
  end
  (response/"//ns1:details/ns1:plz").first.to_s
end

The bank code parameter is assigned in the block, which yields a SOAP message object. The resulting XML document is wrapped and can be accessed using some predefined XML library. Handsoap enables you to choose between different types of XML parsers like REXML, ruby-libxml or nokogiri.

Savon’s proxy client on the other hand is dynamic and can be accessed directly with the name of the SOAP method and a block:

class SavonBankCode
  def self.zip_code(bank_code)
    client = Savon::Client.new Shootout.endpoints[:bank_code][:uri]
    response = client.get_bank { |soap| soap.body = { "wsdl:blz" => bank_code } }
    response.to_hash[:get_bank_response][:details][:plz]
  end
end

The block yields a SOAP request object for setting the payload or tweaking defaults like the SOAP header. Converting the response to a hash is a convenient way to access the desired result. The conversion is done using crack.