Generating PDF from HTML using DocRaptor on Heroku

There comes a time one has to create PDFs for a Rails application. Searching the web will most likely bring you to libraries like PDF Kit and Wicket PDF that use wkhtmltopdf as a driver.

If your app is hosted on Heroku you wonder weather wkhtmltopdf is available so that you can use one of these awesome libraries. Searching the Heroku docs, you will probably come to same conclusion: nothing on there!

As the Heroku support states:
You’re correct that there’s no official documentation. Most of our customers seem to be using wkhtmltopdf, generally with pdfkit. I do hope to document this usage soon.

http://github.com/jdpace/PDFKit
http://code.google.com/p/wkhtmltopdf/

If you want us to offer a Prince add-on, I encourage you to write them inviting them to check out our Add-on Provider Programme: http://addons.heroku.com/provider

Of course, you can always purchase your own license and run it on EC2 or another server of your choosing.

PrinceXML

One of the libraries that were used at my last job was PrinceXML, which did a great jog generating PDFs from HTML pages. It supports most of the HTML and CSS stuff, passing the ACID2 test. PrinceXML has some additional CSS attributes that enable you to configure additional PDF specific layout settings.

DocRaptor

Since PrinceXML is a commercial product, Heroku won’t support it and I did not find anything on the web, that would offer PDF generation as a service. Asking the PrinceXML Forum I found out about DocRaptor. These guys provide a service to convert HTML to XLS or PDF over a webservice interface, extactly what I was looking for. As an additional bonus, they just implemented a gem for supporting the Heroku Add-on interface. Mail to Expected Behavior Support if you want to participate in the private beta.

Improvements

DocRaptor offers a great service, but they are still in early development. There were some Issues that were resolved recently. If you are a user of PrinceXML you probably know the —baseurl option that allows usage of relative paths for images and stylesheets. DocRaptor adds support for command line options just yet. The feature for generating a PDF from a given URL is even better!

Using DocRaptor from Rails

The latest DocRaptor documentation for the Heroku Add-on is decent and it provides some nice examples.

PDF from raw HTML

Here is what I did to get it running on my Rails 3 project:

# Gemfile
gem "doc_raptor", "0.1.1"

# mime_types.rb
Mime::Type.register_alias "application/pdf", :pdf

# your_controller.rb
def your_pdf_action
  respond_to do |format|
    format.pdf do
      data = DocRaptor.create(:name => 'DocRaptor.pdf', :document_content => render_to_string, :document_type => "pdf", :prince_options => {:baseurl => 'http://nofail.de'})
      send_data data, :type => 'application/pdf', :filename => 'DocRaptor.pdf'
    end
  end
end

If you registered the pdf mime-type you will have to provide an additional layout for this. I added some PrinceXML specific parameters to the styles to make it a fullscreen PDF. One thing that is essential for making the stylesheets work is the media => ‘screen, print’ settings:

// application.pdf.haml
= stylesheet_link_tag 'style', :media => 'screen, print'
// you can provide a base tag for images and stylesheets
// %base{:url=>'http://blog.nofail.de'}

%style
  @page { size: A4 }
  @page { margin: 0px }
  @page { border: none }
  @page { padding: 0px }
  @page { prince-shrink-to-fit: auto }

PDFs from an URL

The simplest solution for generating a PDF is to send an url to the service, so you can re-use all your view logic:

data = DocRaptor.create(:name => "DocRaptor.pdf", :document_url => "http://blog.nofail.de", :document_type => "pdf")
send_data data, :type => 'application/pdf', :filename => "DocRaptor.pdf"

One caveat though, you got to have at least two dynos to serve the additional request from DocRaptor!

See a working example on my homepage.

Generating PDF form HTML without the hassle, thanks to DocRaptor!

Rails, getting started without the hassle

I just changed jobs and am now a Rails developer at tolingo.com, which is an online translation broker. When I started out working on my new desk, I had to setup my iMac development environment. There are tons of articles of how to compile/install/run stuff like MySQL, to get you started on OS X, but I think all one really needs is Homebrew and RVM.

Homebrew

Homebrew is a Ruby based packaging tool for Mac and once you start using it, you immediately hate yourself for having wasted time on MacPorts

“Homebrew is the easiest and most flexible way to install the UNIX tools Apple didn’t include with OS X.”

This quote is from the official website and I guess they are absolutely right!

Formula

Homebrew is build around formulas. They describe how a package should be loaded from the web and installed on your system. It also cares about package dependencies, paths and all the other ugly stuff:

require 'formula'

class Wget < Formula
  homepage 'http://www.gnu.org/wget/'
  url 'http://ftp.gnu.org/wget-1.12.tar.gz'
  md5 '308a5476fc096a8a525d07279a6f6aa3'

  def install
    system "./configure --prefix=#{prefix}"
    system 'make install'
  end
end

You can easily install packages from the shell with brew:

brew install wget

Homebrew puts all the packages into ‘/usr/local’, so that it won’t interfer with other components of your system. To get your packages working, you need to include it into your $PATH. If you have any problems running something, Homebrew comes with the doctor command, that scans for problems in your setup!

Installation

Just download Homebrew to your system and update once a while:

# install homebrew via curl
sudo mkdir -p /usr/local && sudo chown -R $USER /usr/local && curl -Lsf http://bit.ly/9H4NXH | tar xvz -C/usr/local --strip 1

# update homebrew
brew update

Git, MySQL, Sphinx and more

What else do you need? Just search for it or get more infos with info!

These are the packages that I needed for development:

# install mysql and set it up
brew install mysql
mysql_install_db
# add mysqld as launch agent
cp /usr/local/Cellar/mysql/#{MYSQL_VERSION}/com.mysql.mysqld.plist ~/Library/LaunchAgents
launchctl load -w ~/Library/LaunchAgents/com.mysql.mysqld.plist

# install git
brew install git git-flow

# add git bash completion (find path to your git with 'brew info git')
ln -s /usr/local/Cellar/git/#{GIT_VERSION}/etc/bash_completion.d/git-completion.bash ~/.git-completion.bash
source .git-completion.bash

# install sphinx search-deamon
brew install sphinx

# aspell with all spellings
brew install aspell --all

# libxml and imagemagick for sprites
brew install libxml2 imagemagick

RVM the Ruby Version Manager

RVM is a command line tool for managing your local Ruby environments, you can get some more information on the RVM homepage and in earlier articles.

Quick start with installing RVM to your machine:

# install rvm via curl !!! FOLLOW RVM INSTRUCTIONS !!!
bash < <( curl http://rvm.beginrescueend.com/releases/rvm-install-head )

# download and compile latest 1.8.7
rvm install 1.8.7

# create a .rvmrc file in your app's base directory
echo "rvm use [email protected]#{YOUR_APP} --create" > #{YOUR_APP}/.rvmrc
# execute it by cd-ing to your app's directory
cd #{YOUR_APP}

Now you can work on your app with a custom gem environment. Unless you are using Bundler, this is probably what you want for installing and removing gems painlessly.

Cucumber with Celerity

Behavior driven development with Cucumber works nicely with Celerity, a JRuby implementation of a headless browser using HtmlUnit and it’s companion a Ruby wrapper called Culerity. Culerity has recently been updated with some configuration points for registering your local JRuby environment:

# jruby config für culerity (from http://rvm.beginrescueend.com/integration/culerity/)
rvm install jruby
rvm use [email protected] --create
gem install celerity
rvm wrapper [email protected] celerity jruby
# add to .profile
export JRUBY_INVOCATION="$(readlink "$(which celerity_jruby)")"

If you are experiencing any weird Broken Pipe errors (like me), have a look at this issue.

This is just an example of how you can setup your Rails development environment. Comments on this topic are appreciated!

DZone API and iPhone app

As I already mentioned, I am currently getting my hands dirty with Objective-C and iPhone application development.

The biggest problem with getting started was that I had no idea what application I could write for that device that could become somewhat usable. As I am a passionate tech reader, I consume a lot of articles posted on DZone. Usually I use a feed reader like NetNewsWire for that, which works very well for my MacBook but is nearly useless on the iPhone, because the DZone site is not very mobile friendly…

Problems

Since there was no DZone iPhone application on the marked I started working on it. Parsing DZone feeds was easy, even though the buildin XML support on iOS sucks. There were some nice libraries that made my life easier.

No deeplink

The DZone RSS feed does not provide a deeplink to the actual linked article, so one would still land on the DZone page… Since DZone does not provide an API currently, I started working on my own Rails application hosted on Heroku. Spidering the RSS, calling the page and extracting the link to the article is fragile, but it works (currently).

No voting

One of my goals was to let the iPhone user vote for the article while reading it. The lack of an API forced me to do some more fragile login and posting stuff to the DZone page, but it works too (currently)!

You can read more about the API I created on the actual page.

iPhone app

The first version of the “dzone mobile” app has passed the iTunes store review process and is available through the app store. A version with some minor bugfixes is currently beeing reviewed. Have a look at updates and documentation here or here.

voting

You have got to provide your DZone login credentials if you want to use the voting feature. Go to the iPhone Settings > DZone and add your username and password. I want you to know that there is NO SSL, so your credentials will be submitted UNSECURE!

more Features

If you are interested in pushing this further, you can add bug reports or feature requests on GitHub.

screenshots

DZone iPhone sugar!

Using the Redis addon on Heroku

I am always playing around with new addons offered by Heroku. My latest discovery was the Redis addon that is provided by Redistogo. The addon is probably in private beta (docs are still on beta), but since they put up a link to it on their site, I managed to install it to my personal website that runs in the cloud.

Redis is “an advanced key-value store” and has some features that make it a perfect match for a cache! I use caching extensively on my site and keep on trying out new ways to do it to circumvent Heroku’s readonly filesystem.

Like Memcache, Redis provides the ability to set a time to live (ttl) on a key. This comes in handy, if you have data that expires in a short period of time, like 3rd party data from Twitter etc.

Caching with Redis

Accessing Redis is very simple, since it is a text based protocol. The command reference is straight forward and there is a simple Ruby wrapper available:

require "redis"
redis = Redis.new
redis.set "foo", "bar"
# => "OK"
redis.get "foo"
# => "bar"

The redis-store gem already provides a Rails 3 compatible Cache Store implementation, but I needed some more configuration points, especially the ttl.

That’s why I wrote my own Rails 3 Redis Cache, also a great way to get used to the way of working with Redis and the Redistogo addon.

Using Rails Redis Cache

There is some configuration needed for Rails to pick up the new cache store. If you want to use different or no caching for test, development and production, you should put the config in your environment files:

# config/environemnts/production.rb
config.action_controller.perform_caching = true
config.cache_store = ActiveSupport::Cache::RailsRedisCache.new(:url => ENV['REDISTOGO_URL'])

If there is a Redis server available in all environments, you can put it in your environment file:

# config/environment.rb
ActionController::Base.cache_store = ActiveSupport::Cache::RailsRedisCache.new(:url => ENV['REDISTOGO_URL'])

The caching parts are mostly in my controllers:

@tweets = cache("tweets", :expires_in => 30.seconds){ Twitter::Search.new(...) }

The store is using the basic Rails cache store implementation which is broken in the Rails 3.0.0.beta1 version that runs on Heroku, so I added a monkey-patch for that using edge Rails.

Redis on localhost

Installing and running Redis on Mac OS X is really simple:

brew install redis
redis-server

There is also a commandline client available for direct access:

redis-cli
redis> set "foo" "bar"
OK
redis> get "foo"
"bar"

It’s key value stores, stupid!

Migrating to Rails 3 for Heroku Bamboo

Recently there were some interesting updates to the Heroku infrastructure, giving the opportunity to migrate my personal Rails 2 website to Rails 3.

Having an app with only a single model for caching data, there is no worry about database migration. A nice opportunity for starting out new:

rvm use 1.9.1
gem install rails --pre
rails basement-rails3
cd basement-rails3
heroku create basement-rails3 --stack bamboo-mri-1.9.1

business as usual?

Not really… Having Yehuda Katz as a core developer of Rails 3, it’s no surprise they adopted the Merb approach of just using one executable for everything. So the ‘script’ folder now contains just a ‘rails’ script. Creating controllers, running the server, jumping into the console - all through the ‘rails’ command:

rails -h
=> [...]
=>  generate    Generate new code (short-cut alias: "g")
=>  console     Start the Rails console (short-cut alias: "c")
=>  server      Start the Rails server (short-cut alias: "s")
=> [...]

I appreciate the shortcuts! No more discussions about what shortcut to use for ‘script/server’ (ss is not an option in germany…)!

dependency management

Rails 3 has changed the way of working with gems. It uses bundler to deal with dependencies. Beeing a big fan of Java’s dependency management tools like Ivy or Maven, I think that separating out the dependency issue is good idea.

All dependencies are now defined in a separate ‘Gemfile’ using an easy dsl to manage the gems:

gem "rails", "3.0.0.beta"
[...]
gem "sqlite3-ruby", :require => "sqlite3"
[...]
group :test do
  gem "test-unit", "1.2.3"
end

I had some trouble getting bundler working on my machine, but after reinstalling Rails 3 AFTER the bundler gem, everything worked fine.

The only Rails plugin in my app is Haml and I was confident that it would play well with the latest Rails version. Never the less I was pleased to find RailsPlugins.org where one can check the compatibility of plugins with Rails 3.

escaping vs. html_safe

There were just very little changes to the existing codebase in my application. Despite one thing though, that forced changes to nearly all of the wrapper objects that are used to encapsulate the data that is coming from external services like twitter. The Problem is that Rails 3 has a strict way of dealing with escaping. Every string rendered into the view will be escaped unless it is ‘html_safe’. Since my application is using a lot of pregenerated content with inline html, adding ‘html_safe’ markers is inevitable:

  def content
    @json["content"]["$t"].html_safe
  end

Ruby 1.9 is different

The biggest pile of migration problems resulted from using Ruby 1.9.1. The latest Ruby version is a lot faster, but it has changed some of the core functionality. The ‘enum_with_index’ method for example is replaced with an ‘each_with_index’ method on a hash.
Using old YAML files resulted in some strange behavior as these files have changed format slightly (because of the new symbol style that Ruby 1.9 is using, I guess):

# old
  id: home
# new
  :id: home

Ruby 1.9 also changed the way of handling unicode characters. Using these in code forces the developer to put a magic comment in the first line of the ruby file:

# coding: utf-8
[...]

beta quirks

Most of the new Rails 3 stuff just works, but there are some reasons why it is still beta:

# rails console won't quit with controll-c but exits without error typing ö.ö
rails c
=> Loading development environment (Rails 3.0.0.beta)
ruby-1.9.1-p378 > ö.ö
^C

# rails help doesn't work for commands
rails -h
=> [...]
=> All commands can be run with -h for more information.
rails generate -h
=> Could not find generator -h.

Beta but running!