The challenge of asset packaging on Heroku

Page load time is an important consideration in web application development. Users have an expectation that navigating a website should be fast, and many people will simply leave if it takes too long to load a page. Two ways to improve it are to minimize the number of HTTP requests and to minimize the amount of data transferred. Both of these can be improved by concatenating, minifying, and caching CSS and JavaScript files.

Rails has a handy feature that helps with part of this: the stylesheet_link_tag and javascript_include_tag helper methods accept a cache option, which will take all files passed to them and concatenate them into a single file (and single HTTP request) in the production environment. This is a big improvement, but it could be better. In addition to combining the files, we want to reduce the data transferred by running them through a so-called minifier, which removes whitespace, comments, and makes various optimizations like variable name substitution and function inlining. Lastly, the big challenge: we want to be able to do this on platforms like Heroku, where our ability to write to disk is highly restricted.

Read only file systems

The biggest issue for asset packaging when deploying a Rails app to Heroku is that, with the exception of the tmp folder, we only have read access to the disk. This means that the cache option for the asset helper methods will not work, because the concatenated files are written to disk the first time they're needed. The Rails helpers also don't offer the ability to minify the output file, so we'll need to look into a plugin-based solution for asset packaging.

There are quite a few asset packaging plugins out there, including asset_packager, heroku_asset_packager, heroku_asset_cacher, and Jammit. If you Google around on the subject, you'll also find a multitude of blog posts and discussions where people have written Rake and Capistrano tasks to jury rig a solution for this problem. Clearly there is no ideal approach yet. I think Jammit has come pretty close, but it still comes up against a brick wall on Heroku's read only file system.

Precaching

The most common suggestion I've seen is to precache the asset files, i.e., to generate them all on the local machine and commit them to the repository before deploying. With this approach, nothing needs to be written to disk in the production environment. The downside is that we now have artifacts from our build process in our repository's history, which is far less than ideal. Still, some find this to be an acceptable compromise, and all the Rake and Capistrano based solutions you'll see automate the committing of assets before deployment to make it a little less painful. If having your history dirtied doesn't bother you, you can probably stop there. Personally, I'm not satisfied yet.

Caching or precaching to tmp

Unlike the built in Rails helper methods, Jammit writes the cached asset files to a special directory at public/assets. Using Jammit's helper include_javascripts :some_package, for example, will create a script tag linking to example.com/assets/some_package.js. On the first request to this address, the request will be routed to a special Jammit controller that will figure out which raw files need to be packaged. It will run them through either the YUI Compressor or the Google Closure Compiler, with options we specify in configuration, serve the response to the client directly, and cache the output by writing it to assets/some_package.js. The next time the address is requested, Rack will see that the cached file exists, and serve that instead of routing to the Jammit controller.

We are faced with two problems with this process on Heroku. The first is that we can only write to tmp. The second is that Heroku lacks a JVM, which is used by both the YUI Compressor and the Google Closure Compiler. Currently, Jammit doesn't offer a workaround for either of these issues. It would require a configuration option to change the full file path for the cached assets, and an alternative minifier which works without a JVM. One possible solution is UglifyJS, which runs on Node.js, and is already being used for projects like jQuery. An interface to UglifyJS and Node might be provided by therubyracer-heroku and Uglifier.

Even if Jammit could write the cached assets to tmp, it's still not the best approach. tmp is not really intended for this purpose, as Heroku states in their documentation:

If you wish to drop a file temporarily for the duration of the request, you can write to a filename like #{RAILS_ROOT}/tmp/myfile_#{Process.pid}. There is no guarantee that this file will be there on subsequent requests (although it might be), so this should not be used for any kind of permanent storage.

The good news is that Heroku provides Varnish as an HTTP cache, so we should be able to use that instead of writing to disk at all. The first request for an asset package will hit the Jammit controller, which would add HTTP caching headers to the response. The next user that requests the packaged asset file will be served directly from Varnish, completely bypassing the application stack. And when the same user loads another page that includes the same asset package, the browser won't even request the file from the server because of the HTTP caching headers that have been set. Now that's efficient.

Busting the cache

Okay, we've got a good plan for caching asset files, but what happens when we update the content in those files? Without some intervention, the user will be served outdated content from the cache. The Rails and Jammit helpers solve this by adding a timestamp to the query string, created from the mtime of the file. After deployment, the old cached files are removed, and new ones are generated with a new cache busting string. The user's browser and Varnish will both see this as a new file, and request the new content. This is a pretty good solution, but still not totally ideal.

Because the cached assets are being recreated on every deployment, the mtime (and therefore the cache busting string) changes even if the contents of the files themselves don't change. Users are forced to redownload all the assets on the entire site again after each deploy, even if only one of them has changed. A better approach would be to use an MD5 hash of the file's contents as the cache busting string, so the query string only changes when the contents of the file change, and the asset files can stay cached across deployments. We'd probably also want some mechanism for remembering the MD5 for a particular asset file, or we'd have to get the MD5 every time a script tag was generated with one of the helper methods.

It's a tough problem

As evidenced by the multitude of plugins and scripts which attempt to solve this problem, it's a tough nut to crack. I think the current tools are good, but still not quite up to par. I will continue to investigate this myself, and will hopefully be able to whip up some code to contribute, but I hope the Rails and Heroku communities can really work together to find a solution for asset packaging and caching on Heroku that makes things as efficient and painless as possible.

Comments

Jason Garber Jason Garber commented
February 16, 2011

It's one of the biggest pain points on Heroku and it bites me again and again. Heroku need to provide a solution.

You didn't mention Amazon S3 as an option for Heroku assets. The generated files aren't in the repository; they're generated and uploaded to S3 straightaway. The downside is that they all have to be generated and uploaded every time, which makes deployment very slow. I've asked some asset packager maintainers to MD5 hash the contents and only upload what's changed, but to my knowledge that hasn't happened yet.

I'm sure S3 is what Heroku would recommend. They really need to be a one-stop, no-hassle deployment shop, IMHO, and set up something themselves. Maybe a writable sibling directory to tmp that gets cached and expired intelligently.

John John commented
February 16, 2011

I feel the same way as Jason Garber: The solution is decoupling asset hosting from Heroku altogether. The popularity of S3 suggests that this is where the work should be.

Jeremy Ashkenas Jeremy Ashkenas commented
February 16, 2011

If you're investigating the Jammit/S3 approach, perhaps this gem can help:

https://github.com/railsjedi/jammit-s3

Richard Taylor Richard Taylor commented
February 18, 2011

You also might want to check out my asset_id gem that should help - it md5 stamps your assets and uploads them to S3

Daniel Huckstep Daniel Huckstep commented
February 21, 2011

I run my blog on Heroku, using Rails 3, and my own gem sinatra-bundles to handle packaging of assets. They don't get cached on the filesystem, but cache headers get set and it works out pretty well.

Jimmy Cuadra Jimmy Cuadra commented
February 21, 2011

Very cool, Daniel. I've been thinking about doing something similar to sinatra-bundles for Rails. I'm probably going to hold off until the dust settles with Rails 3.1 and the Heroku/therubyracer/UglifyJS issue. Once those are sorted out, it will be more apparent whether I should try patching an existing library or reinventing the wheel for my own tastes.

John McCaffrey John McCaffrey commented
February 22, 2011

The 2 common problems I've seen mentioned with respect to the standard rails cache-busting url
(eg. images/my_image?

Problem #1 The timestamp is based on mtime, which, depending on how you deploy, may force users to download a new file, even though the contents haven't actually changed.

Problem #2 The query string ?3424234234 is ignored by certain proxies (and amazon cloudfront)

Both of these issues are addressed with the asset_fingerprint plugin (though I haven't tried it yet)

Improving the way static assets are handled may seem like an afterthought to most people, but often, its one of the most straight-forward and repeatable ways to drastically improve the performance of your application without meddling with queries and rearchitecting the backend.

Thanks for doing this review and pointing people in the right direction!

Daniel Huckstep Daniel Huckstep commented
February 22, 2011

Jimmy:

sinatra-bundles does actually work pretty well with Rails 2.3.x using the plain metal, or with Rails 3.x using https://github.com/darkhelmet/darkblog2/blob/master/lib/bundles.rb and https://github.com/darkhelmet/darkblog2/blob/master/lib/rack/sinatra.rb

Yeah it's a sinatra plugin, but whatever :)

Jonathan Baudanza Jonathan Baudanza commented
February 28, 2011

https://github.com/jbaudanza/rack-asset-compiler

This isn't really a bundler, but if you have a simple case of, "I want to compile X to Y on heroku". it works pretty well. I use it to compile coffeescript and sass, both on Heroku and in my jasmine suite.