The highs and lows of Rust

I really like programming. I find programming languages very interesting. I've learned a lot of them over the years. Some I like and some I don't. Once in a while, I learn a new language that makes a lightbulb go on for me. One that changes the way I think about programming, helps me mature as a programmer, and becomes my default for most if not all new projects. The first time that happened to me was when I learned Ruby around 2009. Over the last year and a half or so, it has happened again with Rust. I love Rust so much that using anything else now drives me nuts.

The highs

What's so great about Rust? In short, it greatly increases my confidence that my programs are going to do what I want them to do without limiting expressiveness. My programming background is with almost entirely dynamic languages, as I've mostly worked on things for the web and other high level applications. People who only know dynamic languages, or people who escaped to them from the likes of Java, are afraid of static typing. They find it unnecessarily restrictive and ultimately not helpful. But in my experience, programs in dynamic languages are riddled with subtle bugs. Lack of static typing results in huge numbers of type errors all the time. In Ruby, you see "NoMethodError: undefined method 'foo' for nil:NilClass" so often that you become numb to it. You do this all in the name of "speed" and "productivity." In some cases, you do end up writing more terse code. But you end up with code that you can't trust. You write libraries that consumers will break in ways that you can't prevent. You can't write a library that adequately protects users from themselves.

With Rust, I have learned that a rich type system is a beautiful thing. There's a strong culture of automated testing in the Ruby community, and I was once religious about it just as many people in that community are. But in Rust, I have realized that so much of what you agonize to verify in a dynamic language is handled for you by static type checking and a good compiler. Testing is still a requirement, but you really only have to cover real logic, not the types of things that so many tests do in programs for dynamic languages. Rust's type system is not restrictive. It's very expressive, and typing out a few extra words provides guarantees that data is what you think it is and provides greatly improved clarity when reading and reviewing code. Every time I get a compiler error in Rust, I feel satisfied, not frustrated, because very often it's caught something that would have been a run time error that I might never have noticed in a Ruby program.

I'm not alone in feeling the pain of dynamic languages like Ruby for building large and complicated programs. In the last few years, many people have embraced Go as their new language of choice. It's been a very popular language for people coming from Ruby, Python, and JavaScript, because for many people, it provides the same feeling of "speed" and "productivity" that made dynamic languages attractive in the first place, but also provides additional benefits that static typing and ahead-of-time compilation bring like more reliable code and static binaries that you can just drop on a server and run without needing a language runtime installed.

Personally, I am not a fan of Go. For whatever reason, I actually find it much harder to read than Rust, whereas people often say that Rust is hard to read. General opinion seems to be that Go is easier to learn and that you can largely hit the ground running with it. Rust has a reputation of having a much higher learning curve, which of course affects its popularity. I really haven't been able to understand why, but somehow Rust just clicks with my mental model of programming better than Go does, and though it was indeed difficult to learn, once I have, I feel that it was worth all the time it took and then some. Go also doesn't appeal to me because I don't think it does enough to advance the state of the art. That is, it doesn't provide that much more than what dynamic languages already do. It has a mediocre type system which can be completely subverted with things like empty interfaces. And it has a nil value, which is a completely insane thing to have in a modern language. Escaping the pain of nil is one of the best parts about leaving Ruby. (See my previous post, Option types and Ruby for more on that.) Go is also not a safe language. Although many are fond of its use of channels for managing concurrent programs, Go knows only shared mutable state. It even comes with a data race detector to help debug your concurrent programs. To me, that is a sign of a fundamental error in the design of the language. In Rust, data races are not possible (assuming you use only safe code) because of its brilliant concept of memory ownership and borrowing.

I won't go over all the bullet points of the features and strengths of Rust, since they are well documented on the Rust website and various other articles on the topic. While the language is described as a "systems language" and is generally intended for lower level applications, my domain is in higher level applications, and I've found Rust to be just as suitable and strong for my purposes. It really is a fantastic general purpose language.

The lows

Being a Rust programmer is not all enjoyable, however. There are some major pain points in Rust so far. They largely have to do with the language and its ecosystem being very young, having reached 1.0 status less than a year ago, in May 2015. The most obvious issue with the ecosystem is simply the lack of libraries and the immaturity of existing ones. In older and more established languages, there is a library for just about anything you want, and for the most common needs, there is a library that is battle-tested and trusted. In Rust, very few libraries are trustworthy. Most Rust software lives on the bleeding edge. While this is frustrating when you just want to get something done, it is also exciting because building some of these needed libraries yourself means you're making a huge contribution to the ecosystem. I'm trying to help pave the way myself by never saying, "I'm not going to build X in Rust, because it requires libraries that do Y and Z, which don't exist yet." Instead, I start on X anyway, and build Y and Z myself too, giving everything back to the community.

The excitement of the new and the excitement of contributing aside, it really is painful to just get things done in Rust a lot of the time. The worst pain points for me so far are 1) lack of a feature-complete, trustworthy cryptography library and 2) lack of a feature-complete, stable serialization library. I'll address these two points in a bit more detail.

With all of the highly publicized security vulnerabilities in the last few years (roughly beginning with OpenSSL's Heartbleed, and up to and including the recent glibc DNS resolver vulnerability) the topic of the danger of programs written in C has come up many times. The general consensus is that it's just not possible to write a large, complicated program in C safely. Even without the history of bad code quality and neglect in some of these high profile C libraries, C is a language driven by the fear of making a critical mistake that is all too easy to make. Naturally, people discuss Rust as being the probable candidate for the next generation of these types of libraries. While it's not really reasonable for now to rewrite things like OpenSSL in Rust simply because of how much work it is and how many programs specifically target this older C software, not having crypto libraries in Rust makes it very difficult to write Rust programs without binding to the same C libraries we're trying to get rid of. This is one area where I envy Go, which has pure-Go crypto libraries that are suitable for production use. There are a few crypto libraries in Rust, most notably ring and Sodium Oxide, but there is currently nothing production-ready or even usable in Rust for TLS and X.509 certificate generation and management. The best you can do for the latter right now is shell out to openssl. For more discussion on this topic, see my post on the Rust Internals forum.

The second major pain point, and the most massive one in my experience, is serialization and deserialization. That is, converting between Rust types and formats like JSON and YAML. Originally, serialization was built into the Rust compiler, but as Rust prepared for 1.0, many pieces that were once part of the compiler or the language itself were removed or extracted to external libraries in order to keep the language and the standard library as small as possible. What was once the built-in "serialize" crate (a crate is Rust's term for an individual unit of compilation; essentially a package or library) now exists as the external crate rustc-serialize. It was decided that the approach taken by that crate was not the best way to do things long term, and that it would be punted until a better crate could be created, eventually becoming the de facto serialization library for the Rust community. As a compromise, the Rust compiler has special knowledge of rustc-serialize, and allows you to automatically make your Rust types serializable by putting a special attribute above them in the source code. However, with rustc-serialize being sort of deprecated, it lacks some important features, like being able to customize certain details of how a type is serialized. For example, there is no way with rustc-serialize's automatic serialization to have a Rust type with snake cased fields serialize to JSON with lower camel cased fields.

Today we have Serde, a great serialization library by Erick Tryzelaar, which is likely to be the de facto serialization library everyone is hoping for. Unfortunately, Serde does not enjoy the same special treatment that the compiler gives rustc-serialize, and so the story for using Serde is much more complicated. In order to do the same type of automatic serialization that rustc-serialize does, Serde has a sub-crate that works as compiler plugin to do code generation, turning a special attribute into a full implementation just like rustc-serialize. The rub is that compiler plugins are an unstable feature of Rust, meaning they are only available in Rust's nightly releases, and not on stable Rust 1.x or even the beta version. It is possible to use Serde's automatic serialization on stable Rust, but it requires some clever yet very hacky indirection. One of the crates under the serde organization on GitHub is Syntex, which is literally a fork of the Rust compiler's private "libsyntax" crate that compiles on stable Rust. When using Serde on stable Rust, the Serde code generation crate uses Syntex instead of the real Rust compiler to generate the code with serialization implemented in a separate build step before your program is actually compiled. There are some ugly edges to this, including needing to have two copies of every source file in your program (one with the code to be processed by Serde + Syntex and one to include the file it generates). You also need to have a build script which manually specifies which source files need to go through this process. Because compiler plugins are in flux within the Rust compiler, there are frequent changes to libsyntax, each of which requires a new version of Syntex to be released, and inevitably causes a waterfall of broken programs down the dependency chain as the new versions of Rust and Syntex are rolled out. This happens every few weeks and it's a nightmare to keep up with. As a result, any program of non-trivial size that needs serialization is really better off just sticking to nightly Rust. But because not every library has nightly Rust in mind, sometimes you end up with dependencies that don't work well together, and you're stuck in a spiral of making trivial fixes to other people's libraries so you can get your own to compile. This whole issue is not something I would expect from a language that has reached 1.0 for something as necessary and ubiquitous as data serialization. Nick Cameron has been working on a new macro system for Rust which will eventually do what compiler plugins are being used for right now, but this new system doesn't even exist yet, so it will be quite some time before it makes it to stable Rust. The near future feels very bleak on this issue.

Should you use Rust?

Rust is a fantastic language that gets almost everything right, but its immaturity can make it hard to get your work done. So should you use it? I think it depends on what you might use it for, and what you value the most. If you use programs just to get your work done, and are most concerned with productivity, Rust is not ready for prime time yet. However, if you check back in a year or so, it will likely be a very good candidate for almost anyone. That said, there is a lot you can learn about good programming from learning Rust. There are fundamentals about the language that will not change that you can benefit from right now. If you have time to look into a new language simply because you're interested in getting better as a programmer, and you're forward thinking about how you might write programs in the future, Rust is a great choice. If you're like me, and the idea of a programming language you can really trust outweighs your desire to get things done quickly, or you are excited to help build the libraries that will be the foundation of Rust for years to come, you should stop what you're doing and learn Rust right now.

Option types and Ruby

I've been learning the Rust programming language over the last several months. One of the great things about learning a new programming language is that it expands your understanding of programming in general by exposing you to new ideas. Sometimes new ideas can result in lightbulb moments for programming in languages you already know. One of the things learning Rust has made me realize is how much I wish Ruby had sum types.

A sum type is a type that has a number of "variants." These variants are alternate constructors for the type that can be differentiated from each other to confer different meaning, while still being the enclosing type. In Rust, sum types are provided through enum. An enum type can be destructured into a value using pattern matching via Rust's match operator.

enum Fruit {
  Apple,
  Banana,
  Cherry,
}

fn print_fruit_name(fruit: Fruit) {
  match fruit {
    Apple => println!("Found an apple!"),
    Banana => println!("Found a banana!"),
    Cherry => println!("Found a cherry!"),
  }
}

We define an enum, Fruit, with three variants. The print_fruit_name function takes a Fruit value and then matches on it, printing a different message depending on which variant this particular Fruit is. For our purposes here, the reason we use match instead of a chain of if/else conditions is that match guarantees that all variants must be accounted for. If one of the three arms of the match were omitted, the program would not compile, citing a non-exhaustive pattern match.

Enum variants can also take arguments which allow them to wrap other types of values. The most common, and probably most useful example of this is the Option type. This type allows you to represent the idea of a method that sometimes returns a meaningful value, and sometimes returns nothing. The same concept goes by different names sometimes. In Haskell, it's called the Maybe monad.

pub enum Option<T> {
  Some(T),
  None,
}

An option can have two possible values: "Some" arbitrary value of any type T, or None, representing nothing. An optional value could then be returned from a method like so:

fn find(id: u8) -> Option<User> {
  if user_record_for_id_exists(id) {
    Some(load_user(id))
  } else {
    None
  }
}

Code calling this method would then have to explicitly account for both possible outcomes:

match find(1) {
  Some(user) => user.some_action(),
  None => return,
}

What you do in the two cases is, of course, up to you and dependent on the situation. The point is that the caller must handle each case explicitly.

How does this relate to Ruby? Well, how often have you seen this exception when working on a Ruby program?

NoMethodError: undefined method `foo' for nil:NilClass

Chances are, you've seen this a million times, and it's one of the most annoying errors. Part of why it's so bad is that associated stack traces may not make it clear where the nil was originally emitted. Ruby code tends to use nil quite liberally. Rails frequently follows the convention of methods returning nil to indicate either the lack of a value or the failure of some operation. Because there are loose nils everywhere, they end up in your code in places you don't expect and tripping you up.

This problem is not unique to Ruby. It's been seen in countless other languages. Java programmers rue the NullPointerException, and Tony Hoare refers to the issue as his billion dollar mistake.

What, then, might we learn from the concept of an option type in regards to Ruby? We could certainly simulate an Option type by creating our own class that wraps another value, but that doesn't really solve anything since it can't force callers to explicitly unwrap it. You'd simply end up with:

NoMethodError: undefined method `foo' for #<Option:0x007fddcc4c1ab0>

But we do have a mechanism in Ruby that will stop a caller cold in its tracks if it doesn't handle a particular case: exceptions. While it's a common adage not to "use exceptions for control flow," let's take a look at how exceptions might be used to bring some of the benefits of avoiding nil through sum types. Imagine this example using an Active-Record-like User object:

def message_user(email, message_content)
  user = User.find_by_email(email)
  message = Message.new(message_content)
  message.send_to(user)
end

The find_by_email method will try looking up a user from the database by their email address, and return either a user object or nil. It's easy to forget this, and move along assuming our user variable is bound to a user object. In the case where no user is found by the provided email address, we end up passing nil to Message#send_to, which will crash our program, because it always expects a user.

One way to get around this is to just use a condition to check if user is nil or not before proceeding. But again, this is easy to forget. If we control the implementation of the User class, we can force callers to explicitly handle this case by raising an exception when no user is found instead of simply returning nil.

def message_user(email, message_content)
  user = User.find_by_email(email)
  message = Message.new(message_content)
  message.send_to(user)
rescue UserNotFound
  logger.warn("Failed to send message to unknown user with email #{email}.")
end

Now message_user explicitly handles the "none" case, and if it doesn't, an exception will be raised right where the nil would otherwise have been introduced. Of course, the program will still run if this exception isn't handled, but it will crash in the case where it does, and the crash will have a more useful exception than the dreaded NoMethodError on nil. Forcing the caller to truly account for all cases is something that pattern matching provides in Rust which is not possible in Ruby, but using exceptions to provide earlier failures and better error messages gets us a bit closer to the practical benefit.

There are other approaches to dealing with the propagation of nil values in Ruby. Another well known approach is to use the null object pattern, returning a "dummy" object (in our example, a User), that responds to all the same messages as a real user but simply has no effect. Some people would argue that is a more object-oriented or Rubyish approach, but I find that it introduces more complexity than its benefit is worth.

Using exceptions as part of the interfaces of your objects forces callers to handle those behaviors, and causes early errors when they don't, allowing them to get quick, accurate feedback when something goes wrong.

etcd 2.0 static bootstrapping on CoreOS and Vagrant

The problem

CoreOS provides a pretty good setup for running a cluster of machines with Vagrant. You can find this setup at coreos/coreos-vagrant. Something I've found annoying, however, is that whenever you start a new cluster, you need to get a new discovery token from CoreOS's hosted discovery service. This is necessary for the etcd instances running on each machine to find each other and form a quorum. The discovery token is written to the machines on initial boot via the cloud-config file named user-data. If you destroy the machines and recreate them, you need to use a fresh discovery token. This didn't sit right with me, as I want to check everything into version control, and didn't want to have a lot of useless commits changing the discovery token every time I recreated the cluster.

The solution

Fortunately, etcd doesn't rely on the hosted discovery service. You can also bootstrap etcd statically if you know the IPs and ports everything will be running on in advance. It turns out that CoreOS's Vagrantfile is already configured to provide a static IP to each machine, so these IPs can simply be hardcoded into the cloud-config. There's one more snag, which is that etcd 0.4.6 (the one that currently ships in CoreOS) gets confused if the list of IPs you include when bootstrapping includes the current machine. That would mean that the cloud-config for each machine would have to be slightly different because it'd have to include the whole list, minus itself. Without introducing an additional layer of abstraction of your own, there isn't an easy way to provide a dynamic cloud-config file that would do this. Fortunately, the newly released etcd 2.0.0 improves on the static bootstrapping story by allowing you to provide the full list of IPs on every machine. Because etcd 2.0 doesn't ship with CoreOS yet, we'll run it in a container.

For this example, we'll use a cluster of three machines, just to keep the cloud-config a bit shorter. Five machines is the recommended size for most uses. Assuming you already have Vagrant and VirtualBox installed, clone the coreos/coreos-vagrant repository and copy config.rb.sample to config.rb. Open config.rb and uncomment $num_instances, setting its value to 3.

# Size of the CoreOS cluster created by Vagrant
$num_instances = 3

Next, create a new file called user-data with the following contents:

#cloud-config

coreos:
  fleet:
    etcd-servers: http://$private_ipv4:2379
    public-ip: $private_ipv4
  units:
    - name: etcd.service
      command: start
      content: |
        Description=etcd 2.0
        After=docker.service

        [Service]
        EnvironmentFile=/etc/environment
        TimeoutStartSec=0
        SyslogIdentifier=writer_process
        ExecStartPre=-/usr/bin/docker kill etcd
        ExecStartPre=-/usr/bin/docker rm etcd
        ExecStartPre=/usr/bin/docker pull quay.io/coreos/etcd:v2.0.0
        ExecStart=/bin/bash -c "/usr/bin/docker run \
          -p 2379:2379 \
          -p 2380:2380 \
          --name etcd \
          -v /opt/etcd:/opt/etcd \
          -v /usr/share/ca-certificates/:/etc/ssl/certs \
          quay.io/coreos/etcd:v2.0.0 \
          -data-dir /opt/etcd \
          -name %H \
          -listen-client-urls http://0.0.0.0:2379 \
          -advertise-client-urls http://$COREOS_PRIVATE_IPV4:2379 \
          -listen-peer-urls http://0.0.0.0:2380 \
          -initial-advertise-peer-urls http://$COREOS_PRIVATE_IPV4:2380 \
          -initial-cluster core-01=http://172.17.8.101:2380,core-02=http://172.17.8.102:2380,core-03=http://172.17.8.103:2380\
          -initial-cluster-state new"
        ExecStop=/usr/bin/docker kill etcd

        [X-Fleet]
        Conflicts=etcd*
    - name: fleet.service
      command: start

Now just run vagrant up and in a few minutes you'll have a cluster of three CoreOS machines running etcd 2.0 with no discovery token needed!

If you want to run fleetctl inside one of the CoreOS VMs, you'll need to set the default etcd endpoint, because the current fleet still expects etcd to be on port 4001:

export FLEETCTL_ENDPOINT=http://127.0.0.1:2379

If you don't care about how that all works and just want a working cluster, you can stop here. If you want to understand the guts of that cloud-config more, keep reading.

The details

One of the changes in CoreOS 2.0 is that it now uses port 2379 and 2380 (as opposed to etcd 0.4.6 which used 4001 and 7001.) The fleet section of the cloud-config tells fleet how to connect to etcd. This is necessary because the version of fleet that ships with CoreOS now still defaults to port 4001. Once etcd 2.0 is shipping in CoreOS, I'm sure fleet will be updated to match.

The units section of the cloud-config creates systemd units that will be placed in /etc/systemd/system/ on each machine. CoreOS ships with a default unit file for etcd, but we overwrite it here (simply by using the same service name, etcd.service) to use etcd 2.0 with our own configuration.

The bulk of the cloud-config is the etcd.service unit file. Most of it is the same as a standard CoreOS unit file for a Docker container. The interesting bits are the arguments to the etcd process that runs in the container:

  • -listen-client-urls: This is the interface and port that the current machine's etcd should bind to for the client API, e.g. etcdctl. It's set to 0.0.0.0 to bind it to all interfaces, and it uses port 2379 which is the standard port, beginning in etcd 2.0.
  • -advertise-client-urls: This is the list of URLs etcd will announce as available for clients to contact.
  • -listen-peer-urls: Similar to the client URL version, this defines how the peer service should bind to the network. Again, we bind it to all interfaces and use the standard peer port of 2380.
  • -initial-advertise-peer-urls: Similar to the client version, this defines how etcd will announce its peer service to other etcd processes on other machines.
  • -initial-cluster: This is the secret sauce that keeps us from having to use a discovery token. We provide a list of each etcd service running in our cluster, mapping each machine's hostname to its etcd peer URL. Because we know which IP addresses Vagrant is going to use, we can simply enumerate them here. If you were running a cluster of a different size, this is where you would add or remove machines from the list.
  • -initial-cluster-state: A value of new here tells etcd that it's joining a brand new cluster. If you were to later add another machine to the existing cluster, you'd change the value here in that machine's cloud-config.

It's worth noting that the arguments that begin with "initial" are just that: data that etcd needs in order to boot the first time. Once the cluster has booted once and formed a quorum, these arguments will be ignored on subsequent boots, because everything etcd needs to know will have been stored in its data directory (/opt/etcd) in this case.

This is all pretty complicated, but it will get easier once etcd 2.0 and a compatible fleet are shipped with CoreOS itself. Then the built-in etcd.service unit can be used again, and all the configuration options can be written in YAML format just like the fleet section in this particular cloud-config.

Page 2