Option types and Ruby

I've been learning the Rust programming language over the last several months. One of the great things about learning a new programming language is that it expands your understanding of programming in general by exposing you to new ideas. Sometimes new ideas can result in lightbulb moments for programming in languages you already know. One of the things learning Rust has made me realize is how much I wish Ruby had sum types.

A sum type is a type that has a number of "variants." These variants are alternate constructors for the type that can be differentiated from each other to confer different meaning, while still being the enclosing type. In Rust, sum types are provided through enum. An enum type can be destructured into a value using pattern matching via Rust's match operator.

enum Fruit {
  Apple,
  Banana,
  Cherry,
}

fn print_fruit_name(fruit: Fruit) {
  match fruit {
    Apple => println!("Found an apple!"),
    Banana => println!("Found a banana!"),
    Cherry => println!("Found a cherry!"),
  }
}

We define an enum, Fruit, with three variants. The print_fruit_name function takes a Fruit value and then matches on it, printing a different message depending on which variant this particular Fruit is. For our purposes here, the reason we use match instead of a chain of if/else conditions is that match guarantees that all variants must be accounted for. If one of the three arms of the match were omitted, the program would not compile, citing a non-exhaustive pattern match.

Enum variants can also take arguments which allow them to wrap other types of values. The most common, and probably most useful example of this is the Option type. This type allows you to represent the idea of a method that sometimes returns a meaningful value, and sometimes returns nothing. The same concept goes by different names sometimes. In Haskell, it's called the Maybe monad.

pub enum Option<T> {
  Some(T),
  None,
}

An option can have two possible values: "Some" arbitrary value of any type T, or None, representing nothing. An optional value could then be returned from a method like so:

fn find(id: u8) -> Option<User> {
  if user_record_for_id_exists(id) {
    Some(load_user(id))
  } else {
    None
  }
}

Code calling this method would then have to explicitly account for both possible outcomes:

match find(1) {
  Some(user) => user.some_action(),
  None => return,
}

What you do in the two cases is, of course, up to you and dependent on the situation. The point is that the caller must handle each case explicitly.

How does this relate to Ruby? Well, how often have you seen this exception when working on a Ruby program?

NoMethodError: undefined method `foo' for nil:NilClass

Chances are, you've seen this a million times, and it's one of the most annoying errors. Part of why it's so bad is that associated stack traces may not make it clear where the nil was originally emitted. Ruby code tends to use nil quite liberally. Rails frequently follows the convention of methods returning nil to indicate either the lack of a value or the failure of some operation. Because there are loose nils everywhere, they end up in your code in places you don't expect and tripping you up.

This problem is not unique to Ruby. It's been seen in countless other languages. Java programmers rue the NullPointerException, and Tony Hoare refers to the issue as his billion dollar mistake.

What, then, might we learn from the concept of an option type in regards to Ruby? We could certainly simulate an Option type by creating our own class that wraps another value, but that doesn't really solve anything since it can't force callers to explicitly unwrap it. You'd simply end up with:

NoMethodError: undefined method `foo' for #<Option:0x007fddcc4c1ab0>

But we do have a mechanism in Ruby that will stop a caller cold in its tracks if it doesn't handle a particular case: exceptions. While it's a common adage not to "use exceptions for control flow," let's take a look at how exceptions might be used to bring some of the benefits of avoiding nil through sum types. Imagine this example using an Active-Record-like User object:

def message_user(email, message_content)
  user = User.find_by_email(email)
  message = Message.new(message_content)
  message.send_to(user)
end

The find_by_email method will try looking up a user from the database by their email address, and return either a user object or nil. It's easy to forget this, and move along assuming our user variable is bound to a user object. In the case where no user is found by the provided email address, we end up passing nil to Message#send_to, which will crash our program, because it always expects a user.

One way to get around this is to just use a condition to check if user is nil or not before proceeding. But again, this is easy to forget. If we control the implementation of the User class, we can force callers to explicitly handle this case by raising an exception when no user is found instead of simply returning nil.

def message_user(email, message_content)
  user = User.find_by_email(email)
  message = Message.new(message_content)
  message.send_to(user)
rescue UserNotFound
  logger.warn("Failed to send message to unknown user with email #{email}.")
end

Now message_user explicitly handles the "none" case, and if it doesn't, an exception will be raised right where the nil would otherwise have been introduced. Of course, the program will still run if this exception isn't handled, but it will crash in the case where it does, and the crash will have a more useful exception than the dreaded NoMethodError on nil. Forcing the caller to truly account for all cases is something that pattern matching provides in Rust which is not possible in Ruby, but using exceptions to provide earlier failures and better error messages gets us a bit closer to the practical benefit.

There are other approaches to dealing with the propagation of nil values in Ruby. Another well known approach is to use the null object pattern, returning a "dummy" object (in our example, a User), that responds to all the same messages as a real user but simply has no effect. Some people would argue that is a more object-oriented or Rubyish approach, but I find that it introduces more complexity than its benefit is worth.

Using exceptions as part of the interfaces of your objects forces callers to handle those behaviors, and causes early errors when they don't, allowing them to get quick, accurate feedback when something goes wrong.

etcd 2.0 static bootstrapping on CoreOS and Vagrant

The problem

CoreOS provides a pretty good setup for running a cluster of machines with Vagrant. You can find this setup at coreos/coreos-vagrant. Something I've found annoying, however, is that whenever you start a new cluster, you need to get a new discovery token from CoreOS's hosted discovery service. This is necessary for the etcd instances running on each machine to find each other and form a quorum. The discovery token is written to the machines on initial boot via the cloud-config file named user-data. If you destroy the machines and recreate them, you need to use a fresh discovery token. This didn't sit right with me, as I want to check everything into version control, and didn't want to have a lot of useless commits changing the discovery token every time I recreated the cluster.

The solution

Fortunately, etcd doesn't rely on the hosted discovery service. You can also bootstrap etcd statically if you know the IPs and ports everything will be running on in advance. It turns out that CoreOS's Vagrantfile is already configured to provide a static IP to each machine, so these IPs can simply be hardcoded into the cloud-config. There's one more snag, which is that etcd 0.4.6 (the one that currently ships in CoreOS) gets confused if the list of IPs you include when bootstrapping includes the current machine. That would mean that the cloud-config for each machine would have to be slightly different because it'd have to include the whole list, minus itself. Without introducing an additional layer of abstraction of your own, there isn't an easy way to provide a dynamic cloud-config file that would do this. Fortunately, the newly released etcd 2.0.0 improves on the static bootstrapping story by allowing you to provide the full list of IPs on every machine. Because etcd 2.0 doesn't ship with CoreOS yet, we'll run it in a container.

For this example, we'll use a cluster of three machines, just to keep the cloud-config a bit shorter. Five machines is the recommended size for most uses. Assuming you already have Vagrant and VirtualBox installed, clone the coreos/coreos-vagrant repository and copy config.rb.sample to config.rb. Open config.rb and uncomment $num_instances, setting its value to 3.

# Size of the CoreOS cluster created by Vagrant
$num_instances = 3

Next, create a new file called user-data with the following contents:

#cloud-config

coreos:
  fleet:
    etcd-servers: http://$private_ipv4:2379
    public-ip: $private_ipv4
  units:
    - name: etcd.service
      command: start
      content: |
        Description=etcd 2.0
        After=docker.service

        [Service]
        EnvironmentFile=/etc/environment
        TimeoutStartSec=0
        SyslogIdentifier=writer_process
        ExecStartPre=-/usr/bin/docker kill etcd
        ExecStartPre=-/usr/bin/docker rm etcd
        ExecStartPre=/usr/bin/docker pull quay.io/coreos/etcd:v2.0.0
        ExecStart=/bin/bash -c "/usr/bin/docker run \
          -p 2379:2379 \
          -p 2380:2380 \
          --name etcd \
          -v /opt/etcd:/opt/etcd \
          -v /usr/share/ca-certificates/:/etc/ssl/certs \
          quay.io/coreos/etcd:v2.0.0 \
          -data-dir /opt/etcd \
          -name %H \
          -listen-client-urls http://0.0.0.0:2379 \
          -advertise-client-urls http://$COREOS_PRIVATE_IPV4:2379 \
          -listen-peer-urls http://0.0.0.0:2380 \
          -initial-advertise-peer-urls http://$COREOS_PRIVATE_IPV4:2380 \
          -initial-cluster core-01=http://172.17.8.101:2380,core-02=http://172.17.8.102:2380,core-03=http://172.17.8.103:2380\
          -initial-cluster-state new"
        ExecStop=/usr/bin/docker kill etcd

        [X-Fleet]
        Conflicts=etcd*
    - name: fleet.service
      command: start

Now just run vagrant up and in a few minutes you'll have a cluster of three CoreOS machines running etcd 2.0 with no discovery token needed!

If you want to run fleetctl inside one of the CoreOS VMs, you'll need to set the default etcd endpoint, because the current fleet still expects etcd to be on port 4001:

export FLEETCTL_ENDPOINT=http://127.0.0.1:2379

If you don't care about how that all works and just want a working cluster, you can stop here. If you want to understand the guts of that cloud-config more, keep reading.

The details

One of the changes in CoreOS 2.0 is that it now uses port 2379 and 2380 (as opposed to etcd 0.4.6 which used 4001 and 7001.) The fleet section of the cloud-config tells fleet how to connect to etcd. This is necessary because the version of fleet that ships with CoreOS now still defaults to port 4001. Once etcd 2.0 is shipping in CoreOS, I'm sure fleet will be updated to match.

The units section of the cloud-config creates systemd units that will be placed in /etc/systemd/system/ on each machine. CoreOS ships with a default unit file for etcd, but we overwrite it here (simply by using the same service name, etcd.service) to use etcd 2.0 with our own configuration.

The bulk of the cloud-config is the etcd.service unit file. Most of it is the same as a standard CoreOS unit file for a Docker container. The interesting bits are the arguments to the etcd process that runs in the container:

  • -listen-client-urls: This is the interface and port that the current machine's etcd should bind to for the client API, e.g. etcdctl. It's set to 0.0.0.0 to bind it to all interfaces, and it uses port 2379 which is the standard port, beginning in etcd 2.0.
  • -advertise-client-urls: This is the list of URLs etcd will announce as available for clients to contact.
  • -listen-peer-urls: Similar to the client URL version, this defines how the peer service should bind to the network. Again, we bind it to all interfaces and use the standard peer port of 2380.
  • -initial-advertise-peer-urls: Similar to the client version, this defines how etcd will announce its peer service to other etcd processes on other machines.
  • -initial-cluster: This is the secret sauce that keeps us from having to use a discovery token. We provide a list of each etcd service running in our cluster, mapping each machine's hostname to its etcd peer URL. Because we know which IP addresses Vagrant is going to use, we can simply enumerate them here. If you were running a cluster of a different size, this is where you would add or remove machines from the list.
  • -initial-cluster-state: A value of new here tells etcd that it's joining a brand new cluster. If you were to later add another machine to the existing cluster, you'd change the value here in that machine's cloud-config.

It's worth noting that the arguments that begin with "initial" are just that: data that etcd needs in order to boot the first time. Once the cluster has booted once and formed a quorum, these arguments will be ignored on subsequent boots, because everything etcd needs to know will have been stored in its data directory (/opt/etcd) in this case.

This is all pretty complicated, but it will get easier once etcd 2.0 and a compatible fleet are shipped with CoreOS itself. Then the built-in etcd.service unit can be used again, and all the configuration options can be written in YAML format just like the fleet section in this particular cloud-config.

Securing CoreOS with iptables

I've been keeping a close eye on CoreOS since it was originally announced, and in the last few months I've actually started using it for a few things. As a young project, CoreOS has lots of rough edges in terms of documentation and usability. One of the issues I ran into was how to secure a CoreOS machine's public network. By default, a fresh CoreOS installation has no firewall rules, allowing all inbound network traffic.

In order to secure a CoreOS machine, I had to learn how to configure the firewall. I use the common iptables utility for this purpose. While I was vaguely familiar with iptables, I'd never really had to learn it, so I delved in to get a more thorough understanding of it. There are plenty of resources to learn iptables on the web, so I won't go into that too much here. The issue specific to CoreOS is how to configure iptables when launching a new machine.

CoreOS is unusual in that it is extremely minimal. It's designed for all programs to be run inside Linux containers, so the OS itself contains only the subsystems and tools necessary to achieve that. iptables, however, is one of the programs that does run on the OS itself.

With a more traditional Linux distribution, it's common to launch a new instance and then provision it with a tool like Chef or Puppet. Your configuration lives in a Git repository somewhere and you run a program on the target machine after it's booted to converge it into the desired state. CoreOS is missing a lot of the infrastructure assumed to be present by tools like Chef and Puppet, so they are not supported. It is possible to run Ansible, a push-based configuration management tool, on a CoreOS host, but I'm not a fan of Ansible for reasons that are beyond the scope of this post, and plus, using a complex configuration management tool is sort of against the spirit of CoreOS, where almost everything should happen in containers.

For very minimal on-boot configuration, CoreOS supports a subset of cloud-config, the YAML-based configuration format from the cloud-init tool. CoreOS instances can be provided a cloud-config file and will perform certain actions on boot. cloud-config can be used to load iptables with a list of rules for a more secure network.

I'll provide the relevant portion of the cloud-config I use on DigitalOcean, then explain the relevant pieces:

#cloud-config

coreos:
  units:
    - name: iptables-restore.service
      enable: true
write_files:
  - path: /var/lib/iptables/rules-save
    permissions: 0644
    owner: root:root
    content: |
      *filter
      :INPUT DROP [0:0]
      :FORWARD DROP [0:0]
      :OUTPUT ACCEPT [0:0]
      -A INPUT -i lo -j ACCEPT
      -A INPUT -i eth1 -j ACCEPT
      -A INPUT -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
      -A INPUT -p tcp -m tcp --dport 22 -j ACCEPT
      -A INPUT -p tcp -m tcp --dport 80 -j ACCEPT
      -A INPUT -p tcp -m tcp --dport 443 -j ACCEPT
      -A INPUT -p icmp -m icmp --icmp-type 0 -j ACCEPT
      -A INPUT -p icmp -m icmp --icmp-type 3 -j ACCEPT
      -A INPUT -p icmp -m icmp --icmp-type 11 -j ACCEPT
      COMMIT

Every cloud-config file must start with #cloud-config exactly. I learned the hard way that this is not just a comment – it actually tells CoreOS to treat the file as a cloud-config. Otherwise it will assume it's a shell script and just run it as such.

The following lines are YAML syntax. The coreos section is a CoreOS-specific extension to cloud-init's cloud-config format. The units section within it will automatically perform the specified action(s) on the specified systemd units. systemd is the init system used by CoreOS, and many of the OS's core operations are tied closely to it. "Units" are essentially processes that are managed by systemd and represented on disk by unit files that define how the unit should behave.

The systemd unit iptables-restore.service ships with CoreOS but is not enabled by default. enable: true turns it on and will cause it to run on boot after every reboot. Here are the important contents of that unit file:

[Service]
Type=oneshot
ExecStart=/sbin/iptables-restore /var/lib/iptables/rules-save

The unit file defines a "oneshot" job, meaning it simply executes and exits and is not intended to stay running permanently. The command run is the iptables-restore utility, which accepts an iptables script file defining rules to be loaded into iptables. Whenever the system reboots, all iptables rules are flushed and must be reloaded from this script. That's exactly what iptables-restore does. The script it loads is expected to live at /var/lib/iptables/rules-save, which brings us to the second section of the cloud-config file.

cloud-config's write_files section will, unsurprisingly, write files with the given content to the file system. The content field is the most important part here. This defines the iptables rules to load. The details of this configuration can be fully explained by reading the iptables documentation, but to summarize, these rules:

  • Allow all input to localhost
  • Allow all input on the private network interface
  • Allow all connections that are currently established, which prevents existing SSH sessions from being suddenly terminated
  • Allow incoming TCP traffic on ports 22 (SSH), 80 (HTTP), and 443 (HTTPS)
  • Allow incoming ICMP traffic for echo replies, unreachable destination messages, and time exceeded messages
  • Drop all other incoming traffic
  • Drop all traffic attempting to forward through the network
  • Allow all outbound traffic

The three TCP ports allowed are pretty standard, but those are the rules you'd be most likely to change or augment depending on what services you'll be running on your CoreOS machine.

After CoreOS boots, SSH into it, and verify that iptables was configured properly by running sudo iptables -S (to see it listed in the same format as above) or with sudo iptables -nvL (for the more standard list format).

That's pretty much it! As you can see, there are a lot of related technologies to learn when venturing into CoreOS. Several of these were new for me, so there was a lot of learning involved in getting this working. For reference, the entire cloud-config I use for CoreOS on DigitalOcean can be found in this Gist.

← Previous 1 3 4 5 18