OpenErlang Video Series: Robert Virding & Joe Armstrong

Our first insight in the #OpenErlang Q&A series is a biggie! Two-thirds of the Erlang Creator Dream Team Robert Virding and Joe Armstrong talk their favourite topic…Erlang!

From how Erlang developed into a programming heavyweight to the benefits of the language becoming open-sourced, Robert and Joe share their highlights over the past 20 years including the community it has created and how important it is to a number of huge global companies.

Interview Transcript

Robert Virding: Hello, Joe.

Joe Armstrong: Hello, Robert.

Robert: And there is no Mike.

Joe: There’s no Mike.

Robert: There’s no Mike. Joe and I, we were two of the original three developers of Erlang. The third was Mike, who is not here now.

Joe: 20 years since it’s open sourced.

Robert: 20 years since it became open sourced. Erlang was the first open source software that Ericsson released. Definitely, some of the highlights over the last 20 years is Bluetail. This was a company that was formed after Erlang became open sourced with people from Ericsson, amongst others, Joe and myself. That was definitely a big win. Then it spread.

Joe: Yes, it spread.

Robert: It spread quite a lot actually. The community’s important because, yes, you get feedback about what you do and you get new ideas, you get things like errors. They spread knowledge about using the system and how it’s being used to think like this. What’s good about it and what’s bad about it as well too. You need an active community to get something up, get useful information about it.

Joe: It’s kind of important for Erlang because Erlang is rather different to a lot of languages. It’s concurrent language, but most languages aren’t concurrent, they’re sequential. People don’t really understand what it is until they join in a forum and they start participating and they get some understanding of what it is. You have to build a culture as well. It’s building that culture that’s the important thing.

Robert: Yes. That can take time as well to do. It’s not always obvious that it will happen. The Erlang community–It has happened. There’s quite a lot of activity on it and quite a lot of the major users are an active part of the community.

Joe: They ask questions like, “How can I persuade my boss to use Erlang?” That’s not a technical question and that’s the sort of answer you can get from the community by socialising. As much as the technical forums where we talk about technical stuff, what we do outside those sessions is equally important, because we’re building social relationships there.

Robert: One of the problems if you’re trying to sell the language into your company– We get questions. Someone comes and asks us how to do it. You need at least one enthusiast inside the company to do it, otherwise it won’t work. The issue is, how can they describe Erlang, the benefits of Erlang in such a way comparing it to other languages? What it’s good at and also what it’s not good at, so they don’t make a bad mistake. We can help them do that. We can give references to other companies that use it, for example. That’s one thing that we definitely can do.

Joe: What was really impressive was the WhatsApp people, because they didn’t go on any courses or anything, they just sat away in a little room somewhere doing WhatsApp. A dozen of them created WhatsApp on the server. Fantastic!

Robert: Have a system with over a billion users!

Joe: Yes. That’s cool.

Robert: Which is quite fantastic and it works!

Joe: It works!

Robert: It’s a very impressive system.

[00:03:08] [END OF AUDIO]

Permalink

#OpenErlang Video Series: Robert Virding & Joe Armstrong

Our first insight in the #OpenErlang Q&A series is a biggie! Two-thirds of the Erlang Creator Dream Team Robert Virding and Joe Armstrong talk their favourite topic…Erlang!

From how Erlang developed into a programming heavyweight to the benefits of the language becoming open-sourced, Robert and Joe share their highlights over the past 20 years including the community it has created and how important it is to a number of huge global companies.

Interview Transcript

Robert Virding: Hello, Joe.

Joe Armstrong: Hello, Robert.

Robert: And there is no Mike.

Joe: There’s no Mike.

Robert: There’s no Mike. Joe and I, we were two of the original three developers of Erlang. The third was Mike, who is not here now.

Joe: 20 years since it’s open sourced.

Robert: 20 years since it became open sourced. Erlang was the first open source software that Ericsson released. Definitely, some of the highlights over the last 20 years is Bluetail. This was a company that was formed after Erlang became open sourced with people from Ericsson, amongst others, Joe and myself. That was definitely a big win. Then it spread.

Joe: Yes, it spread.

Robert: It spread quite a lot actually. The community’s important because, yes, you get feedback about what you do and you get new ideas, you get things like errors. They spread knowledge about using the system and how it’s being used to think like this. What’s good about it and what’s bad about it as well too. You need an active community to get something up, get useful information about it.

Joe: It’s kind of important for Erlang because Erlang is rather different to a lot of languages. It’s concurrent language, but most languages aren’t concurrent, they’re sequential. People don’t really understand what it is until they join in a forum and they start participating and they get some understanding of what it is. You have to build a culture as well. It’s building that culture that’s the important thing.

Robert: Yes. That can take time as well to do. It’s not always obvious that it will happen. The Erlang community–It has happened. There’s quite a lot of activity on it and quite a lot of the major users are an active part of the community.

Joe: They ask questions like, “How can I persuade my boss to use Erlang?” That’s not a technical question and that’s the sort of answer you can get from the community by socialising. As much as the technical forums where we talk about technical stuff, what we do outside those sessions is equally important, because we’re building social relationships there.

Robert: One of the problems if you’re trying to sell the language into your company– We get questions. Someone comes and asks us how to do it. You need at least one enthusiast inside the company to do it, otherwise it won’t work. The issue is, how can they describe Erlang, the benefits of Erlang in such a way comparing it to other languages? What it’s good at and also what it’s not good at, so they don’t make a bad mistake. We can help them do that. We can give references to other companies that use it, for example. That’s one thing that we definitely can do.

Joe: What was really impressive was the WhatsApp people, because they didn’t go on any courses or anything, they just sat away in a little room somewhere doing WhatsApp. A dozen of them created WhatsApp on the server. Fantastic!

Robert: Have a system with over a billion users!

Joe: Yes. That’s cool.

Robert: Which is quite fantastic and it works!

Joe: It works!

Robert: It’s a very impressive system.

[00:03:08] [END OF AUDIO]

Permalink

Silly: Hextexting via the command line…

A silly thread on Twitter came to my attention today that stirred some late 1980’s/1990’s phreak/hax0r nostalgia in me. So, of course, I did what any geek would do, wrote a one-off utility script for it. Have fun confusing your parents, kids.

#! /usr/bin/env escript

-mode(compile).

main([Command | Input]) ->
    ok = io:setopts([{encoding, unicode}]),
    Output = convert(Command, Input),
    io:format("~ts~n", [Output]).

convert("t", Input) ->
    String = string:join(Input, " "),
    string:join(lists:map(fun(C) -> integer_to_list(C, 16) end, String), " ");
convert("h", Input) ->
    lists:map(fun(C) -> list_to_integer(C, 16) end, Input);
convert(_, _) ->
    "hextext usage: `hextext t|h [text]".

(Also, look up rot13 — ’twas all the rage 30 years ago, and still makes an appearance as a facilitator of hidden easter eggs in some games. A lot of “garbled alien/monster/otherling speech” text is rot13.)

Permalink

A Proposal: Elixir-Style Modules in JavaScript

Moving your code towards a more functional style can have a lot of benefits – it can be easier to reason about, easier to test, more declarative, and more. One thing that sometimes comes out worse in the move to FP, though, is organization. By comparison, Object Oriented Programming classes are a pretty useful unit of organization – methods have to be in the same class as the data they work on, so your code is pushed towards being organized in pretty logical ways.

In a modern JavaScript project, however, things are often a little less clear-cut. You’re generally building your application around framework constructs like components, services, and controllers, and this framework code is often a stateful class with a lot of dependencies. Being a good functional programmer, you pull your business logic out into small pure functions, composing them together in your component to transform some state. Now you can test them in isolation, and all is well with the world.

But where do you put them?

Common patterns

The first answer is often “at the bottom of the file.” For example, say you’ve got your main component class called UserComponent.js. You can imagine having a couple pure helper functions like fullName(user) at the bottom of the file, and you export them to test them in UserComponent.spec.js.

Then as time goes on, you add a few more functions. Now the component is a few months old, the file is 300 lines long and it’s more pure functions than it is component. It’s clearly time to split things up. So hey, if you’ve got a UserComponent, why not toss those functions into a UserComponentHelpers.js? Now your component file looks a lot cleaner, just importing the functions it needs from the helper.

So far so good – though that UserComponentHelpers.js file is kind of a grab-bag of functions, where you’ve got fullName(user) sitting next to formatDate(date).

And then you get a new story to show users’ full names in the navbar. Okay, so now you’re going to need that fullName function in two places. Maybe toss it in a generic utils file? That’s not great.

And then, a few months later, you’re looking at the FriendsComponent, and find out someone else had already implemented fullName in there. Oops. So now the next time you need a user-related function, you check to see if there’s one already implemented. But to do that, you have to check at least UserComponent, UserComponentHelpers, and FriendsComponent, and also UserApiService, which is doing some User conversion.

So at this point, you may find yourself yearning for the days of classes, where a User would handle figuring out its own fullName. Happily, we can get the best of both worlds by borrowing from functional languages like Elixir.

Modules in Elixir

Elixir has a concept called structs, which are dictionary-like data structures with pre-defined attributes. They’re not unique to the language, but Elixir sets them up in a particularly useful way. Files generally have a single module, which holds some functions, and can define a single struct. So a User module might look like this:

Even if you’re never seen any Elixir before, that should be pretty easy to follow. A User struct is defined as having a first name, last name, and email. There’s also a related full_name function that takes a User and operates on it. The module is organized like a class – we can define the data that makes up a User, and logic that operates on Users, all in one place. But, we get all that without trouble of mutable state.

Modules in JavaScript

There’s no reason we can’t use the same pattern in JavaScript-land. Instead of organizing your pure functions around the components they’re used in, you can organize them around the data types (or domain objects in Domain Driven Design parlance) that they work on.

So, you can gather up all the user-related pure functions, from any component, and put them together in a User.js file. That’s helpful, but both a class and an Elixir module define their data structure, as well as their logic.

In JavaScript, there’s no built-in way to do that, but the simplest solution is to just add a comment. JSDoc, a popular specification for writing machine-readable documentation comments, lets you define types with the @typedef tag:

With that we’ve replicated all the information in an Elixir module in JavaScript, which will make it easier for future developers to keep track of what a User looks like in your system. But the problem with comments is they get out of date. That’s where something like TypeScript comes in. With TypeScript, you can define an interface, and the compiler will make sure it stays up-to-date:

This also works great with propTypes in react. PropTypes are just objects that can be exported, so you can define your User propType as a PropType.shape in your User module.

Then you can use the User’s type and functions in your components, reducers, and selectors.

You could do something very similar with Facebook’s Flow, or any other library that lets you define the shape of your data.

However you define your data, the key part is to put a definition of the data next to the logic on the data in the same place. That way it’s clear where your functions should go, and what they’re operating on. Also, since all your user-specific logic is in once place, you’ll probably be able to find some shared logic to pull out that might not have been obvious if it was scattered all over your codebase.

Placing Parameters

It’s good practice to always put the module’s data type in a consistent position in your functions – either always the first parameter, or always the last if you’re doing a lot of currying. It’s both helpful just to have one less decision to make, and it helps you figure out where things go – if it feels weird to put user in the primary position, then the function probably shouldn’t go into the User module.

Functions that deal with converting between two types – pretty common in functional programming – would generally go into the module of the type being passed in – userToFriend(user, friendData) would go into the User module. In Elixir it would be idiomatic to call that User.to_friend, and if you’re okay with using wildcard imports, that’ll work great:

On the other hand, if you’re following the currently popular JavaScript practice of doing individual imports, then calling the function userToFriend would be more clear:

Consider wildcard imports

However, I think that with this functional module pattern, wildcard imports make a lot of sense. They let you prefix your functions with the type they’re working on, and push you to think of the collection of User-related types and functions as one thing like a class.

But if you do that and declare types, one issue is that then in other classes you’d be referring to the type User.User or User.userType. Yuck. There’s another idiom we can borrow from Elixir here – when declaring types in that language, it’s idiomatic to name the module struct’s type t.

We can replicate that with React PropTypes by just naming the propType t, like so:

It also works just fine in TypeScript, and it’s nice and readable. You use t to describe the type of the current module, and Module.t to describe the type from Module.

Using t in TypesScript does break a popular rule from the TypeScript Coding Guidelines to “use PascalCase for type names.” You could name the type T instead, but then that would conflict with the common TypeScript practice of naming generic types T. Overall, User.t seems like a nice compromise, and the lowercase t feels like it keeps the focus on the module name, which is the real name of the type anyway. This is one for your team to decide on, though.

Wrapping up

Decoupling your business logic from your framework keeps it nicely organized and testable, makes it easier to onboard developers who don’t know your specific framework, and means you don’t have to be thinking about controllers or reducers when you just want to be thinking about users and passwords.

This process doesn’t have to happen all at once. Try pulling all the logic for just one module together, and see how it goes. You may be surprised at how much duplication you find!

So in summary:

Try organizing your functional code by putting functions in the same modules as the types they work on.
Put the module’s data parameter in a consistent position in your function signatures.
Consider using import * as Module wildcard imports, and naming the main module type t.

Permalink

Chess and Recursion: Part 1

I’ve been using my investment time at thoughtbot to build a multiplayer chess game using Elixir and Phoenix in order to hone my skills in that area. One of the trickiest and most fun parts of the project so far has been generating all the possible moves for a player to make.

The board

We will store the board as a map of pieces indexed by their position as a tuple. This makes it easy to move pieces around the board by popping elements out of the map and then adding them back in with a new index.

%{
  {0, 7} => %{type: "rook",   colour: "black"},
  {1, 7} => %{type: "knight", colour: "black"},
  {2, 7} => %{type: "bishop", colour: "black"},
  {3, 7} => %{type: "queen",  colour: "black"},
  {4, 7} => %{type: "king",   colour: "black"},
  {5, 7} => %{type: "bishop", colour: "black"},
  {6, 7} => %{type: "knight", colour: "black"},
  {7, 7} => %{type: "rook",   colour: "black"},

  {0, 6} => %{type: "pawn", colour: "black"},
  {1, 6} => %{type: "pawn", colour: "black"},
  # rest of the pieces ...
}

Linear moves

We’ll start this journey with rooks as they have a relatively straightforward movement profile. For now we will ignore blocking pieces which means that for each direction we just need to traverse the board in one direction until we hit the edge.

Looping in Elixir is achieved through recursion. This may sound complex but has some advantages as we will see.

Here’s a function that will return all the available moves in one direction:

# lib/chess/moves/rook.ex

defmodule Chess.Moves.Rook do
  def moves_north(_board, {_file, 7}), do: []

  def moves_north(board, {file, rank}) do
    [{file, rank + 1} | moves_north(board, {file, rank + 1})]
  end
end

Let’s break this down piece by piece.

We’re using Elixir’s pattern matching to define multiple functions here with the same name. If you’d like to read more about Elixir’s pattern matching I would recommend Pattern Matching in Elixir: Five Things to Remember by Anna Neyzberg

The first function matches if the rank is 7. This means we’ve hit the top edge of the board so we return an empty list to stop the recursion.

def moves_north(_board, {_file, 7}), do: []

(The underscores in front of the variable names indicate that we can discard the values as we don’t need them in the function definition.)

Lists in Elixir are represented internally as linked lists. They are represented internally by pairs consisting of the head and the tail of the list. The | operator allows us to match the head and tail of a lists or construct a new list from a head and a tail.

The next function returns a new list with the head containing the next square north and the tail being the result of another call to this function:

def moves_north(board, {file, rank}) do
  [{file, rank + 1} | moves_north(board, {file, rank + 1})]
end

Essentially we are adding 1 to the rank until it reaches 7, then unwinding the stack and returning the resulting list. If rank starts at say 4 (we’ll set file to 0) then we get this back from the recursive function:

[{0, 4} | [{0, 5} | [{0, 6} | [{0, 7}]]]]

Which is equivalent to:

[{0, 4}, {0, 5}, {0, 6}, {0, 7}]

And there we have our list of moves in one direction!

We can create other recursive functions to handle moving south, east and west:

def moves_south(_board, {_file, 0}), do: []
def moves_south(board, {file, rank}) do
  [{file, rank - 1} | moves_south(board, {file, rank - 1})]
end

def moves_east(_board, {7, _rank}), do: []
def moves_east(board, {file, rank}) do
  [{file + 1, rank} | moves_east(board, {file + 1, rank})]
end

def moves_west(_board, {0, _rank}), do: []
def moves_west(board, {file, rank}) do
  [{file - 1, rank} | moves_west(board, {file - 1, rank})]
end

We can even start writing functions to handle moving in diagonal directions, for bishops and queens:

def moves_northeast(_board, {7, _rank}), do: []
def moves_northeast(_board, {_file, 7}), do: []
def moves_northeast(board, {file, rank}) do
  [{file + 1, rank + 1} | moves_northeast(board, {file + 1, rank + 1})]
end

We’re writing a lot of functions now that look very similar, so let’s figure out a better way.

Each of these functions is doing the same thing—moving in a straight line—just in a different direction. We could pass an offset vector as a tuple instead of hard coding it into each function, so if we want all the moves north we could call it like this:

moves(board, {3, 4}, {0, 1}) # (board, position, vector)

The main body of the function looks like this:

def moves(board, {file, rank}, {fv, rv}) do
  next_square = {file + fv, rank + rv}
  [next_square | moves(board, next_square, {fv, rv})]
end

We need to match cases where we hit the edge of the board and stop the recursion:

def moves(_board, {0, _rank}, {-1, _}), do: []
def moves(_board, {_file, 0}, {_, -1}), do: []
def moves(_board, {7, _rank}, {1, _}), do: []
def moves(_board, {_file, 7}, {_, 1}), do: []

def moves(board, {file, rank}, {fv, rv}) do
  next_square = {file + fv, rank + rv}
  [next_square | moves(board, next_square, {fv, rv})]
end

Obstructions

We need to handle cases where another piece is in the way. First, let’s define a function that will tell us if a square is empty:

# we'll use defp to define a private function as
# we won't be calling this outside of this module.
defp empty?(board, position) do
  is_nil(board[position])
end

Now we can use this function in our moves function to only recurse if the next square is empty. If the square is not empty then we return an empty list to stop recursion.

def moves(board, {file, rank}, {fv, rv}) do
  next_square = {file + fv, rank + rv}
  if empty?(board, next_square) do
    [next_square | moves(board, next_square, {fv, rv})]
  else
    []
  end
end

All together now

Lastly, let’s combine these to generate moves for all the pieces that move in straight lines:

defmodule Chess.Moves do
  def queen(board, position) do
    # The queen moves like both a rook and a bishop
    rook(board, position) ++
      bishop(board, position)
  end

  def rook(board, {file, rank}) do
    moves(board, {file, rank}, {0, 1}) ++
      moves(board, {file, rank}, {0, -1}) ++
      moves(board, {file, rank}, {-1, 0}) ++
      moves(board, {file, rank}, {1, 0})
  end

  def bishop(board, {file, rank}) do
    moves(board, {file, rank}, {1, 1}) ++
      moves(board, {file, rank}, {1, -1}) ++
      moves(board, {file, rank}, {-1, 1}) ++
      moves(board, {file, rank}, {-1, -1})
  end

  defp moves(_board, {0, _rank}, {-1, _}), do: []
  defp moves(_board, {_file, 0}, {_, -1}), do: []
  defp moves(_board, {7, _rank}, {1, _}), do: []
  defp moves(_board, {_file, 7}, {_, 1}), do: []
  defp moves(board, {file, rank}, {fv, rv}) do
    next_square = {file + fv, rank + rv}
    if empty?(board, next_square) do
      [next_square | moves(board, next_square, {fv, rv})]
    else
      []
    end
  end

  defp empty?(board, position) do
    is_nil(board[position])
  end
end

That’s all for now, next time we’ll tackle the gnarly moves of the knight!

If you’re impatient for more, you can always check out the source code on GitHub at https://github.com/danbee/chess.

Permalink

All For Reliability: Reflections on the Erlang Thesis

If you ask Elixir developers what got us interested in the language, many will say “concurrency.” We wanted to make our programs faster by making them use more cores.

That desire is a big part of why Elixir exists. Before creating Elixir, José Valim was trying to make Rails thread-safe, and found it frustrating. He remembered thinking:

“If the future is going to have [many] cores, we needed to have better abstractions, because the ones I had working with Ruby and Rails were not going to cut it. So I decided to study other languages and see what they were doing.”

That study led him to Erlang, and he built Elixir to run on the Erlang virtual machine.

Similarly, Chris McCord wanted to use WebSockets in his Rails apps, but struggled to create a scalable solution in Ruby. Then he heard about WhatsApp using Erlang to get 1 million connections on a single machine.

“That kind of blew my mind, because I was looking at getting maybe a hundred connections on my Rails app.”

That got McCord interested in Erlang, then Elixir. He went on to create Elixir’s main web framework, Phoenix, known for sub-millisecond response times and massively scalable WebSocket support.

Personally, I worked on a travel search engine that needed to do a lot of on-demand concurrent work, but was written in a language that made that difficult. I learned about Elixir while at that job, and though we couldn’t adopt it there, I knew it was a tool I’d want to use in the future.

But although he calls it a “concurrency-oriented programming language”, Erlang co-creator Joe Armstrong does not cite speed as the main reason for its creation in his 2003 PhD dissertation, “Making reliable distributed systems in the presence of software errors”. Its purpose is there in the title: reliability.

What’s fascinating is how many of Erlang’s (and therefore Elixir’s) attributes are direct consequences of designing it to be reliable.

Mistakes Are Inevitable

In the paper, Armstrong talks about the challenges his team faced in writing telephone switching systems in the 80s.

First, they needed to write complex systems, with “several millions of lines of program code”, and might have teams with “many hundreds of programmers” of varying experience levels.

The requirements were demanding:

Market pressure encourages the development and deployment of systems with large numbers of complex features. Often systems are deployed before the interaction between such features is well understood. During the lifetime of a system, the feature set will probably be changed and extended in many ways.

Under such constraints, there was no way they could produce perfect software.

Yet something approaching perfection was required. Telephone calls are important; some of them are emergencies. You can’t turn off the phone system each night for maintenance. A telephone system needs “to be in continuous operation for many years”, “typically having less than two hours of down-time in 40 years.”

It’s quite a problem! How can an imperfectly-written system get near-perfect results? Or as Armstrong puts it:

The central problem addressed by this thesis is the problem of constructing reliable systems from programs which themselves may contain errors.

Handling the Worst Case

Before solving the problem, Armstrong makes it harder by considering the worst case: hardware failure. Imagine a perfectly-written program running on a computer that bursts into flames.

There are a lot of failures from which a process can recover, but ceasing to exist is not one of them.

Armstrong addresses this situation simply.

To guard against the failure of an entire computer, we need two computers.

Specifically:

If the first computer crashes, the failure is noticed by the second computer, which will try to correct the error.

This might mean, for example, creating a new process elsewhere to replace the one that died.

Here we see the first interesting property of Erlang: it’s made to run systems that span multiple machines, because that’s what’s needed for reliability.

Even better: since the failure of one process must be corrected by another process when they’re on separate machines, Erlang uses that mechanism for all failures. All failures are handled via “monitors” and “links”, which are ways for one process to react to the failure of another, supported directly by the VM. (These mechanisms are the foundation for the supervision tools and patterns of OTP.)

In fact, the Erlang VM goes so far as to make hardware failures look like software failures. If Process A is monitoring Process B and Process B dies, Process A is notified. This happens whether Process B divides by zero, loses connectivity, or dies in a fire; in the last two cases, the “distress call” is faked by the VM when it loses contact with Process B.

As Armstrong says:

The reason for coercing the hardware error to make it look like a software error is that we don’t want to have two different methods for dealing with errors… we want one uniform mechanism. This, combined with the extreme case of hardware error, and the failure of entire processors, leads to the idea of handling errors, not where they occurred, but at some other place in the system.

Process Properties

Now consider the two processes mentioned above, running on separate machines. There are certain things we can be sure of.

  • They can’t share memory because they’re physically separated. This is nice, because a failing process can’t corrupt the memory of the process that’s watching it.
  • They can communicate only by passing messages.
  • They will succeed or fail independently.

This separation is analogous to a firewall. In construction, a “firewall” is a wall between two parts of a building that keeps fire from spreading. A maximally-safe house would have firewalls around every room, chair, and bed. In a house, this is impractical. But in an Erlang system, process boundaries are like firewalls for failures, and they cost almost nothing.

And again, for uniformity, these same “firewall-like” characteristics are true whether two processes run on separate machines or not. Processes on the same machine share no memory and communicate only by passing messages, just as if they were on separate machines.

( As Armstrong noted, Erlang’s process isolation isn’t perfect; if one process allocates an inordinate amount of memory or atoms, for example, it could crash the Erlang VM on that machine, including all its processes. )

To further improve reliability, several more properties are needed.

First, messages must be asynchronous. As Armstrong says:

Isolation implies that message passing is asynchronous. If process communication is synchronous then a software error in the receiver of a message could indefinitely block the sender of the message, destroying the property of isolation.

Second, processes must be lightweight. If safety is increased by dividing a system into more processes, we’ll want to run a lot of them, creating them quickly and on-demand.

And third, processes must take turns. As Armstrong says:

Concurrent processes must also time-share the CPU in some reasonable manner, so that CPU-bound processes do not monopolise the CPU, and prevent progress of other processes which are “ready to run.”

Like most operating systems, the Erlang VM uses “preemptive multitasking” (more or less). This means that each process gets a fixed amount of time to use the CPU. If a process isn’t finished when its turn is up, it is paused and sent to the back of the line, then another process gets a turn. It’s also paused if it’s waiting to read a file or get a network response.

In this way, the Erlang VM supports “non-blocking IO” as well as “non-blocking computation”, both of which get applied automatically to sequential-looking code.

Where Reliability and Performance Meet

You might have noticed that those last two points of reliability are also performance concerns. That’s because the two are related.

Imagine a system that’s handling a large number of small tasks - calls, web requests, or whatever. In comes a large task. What happens?

If the system’s overall performance takes a dive, it’s not reliable. Calls get dropped, web requests time out, and so on. A reliable system must perform consistently.

This is the logic behind the Erlang VM’s multitasking. In absolute terms, frequent task-switching wastes a little time, making performance sub-optimal. But the benefit is that performance is consistent: small tasks continue completing quickly, while large tasks get processed a little at a time.

Garbage collection works this way, too. Although Erlang’s immutable data means that a lot of garbage is created, it’s divided into tiny heaps across many processes. When a process dies, its memory is freed, and GC isn’t needed. For long-running processes, GC is performed concurrently. There are no “stop the world” pauses to collect garbage from the entire system, so GC is no barrier to consistent performance.

Evidence for Reliability

After all this effort toward building reliable systems, Armstrong tried to find out how well several Erlang-based systems had worked; he interviewed the maintainers, analyzed the source code, and examined bug reports.

Armstrong has elsewhere cited Ericsson’s ADX301 switch as an example of “nine nines” reliability - an uptime of 99.9999999%.

Here’s how he describes the switch:

The ADX301 is designed for “carrier-class” non-stop operation. The system has duplicated hardware which provides hardware redundancy and hardware can be added or removed from the system without interrupting services. The software has to be able to cope with hardware and software failures. Since the system is designed for non-stop operation, it must be possible to change the software without disturbing traffic in the system.

In his thesis, he treats the “nine nines” figure as uncertain:

Evidence for the long-term operational stability of the system had also not been collected in any systematic way. For the Ericsson AXD301 the only information on the long-term stability of the system came from a PowerPoint presentation showing some figures claiming that a major customer had run an 11 node system with a 99.9999999% reliability, though how these figure had been obtained was not documented.

And that particular figure has been the subject of some debate.

However, Armstrong’s investigations seemed to indicate that the Erlang systems he examined were indeed extremely reliable:

The software systems in the case study are so reliable that the people operating these systems are inclined to think that they are error-free. This is not the case, indeed software errors did occur at run-time but these errors were quickly corrected so that nobody ever noticed that the errors had occurred.

Performance After All

So here’s a recap of what the Erlang VM gives us in pursuit of reliability:

  • Lightweight, shared-nothing processes that communicate via asynchronous messages
  • A built-in mechanism for processes to react to failures in other processes
  • The ability to quickly spawn a large number of processes and run them on multiple machines
  • Efficient context-switching and concurrent garbage collection

No wonder Armstrong calls Erlang a “concurrency-oriented programming language”. These features, created mainly for reliability, make it easy to write concurrent programs that scale horizontally across cores and computers. You write your program using processes and messages, and the VM takes care of all the tricky parts - running a scheduler per core, giving each process its turn, moving processes between schedulers to balance throughput and power consumption, and so on.

Having tiny processes makes this easier. Armstrong once described the ease of scheduling small Erlang processes vs larger operating system processes:

Packing huge big rocks into containers is very very difficult, but pouring sand into containers is really easy. If you think of processes like little grains of sand and you think of schedulers like big barrels that you have to fill up, filling your barrels up with sand, you can pack them very nicely, you just pour them in and it will work.

And of course, the Erlang VM is pretty fast. As Armstrong once joked in a conference talk:

You take a program and you want it to go a thousand times faster, just wait ten years, and it goes a thousand times faster. So that’s solved. If you want it a million times faster, you wait twenty years. So that problem’s solved. And that’s why Erlang’s really fast today, because we waited a long time.

Erlang was created in the 1980s for telephones switches that handled “many tens of thousands of people” interacting simultaneously. As Armstrong implied, it rode the wave of ever-increasing chip speeds even as it continued to be optimized, so that its once-acceptable speed became truly formidable.

That wave has passed, and the future of performance is concurrency. To take advantage of it, we have to write concurrent programs. If Armstrong’s thesis is correct, “concurrency-oriented programming” is also the key to reliability.

And it’s awfully nice to get both at the same time.

Permalink

Copyright © 2016, Planet Erlang. No rights reserved.
Planet Erlang is maintained by Proctor.