Carrots, sticks, and making things worse

This blog post originally appeared on the LFI blog but I decided to post it on my own as well.

Every organization has to contend with limits: scarcity of resources, people, attention, or funding, friction from scaling, inertia from previous code bases, or a quickly shifting ecosystem. And of course there are more, like time, quality, effort, or how much can fit in anyone's mind. There are so many ways for things to go wrong; your ongoing success comes in no small part from the people within your system constantly navigating that space, making sacrifice decisions and trading off some things to buy runway elsewhere. From time to time, these come to a head in what we call a goal conflict, where two important attributes clash with each other.

These are not avoidable, and in fact are just assumed to be so in many cases, such as "cheap, fast, and good; pick two". But somehow, when it comes to more specific details of our work, that clarity hides itself or gets obscured by the veil of normative judgments. It is easy after an incident to think of what people could have done differently, of signals they should have listened to, or of consequences they would have foreseen had they just been a little bit more careful.

From this point of view, the idea of reinforcing desired behaviors through incentives, both positive (bonuses, public praise, promotions) and negative (demerits, re-certification, disciplinary reviews) can feel attractive. (Do note here that I am specifically talking of incentives around specific decision-making or performance, rather than broader ones such as wages, perks, overtime or hazard pay, or employment benefits, even though effects may sometimes overlap.)

But this perspective itself is a trap. Hindsight bias—where we overestimate how predictable outcomes were after the fact—and its close relative outcome bias—where knowing the results after the fact tints how we judge the decision made—both serve as good reminders that we should ideally look at decisions as they were being made, with the information known and pressures present then..

This is generally made easier by assuming people were trying to do a good job and get good results; a judgment that seems to make no sense asks of us that we figure out how it seemed reasonable at the time.

Events were likely challenging, resources were limited (including cognitive bandwidth), and context was probably uncertain. If you were looking for goal conflicts and difficult trade-offs, this is certainly a promising area in which they can be found.

Taking people's desire for good outcomes for granted forces you to shift your perspective. It demands you move away from thinking that somehow more pressure toward succeeding would help. It makes you ask what aid could be given to navigate the situation better, how the context could be changed for the trade-offs to be negotiated differently next time around. It lets us move away from wondering how we can prevent mistakes and move toward how we could better support our participants.

Hell, the idea of rewarding desired behavior feels enticing even in cases where your review process does not fall into the traps mentioned here, where you take a more just approach.

But the core idea here is that you can't really expect different outcomes if the pressures and goals that gave them rise don't change either.

During incidents, priorities in play already are things like "I've got to fix this to keep this business alive", stabilizing the system to prevent large cascades, or trying to prevent harm to users or customers. They come with stress, adrenalin, and sometimes a sense of panic or shock. These are likely to rank higher in the minds of people than “what’s my bonus gonna be?” or “am I losing a gift card or some plaque if I fail?”

Adding incentives, whether positive or negative, does not clarify the situation. It does not address goal conflicts. It adds more variables to the equation, complexifies the situation, and likely makes it more challenging.

Chances are that people will make the same decisions they would have made (and have been making continuously) in the past, obtaining the desired outcomes. Instead, they’ll change what they report later in subtle ways, by either tweaking or hiding information to protect themselves, or by gradually losing trust in the process you've put in place. These effects can be amplified when teams are given hard-to-meet abstract targets such as lowering incident counts, which can actively interfere with incident response by creating new decision points in people's mental flows. If responders have to discuss and classify the nature of an incident to fit an accounting system unrelated to solving it right now, their response is likely made slower, more challenging.

This is not to say all attempts at structure and classification would hinder proper response, though. Clarifying the critical elements to salvage first, creating cues and language for patterns that will be encountered, and agreeing on strategies that support effective coordination across participants can all be really productive. It needs to be done with a deeper understanding of how your incident response actually works, and that sometimes means unpleasant feedback about how people perceive your priorities.

I've been in reviews where people stated things like "we know that we get yelled at more for delivering features late than broken code so we just shipped broken code since we were out of time", or who admitted ignoring execs who made a habit of coming down from above to scold employees into fixing things they were pressured into doing anyway. These can be hurtful for an organization to consider, but they are nevertheless a real part of how people deal with exceptional situations.

By trying to properly understand the challenges, by clarifying the goal conflicts that arise in systems and result in sometimes frustrating trade-offs, and by making learning from these experiences an objective of its own, we can hopefully make things a bit better. Grounding our interventions within a richer, more naturalistic understanding of incident response and all its challenges is a small—albeit a critical one—part of it all.

Permalink

Erlang/OTP 27.1 Release

OTP 27.1

Erlang/OTP 27.1 is the first maintenance patch package for OTP 27, with mostly bug fixes as well as improvements.

Highlights

  • The zip module has been updated with support for:

    • zip64 archives - Archives larger than 4GB or with more than 2^32 entries.
    • extended timestamps - Higher resolution and in UTC.
    • UID/GID - Save and extract the original UID/GID.
    • Fixes so that permission mode attributes are correctly read and set for files in archives.
    • zip:list_dir/2 now also returns directories, not only files. (You can disable this behaviour by using the option skip_directories).
  • All releases now have .zip versions of the Windows installer that can be used to install Erlang/OTP on Windows when you do not have administrator priviliges. You can download it from erlang.org/downloads or from https://github.com/erlang/otp/releases.

    Various bugs in the original implementation have also been fixed, such as:

    • Correctly encoding and decoding the DOS timestamps for entries within an archive (that is the non-extended timestamp).
    • Fixed DOS timestamps to be set to localtime instead of UTC (use extended timestamps for UTC timestamps).
    • Used the unix file attributes read from disk when creating archives instead of setting everything to 644.

Potential incompatibilities:

  • Progress reports for a dynamically started supervisor will now be logged at debug level.

For details about bugfixes and potential incompatibilities see the Erlang 27.1 README

The Erlang/OTP source can also be found at GitHub on the official Erlang repository, https://github.com/erlang/otp

Download links for this and previous versions are found here

Permalink

My Blog Engine is the Erlang Build Tool

From time to time, people ask me what I use to power my blog, maybe because they like the minimalist form it has. I tell them it’s a bad idea and that I use the Erlang compiler infrastructure for it, and they agree to look elsewhere.

After launching my notes section, I had to fully clean up my engine. I thought I could write about how it works because it’s fairly unique and interesting, even if you should probably not use it.

The Requirements

I first started my blog 14 years ago. It had roughly the same structure as it does at the time of writing this: a list of links and text with nothing else. It did poorly with mobile (which was still sort of new but I should really work to improve these days), but okay with screen readers. It’s gotta be minimal enough to load fast on old devices.

There’s absolutely nothing dynamic on here. No JavaScript, no comments, no tracking, and I’m pretty sure I’ve disabled most logging and retention.

I write into a void, either transcribing talks or putting down rants I’ve repeated 2-3 times to other people so it becomes faster to just link things in the future. I mostly don’t know what gets read or not, but over time I found this kept the experience better for me than chasing readers or views.

Basically, a static site is the best technology for me, but from time to time it’s nice to be able to update the layout, add some features (like syntax highlighting or an RSS feed) so it needs to be better than flat HTML files.

Internally it runs with erlydtl, an Erlang implementation of Django Templates, which I really liked a decade and a half ago. It supports template inheritance, which is really neat to minimize files I have to edit. All I have is a bunch of files containing my posts, a few of these templates, and a little bit of Rebar3 config tying them together.

There are some features that erlydtl doesn’t support but that I wanted anyway, notably syntax highlighting (without JavaScript), markdown support, and including subsections of HTML files (a weird corner case to support RSS feeds without powering them with a database).

The feature I want to discuss here is “only rebuild what you strictly need to,” which I covered by using the Rebar3 compiler.

Rebar3’s Compiler

Rebar3 is the Erlang community’s build tool, which Tristan and I launched over 10 years ago, a follower to the classic rebar 2.x script.

A funny requirement for Rebar3 is that Erlang has multiple compilers: one for Erlang, but also one for MIB files (for SNMP), the Leex syntax analyzer generator, and the Yecc parser generator. It also is plugin-friendly in order to compile Elixir modules, and other BEAM languages, like LFE, or very early versions of Gleam.

We needed to support at least four compilers out of the box, and to properly track everything such that we only rebuild what we must. This is done using a Directed Acyclic Graph (DAG) to track all files, including build artifacts.

The Rebar3 compiler infrastructure works by breaking up the flow of compilation in a generic and specific subset.

The specific subset will:

  1. Define which file types and paths must be considered by the compiler.
  2. Define which files are dependencies of other files.
  3. Be given a graph of all files and their artifacts with their last modified times (and metadata), and specify which of them need rebuilding.
  4. Compile individual files and provide metadata to track the artifacts.

The generic subset will:

  1. Scan files and update their timestamps in a graph for the last modifications.
  2. Use the dependency information to complete the dependency graph.
  3. Propagate the timestamps of source files modifications transitively through the graph (assume you update header A, included by header B, applied by macro C, on file D; then B, C, and D are all marked as modified as recently as A in the DAG).
  4. Pass this updated graph to the specific part to get a list of files to build (usually by comparing which source files are newer than their artifacts, but also if build options changed).
  5. Schedule sequential or parallel compilation based on what the specific part specified.
  6. Update the DAG with the artifacts and build metadata, and persist the data to disk.

In short, you build a compiler plugin that can name directories, file extensions, dependencies, and can compare timestamps and metadata. Then make sure this plugin can compile individual files, and the rest is handled for you.

The blog engine

Since I’m currently the most active Rebar3 maintainer, I’ve definitely got to maintain the compiler infrastructure described earlier.

Since my blog needed to rebuild the fewest static files possible and I already used a template compiler, plugging it into Rebar3 became the solution demanding the least effort.

It requires a few hundred lines of code to write the plugin and a bit of config looking like this:

{blog3r,[{vars,[{url,[{base,"https://ferd.ca/"},{notes,"https://ferd.ca/notes/"},{img,"https://ferd.ca/static/img/"},...]},%% Main site{index,#{template=>"index.tpl",out=>"index.html",section=>main}},{index,#{template=>"rss.tpl",out=>"feed.rss",section=>main}},%% Notes section{index,#{template=>"index-notes.tpl",out=>"notes/index.html",section=>notes}},{index,#{template=>"rss-notes.tpl",out=>"notes/feed.rss",section=>notes}},%% All sections' pages.{sections,#{main=>{"posts/","./",[{"Mon, 02 Sep 2024 11:00:00 EDT","My Blog Engine is the Erlang Build Tool","blog-engine-erlang-build-tool.md.tpl"},{"Thu, 30 May 2024 15:00:00 EDT","The Review Is the Action Item","the-review-is-the-action-item.md.tpl"},{"Tue, 19 Mar 2024 11:00:00 EDT","A Commentary on Defining Observability","a-commentary-on-defining-observability.md.tpl"},{"Wed, 07 Feb 2024 19:00:00 EST","A Distributed Systems Reading List","distsys-reading-list.md.tpl"},...]},notes=>{"notes/","notes/",[{"Fri, 16 Aug 2024 10:30:00 EDT","Paper: Psychological Safety: The History, Renaissance, and Future of an Interpersonal Construct","papers/psychological-safety-interpersonal-construct.md.tpl"},{"Fri, 02 Aug 2024 09:30:00 EDT","Atomic Accidents and Uneven Blame","atomic-accidents-and-uneven-blame.md.tpl"},{"Sat, 27 Jul 2024 12:00:00 EDT","Paper: Moral Crumple Zones","papers/moral-crumple-zones.md.tpl"},{"Tue, 16 Jul 2024 19:00:00 EDT","Hutchins' Distributed Cognition in The Wild","hutchins-distributed-cognition-in-the-wild.md.tpl"},...]}}}]}.

And blog post entry files like this:

{% extends "base.tpl" %}
{% block content %}

<p>I like cats. I like food. <br />
   I don't especially like catfood though.</p>

{% markdown %}
### Have a subtitle

And then _all sorts_ of content!

- lists
- other lists
- [links]({{ url.base }}section/page))
- and whatever fits a demo

> Have a quote to close this out

{% endmarkdown %}
{% endblock %}

These call to a parent template (see base.tpl for the structure) to inject their content.

The whole site gets generated that way. Even compiler error messages are lifted from the Rebar3 libraries (although I haven't wrapped everything perfectly yet), with the following coming up when I forgot to close an if tag before closing a for loop:

$ rebar3 compile
===> Verifying dependencies...
===> Analyzing applications...
===> Compiling ferd_ca
===> template error:
    ┌─ /home/ferd/code/ferd-ca/templates/rss.tpl:
    │
 24 │      {% endfor %}
    │         ╰── syntax error before: "endfor"

===> Compiling templates/rss.tpl failed

As you can see, I build my blog by calling rebar3 compile, the same command as I do for any Erlang project.

I find it interesting that on one hand, this is pretty much the best design possible for me given that it represents almost no new code, no new tools, and no new costs. It’s quite optimal. On the other hand, it’s possibly the worst possible tool chain imaginable for a blog engine for almost anybody else.

Permalink

Typing lists and tuples in Elixir

We have been working on a type system for the Elixir programming language. The type system provides sound gradual typing: it can safely interface static and dynamic code, and if the program type checks, it will not produce type errors at runtime.

It is important to emphasize type errors. The type systems used at scale today do not guarantee the absense of any runtime errors, but only typing ones. Many programming languages error when accessing the “head” of an empty list, most languages raise on division by zero or when computing the logarithm of negative numbers on a real domain, and others may fail to allocate memory or when a number overflows/underflows.

Language designers and maintainers must outline the boundaries of what can be represented as typing errors and how that impacts the design of libraries. The goal of this article is to highlight some of these decisions in the context of lists and tuples in Elixir’s on-going type system work.

In this article, the words “raise” and “exceptions” describe something unexpected happened, and not a mechanism for control-flow. Other programming languages may call them “panics” or “faults”.

The head of a list

Imagine you are designing a programming language and you want to provide a head function, which returns the head - the first element - of a list, you may consider three options.

The first option, the one found in many programming languages, is to raise if an empty list is given. Its implementation in Elixir would be something akin to:

$ list(a) -> a
def head([head | _]), do: head
def head([]), do: raise "empty list"

Because the type system cannot differentiate between an empty list and a non-empty list, you won’t find any typing violations at compile-time, but an error is raised at runtime for empty lists.

An alternative would be to return an option type, properly encoding that the function may fail (or not):

$ list(a) -> option(a)
def head([head | _]), do: {:ok, head}
def head([]), do: :none

This approach may be a bit redundant. Returning an option type basically forces the caller to pattern match on the returned option. While many programming languages provide functions to compose option values, one may also get rid of the additional wrapping and directly pattern match on the list instead. So instead of:

case head(list) do
  {:ok, head} -> # there is a head
  :none -> # do what you need to do
end

You could just write:

case list do
  [head | _] -> # there is a head
  [] -> # do what you need to do
end

Both examples above are limited by the fact the type system cannot distinguish between empty and non-empty lists and therefore their handling must happen at runtime. If we get rid of this limitations, we could define head as follows:

$ non_empty_list(a) -> a
def head([head | _]), do: head

And now we get a typing violation at compile-time if an empty list is given as argument. There is no option tagging and no runtime exceptions. Win-win?

The trouble with the above is that now it is responsibility of the language users to prove the list is not empty. For example, imagine this code:

list = convert_json_array_to_elixir_list(json_array_as_string)
head(list)

In the example above, since convert_json_array_to_elixir_list may return an empty list, there is a typing violation at compile-time. To resolve it, we need to prove the result of convert_json_array_to_elixir_list is not an empty list before calling head:

list = convert_json_array_to_elixir_list(json_array_as_string)

if list == [] do
  raise "empty list"
end

head(list)

But, at this point, we might as well just use pattern matching and once again get rid of head:

case convert_json_array_to_elixir_list(json_array_as_string) do
  [head | _] -> # there is a head
  [] -> # do what you need to do
end

Most people would expect that encoding more information into the type system would bring only benefits but there is a tension here: the more you encode into types, the more you might have to prove in your programs.

While different developers will prefer certain idioms over others, I am not convinced there is one clearly superior approach here. Having head raise a runtime error may be the most pragmatic approach if the developer expects the list to be non-empty in the first place. Returning option gets rid of the exception by forcing users to explicitly handle the result, but leads to more boilerplate compared to pattern matching, especially if the user does not expect empty lists. And, finally, adding precise types means there could be more for developers to prove.

What about Elixir?

Thanks to set-theoretic types, we will most likely distinguish between empty lists and non-empty lists in Elixir’s type system, since pattern matching on them is a common language idiom. Furthermore, several functions in Elixir, such as String.split/2 are guaranteed to return non-empty lists, which can then be nicely encoded into a function’s return type.

Elixir also has the functions hd (for head) and tl (for tail) inherited from Erlang, which are valid guards. They only accept non-empty lists as arguments, which will now be enforced by the type system too.

This covers almost all use cases but one: what happens if you want to access the first element of a list, which has not been proven to be empty? You could use pattern matching and conditionals for those cases, but as seen above, this can lead to common boilerplate such as:

if list == [] do
  raise "unexpected empty list"
end

Luckily, it is common in Elixir to use the ! suffix to encode the possibility of runtime errors for valid inputs. For these circumstances, we may introduce List.first! (and potentially List.drop_first! for the tail variant).

Accessing tuples

Now that we have discussed lists, we can talk about tuples. In a way, tuples are more challenging than lists for two reasons:

  1. A list is a collection where all elements have the same type (be it a list(integer()) or list(integer() or float())), while tuples carry the types of each element

  2. We natively access tuples by index, instead of its head and tail, such elem(tuple, 0)

In the upcoming v1.18 release, Elixir’s new type system will support tuple types, and they are written between curly brackets. For example, the File.read/1 function would have the return type {:ok, binary()} or {:error, posix()}, quite similar to today’s typespecs.

The tuple type can also specify a minimum size, as you can also write: {atom(), integer(), ...} . This means the tuple has at least two elements, the first being an atom() and the second being an integer(). This definition is required for type inference in patterns and guards. After all, a guard is_integer(elem(tuple, 1)) tells you the tuple has at least two elements, with the second one being an integer, but nothing about the other elements and the tuple overall size.

With tuples support merged into main, we need to answer questions such as which kind of compile-time warnings and runtime exceptions tuple operations, such as elem(tuple, index) may emit. Today, we know that it raises if:

  1. the index is out of bounds, as in elem({:ok, "hello"}, 3)

  2. the index is negative, as in elem({:ok, 123}, -1)

When typing elem(tuple, index), one option is to use “avoid all runtime errors” as our guiding light and make elem return option types, such as: {:ok, value} or :none. This makes sense for an out of bounds error, but should it also return :none if the index is negative? One could argue that they are both out of bounds. On the other hand, a positive index may be correct depending on the tuple size but a negative index is always invalid. From this perspective, encoding an always invalid value as an :none can be detrimental to the developer experience, hiding logical bugs instead of (loudly) blowing up.

Another option is to make these programs invalid. If we completely remove elem/2 from the language and you can only access tuples via pattern matching (or by adding a literal notation such as tuple.0), then all possible bugs can be caught by the type checker. However, some data structures, such as array in Erlang rely on dynamic tuple access, and implementing those would be no longer possible.

Yet another option is to encode integers themselves as values in the type system. In the same way that Elixir’s type system supports the values :ok and :error as types, we could support each integer, such as 13 and -42 as types as well (or specific subsets, such as neg_integer(), zero() and pos_integer()). This way, the type system would know the possible values of index during type checking, allowing us to pass complex expressions to elem(tuple, index), and emit typing errors if the indexes are invalid. However, remember that encoding more information into types may force developers to also prove that those indexes are within bounds in many other cases.

Once again, there are different trade-offs, and we must select one that best fit into Elixir use and semantics today.

What about Elixir?

The approach we are taking in Elixir is two-fold:

  • If the index is a literal integer, it will perform an exact access on the tuple element. This means elem(tuple, 1) will work if we can prove the tuple has at least size 2, otherwise you will have a type error

  • If the index is not a literal integer, the function will fallback to a dynamic type signature

Let’s expand on the second point.

At a fundamental level, we could describe elem with the type signature of tuple(a), integer() -> a. However, the trouble with this signature is that it does not tell the type system (nor users) the possibility of a runtime error. Luckily, because Elixir will offer a gradual type system, we could encode the type signature as dynamic({...a}), integer() -> dynamic(a). By encoding the argument and return type as dynamic, developers who want a fully static program will be notified of a typing error, while existing developers who rely on dynamic features of the language can continue to do so, and those choices are now encoded into the types.

Overall,

  • For static programs (the ones that do not use the dynamic() type), elem/2 will validate that the first argument is a tuple of known shape, and the second argument is a literal integer which is greater than or equal to zero and less than the tuple size. This guarantees no runtime exceptions.

  • Gradual programs will have the same semantics (and runtime exceptions) as today.

Summary

I hope this article outlines some of the design decisions as we bring a gradual type system to Elixir. Although supporting tuples and lists is a “table stakes” feature in most type systems, bringing them to Elixir was an opportunity to understand how the type system will interact with several language idioms, as well as provide a foundation for future decisions. The most important take aways are:

  1. Type safety is a commitment from both sides. If you want your type system to find even more bugs through more precise types, you will need to prove more frequently that your programs are free of certain typing violations.

  2. Given not everything will be encoded as types, exceptions are important. Even in the presence of option types, it would not be beneficial for developers if elem(tuple, index) returned :none for negative indexes.

  3. Elixir’s convention of using the suffix ! to encode the possibility of runtime exceptions for a valid domain (the input types) nicely complements the type system, as it can help static programs avoid the boilerplate of converting :none/:error into exceptions for unexpected scenarios.

  4. Using dynamic() in function signatures is a mechanism available in Elixir’s type system to signal that a function has dynamic behaviour and may raise runtime errors, allowing violations to be reported on programs that wish to remain fully static. Similar to how other static languages provide dynamic behaviour via Any or Dynamic types.

The type system was made possible thanks to a partnership between CNRS and Remote. The development work is currently sponsored by Fresha (they are hiring!), Starfish*, and Dashbit.

Happy typing!

Permalink

Announcing the official Elixir Language Server team

I am glad to welcome Elixir’s official Language Server team, formed by (in alphabetical order):

The team will work on the code intelligence infrastructure to be used across tools and editors. These efforts are partially funded by Fly.io and Livebook.

A brief history

The Language Server Protocol (LSP) was created by Microsoft as a protocol between IDEs and programming languages to provide language intelligence tools.

The first implementation of said protocol for Elixir was started by Jake Becker, back in 2017, alongside an implementation for Visual Studio Code, and it relied on the ElixirSense project from Marlus Saraiva to extract and provide some of the language intelligence.

As the Language Server Protocol adoption grew as a whole, so did the usage of Elixir’s implementation, which eventually became the main mechanism Elixir users interacted with the language from their editors.

Eventually, Elixir’s language server implementation got its own organization on GitHub, and maintenance reins were given to Łukasz Samson and Jason Axelson.

Over time, the Elixir Language Server has accrued technical debt. Some of it exists due to intrinsic complexities (for example, the Language Server Protocol uses UTF-16 for text encoding, instead of the more widely used UTF-8), while others are a consequence of working on codebase while both the programming language and the protocol themselves were still evolving.

This led to Mitch Hanberg and Steve Cohen to create alternative language server implementations, exploring different trade-offs.

For example, both Next LS and Lexical use Erlang Distribution to isolate the Language Server runtime from the user code.

Next LS also focused on extracting the LSP protocol parts into GenLSP (which can be used by anyone to easily create a language server), single binary distribution with Burrito, and experimenting with SQLite for the symbol index.

Lexical concerned itself with speed and abstractions to deal with documents, ranges, and more.

This means the Elixir community had, for some time, three distinct language server implementations, each with their own strengths.

Looking forward

The current language server maintainers have agreed to move forward with a single Language Server Protocol project, relying on the strengths of each implementation:

  • Lexical provides a stable foundation
  • ElixirLS, through ElixirSense, provides the most complete implementation and wider functionality
  • Next LS, through GenLSP, provides a general interface for LSP implementations and straight-forward packaging via Burrito

The above is a rough outline, as the specific details of how the projects will move forward are still being discussed. While some of the team members also maintain direct integration with some editors, we will continue relying on the community’s help and efforts to get full coverage across all available editors.

And there is still a lot more to do!

Many underestimate the complexity behind implementing the Language Server Protocol. That’s not surprising: we mostly interact with it from an editor, allowing us to freely ignore what makes it tick.

In practice, the Language Server needs, in many ways, to reimplement several parts of the language and its compiler.

If the Elixir compiler sees the code some_value +, it can immediately warn and say: “this expression is incomplete”. However, the Language Server still needs to make sense of invalid code to provide features like completion. And that applies to everything: missing do-end blocks, invalid operators, invoking macros that do not exist, etc. Mitch has made Spitfire, an error tolerant parser to tackle this particular problem.

Some ecosystems have undertaken multi-year efforts to redesign their compilers and toolchains to provide better tools for lexical and semantic code analysis (which most likely took a significant investment of time and resources to conclude). That’s to say some of the problems faced by Language Server implementations will be best tackled if they are also solved as part of Elixir itself.

For example, every Language Server implementation compiles their own version of a project, making it so every application and its dependencies have to be compiled twice in development: once for Mix and once for the Language Server. Wouldn’t it be nice if Elixir and the Language Servers could all rely on the same compilation artifacts?

This is not news to the Elixir team either: almost every Elixir release within the last 3 years has shipped new code analysis APIs, such as Code.Fragment, with the goal of removing duplication across Language Servers, IEx, and Livebook, as well as reduce their reliance on internal Elixir modules. Most recently, Elixir v1.17 shipped with new APIs to help developers emulate the compiler behaviour. Our goal is to make these building blocks available for all Elixir developers, so their benefits are reaped beyond the language server tooling.

Furthermore, as set-theoretic types make their way into Elixir, we also want to provide official APIs to integrate them into our tools.

Sponsorships

Currently, Fly.io is sponsoring Łukasz Samson to work part-time on the Language Server and editor integration. The Livebook project is donating development time from Jonatan Kłosko, creator of Livebook, to improve the Elixir compiler and its code intelligence APIs.

We are grateful to both companies for investing into the community and you should check them out.

As mentioned above, Language Server implementations are complex projects, and unifying efforts is an important step in the right direction. However, we also need community help, and one of the ways to do so is by sponsoring the developers making this possible:

Companies who can afford to sponsor part-time development are welcome to reach out and help us achieve this important milestone.

Progress updates

A new project website and social media accounts will be created soon, and you can follow them to stay up to date with our progress and any interesting developments.

The name of the new project is still in the works as well as many of the decisions we’ll need to make, so please have patience!

In the meantime, you can continue to use the language server of your choice, and we’ll be sure to make the transition to the fourth and final project as smooth as possible.

Thank you!

Permalink

Elixir v1.17 released: set-theoretic data types, calendar durations, and Erlang/OTP 27 support

Elixir v1.17 has just been released. 🎉

This release introduces set-theoretic types into a handful of language constructs. While there are still many steps ahead of us, this important milestone already brings benefits to developers in the form of new warnings for common mistakes. This new version also adds support for Erlang/OTP 27, the latest and greatest Erlang release. You’ll also find a new calendar-related data type (Duration) and a Date.shift/2 function.

Let’s dive in.

Warnings from gradual set-theoretic types

This release introduces gradual set-theoretic types to infer types from patterns and use them to type check programs, enabling the Elixir compiler to find faults and bugs in codebases without requiring changes to existing software. The underlying principles, theory, and roadmap of our work have been outlined in “The Design Principles of the Elixir Type System” by Giuseppe Castagna, Guillaume Duboc, José Valim.

At the moment, Elixir developers will interact with set-theoretic types only through warnings found by the type system. The current implementation models all data types in the language:

  • binary(), integer(), float(), pid(), port(), reference() - these types are indivisible. This means both 1 and 13 get the same integer() type.

  • atom() - it represents all atoms and it is divisible. For instance, the atom :foo and :hello_world are also valid (distinct) types.

  • map() and structs - maps can be “closed” or “open”. Closed maps only allow the specified keys, such as %{key: atom(), value: integer()}. Open maps support any other keys in addition to the ones listed and their definition starts with ..., such as %{..., key: atom(), value: integer()}. Structs are closed maps with the __struct__ key.

  • tuple(), list(), and function() - currently they are modelled as indivisible types. The next Elixir versions will also introduce fine-grained support to them.

We focused on atoms and maps on this initial release as they are respectively the simplest and the most complex types representations, so we can stress the performance of the type system and quality of error messages. Modelling these types will also provide the most immediate benefits to Elixir developers. Assuming there is a variable named user, holding a %User{} struct with a address field, Elixir v1.17 will emit the following warnings at compile-time:

  • Pattern matching against a map or a struct that does not have the given key, such as %{adress: ...} = user (notice address vs adress).

  • Accessing a key on a map or a struct that does not have the given key, such as user.adress.

  • Invoking a function on non-modules, such as user.address().

  • Capturing a function on non-modules, such as &user.address/0.

  • Attempting to call an anonymous function without an actual function, such as user.().

  • Performing structural comparisons between structs, such as my_date < ~D[2010-04-17].

  • Performing structural comparisons between non-overlapping types, such as integer >= string.

  • Building and pattern matching on binaries without the relevant specifiers, such as <<name>> (this warns because by default it expects an integer, it should have been <<name::binary>> instead).

  • Attempting to rescue an undefined exception or a struct that is not an exception.

  • Accessing a field that is not defined in a rescued exception.

Here’s an example of how the warning for accessing a misspelled field of a struct looks like:

Example of a warning when accessing a mispelled struct field

Another example, this time it’s a warning for structural comparison across two Date structs:

Example of a warning when comparing two structs with ">"

These warnings also work natively in text editors, as they are standard Elixir compiler warnings:

Example of a type warning inline in an editor

These new warnings will help Elixir developers find bugs earlier and give more confidence when refactoring code, especially around maps and structs. While Elixir already emitted some of these warnings in the past, those were discovered using syntax analysis. The new warnings are more reliable, precise, and with better error messages. Keep in mind, however, that the Elixir typechecker only infers types from patterns within the same function at the moment. Analysis from guards and across function boundaries will be added in future releases. For more details, see our new reference document on gradual set-theoretic types.

The type system was made possible thanks to a partnership between CNRS and Remote. The development work is currently sponsored by Fresha (they are hiring!), Starfish*, and Dashbit.

Erlang/OTP support

This release adds support for Erlang/OTP 27 and drops support for Erlang/OTP 24. We recommend Elixir developers to migrate to Erlang/OTP 26 or later, especially on Windows. Support for WERL (a graphical user interface for the Erlang terminal on Windows) will be removed in Elixir v1.18.

You can read more about Erlang/OTP 27 in their release announcement. The bits that are particularly interesting for Elixir developers are the addition of a json module and process labels (proc_lib:set_label/1). The latter will also be available in this Elixir release as Process.set_label/1.

New Duration data type and shifting functions

This Elixir version introduces the Duration data type and APIs to shift dates, times, and date times by a given duration, considering different calendars and time zones.

iex> Date.shift(~D[2016-01-31], month: 2)
~D[2016-03-31]

We chose the name “shift” for this operation (instead of “add”) since working with durations does not obey properties such as associativity. For instance, adding one month and then one month does not give the same result as adding two months:

iex> ~D[2016-01-31] |> Date.shift(month: 1) |> Date.shift(month: 1)
~D[2016-03-29]

Still, durations are essential for building intervals, recurring events, and modelling scheduling complexities found in the world around us. For DateTimes, Elixir will correctly deal with time zone changes (such as Daylight Saving Time). However, provisions are also available in case you want to surface conflicts, such as shifting to a wall clock that does not exist, because the clock has been moved forward by one hour. See DateTime.shift/2 for examples.

Finally, we added a new Kernel.to_timeout/1 function, which helps developers normalize durations and integers to a timeout used by many APIs—like Process, GenServer, and more. For example, to send a message after one hour, you can now write:

Process.send_after(pid, :wake_up, to_timeout(hour: 1))

Learn more

Here are other notable changes in this release:

  • There are new Keyword.intersect/2,3 functions to mirror the equivalent in the Map module.

  • A new Mix profiler was added, mix profile.tprof, which lets you use the new tprof profiler released with Erlang/OTP 27. This profiler leads to the soft-deprecation of mix profile.cprof and mix profile.eprof.

  • We added Kernel.is_non_struct_map/1, a new guard to help with the common pitfall of matching on %{}, which also successfully matches structs (as they are maps underneath).

  • Elixir’s Logger now formats gen_statem reports and includes Erlang/OTP 27 process labels in logger events.

For a complete list of all changes, see the full release notes.

Check the Install section to get Elixir installed and read our Getting Started guide to learn more.

Happy learning!

Permalink

The Review Is the Action Item

I like to consider running an incident review to be its own action item. Other follow-ups emerging from it are a plus, but the point is to learn from incidents, and the review gives room for that to happen.

This is not surprising advice if you’ve read material from the LFI community and related disciplines. However, there are specific perspectives required to make this work, and some assumptions necessary for it, without which things can break down.

How can it work?

In a more traditional view, the system is believed to be stable, then disrupted into an incident. The system gets stabilized, and we must look for weaknesses that can be removed or barriers that could be added in order to prevent such disruption in the future.

Other perspectives for systems include views where they are never truly stable. Things change constantly; uncertainty is normal. Under that lens, systems can’t be forced into stability by control or authority. They can be influenced and adapt on an ongoing basis, and possibly kept in balance through constant effort.

Once you adopt a socio-technical perspective, the hard-to-model nature of humans becomes a desirable trait to cope with chaos. Rather than a messy variable to stamp out, you’ll want to give them more tools and ways to keep all the moving parts of the subsystems going.

There, an incident review becomes an arena where misalignment in objectives can be repaired, where strategies and tactics can be discussed, where mental models can be corrected and enriched, where voices can be heard when they wouldn’t be, and where we are free to reflect on the messy reality that drove us here.

This is valuable work, and establishing an environment where it takes place is a key action item on its own.

People who want to keep things working will jump on this opportunity if they see any value in it. Rather than giving them tickets to work on, we’re giving them a safe context to surface and discuss useful information. They’ll carry that information with them in the future, and it may influence the decisions they make, here and elsewhere.

If the stories that come out of reviews are good enough, they will be retold to others, and the organization will have learned something.

That belief people will do better over time as they learn, to me, tends to be worth more than focusing on making room for a few tickets in the backlog.

How can it break down?

One of the unnamed assumptions with this whole approach is that teams should have the ability to influence their own roadmap and choose some of their own work.

A staunchly top-down organization may leverage incident reviews as a way to let people change the established course with a high priority. That use of incident reviews can’t be denied in these contexts.

We want to give people the information and the perspectives they need to come up with fixes that are effective. Good reviews with action items ought to make sense, particularly in these orgs where most of the work is normally driven by folks outside of the engineering teams.

But if the maintainers do not have the opportunity to schedule work they think needs doing outside of the aftermath of an incident—work that is by definition reactive—then they have no real power to schedule preventive work on their own.

And so that’s a place where learning being its own purpose breaks down: when the learnings can’t be applied.

Maybe it feels like “good” reviews focused on learning apply to a surprisingly narrow set of teams then, because most teams don’t have that much control. The question here really boils down to “who is it that can apply things they learned, and when?”

If the answer is “almost no one, and only when things explode,” that’s maybe a good lesson already. That’s maybe where you’d want to start remediating.

Note that even this perspective is a bit reductionist, which is also another way in which learning reviews may break down. By narrowing knowledge’s utility only to when it gets applied in measurable scheduled work, we stop finding value outside of this context, and eventually stop giving space for it.

It’s easy to forget that we don’t control what people learn. We don’t choose what the takeaways are. Everyone does it for themselves based on their own lived experience. More importantly, we can’t stop people from using the information they learned, whether at work or in their personal life.

Lessons learned can be applied anywhere and any time, and they can become critically useful at unexpected times. Narrowing the scope of your reviews such that they only aim to prevent bad accidents indirectly hinders creating fertile grounds for good surprises as well.

Going for better

While the need for action items is almost always there, a key element of improving incident reviews is to not make corrections the focal point.

Consider the incident review as a preliminary step, the data-gathering stage before writing down the ideas. You’re using recent events as a study of what’s surprising within the system, but also of how it is that things usually work well. Only once that perspective is established does it make sense to start thinking of ways of modifying things.

Try it with only one or two reviews at first. Minor incidents are usually good, because following the methods outlined in docs like the Etsy Debriefing Facilitation Guide and the Howie guide tends to reveal many useful insights in incidents people would have otherwise overlooked as not very interesting.

As you and your teams see value, expand to more and more incidents.

It also helps to set the tone before and during the meetings. I’ve written a set of “ground rules” we use at Honeycomb and that my colleague Lex Neva has transcribed, commented, and published. See if something like that could adequately frame the session..

If abandoning the idea of action items seems irresponsible or impractical to you, keep them. But keep them with some distance; the common tip given by the LFI community is to schedule another meeting after the review to discuss them in isolation. iiii At some point, that follow-up meeting may become disjoint from the reviews. There’s not necessarily a reason why every incident needs a dedicated set of fixes (longer-term changes impacting them could already be in progress, for example), nor is there a reason to wait for an incident to fix things and improve them.

That’s when you decouple understanding from fixing, and the incident review becomes its own sufficient action item.

Permalink

Copyright © 2016, Planet Erlang. No rights reserved.
Planet Erlang is maintained by Proctor.