Web scraping with Elixir


Businesses are investing in data. The big data analytics market is expected to grow to $103 billion (USD) within the next five years. It’s easy to see why, with everyone of us on average generating 1.7 megabytes of data per second.

As the amount of data we create grows, so too does our ability to inherent, interpret and understand it. Taking enormous datasets and generating very specific findings are leading to fantastic progress in all areas of human knowledge, including science, marketing and machine learning.

To do this correctly, we need to ask an important question, how do we prepare datasets that meet the needs of our specific research. When dealing with publicly available data on the web, the answer is web scraping.

What is web scraping?

Wikipedia defines web scraping as:

Web scraping, web harvesting, or web data extraction is data scraping used for extracting data from websites.[1] While web scraping can be done manually by a software user, the term typically refers to automated processes implemented using a bot or web crawler. It is a form of copying, in which specific data is gathered and copied from the web, typically into a central local database or spreadsheet, for later retrieval or analysis. (See: https://en.wikipedia.org/wiki/Web_scraping).

Web scraping with Elixir

Web scraping can be done differently in different languages, for the purposes of this article we’re going to show you how to do it in Elixir, for obvious reasons ;). Specifically, we’re going to show you how to complete web scraping with an Elixir framework called Crawly.

Crawly is an application framework for crawling web sites and extracting structured data which can be used for a wide range of useful applications, like data mining, information processing or historical archival. You can find out more about it on the documentation page.

Getting started

For the purposes of this demonstration, we will build a web scraper to extract the data from Erlang Solutions Blog. We will extract all the blog posts, so they can be analyzed later (let’s assume we’re planning to build some machine learning to identify the most useful articles and authors).

First of all, we will create a new Elixir project:

mix new esl_blog --sup

Now that the project is created, modify the deps` function of the mix.exs file, so it looks like this:

     #Run "mix help deps" to learn about dependencies.

    defp deps do
      {:crawly, "~> 0.1"},

Fetch the dependencies with: mix deps.get, and we’re ready to go! So let’s define our crawling rules with the help of… spiders.

The spider.


Spiders are behaviours which you create an implementation, and which Crawly uses to extract information from a given website. The spider must implement the spider behaviour (it’s required to implement the parse_item/1, init/0`, `base_url/0 callbacks). This is the code for our first spider. Save it in to file called esl.ex under the lib/esl_blog/spiders directory of your project.

defmodule Esl do
  @behaviour Crawly.Spider

  @impl Crawly.Spider
  def base_url() do

  @impl Crawly.Spider
  def init() do
      start_urls: ["https://www.erlang-solutions.com/blog.html"]

  @impl Crawly.Spider
  def parse_item(_response) do
    %Crawly.ParsedItem{:items => [], :requests => []}

Here’s a more detailed run down of the above code:

base_url(): function which returns base_urls for the given Spider, used in order to filter out all irrelevant requests. In our case, we don’t want our crawler to follow links going to social media sites and other partner sites (which are not related to the crawled target).

init(): must return a KW list which contains start_urls list which Crawler will begin to crawl from. Subsequent requests will be generated from these initial urls.

parse_item(): function which will be called to handle response downloaded by Crawly. It must return the Crawly.ParsedItem structure.

Crawling the website

Now it’s time to run our first spider: Start the interactive elixir console on the current project, and run the following command:


You will get the following (or similar) result:

22:37:15.064 [info] Starting the manager for Elixir.Esl

22:37:15.073 [debug] Starting requests storage worker for Elixir.Esl…

22:37:15.454 [debug] Started 4 workers for Elixir.Esl :ok iex(2)> 22:38:15.455 [info] Current crawl speed is: 0 items/min

22:38:15.455 [info] Stopping Esl, itemcount timeout achieved

So what just happened under the hood?

Crawly scheduled the Requests object returned by the init/0 function of the Spider. Upon receiving a response, Crawly used a callback function (parse_item/1) in order to process the given response. In our case, we have not defined any data to be returned by the parse_item/1 callback, so the Crawly worker processes (responsible for downloading requests) had nothing to do, and as a result, the spider is being closed after the timeout is reached. Now it’s time to harvest the real data!

Extracting the data

Now it’s time to implement the real data extraction part. Our parse_item/1 callback is expected to return Requests and Items. Let’s start with the Requests extraction:

Open the Elixir console and execute the following:

`{:ok, response} = Crawly.fetch("https://www.erlang-solutions.com/blog.html")` you will see something like:

   body: "<!DOCTYPE html>\n\n<!--[if IE 9 ]>" <> ...,
   headers: [
     {"Date", "Sat, 08 Jun 2019 20:42:05 GMT"},
     {"Content-Type", "text/html"},
     {"Content-Length", "288809"},
     {"Last-Modified", "Fri, 07 Jun 2019 09:24:07 GMT"},
     {"Accept-Ranges", "bytes"}
   request: %HTTPoison.Request{
     body: "",
     headers: [],
     method: :get,
     options: [],
     params: %{},
     url: "https://www.erlang-solutions.com/blog.html"
   request_url: "https://www.erlang-solutions.com/blog.html",
   status_code: 200

This is our request, as it’s fetched from the web. Now, let’s use Floki in order to extract the data from the request.

First of all, we’re interested in “Read more” links, as they would allow us to navigate to the actual blog pages. In order to understand the ‘Now’, let’s use Floki in order to extract URLs from the given page. In our case, we will extract Read more links.


Using the firebug or any other web developer extension, find the relevant CSS selectors. In my case I have ended up with the following entries: iex(6)> response.body |> Floki.find(“a.more”) |> Floki.attribute(“href”) [“/blog/a-guide-to-tracing-in-elixir.html”, “/blog/blockchain-no-brainer-ownership-in-the-digital-era.html”, “/blog/introducing-telemetry.html”, “/blog/remembering-joe-a-quarter-of-a-century-of-inspiration-and-friendship.html”,

Next, we need to convert the links into requests. Crawly requests is a way to provide some sort of flexibility to a spider. For example, sometimes you have to modify HTTP parameters of the request before sending them to the target website. Crawly is fully asynchronous. Once the requests are scheduled, they are picked up by separate workers and are executed in parallel. This also means that other requests can keep going even if some request fails or an error happens while handling it. In our case, we don’t want to tweak HTTP headers, but it is possible to use the shortcut function: Crawly.Utils.request_from_url/1 Crawly expects absolute URLs in requests. It’s a responsibility of a spider to prepare correct URLs. Now we know how to extract links to actual blog posts, let’s also extract the data from the blog pages. Let’s fetch one of the pages, using the same command as before (but using the URL of the blog post): {:ok, response} = Crawly.fetch(“https://www.erlang-solutions.com/blog/a-guide-to-tracing-in-elixir.html”)

Now, let’s use Floki in order to extract title, text, author, and URL from the blog post: Extract title with:

Floki.find(response.body, "h1:first-child") |> Floki.text

Extract the author with:
  author =
      |> Floki.find("article.blog_post p.subheading")
      |> Floki.text(deep: false, sep: "")
      |> String.trim_leading()
      |> String.trim_trailing()

Extract the text with:

Floki.find(response.body, "article.blog_post") |> Floki.text

Finally, the url does not need to be extracted, as it’s already a part of response! Now it’s time to wire everything together. Let’s update our spider code with all snippets from above:

defmodule Esl do
  @behaviour Crawly.Spider

  @impl Crawly.Spider
  def base_url() do

  @impl Crawly.Spider
  def init() do
      start_urls: ["https://www.erlang-solutions.com/blog.html"]

  @impl Crawly.Spider
  def parse_item(response) do
    # Getting new urls to follow
    urls =
      |> Floki.find("a.more")
      |> Floki.attribute("href")

    # Convert URLs into requests
    requests =
      Enum.map(urls, fn url ->
        |> build_absolute_url(response.request_url)
        |> Crawly.Utils.request_from_url()

    # Extract item from a page, e.g.
    # https://www.erlang-solutions.com/blog/introducing-telemetry.html
    title =
      |> Floki.find("article.blog_post h1:first-child")
      |> Floki.text()

    author =
      |> Floki.find("article.blog_post p.subheading")
      |> Floki.text(deep: false, sep: "")
      |> String.trim_leading()
      |> String.trim_trailing()

    text = Floki.find(response.body, "article.blog_post") |> Floki.text()

      :requests => requests,
      :items => [
        %{title: title, author: author, text: text, url: response.request_url}

  def build_absolute_url(url, request_url) do
    URI.merge(request_url, url) |> to_string()

Running the spider again

If you run the spider, it will output the extracted data with the log:

22:47:06.744 [debug] Scraped "%{author: \"by Lukas Larsson\", text: \"Erlang 19.0 Garbage Collector2016-04-07" <> …

Which indicates that the data has successfully been extracted from all blog pages.

What else?

You’ve seen how to extract and store items from a website using Crawly, but this is just the basic example. Crawly provides a lot of powerful features for making scraping easy and efficient, such as:

  • Flexible request spoofing (for example user-agent rotation)
  • Item validation, using a pipeline approach
  • Filtering already seen requests and items
  • Filtering out all requests which are coming from other domains
  • Robots.txt enforcement
  • Concurrency control
  • HTTP API for controlling crawlers
  • Interactive console, which allows to create and debug spiders

Learn more

Want to know how the tools used in this blog can be used to create a machine learning program in Elixir? Don’t miss out on part two of this blog series, sign up to our newsletter and we’ll send the next blog straight to your inbox.


Which companies are using Elixir, and why? #MyTopdogStatus

How do you choose the right programming language for a project? As a developer or a key-decision maker you could decide based on your personal preference, or what is fashionable at the time, but if you’re not considering real experiences and use cases, you may end up making a decision that becomes difficult to execute, expensive to complete and a nightmare to maintain.

As Francesco mentioned in the Erlang blog of our #MyTopdogStatus series, you could start by asking why a language was created. This will help ensure that you’re choosing a language designed to fix a problem suited to your needs. On a number of occasions, José Valim has stated he wanted to give the power and productivity of Erlang to more developers.

Another great tip for choosing a language is to look under the hood and see what powers it. For that, the previously mentioned Erlang blog gives you a great insight into the OTP and Virtual Machine (VM) that helps make Elixir the powerful language that it is.

Lastly, you can look at case studies to see how other companies have used the language, why they’ve chosen it, and what they were able to achieve. In practice, it’s not easy to find this information, as we still don’t know exactly how many companies use Elixir or their exact use case. Luckily, we can refer to www.elixir-companies.com or openly ask the community. And we hope this blog post will be helpful too!

Simply put, here is a checklist of common benefits that a company gets from using Elixir:

Need to grow or scale - If you have a service that you expect to be used by millions, then you’ll benefit from the reliability and scalability of the BEAM VM.

Handling a lot of incoming requests - Even if you don’t have millions of users, if all of your users make intense use of the system and generate a high volume of requests, you will need a system built for concurrency.

Easy to develop and maintainable - One of the priorities of Erlang and Elixir is keeping the code simple. As a result, one of the advantages most companies will see is ease of usage and fast development for everything from fixing bugs to adding new features.

Now, without further ado, let’s look at who’s using Elixir and how.

Event handling by Pinterest

Have you heard that Pinterest are using Elixir? Well, now you have. They use it to handle the routing of the events generated in their system. Think about all of the actions taking place on Pinterest in a day. How many do you think there are? The answer is around 30 thousand events per second that need to be processed. In addition, they have over 200 million active users.

Keep in mind, 20 years ago, it would have been impressive to achieve 10 thousand concurrent connections. Luckily, the way the BEAM works lets us handle and deal with huge spikes in concurrent connections in a way that would be very difficult to replicate in other technologies.

Elixir is a good example of a language which takes advantage of the BEAM VM, which is why it was a great choice for Pinterest. Not only were they able to handle the demands of their users, but they were also able to significantly reduce the lines of code needed by simplifying it and clustering it. The end result being, they were able to half the number of servers needed, down to just 15, allowing their solution to scale up with simple, maintainable code and lower operating costs. And, it’s better for the environment too.

Faster distributed data by Moz Pro

Companies often look for alternatives to SQL for data storage. In the case of Moz Pro, one of the leading SEO analytics companies in the world, they decided to replace their traditional MySQL databases with a distributed indexing data system built in Elixir. In doing this, they managed to get responses of 50ms, significantly quicker than the previous +800ms.

Elixir’s ability to be distributed in different nodes while keeping transparent communication between the connected nodes makes it easy to create this kind of solution, improving service speed, no matter how much data you need to store.

The BEAM can handle millions of processes per node. That helps to create cache systems, as well get rid of other external solutions like Redis or Memcached. In addition, the BEAM provides elements like DETS, ETS and Mnesia which help to create distributed data systems with ease.

Proxying requests by Lonely Planet

This example shows the ability of an Elixir and the Phoenix Framework to create a project that can handle massive spikes and loads from their millions of users while managing the flow of requests to external sites like booking.com and HostelWorld.

Elixir is a great solution for creating systems to manage a high volume of requests to external resources. It allows you to create a pool of workers to access external APIs and keep back-pressure systems to avoid overwhelming your partner websites, which is an essential part of a successful system of this nature.

In addition, the Phoenix Framework is a very complete system to build extremely complex websites. It makes extensibility, documentation and avoiding boiler-plate issues easy by minimising the lines of code needed. This allows you to focus on the implementation of integrations with third-party platforms.

GraphQL by Financial Times

The Financial Times made fantastic use of the Absinthe framework in Elixir, and Elixir’s meta-programming ability to create DSLs (Domain Specific Languages). Ellis Pritchard, the Senior Software Developer at the time said:

“I found developers picked up Elixir basics very rapidly if there is an existing framework of code to work within at first since there are patterns they can spot and replicate. Good developers love learning new things, and are adept at switching between languages and frameworks; plus the learning curve for Elixir, certainly within an existing code base, seems low”.

They use Elixir for microservices in their system. You can check their GitHub account for more details and take advantage of the elements they share which are battle-tested in production.

Mobility Services by Toyota Connected

Toyota Connected launched its Mobility Service Platform (MSPF) which connects their cars, allowing them to send real-time events. Functional uses for this include sharing information on traffic patterns or driver behaviours.

It is a platform that is expected to have millions of connected vehicles sending events to the cloud regularly. This kind of concurrent traffic makes Elixir an obvious choice for the project. In addition, Toyota Connected features an API that third-parties can use to create apps and website integrations. This allows third parties to take advantage of the data and create a better user experience for their drivers.

A great example of this is the Hui service in Hawaii, which uses the Toyota Connected system to run a keyless car-share service.

Custom Sports News by Bleacher Report

Bleacher Report is a sports news service that serves 1.5 billion pages per month to its users. They offer a customised service to ensure push notifications are relevant to each individual. That in itself is difficult because it means you cannot cache the content between users.

Much like Lonely Planet, Bleacher Report also has to consider how they send requests to third-parties (mainly Google and Apple) to reach their users without limiting the throughput or overwhelming their providers.

When transitioning from Ruby on Rails to Elixir they were able to shift from using 150 servers, down to 5 servers, all while increasing response time for their 200m+ daily push notifications.

Microservices for Games by SquareEnix

The increase in massive-multiplayer online gaming has seen a huge shift in the way games are developed. Now, digital environments must consider the way they can handle millions of concurrent players at once. To handle this, SquareEnix is using Elixir for authenticating players, in-game communication and the CMS (Content Management System).

These are services which are common to several titles because they rely on the same company and are usually a critical point of convergence for the traffic of the system. Elixir helps provide a healthy throughput, dealing with all of the input requests and releasing the session, authentication and profile data from other services without collapsing the system.

Chat for gamers by Discord

Another huge use for Elixir in gaming is the chat systems. Discord, a leading provider of chat for gaming have built their systems in Elixir. Their app can allow huge groups (up-to 300 thousand in theory) to join the same voice call. You can read about how they upgraded to be able to handle 11 million concurrent users here.

ECommerce by PepsiCo

The ECommerce team at PepsiCo told ElixirConf US 2019 “We view Elixir – which is a core part of our software stack – as a nimble and reliable building block for powering critical business solutions.” It is fantastic to see companies of PepsiCo’s size proudly speaking about the benefits of Elixir. We hope to add many more companies of their size to this list over the coming years.

Is it time for you to join the pack?

Leading companies in a variety of industries are already taking advantage of Elixir. They’re doing it to reduce the cost of their physical hardware, improve the reliability of their site when dealing with high volumes of concurrent users, to scale up data while reducing the time to access it, to allow for fault-tolerant third-party integration, even when dealing with high volumes of traffic and to create reliable micro-services, even when dealing with extremely high numbers of request.

Even if the above requirements aren’t needed for your current project, you can still benefit from the future-proofing, ease of use, available documentation and tooling that Elixir offers.

Add your company to the blog

If you’re part of the pack already using Elixir and want to see your company featured to this blog, fill out this form.


Want to see more great Elixir and Erlang success stories? Head to our hub where we’ll be adding videos, case studies, blogs and more.


Erlang Innovation Race: The Winners!

On Saturday 10 million people turned on their tellies to watch the Grand National race, and we wanted to celebrate this beloved British tradition in our own way. Alas, we know little about horses and racing, but we take pride in knowing a lot about technology. We picked some pretty cool Erlang companies and over the weekend we allowed everybody to push them ahead in the race with one click. Rumor has it that our office manager played the bookie and ran the bets in the office, but we can neither confirm nor deny these allegations :P Before we review the winner, take a look at the companies in the race and their use of Erlang:

Klarna: Europe’s leading payment solutions providers. Klarna was valued at $2.25 billion before their U.S launch in 2015. They use Erlang for concurrent, realtime domains, along with Riak for storing data where availability trumps consistency.

Grindr: The world’s largest gay social network, recently announced they geared up their scalable platform to expand from app into lifestyle service. They use Erlang in their core platform and MongooseIM in their chat.

Fermilab: There are very few things cooler or more innovative than a particle accelerator. USA’s particle accelerator uses Erlang extensively in their control systems.

Ericsson: The birthplace of Erlang. Ericsson now supports and maintains it through their OPT unit. The company turned 140 years old last week, and their innovation accomplishments speak for themselves.

Machine Zone: The mobile gaming phenomenon announced yesterday the launch of RTplatform, a cloud platform that, “connects and engages people-devices-data simultaneously in real time…at unparalleled speed, capacity and affordability. They use Erlang extensively, including in their Game of War chat app.

Sqor Sports: This Erlang startup uses Erlang for their backend application and MongooseIM in their chat app. Set out to revolutionise how sports fans interact with athletes, and after taking over the US they are now expanding to Europe, starting with Bayern Munchen FC partnership.

bet365: UK’s leading online betting and gambling company switched to Erlang 3 years ago and never looked back. They have become such fierce Erlang advocates they recently open sourced Erlang tools. (Thank you bet365!)

Aol: Their leading real-time ad exchange platform Marketplace revolutionised the RTB market in 2013. The platform is a merger of Erlang, Java and C++ and serves up to 40 billion auction requests per day.

Whatsapp: The poster child of Erlang companies serves over 1 billion users with a team of 50 engineers. They recently switched to end-to-end encryption, possibly the world’s largest-ever implementation of this standard of encryption in a messaging service.

EE: As of 12 February 2016, EE’s 4G network reaches more than 95% of the UK population, with double speed 4G reaching 80%. Erlang/OTP has been in use at EE for the past 15 years.

Why an “Innovation Race” ? The importance a company gives to innovation is a sure marker of their growth and success, and world’s most successful companies are the most innovative ones. What drives innovation? According to the Boston Consulting Group there are 4 interrelated factors: quickly adopting new technologies; leveraging existing technology platforms to expand into new sectors; lean R&D processes and adopting technology across the entire business instead keeping it only in the IT department. Erlang is not exactly a new technology, but it seems that more and more companies are just discovering it. Initially developed for telecomms, Erlang is finding more and more usage in new sectors such as above and there are excellent use cases for Big Data and the Internet of Things.

For many companies, getting over a certain number of users means that the ability to innovate quickly is lost. Precious time is spent constantly rewriting software so it runs at the scale needed to run the business and getting a new innovation to scale often means that they can’t get it out of the door fast enough. Here’s where Erlang comes in handy, with its support for millions of lightweight processes and its functional paradigm allow companies to build massively concurrent and fault tolerant applications quickly, with profound reduction in the complexity of the code base.

At the end of the day, for any company innovation leadership translates into profit. Two of the world’s hottest startups on the ‘Unicorn Valuation’ list know a thing or two about Erlang: 10% of the 230 billion dollars generated by the top 20 unicorns came from Whatsapp and MachineZone. In addition, many of the 20 companies on the list that are not not using Erlang, often use a component in their stack written in Erlang like: RabbitMQ, Ejabberd or Riak.

And now, here are the winners of our Erlang Innovation Race:

1st place Whatsapp: 36.51% of votes

2nd place Ericsson: 22.22%

3rd place bet365: 12.70%


We can’t wait to see who will run the Erlang race next year!



Clever use of persistent_term

This blog post will go through three different uses of persistent_term that I have used since its release and explain a bit why they work so well with persistent_term.

Global counters

Let’s say you want to have some global counters in your system. For example the number of times an http request has been made. If the system is very busy that counter will be incremented many many times per second by many different processes. Before OTP-22 the best way that I know of to get the best performance is by using a striped ets tables. i.e. something like the code below:

incr(Counter) ->

read(Counter) ->

The code above would make sure that there is very little contention on the ets table as each scheduler will get a separate slot in the table to update. This comes at the cost of more memory usage and that when reading the value you may not get an exact value.

In OTP-22 the same can be achieved by using counters. Counters have built-in support for striping by using the write_concurrency option, so we don’t have to write our own implementation for that. They are also faster and use less memory than ets tables, so lots of wins.

The remaining problem then is finding the reference to the counter. We could put it into ets and then do an ets:lookup_element/3 when updating a counter.

cnt_incr(Counter) ->

cnt_read(Counter) ->

This gives a performance degradation of about 20%, so not really what we want. However, if we place the counter in persistent_term like the code below we get a performance increase by about 140%, which is much more in line with what we wanted.

cnt_pt_incr(Counter) ->

cnt_pt_read(Counter) ->

The reason for this huge difference is because when the counters are placed into persistent_term they are placed there as literals which means that at each increment we not longer have to make a copy of the counters reference. This is good for two reasons:

1) The amount of garbage will decrease. In my benchmarks the amount of garbage generated by cnt_incr is 6 words while both ets_incr and cnt_pt_incr create 3 words.

2) No reference counts have to be modified. What I mean by this is that the counters reference is what is called a magic reference or nif resource. These references work much in the same way as reference counted binaries in that they are not copied when sent to different processes. Instead only a reference count is incremented at copy and then decremented later by the GC. This means that for cnt_incr we actually have 3 counters that are modified for each call. First we increment the reference count on the counter when copying from ets, then we update the actual counter and then eventually we decrement the reference counter. If we use persistent_term, the term is never copied so we don’t have to update any reference counters, instead we just have to update the actual counter.

However, placing the counter in persistent_term is not trouble free. In order to delete or replace the counter reference in persistent_term we have to do a global GC which depending on the system could be very very expensive.

So this method is best to only be used by global persistent counters that will never be deleted.

You can find the code for all the above examples and the benchmark I ran here.

Logger level check

In logger there is a primary logging level that is the first test to be done for each potential log message to be generated. This check can be done many times per second and needs to be very quick. At the moment of writing (OTP-22) logger uses an ets table to keep all its configuration which includes the primary logging level.

This is not really ideal as doing a lookup from the ets table means that we have to take a read-lock to protect against parallel writes to the value. Taking such a read lock is not terribly expensive, but when done thousands of times per second it adds up.

So in this PR I’ve used persistent_term as a cache for the primary logging level. Now when reading the value from the hot path logger will instead use persistent_term. This removes all locks from the hot path and we only need to do a lookup in the persistent_term hash table.

But what if we need to update the primary logger level? Don’t we force a global GC then? No, because the small integer representing the primary logger level is an immediate. This means that the value fits in one machine word and is always copied in its entirety to the calling process. Which in turn means that we don’t have to do a global GC when replacing the value.

When doing this we have to be very careful so that the value does not become a heap value as the cost of doing an update would explode. However, it works great for logger and has reduced the overhead of a ?LOG_INFO call by about 65% when no logging should be done.

Large constant data

We use an internal tool here at the OTP-team called the “ticket tool”. It basically manages all of the OTP-XYZ tickets that you see in the release notes that comes with each release of Erlang/OTP. It is an ancient tool from late 90’s or early 00’s that no one really wants to touch.

One part of it is a server that contains a cache of all the 17000 or so tickets that have been created through the years. In that server there is a single process that has each ticket and its state in order to speed up searching in the tickets. The state of this process is quite large and when it is doing a GC it takes somewhere around 10 seconds for it to finish. This means that about every 10 minutes the server freezes for 10 seconds and we get to experience the joy of being Java programmers for a while.

Being a VM developer I’ve always thought the solution to this problem is to implement either an incremental GC or at least a mark and sweep GC for large heaps. However, the ticket tool server has never been of high enough priority to make me spend a year or two rewriting the GC.

So, two weeks ago I decided to take a look and instead I used persistent_term to move the data from the heap into the literal area instead. This was possible to do because I know that the majority of tickets are only searched and never changed, so they will remain in the literal area forever, while the tickets that do get edited move onto the heap of the ticket server. Basically my code change was this:

handle_info(timeout, State) ->
  erlang:start_timer(60 * 60 * 1000, self(), timeout),

This small change puts the entire gen_server state into the literal area and then any changes done to it will pull the data into the heap. This dropped the GC pauses down to be non-noticeable and took considerable less time to implement than a new GC algorithm.


Personal notes from Elixir Conf 2019

Personal notes from Elixir Conf 2019

Last week I attended Elixir Conf 2019 in Aurora, Colorado. It was my fourth Elixir Conf and by far the one that I engaged more with other people that I've never talked before, the hallway track was great and I could meet again some conference buddies from other editions. Another highlight was the conference hotel, a brand-new resort close to Denver, who knows me well understands how I like anything Colorado so I am glad the next edition will be in the same location.

The conference format was similar to previous years, it was three tracks, two days, with two extra days for an optional training, that I didn’t attend this time. The conference had 4 keynotes, José Valim (Elixir creator), Chris McCord (Phoenix creator) and Justin Schneck (Nerves co-creator), Dockyard team introducing Lumen, and great talks.

All the talks and keynotes are available in the Elixir Conf YouTube channel. Each keynote was focused in some areas, Elixir language and future, LiveView, Nerves and the brand-new Lumen, a project from Dockyard that uses Elixir in the client (browser).

As I always like to take notes when attending conferences and these are my highlights for Elixir Conf 2019. Please be advised that those notes are written like reminders for things I considered relevant during the talks and they are not a summary of them by any means. As the conference had many tracks, of course I couldn't attend all the talks, so feel free to comment with your favorite talk and notes:

Keynote: José Valim

Before anything, thank you very much José for the incredible language, it is a pleasure working full-time with Elixir and share the same community with great people, and I extend that to all Elixir core team as well.

Goal for 2019 is streamline Elixir in production

The main goal is enhance the ability to put applications in production easier, and to ensure what is running in production can be inspected.

The two main pieces are releases and telemetry.


  • Elixir 1.9 introduced release commands as part of the language, with lots of lessons and concepts from Distillery;
  • shellcheck to test shell scripts;


The library is composed by many pieces and helps snapshot what is going on inside running applications through metrics from the BEAM and custom ones.

  • telemetry, telemetry_poller, telemetry_metrics, and telemetry_reports;
  • Phoenix 1.5 will come with a Telemetry module to help define application specific metrics;

What is next to Elixir?

The next release, 1.10, will come with a different set up for its CI, now using Cirrus CI. It will also contain compiler tracing and ExUnit pattern diffing.

What is next to José Valim?

Now that the language enhancements are getting to a very stable point, and the community projects are being built and showcasing the power of the language, the efforts will be more directed to lower level (HiPE, Lumen and Hastega) and code analysis.

Keynote: Chris McCord

At Lonestar Elixir Conf 2019, Chris McCord presented the still-private project LiveView, which enables rich, real-time user experiences with server-rendered HTML. Right after the conference, the project was made available to the public and since then, a collection of projects is showcasing how interesting and powerful LiveView is. That also includes a Phoenix Phrenzy, a contest for LiveView projects.

In this keynote, he presents a few interesting points of LiveView and also what is coming next and the reasons behind.

LiveView library

  • template engine;
  • tests;
  • navigation: live_redirect, live_link and handle_params for pagination URL changes;
  • prepend, append, ignore and replace updates (phx-update);
  • JS hooks when you need just a piece of extra Javascript;
  • IE11 support;

Coming soon

  • phx-debounce;
  • file uploads;
  • exception translation;
  • stash for client state;
  • automatic from recovery;

Keynote: Justin Schneck

In his keynote, Justin Schneck, again, gave a really nice demo to show the improvements Nerves is getting.


  • operating system: Erlang VM and Linux kernel;
  • application domain: Nerves runtime allows other than Elixir, and resilience from Erlang VM;
  • application data for specific data storage;


Read-only filesystem, immutable;


Whitelist approach (build up) with minimal output;

Nerves Hub

  • CryptoAuthentication chip;
  • NervesKey: write once;
  • delegated auth;
  • certificate;
  • remote console;
  • Terraform scripts to host your own Nerves Hub;

Keynote: Brian Cardarella, Paul Schoenfelder and Luke Imhoff

Introducing Lumen

Lumen is a new project from Dockyard that uses Elixir in the client. It uses WebAssembly and Rust as well in the compiler.


The compiler has some unique constratins such as code size, load time and concurrency model.

Why not BEAM?

  • Runtime: unavailable APIs, incompatible scheduler, JS managed types;
  • Code size: BEAM bytecode is expensive, weak dead-code elimination;
  • Performance: VM on a VM, JS engine unable to reason about bytecode;

New Compiler

  • Restrictions: no hot-code, allow dead-code elimination;
  • Ahead of time vs VM: only pay for what you use, no interpretation overhead;
  • Build on existing tools: LLVM, Rust, wasm-bindgem;
  • Key challenges: recursion and tail-call optimization, non-local returns, green threads, webassembly specific limitations;
  • Continuations: represent jumps as calls to a continuation, never return, always moving forward;

Accepts source in Erlang and Core Erlang, including mix task.

Middle tier
AST lowered to EIR.

Lowers from EIR to LLVM IR, generate object files.

Future goals
Push more data to LLVM, bootstrapping, MLIR, and embedded targets.


Memory management
BEAM and Rust, with property-based testing.

Memory model
Process heap and pre-process garbage collection.

Very similar with what we have in Elixir. Code stack, mailbox, pid,
memory, links, and so on.

One per thread, main thread and webworkers.

WebAssembly main thread
Calls are blocking, scheduler needs to run timers.

Interacting with web
JS calling Lumen, and Lumen calling JS.

Why Lumen?

  • Joe's paper about GUI in functional programming;
  • Optimize front-end development;
  • GUI is concurrent by nature, we could have a supervisor taking care
    of a DOM tree;

Phoenix LiveView Demystified: Alex Garibay

In this talk Alex showed some LiveView internals, how it works and how it works so well.

LiveView EEx

From template with sigils to AST

  • static/vars and dynamic code;
  • %Phoenix.LiveView.Rendered{} with static, dynamic and fingerprint;
  • %Phoenix.LiveView.Comprehension{} to optmize data sent to the client;

Mounting the DOM

  • rounter, controller or template with live and live_render macros;
  • rendered template has few ids for channels and sessions;
  • container <div> that receives the template can be configured to be any HTML tag;

Phoenix Channels

  • uses Phoenix.LiveView.Socket;
  • uses a specific channel "lv:*";
  • socket receives session data and potentially user data;
  • client triggers an event that is sent to the socket using %Phoenix.Socket.Message{};
  • channel handles the event with callbacks;


  • import LiveView library;
  • instantiate and connect LiveSocket;
  • a JS view is created with LiveView container, socket and so on;
  • each JS object in the view has static and dynamic data that is constructed for every change;
  • uses Morphdom.js to re-render the changes in the DOM;

WebRTC from start to finish: Scott Hamilton

WebRTC (Web Real-Time Communication) is a free, open-source project that provides web browsers and mobile applications with real-time communication (RTC) via simple application programming interfaces (APIs). It allows audio and video communication to work inside web pages by allowing direct peer-to-peer communication, eliminating the need to install plugins or download native apps.

Janus is a general purpose WebRTC server that has an Elixir client available.


  • spec and project;
  • basic implementation in P2P;
  • terminology: STUN, TURN and SDP;


  • general purpose WebRTC gateway;
  • JS, websocket;
  • 101: session, plugin, peer connection, handle;
  • resources to learn it are easy to find;

Elixir Phoenix Janus

Phoenix as middleman between client and Janus.

What could go wrong?

  • Janus configuration - ice trickle;
  • Janus on Docker;
  • deployment;
  • translation from Janus response to Elixir terms;
  • mixing HTTP and WS calls;

Elixir + CQRS - Architecting for availability, operability, and maintainability at PagerDuty: Jon Grieman

PagerDuty has a feature that records all incident user logs and they use CQRS pattern to design that piece of functionality.

They use composite keys for tracing that can be order-able. Other benefits of the approach is having separation for monitoring and scaling.

Overall incident log system

  • upstream systems;
  • Kafka;
  • creator;
  • DB cluster;
  • querier;
  • client systems;

DB incident recovery

  • whole stack that could be duplicated in region;
  • replace the DB engine entirely after back to normal;
  • operational benefits coming from ES and CQRS;

Date, Time, and Time Zones in Elixir 1.9: Lau Taarnskov

Handling date and time is a challenge in any language, in this talk we see the details behind Calendar in Elixir that had an upgrade in the version 1.9.

Differences between UTC, TAI and the concept of layers of wall time.

Elixir Calendar, Date and Time specifics

  • sigils: ~D, ~T, ~N, ~U;
  • chose the type by the meaning, not convenience;
  • NaiveDateTime vs DateTime date/time comparison;
    - no data is better than fake/assumed data (JS and Ruby, for example);
    - correct data is better than assumptions;
  • TimeZoneDatabase behaviour, as example tzdata;
  • check date time stored to verify they are appending Z;

Mint, disrupting HTTP clients: Andrea Leopardi

Mint is a reasonable new low-level HTTP client that aims to provide a small and functional core that others can build on top.

Story and features

The initial need was caused by a potential removal of httpc from Erlang standard library. Mint is a processless client, that defines a wrapper as data structure for the connection, on top of gen_tcp. Mint knows how to handle raw bits and also HTTP protocol.

Streaming by default
Responses will arrive async (status, headers, body and so on).


  • httpc is not safe by default;
  • hackney is safe by default but can be overridden if you work with `ssl_options`;
  • mint is safe by default;
  • SSL certificate store with castore;

Mint has proxying support for request and tunnel proxy types.


  • multiplexed streams;
  • server push;
  • backpressure;

Use cases

GenServer, GenStage, GenStatem with connection data. Also, it can be used as one process with many connections.

Challenges and future planes

  • immutability;
  • low level but usable;
  • increase adoption;
  • pooling;
  • websockets;
  • gRPC;

BEAM extreme: Miriam Pena

In her talk, Miriam showed us things to consider to improve performance, also alerted us that any tuning in the VM should be done only when needed, for very specific cases. Besides that, performance measuring should be done for a long time, and not using IEx.

One of the things she mentioned is that memory copy, something that happens a lot in the BEAM brings CPU overhead.

Pool bottleneck

  • use process directly instead of GenServer, 80% faster;
  • leverage process priority level;
  • as a con, hard code readability;

Key-value storage

  • process dictionary: hard to debug, tuple performance as an example;
  • code generation;
  • persistent term: access in constant time, write once read many, no copy to memory heap, tuple performance as an example;


  • for when all the rest fails;
  • extreme memory usage;
  • no VM guarantees;

Other suggestion is to keep OTP version updated as possible as new releases are always improving performance.

Contracts for building robust systems: Chris Keathley

In this talk, Chris presents some insight why Design by Contract should be considered and how his new library Norm can help projects in the data specification and generation aspect.

Breaking changes require coordination

  • refactor;
  • requirement (requires no breaking changes);
  • technology;


  • enforcing callers are correct (pre-conditions);
  • ensure the function is correct (post-conditions);
  • ExContract;

Data specification

  • Norm;
  • using in combination with ExContract;
  • supports schema, optionality;

Testing strategy

  • Norm uses stream data to generate data for testing;
  • can be used with property-based test;

Writing an Ecto adapter, introducing MyXQL: Wojtek Mach

Even not personally interested in MySQL as prefer and use only Postgres, I was willing to know more about internals of Ecto and how Ecto uses its adapters to connect and interact with databases.


  • :gen_tcp for database connection;
  • library binpp;
  • Wireshark app;

MySQL packet

  • payload length;
  • sequence id;
  • packet payload;
  • leverages Elixir binary pattern matching;
  • initial handshake package;
  • handshake response;

Building the adapter

  • encoding and decoding: function for every data type, OK_Packet with defrecord;
  • DBConnection behaviour: maintain a connection pool, does not overload the database, reconnect, support to common database features, and it needs to be fast;


  • start N connections using DBConnection based on the pool configuration;
  • fetching results preparing and executing the query;
  • other functions as disconnect, checkout, ping, and so on;

Ecto integration

  • all the features Postgres adapter has;
  • implements Ecto.Adapter, Ecto.Adapter.Queryable, Ecto.Adapter.Schema, Ecto.Adapter.Storage, and Ecto.Adapter.Transaction;
  • constraints;
  • data normalization: boolean as an example, as in MySQL it is set as 1 and 0;
  • integration tests;

Kubernetes at small scale: Phil Toland

Kubernetes has great benefits, even being a not so easy implementation. Some of the benefits are improved resource efficiency and reduced cost, and operational scalability. In this talk Phil described his process to implement Kubernetes at Hippware.

Main components

  • control plane (leave it alone);
  • workers;
  • workload:
    - pod;
    - deployment (how your pod runs in the cluster);
    - service (expose a pod to outside, expose a port);
    - config map;

Lessons learned

  • outsource as much as possible;
  • automate all the things;
  • pods are ephemeral;
  • automatic clustering: via libraries as libcluster and peerage;
  • deploying: via libraries as kubernetes-deploy and swarm;
  • one instance per node: anti-affinity specification;

ETS Versus ElasticSearch for Queryable Caching: David Schainks

In this talk David compares the characteristics of ElasticSearch that is well-known as a great solution for search and cached data, with Elixir/Erlang out-of-box solutions such as ETS and Persistent Term, listing the pros and cons of each option.


  • filtering, auto completion and full text search;
  • performant;
  • queryable cache;
  • operational cost: another DSL, integration time, expensive, and configuration gotchas;


  • no garbage collection;
  • use cases: preset configuration, cached data, message buffer;
  • filtering with match/2, match_object/1 and fun2ms/1;
  • auto completion with fun2ms/1;
  • full text search: using Joe Armstrong's elib1 library;

Real world concerns

  • performance with large data sets;
  • data ranking;

Operational concerns

  • high availability;
  • synchronization;
  • index changes;

Persistent term

  • no filtering;
  • much faster than ETS;
  • garbage collection on update;

UI Testing is Ruff; Hound Can Help: Vanessa Lee

Whether you call it UI testing, end-to-end testing, end-to-user testing, or acceptance testing, it is often an intensely manual and time-consuming process. An Elixir library, Hound, can carry some of the load through browser automation.

Hound takes screenshots of tests and stores in the test folder.

Library test helpers

  • check page elements;
  • fill, execute actions and submit form;
  • manage sessions;

Property-based testing

  • combining Hound with property-based libraries is very straightforward;
  • using custom StreamData generator using bind/2 helper;

While handy, there are some gotchas when elements are not ready or if the application is heavily dependent in Javascript events/side effects;


How to Comprehend Comprehensions

Particularly for Erlang

Good Will Hunting (1997)

So, I Gusti Ngurah Oka Prinarjaya was reading Joe’s Book and he found one of the most amazing examples of List Comprehensions I’ve ever seen…

perms([]) -> [[]];
perms(List) -> [ [H|T] || H <- List, T <- perms(List--[H]) ].


1> lib_misc:perms("123").

And, of course… he couldn’t understand it. And, as a seasoned Erlang trainer, I got to tell you: He’s not alone… by any means. After a session of the BeamBA Meetup where I casually tried to explain something even simpler than this, we had to schedule another meetup talk just to explain how recursion works and how to think recursively.

So, let me try to explain how recursion and list comprehensions work together in perms/1, how can one end up with a function like that and how to read it. Let’s see how far I can get in a blog post… and Robert Virding (I know you’re a fan of perms/1) please point out the mistakes I’ll surely make ;)

How to write a function like that?

So, let’s first think of how will we come up with such a beautiful function.

The requirement here is pretty simple:

The function perms/1 should receive a single argument (a list) and return the list of all possible permutations of it.

To understand what it does, let’s write a test for it, shall we? (I’ll use strings, since that’s what Joe used in his book, but any list will do)


Cool, if we try to run that test it will tell us that we need to implement the function…

1> c(joe), joe:test().
** exception error: undefined function joe:perms/1
in function joe:test/0 (joe.erl, line 6)

We can start with the base case then… The only permutation of an empty list is itself.


That moved us one step ahead. Cool!

2> c(joe), joe:test().
** exception error: no function clause matching joe:perms("a") (joe.erl, line 13)
in function joe:test/0 (joe.erl, line 7)

Now, for the recursive step…

When writing recursive functions, we must assume that we know how to apply the function to a smaller input. In our case, we’re working with lists, so a smaller input can be the tail of the list (since it’s a smaller list). That’s why my first step when writing this code would be something like this…

perms([H|T]) -> … perms(T) ….

In other words, I take the head of the list (H) and apply the function recursively to its tail (T). Now we have to figure out how to move from the list of permutations for T to the list of permutations for [H|T] .

So, let’s say H=a, T=[b,c], then perms(T) == [[b,c], [c,b]] and we want to get to [[a,b,c], [a,c,b], [b,a,c], [c,a,b], [b,c,a], [c,b,a]]. 🤔

The first two are easy to build: [ [H|P] || P <- perms(T) ]. So far, so good. Let’s try it and see what the tests have to say about it.

3> c(joe), joe:test().
** exception error: no match of right hand side value ["ab"]
in function joe:test/0 (joe.erl, line 8)

Right… perms("ab") is not equal to ["ab", "ba"] , it’s just ["ab"]. We’re just adding the permutations that have the elements in order, since we’re traversing the list from left to right. We need a different way to reduce our list.

So far we had perms(T) be the list of permutations of the elements in the tail of the list. What if we had access to the list of all permutations of any sublist of the original list ([H|T]) with one element less (i.e. what if for [a,b,c] we could build the list of all permutations for [a,b], the list of all permutations for [b,c], and the one for [a,c]). In that case, to build the final result we will only need to add the missing element to their heads.

Too complex? Well… let’s go step by step… First let’s build all the sublists…

perms(List) -> [List -- [H] || H <- List].

Let’s test that on the shell for a second…

4> c(joe), joe:perms("ab").
5> c(joe), joe:perms("abc").
6> c(joe), joe:perms("abcd").

Nice! So… this following code won’t work, but if we could compute the permutations for each of those sublists like this…

perms(List) -> [perms(List -- [H]) || H <- List].

…we would end up with something like…

4> c(joe), joe:perms("ab").
5> c(joe), joe:perms("abc").
[["bc","cb"], ["ac","ca"], ["ab","ba"]]

Let me first apply a neat trick here to flatten the list. Something we can easily do in List Comprehensions by just using multiple generators.

perms(List) -> [T || H <- List, T <- perms(List -- [H])].

Again, using a built-up example since the code won’t actually work like this…

4> c(joe), joe:perms("ab").
5> c(joe), joe:perms("abc").
["bc","cb", "ac","ca", "ab","ba"]

We’re really close there. To get from ["b","a"] to the actual result we want (["ab", "ba"]) we just need to prepend each list with the element we originally removed (which is H in our code). Let’s try that!

7> c(joe), joe:test().

Well… that’s Joe’s code, isn’t it? What did you expect? 😉

How to read a function like that?

Well, that was fun, but what if you are faced with a function like that and your job is to understand it, not to write it. Well, a neat technique you can use it to compute the reductions manually or using the debugger. I’ll do it manually here.

What we’ll be doing is basically emulating the VM work by hand, one step at a time. So, let’s start with a sample expression and let’s see what the VM would do. To be clear: these are not the actual reductions in the VM. I took some liberties, mostly because LCs are just syntactic sugar. But I hope you understand what I’m showing here…

% Second clause of perms/1
[ [H|T] || H <- "123", T <- joe:perms("123" -- [H])].
% Expansion of the first generator
[ [$1|T] || T <- joe:perms("123" -- "1") ] ++
[ [$2|T] || T <- joe:perms("123" -- "2") ] ++
[ [$3|T] || T <- joe:perms("123" -- "3") ].
% Reduction of --
[ [$1|T] || T <- joe:perms("23") ] ++
[ [$2|T] || T <- joe:perms("13") ] ++
[ [$3|T] || T <- joe:perms("12") ].
% Multiple second clauses of perms/1
[ [$1|T]
|| T <- [ [H|T1] || H <- "23", T1 <- joe:perms("23" -- [H])]] ++
[ [$2|T]
|| T <- [ [H|T1] || H <- "13", T1 <- joe:perms("13" -- [H])] ] ++
[ [$3|T]
|| T <- [ [H|T1] || H <- "12", T1 <- joe:perms("12" -- [H])] ].
% Multiple expansion of first generators and --
[ [$1|T]
|| T <- [ [$2|T1] || T1 <- joe:perms("3")] ++
[ [$3|T1] || T1 <- joe:perms("2")] ] ++
[ [$2|T]
|| T <- [ [$1|T1] || T1 <- joe:perms("3")] ++
[ [$3|T1] || T1 <- joe:perms("1")] ] ++
[ [$3|T]
|| T <- [ [$1|T1] || T1 <- joe:perms("2")] ++
[ [$2|T1] || T1 <- joe:perms("1")] ].

Let’s work on perms(“3”) alone for a second…

% Second clause of perms/1
[ [H|T] || H <- "3", T <- joe:perms("3" -- [H])].
% Expansion of the first generator
[ [$3|T] || T <- joe:perms("3" -- "3")].
% Reduction of --
[ [$3|T] || T <- joe:perms("")].
% First clause of perms/1
[ [$3|T] || T <- [[]]].
% Expansion of the generator
[ [$3|[]] ].
% List reduction (and some syntax sugar)

And you can work with the other small lists in an analogous manner. So, going back to the original example…

% Multiple expansion of all simple perm/1's
[ [$1|T]
|| T <- [ [$2|T1] || T1 <- ["3"]] ++
[ [$3|T1] || T1 <- ["2"]] ] ++
[ [$2|T]
|| T <- [ [$1|T1] || T1 <- ["3"]] ++
[ [$3|T1] || T1 <- ["1"]] ] ++
[ [$3|T]
|| T <- [ [$1|T1] || T1 <- ["2"]] ++
[ [$2|T1] || T1 <- ["1"]] ].
% Expansion of all simple generators
[ [$1|T] || T <- [[$2|"3"]] ++ [[$3|"2"]] ] ++
[ [$2|T] || T <- [[$1|"3"]] ++ [[$3|"1"]] ] ++
[ [$3|T] || T <- [[$1|"2"]] ++ [[$2|"1"]] ].
% Reduction of cons operator
[ [$1|T] || T <- ["23"] ++ ["32"] ] ++
[ [$2|T] || T <- ["13"] ++ ["31"] ] ++
[ [$3|T] || T <- ["12"] ++ ["21"] ].
% Reduction of ++
[ [$1|T] || T <- ["23", "32"] ] ++
[ [$2|T] || T <- ["13", "31"] ] ++
[ [$3|T] || T <- ["12", "21"] ].
% Expansion of generators
[[$1|"23"]] ++ [[$1|"32"]] ++
[[$2|"13"]] ++ [[$2|"31"]] ++
[[$3|"12"]] ++ [[$3|"21"]].
% Reduction of ++ and cons
["123", "132", "213", "231", "312", "321"].

Et voilà! 🎉

What you see here is kind of the same algorithm I described above:

For each element in the original list, find all the permutations for the sublist that results from removing it, then attach the element to the head of each permutation.

It’s quite a simple algorithm to describe (and thus the resulting function is also quite simple and descriptive) but it might be hard to understand how such an algorithm actually solves the problem. I hope this article brings some light on the subject. 😃

In Other News…


We’re less than a month away from SpawnFest!
This year it will happen on September 21st & 22nd.
Register your team and join us for FREE to win several amazing prizes provided by our great sponsors!


ElixirConf is coming to Latin America for the first time!
Thanks to our friends from PlayUS Media, we’ll meet in Medellín, Colombia for ElixirConfLA on October 24th & 25th.
It will be an awesome event, you don’t want to miss it :)

Erlang Battleground

We’re still looking for writers. If you want to write on Erlang Battleground, just let us know! ✍️

How to Comprehend Comprehensions was originally published in Erlang Battleground on Medium, where people are continuing the conversation by highlighting and responding to this story.


Copyright © 2016, Planet Erlang. No rights reserved.
Planet Erlang is maintained by Proctor.