How to build a machine learning project in Elixir.

Introduction to machine learning

Machine learning is an ever-growing area of interest for developers, businesses, tech enthusiasts and the general public alike. From agile start-ups to trendsetting industry leaders, businesses know that successful implementation of the right machine learning product could give them a substantial competitive advantage. We have already seen businesses reap significant benefits of machine learning in production through automated chat bots and customised shopping experiences.
Given we recently demonstrated how to complete web scraping in Elixir, we thought we’d take it one step further and show you to apply this in a machine learning project.

The classic algorithmic approach VS machine learning

The traditional approach has always been algorithm centric. To do this, you need to design an efficient algorithm to fix edge cases and meet your data manipulation needs. The more complicated your dataset, the harder it becomes to cover all the angles, and at some point, an algorithm is no longer the best way to go. Luckily, machine learning offers an alternative. When you’re building a machine learning-based system, the goal is to find dependencies in your data. You need the right information to train the program to solve the questions it is likely to be asked. To provide the right information, incoming data is vital for the machine learning system. You need to provide adequate training datasets to achieve success. So without further adieu, we’re going to provide an example tutorial for a machine learning project and show how we achieved success. Feel free to follow along.

The project description

For this project, we’re going to look at an ecommerce platform that offers real-time price comparisons and suggestions. The core functionality of any ecommerce machine learning project is to:

  1. Extract data from websites
  2. Process this data
  3. Provide intelligence and suggestions to the customer
  4. Variable step depending on actions and learnings
  5. Profit

One of the most common problems is the need to group data in a consistent manner. For example, let’s say we want to unify the categories of products from all men’s fashion brands (so we can render all products within a given category, across multiple data sources). Each site (and therefore data source) will likely have inconsistent structures and names, these need to be unified and matched before we can run an accurate comparison.

For the purpose of this guide, we will build a project which :

  1. Extracts the data from a group of websites (in this case we will demonstrate how to extract data from the harveynorman.ie shop)
  2. Train a neural network to recognise a product category from the product image
  3. Integrate the neural network into the Elixir code so it completes the image recognition and suggests products
  4. Build a web app which glues everything together.

Extracting the data

As we mentioned at the beginning, data is the cornerstone of any successful machine learning system. The key to success at this step is to extract real-world-data that is publicly available and then prepare it into training sets. For our example, we need to gather the basic information about the products (title, description, SKU and image URL etc). We will use the extracted images and their categories to perform the machine learning training.

The quality of the trained neural network model is directly related to the quality of datasets you’re providing. So it’s important to make sure that the extracted data actually makes sense.

We’re going to use a library called Crawly to perform the data extraction.

Crawly is an application framework for crawling web sites and extracting structured data which can be used for a wide range of useful applications, like data mining, information processing or historical archival. You can find out more about it on the documentations page. Or you can visit our guide on how to complete web scraping in Elixir.

Now that is explained, let’s get started! First of all, we will create a new Elixir project:

mix new products_advisor --sup

Now the project is created, modify the deps function of the mix.exs file, so it looks like this:

  # Run "mix help deps" to learn about dependencies.
  defp deps do
    [
      {:crawly, "~> 0.1"},
    ]
  end

Now, fetch all the dependencies: mix deps.get, and we’re ready to go. For the next step, implement the module responsible for crawling harveynorman.ie website. Save the following code under the lib/products_advisor/spiders/harveynorman.ex

defmodule HarveynormanIe do
 @behaviour Crawly.Spider

 require Logger
 @impl Crawly.Spider
 def base_url(), do: "https://www.harveynorman.ie"

 @impl Crawly.Spider
 def init() do
  [
   start_urls: [
    "https://www.harveynorman.ie/tvs-headphones/"
   ]
  ]
 end

 @impl Crawly.Spider
 def parse_item(response) do

    # Extracting pagination urls
  pagination_urls =
   response.body |> Floki.find("ol.pager li a") |> Floki.attribute("href")

    # Extracting product urls
  product_urls =
   response.body |> Floki.find("a.product-img") |> Floki.attribute("href")

  all_urls = pagination_urls ++ product_urls

    # Converting URLs into Crawly requests
  requests =
   all_urls
   |> Enum.map(&build_absolute_url/1)
   |> Enum.map(&Crawly.Utils.request_from_url/1)

  # Extracting item fields
  title = response.body |> Floki.find("h1.product-title") |> Floki.text()
  id = response.body |> Floki.find(".product-id") |> Floki.text()

  category =
   response.body
   |> Floki.find(".nav-breadcrumbs :nth-child(3)")
   |> Floki.text()

  description =
   response.body |> Floki.find(".product-tab-wrapper") |> Floki.text()

  images =
   response.body
   |> Floki.find(" .pict")
   |> Floki.attribute("src")
   |> Enum.map(&build_image_url/1)

  %Crawly.ParsedItem{
   :items => [
    %{
     id: id,
     title: title,
     category: category,
     images: images,
     description: description
    }
   ],
   :requests => requests
  }
 end

 defp build_absolute_url(url), do: URI.merge(base_url(), url) |> to_string()

 defp build_image_url(url) do
  URI.merge("https://hniesfp.imgix.net", url) |> to_string()
 end

end

Here we’re implementing a module called HarveynormanIe which triggers a Crawly.Spider behavior by defining its callbacks: init/0 (used to create initial request used by the spider code to fetch the initial pages), base_url/0 (used to filter out unrelated urls, e.g. urls leading to the outside world) and parse_item/1 (responsible for the conversion of the downloaded request into items and new requests to follow).

Now for the basic configuration:

Here we will use the following settings to configure Crawly for our platform:

config :crawly,
  # Close spider if it extracts less than 10 items per minute
 closespider_timeout: 10,
  # Start 16 concurrent workers per domain
 concurrent_requests_per_domain: 16,
 follow_redirects: true,
  # Define item structure (required fields)
 item: [:title, :id, :category, :description],
  # Define item identifyer (used to filter out duplicated items)
 item_id: :id,
  # Define item item pipelines
 pipelines: [
  Crawly.Pipelines.Validate,
  Crawly.Pipelines.DuplicatesFilter,
  Crawly.Pipelines.JSONEncoder
 ]

That’s it. Our basic crawler is ready, now we can get the data extracted in a JL format, sent to a folder under the name: /tmp/HarveynormanIe.jl

Crawly supports a wide range of configuration options, like base_store_path which allows you to store items under different locations, see the related part of the documentation here. The full review of Crawly’s capabilities is outside of the scope of this blog post.

Use the following command to start the spider:

iex -S mix
Crawly.Engine.start_spider(HarveynormanIe)

You will see the following entries amongst your logs:

6:34:48.639 [debug] Scraped "{\"title\":\"Sony MDR-E9LP In-Ear Headphones | Blue\",\"images\":[\"https://hniesfp.imgix.net/8/images/detailed/161/MDRE9LPL.AE.jpg?fit=fill&bg=0FFF&w=833&h=555&auto=format,compress\",\"https://hniesfp.imgix.net/8/images/feature_variant/48/sony_logo_v3.jpg?fit=fill&bg=0FFF&w=264&h=68&auto=format,compress\"],\"id\":\"MDRE9LPL.AE\",\"description\":\"Neodymium Magnet13.5mm driver unit reproduces powerful bass sound.Pair with a Music PlayerUse your headphones with a Walkman, "<> ...

The above entries indicate that a crawling process is successfully running, and we’re getting items stored in our file system.

Tensor flow model training

To simplify and speed up the model training process, we are going to use a pre-trained image classifier. We will use an image classifier trained on ImageNet to create a new classification layer on top of using a transfer learning technique. The new model will be based on MobileNet V2 with a depth multiplier of 0.5 and an input size of 224x224 pixels.

This part is based on the Tensorflow tutorial on how to how to retrain an image classifier for new categories. If you followed the previous steps, then the training data set has already been downloaded (scraped) into a configured directory (/tmp/products_advisor by default). All the images are located according to their category:

/tmp/products_advisor  
├── building_&_hardware  
├── computer_accessories  
├── connected_home  
├── headphones  
├── hi-fi,_audio_&_speakers  
├── home_cinema  
├── lighting_&_electrical  
├── storage_&_home  
├── tools  
├── toughbuilt_24in_wall_organizer  
├── tv_&_audio_accessories  
└── tvs  

Before the model can be trained, let’s review the downloaded data set. You can see that some categories contain a very small number of scraped images. In cases with less than 200 images, there is not enough data to accurately train your machine learning program, so we can delete these categories.

find /tmp/products_advisor -depth 1 -type d \
        -exec bash -c "echo -ne '{}'; ls '{}' | wc -l" \; \
    | awk '$2<200 {print $1}' \
    | xargs -L1 rm -rf

This will leave us with just 5 categories that can be used for the new model:

/tmp/products_advisor  
├── headphones  
├── hi-fi,_audio_&_speakers  
├── tools  
├── tv_&_audio_accessories  
└── tvs  

Creating a model is as easy as running a python script that was created by the Tensorflow authors and can be found in the official tensorflow Github repository:

TFMODULE=https://tfhub.dev/google/imagenet/mobilenet_v2_050_224/classification/2

python bin/retrain.py \  
    --tfhub_module=$TFMODULE \  
    --bottleneck_dir=tf/bottlenecks \  
    --how_many_training_steps=1000 \  
    --model_dir=tf/models \  
    --summaries_dir=tf/training_summaries \  
    --output_graph=tf/retrained_graph.pb \  
    --output_labels=tf/retrained_labels.txt \  
    --image_dir=/tmp/products_advisor  

On the MacBook Pro 2018 2.2 GHz Intel Core i7 this process takes approximately 5 minutes. As a result, the retrained graph, along with new label categories, can be found in the configured locations (tf/retrained_graph.pb and tf/retrained_labels.txt in this example), these can be used for further image classification:

IMAGE_PATH="/tmp/products_advisor/hi-fi,_audio_&_speakers/0017c7f1-129f-4fa7-a62b-9766d2cb4486.jpeg"

python bin/label_image.py \
    --graph=tf/retrained_graph.pb \
    --labels tf/retrained_labels.txt \
    --image=$IMAGE_PATH \
    --input_layer=Placeholder \
    --output_layer=final_result \
    --input_height=224 \
    --input_width=224

hi-fi audio speakers 0.9721675
tools 0.01919974
tv audio accessories 0.008398962
headphones 0.00015944676
tvs 7.433378e-05

As you can see, the newly trained model classified the images from the training set with 0.9721675 probability of belonging to the “hi-fi audio speakers” category.

Image classification using Elixir

Using python a tensor can be created using the following code:

import tensorflow as tf

def read_tensor_from_image_file(file_name):
    file_reader = tf.read_file("file_reader", input_name)
    image_reader = tf.image.decode_jpeg(
        file_reader, channels=3, name="jpeg_reader")
    float_caster = tf.cast(image_reader, tf.float32)
    dims_expander = tf.expand_dims(float_caster, 0)
    resized = tf.image.resize_bilinear(dims_expander, [224, 224])
    normalized = tf.divide(resized, [input_std])
    sess = tf.Session()
    return sess.run(normalized)

Now let’s classify the images from an Elixir application. Tensorflow provides APIs for the following languages: Python, C++, Java, Go and JavaScript. Obviously, there’s no native support for BEAM languages. We could’ve used C++ bindings, though the C++ library is only designed to work with the bazel build tool. Let’s leave the mix integration with bazel as an exercise to a curious reader and instead take a look at the C API, that can be used as native implemented functions (NIF) for Elixir. Fortunately, there’s no need to write bindings for Elixir as there’s a library that has almost everything that we need: https://github.com/anshuman23/tensorflex As we saw earlier, to supply an image as an input for a tensorflow session, it has to be converted to an acceptable format: 4-dimensional tensor that contains a decoded normalised image that is 224x224 in size (as defined in the chosen MobileNet V2 model). The output is a 2-dimensional tensor that can hold a vector of values. For a newly trained model, the output is received as a 5x1 float32 tensor. 5 comes from the number of classes in the model.

Image decoding

Let’s assume that the images are going to be provided encoded in JPEG. We could write a library to decode JPEG in Elixir, however, there are several open source C libraries that can be used from NIFs. The other option would be to search for an Elixir library, that already provides this functionality. Hex.pm shows that there’s a library called imago that can decode images from different formats and perform some post-processing. It uses rust and depends on other rust libraries to perform its decoding. Almost all its functionality is redundant in our case. To reduce the number of dependencies and for educational purposes, let’s split this into 2 simple Elixir libraries that will be responsible for JPEG decoding and image resizing.

JPEG decoding

This library will use a JPEG API to decode and provide the image. This makes the Elixir part of the library responsible for loading a NIF and documenting the APIs:

defmodule Jaypeg do
  @moduledoc 
  Simple library for JPEG processing.

  ## Decoding

   elixir
  {:ok, <<104, 146, ...>>, [width: 2000, height: 1333, channels: 3]} =
      Jaypeg.decode(File.read!("file/image.jpg"))



  @on_load :load_nifs

  @doc 
  Decode JPEG image and return information about the decode image such
  as width, height and number of channels.

  ## Examples

      iex> Jaypeg.decode(File.read!("file/image.jpg"))
      {:ok, <<104, 146, ...>>, [width: 2000, height: 1333, channels: 3]}


  def decode(_encoded_image) do
    :erlang.nif_error(:nif_not_loaded)
  end

  def load_nifs do
    :ok = :erlang.load_nif(Application.app_dir(:jaypeg, "priv/jaypeg"), 0)
  end
End

The NIF implementation is not much more complicated. It initialises everything necessary for decoding the JPEG variables, passes the provided content of the image as a stream to a JPEG decoder and eventually cleans up after itself:

static ERL_NIF_TERM decode(ErlNifEnv *env, int argc,
                           const ERL_NIF_TERM argv[]) {
  ERL_NIF_TERM jpeg_binary_term;
  jpeg_binary_term = argv[0];
  if (!enif_is_binary(env, jpeg_binary_term)) {
    return enif_make_badarg(env);
  }

  ErlNifBinary jpeg_binary;
  enif_inspect_binary(env, jpeg_binary_term, &jpeg_binary);

  struct jpeg_decompress_struct cinfo;
  struct jpeg_error_mgr jerr;
  cinfo.err = jpeg_std_error(&jerr);
  jpeg_create_decompress(&cinfo);

  FILE * img_src = fmemopen(jpeg_binary.data, jpeg_binary.size, "rb");
  if (img_src == NULL)
    return enif_make_tuple2(env, enif_make_atom(env, "error"),
                            enif_make_atom(env, "fmemopen"));

  jpeg_stdio_src(&cinfo, img_src);

  int error_check;
  error_check = jpeg_read_header(&cinfo, TRUE);
  if (error_check != 1)
    return enif_make_tuple2(env, enif_make_atom(env, "error"),
                            enif_make_atom(env, "bad_jpeg"));

  jpeg_start_decompress(&cinfo);

  int width, height, num_pixels, row_stride;
  width = cinfo.output_width;
  height = cinfo.output_height;
  num_pixels = cinfo.output_components;
  unsigned long output_size;
  output_size = width * height * num_pixels;
  row_stride = width * num_pixels;

  ErlNifBinary bmp_binary;
  enif_alloc_binary(output_size, &bmp_binary);

  while (cinfo.output_scanline < cinfo.output_height) {
    unsigned char *buf[1];
    buf[0] = bmp_binary.data + cinfo.output_scanline * row_stride;
    jpeg_read_scanlines(&cinfo, buf, 1);
  }

  jpeg_finish_decompress(&cinfo);
  jpeg_destroy_decompress(&cinfo);

  fclose(img_src);

  ERL_NIF_TERM bmp_term;
  bmp_term = enif_make_binary(env, &bmp_binary);
  ERL_NIF_TERM properties_term;
  properties_term = decode_properties(env, width, height, num_pixels);

  return enif_make_tuple3(
    env, enif_make_atom(env, "ok"), bmp_term, properties_term);
}

Now, all that’s left to do to make the tooling work is to declare the NIF functions and definitions. The full code is available on github.

Image resizing

Even though it is possible to reimplement the image operation algorithm using Elixir, this is out of the scope of this exercise and we decided to use C/C++ stb library, that is distributed under a public domain and can be easily integrated as an Elixir NIF. The library is literally just a proxy for a C function that resizes an image, the Elixir part is dedicated to the NIF load and documentation:

static ERL_NIF_TERM resize(ErlNifEnv *env, int argc,
                           const ERL_NIF_TERM argv[]) {
  ErlNifBinary in_img_binary;
  enif_inspect_binary(env, argv[0], &in_img_binary);

  unsigned in_width, in_height, num_channels;
  enif_get_uint(env, argv[1], &in_width);
  enif_get_uint(env, argv[2], &in_height);
  enif_get_uint(env, argv[3], &num_channels);

  unsigned out_width, out_height;
  enif_get_uint(env, argv[4], &out_width);
  enif_get_uint(env, argv[5], &out_height);

  unsigned long output_size;
  output_size = out_width * out_height * num_channels;
  ErlNifBinary out_img_binary;
  enif_alloc_binary(output_size, &out_img_binary);

  if (stbir_resize_uint8(
        in_img_binary.data, in_width, in_height, 0,
        out_img_binary.data, out_width, out_height, 0, num_channels) != 1)
    return enif_make_tuple2(
      env,
      enif_make_atom(env, "error"),
      enif_make_atom(env, "resize"));

  ERL_NIF_TERM out_img_term;
  out_img_term = enif_make_binary(env, &out_img_binary);

  return enif_make_tuple2(env, enif_make_atom(env, "ok"), out_img_term);
}

The image resizing library is available on github as well.

Creating a tensor from an image

Now it’s time to create a tensor from the processed images (after it has been decoded and resized). To be able to load a processed image as a tensor, the Tensorflex library should be extended with 2 functions:

  1. Create a matrix from a provided binary
  2. Create a float32 tensor from a given matrix.

Implementation of the functions are very Tensorflex specific and wouldn’t make much sense to a reader without an understanding of the context. NIF implementation can be found on github and can be found under functions binary_to_matrix and matrix_to_float32_tensor respectively.

Putting everything together

Once all necessary components are available, it’s time to put everything together. This part is similar to what can be seen at the beginning of the blog post, where the image was labelled using Python, but this time we are going to use Elixir to leverage all the libraries that we have modified:

def classify_image(image, graph, labels) do
    {:ok, decoded, properties} = Jaypeg.decode(image)
    in_width = properties[:width]
    in_height = properties[:height]
    channels = properties[:channels]
    height = width = 224

    {:ok, resized} =
      ImgUtils.resize(decoded, in_width, in_height, channels, width, height)

    {:ok, input_tensor} =
      Tensorflex.binary_to_matrix(resized, width, height * channels)
      |> Tensorflex.divide_matrix_by_scalar(255)
      |> Tensorflex.matrix_to_float32_tensor({1, width, height, channels})

    {:ok, output_tensor} =
      Tensorflex.create_matrix(1, 2, [[length(labels), 1]])
      |> Tensorflex.float32_tensor_alloc()

    Tensorflex.run_session(
      graph,
      input_tensor,
      output_tensor,
      "Placeholder",
      "final_result"
    )
  end

classify_image function returns a list of probabilities for each given label:

iex(1)> image = File.read!("/tmp/tv.jpeg")
<<255, 216, 255, 224, 0, 16, 74, 70, 73, 70, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 255,
  219, 0, 132, 0, 9, 6, 7, 19, 19, 18, 21, 18, 19, 17, 18, 22, 19, 21, 21, 21,
  22, 18, 23, 19, 23, 18, 19, 21, 23, ...>>
iex(2)> {:ok, graph} = Tensorflex.read_graph("/tmp/retrained_graph.pb")
{:ok,
 %Tensorflex.Graph{
   def: #Reference<0.2581978403.3326476294.49326>,
   name: "/Users/grigory/work/image_classifier/priv/retrained_graph.pb"
 }}
iex(3)> labels = ImageClassifier.read_labels("/tmp/retrained_labels.txt")
["headphones", "hi fi audio speakers", "tools", "tv audio accessories", "tvs"]
iex(4)> probes = ImageClassifier.classify_image(image, graph, labels)
[
  [1.605743818799965e-6, 2.0029481220262824e-6, 3.241990925744176e-4,
   3.040388401132077e-4, 0.9993681311607361]
]

retrained_graph.pb and retrained_labels.txt can be found in the tf directory of the products-advisor-model-trainer repository that was mentioned earlier on in the model training step. If the model was trained successfully, tf directory should be similar to this tree:

/products-advisor-model-trainer/tf/  
├── bottlenecks  
├── retrained_graph.pb  
├── retrained_labels.txt  
└── training_summaries  

The most probable label can easily be found by the following line:

iex(6)> List.flatten(probes) |> Enum.zip(labels) |> Enum.max()
{0.9993681311607361, "tvs"}

Learn more

So there you have it. This is a basic demonstration of how Elixir can be used to complete machine learning projects. The full code is available on the github. If you’d like to stay up-to-date with more projects like this, why not sign up to our newsletter? Or check out our detailed blog on how to complete web scraping in Elixir. Or, if you’re planning a machine learning project, why not talk to us, we’d be happy to help.

Permalink

Advanced RabbitMQ Support Part 1: Leveraging WombatOAM Alarms

Erlang Solutions offers world-leading RabbitMQ consultancy, support & tuning solutions. Learn more >

Introduction

From simple single node deployments to complex and mission critical clustered/federated node architectures, RabbitMQ often finds itself put to use in various MOM (Message-Oriented Middleware) solutions. This is mainly due to its seamless ease of use and adaptability to different use cases, each with distinct requirements on aspects such as high availability, latency stringencies, just to name a few. User testimonials for RabbitMQ back this up; across all its spheres of use, teams have found it to be very impressive.

For such a highly esteemed system, playing such an crucial role in the most mission critical solutions in the industry, you’d assume that it is fully equipped with in-built, top-of-the-range operations and maintenance sub-components, utilities, and functions, which enable the most effective and efficient support possible. Unfortunately, that’s often not the case, in particular when we put focus on the all-important aspect of alarming.

What RabbitMQ considers as alarms are notifications and error messages which it writes to its logs, coupled with some internal defence mechanisms or recovery strategies, which are put into effect when a subset of some of its common operational problems are encountered. An alarm, in its true OAM sense, can be defined as;

a system generated indicator to an external entity, consisting of as much useful information as possible, in order to trigger an aiding action to prevent, resolve and/or alleviate the reported problem’s root cause.

With this definition in mind, we come to realise that RabbitMQ has in fact (at the time of publishing this discussion), a limitation in notifying and triggering external entities for carrying out any corrective, or, pre-emptive action(s) for resolving the problem cause. When an alarm is raised within RabbitMQ, an action plan is decided upon, and automatic resolution or recovery attempts are internally executed by the system. See Fig 1 below (NOTE: This is for illustration purposes as the ACTION PLAN step is carried out within RabbitMQ).

Fig 1: RabbitMQ Alarming routine illustration

Some of RabbitMQ’s native alarms provide accompanying graphical illustrations on the native Management UI, for example, VM High Memory Watermark alarms will graphically indicate that a node’s permissible memory usage has been exceeded. However, the user is not necessarily notified (unless continually monitoring from the UI) beforehand.

An ideal alarming scenario and resolution plan would not only decide on a particular action plan (internally), but go further to warn and/or notify its user(s) of the reported problem, similar to the illustration in Fig 2. This would imply that for any (and many) reported warnings and problems notified to the user, the time taken for recovery would be much much faster, depending on how quickly (and skilled) the user who participates in the recovery procedure is.

Fig 2: Expected RabbitMQ Alarming routine illustration

To handle and cater for common or rare internal RabbitMQ system alarms in a near ideal manner, custom RabbitMQ plugins or external tools which help meet and fulfill these alarming requirements need to be developed, or made use of if already in existence. This is where WombatOAM comes into play, showcasing its significant importance and need for use in operations and maintenance functions of all RabbitMQ installations.

i. Metric based alarms

WombatOAM provides RabbitMQ users with a rich set of native and AMQP related metrics to monitor. Coupled with its alarming infrastructure, specific alarms may be defined for each supported RabbitMQ metric provided by WombatOAM. Currently, approximately 50 RabbitMQ-specific metrics are provided by WombatOAM, which equates to a staggering same number of possible distinct alarming scenarios which may be configured and used by support engineers to assist in ensuring optimum end-to-end service provision.

Let’s take the Connections created metric as an example. For an installed RabbitMQ node, getting a clear picture of this attribute is crucial for many reasons, such as its direct influence on the node’s service availability and memory utilization. RabbitMQ nodes will accept connections only to certain extents depending on how they’ve been configured and the resources limitations on which they’ve been installed. As an example, typical IoT use cases tend to have a common need for dynamic connection establishment(s) from their endpoint electronic devices, interacting with some SaaS backend via RabbitMQ.

With RabbitMQ as the middleman, the use of WombatOAM to define and raise alarms when the number of created connections exceeds a predefined threshold becomes a crucial aspect to expose to support engineers. This functionality, most importantly, assists them in making decisions such as knowing “when to scale,”, i.e. adding one or more nodes and directing newly inbound connection establishments there, when the current nodes reach connection saturation. Such an alarm could be defined as follows (in your wombat.config file);

{set, wo_metrics, threshold_sets,
 [
  [{nodes, ["rabbit@Ayandas-MacBook-Pro", "rabbit_1@Ayandas-MacBook-Pro", "rabbit_2@Ayandas-MacBook-Pro"]},
   {rules,
    [[{name, "RABBITMQ_CONNECTIONS_CREATED_ALARM"},
    {metric, {"RabbitMQ", "Connections created"}},
      {raise_level, 75},
      {cease_level, 50},
      {unit, percentage},
      {direction, warn_above},
      {percentage_base, 100000}]]}]
 ]}.

This configuration will raise an alarm when (75% of 100000) 75000 connections have been created on each node configured in the nodes list. Fig 3 illustrates an example of this alarm when raised:

Fig 3: Connections created alarm example

As already mentioned, the possible number of alarming cases which WombatOAM can be configured to expose is remarkably rich. Other useful metric alarms which can be further configured include the following cases, or more, depending on your business criticalities (these may be copied as is into your wombat.config file):

- Total active connections
- Total active channels
- Total active consumers
- Total active exchanges
- Total active queues
- Channels created
- Channels closed
- Consumers created
- Permission created
- Queue created
- Queue deleted
- User created
- User deleted
- User password changed
- User authentication failure
- User authentication success
- Exchanges created


And with the RabbitMQ statistics database running, WombatOAM will fetch more metrics from the database, and you can extend the number of possible metric alarming scenarios to the following:


’’’

<<>><<>><<>><<>><<>>

’’’

<<>><<>><<>><<>><<>>

’’

Permalink

Advanced RabbitMQ Support Part II: Deeper Insight into Queues

Before you go any further, you should know that you can test WombatOAM out today with a 45 day free trial for WombatOAM 3.0.0


Introduction

The most important and critical elements of any RabbitMQ installation are the Queues. Queues retain messages specific to different use cases across various industrial sectors such as telecommunications, financial systems, automotive, and so forth. Queues, and their adherence to AMQP are essentially “why” RabbitMQ exists. Not only do they retain messages till consumption, but internally, they are also an implementation of some of the most complex mechanisms for guaranteeing efficient message propagation through the fabric, while catering for additional requirements such as high availability, message persistence, regulated memory utilisation, and so forth.

So queues are general, the main focal point of any RabbitMQ installation. Which is why all RabbitMQ users and support engineers often find themselves having to do regular checks around queues, as well ensuring their host Rabbit nodes have been precisely configured to guarantee efficient message queueing operations. Typical questions that tend to arise from RabbitMQ users and support engineers are;

… how many messages are in Queue “A”?

… how many messages are pending acknowledgement in Queue “K”?

… how many consuming clients are subscribed to Queue “R”?

… how much memory is Queue “D” using?

… how many messages in Queue “F” are persisted on disk?

… is Queue “E” alive?

Within RabbitMQ, the implementation of a queue is a combination of multiple aspects such as the behaviour specification governing its operation (e.g. internally, what is known as the backing queue behaviour), the transient/persistent message store components, and most importantly, the queue process dictating all the logic and mechanics involved in the queueing logic. From these, a number of attributes exist, which give an indication of the current state of the queue. Some of these queue attributes are illustrated below:

Fig 1: RabbitMQ Queue Attributes

WombatOAM

As of WombatOAM 2.7.0, the WombatOAM-RabbitMQ plugin now ships with an additional agent, the RabbitMQ Queues agent. This RabbitMQ Queues agent has been precisely designed and developed to allow monitoring and acquisition of metrics specific to Queues, as well as presenting them in a user friendly manner to RabbitMQ users. Two modes of operation are supported:

Dynamic operation: Queues existing on the monitored node, with names matching to a user defined regex are dynamically loaded by WombatOAM for monitoring. Static operation: Specific queues are configured and monitored as defined in the WombatOAM RabbitMQ

Configuration

The manner in which this agent operates and presents metrics is solely dependant on the way in which it has been configured.

1. Dynamic operation

Dynamic mode of monitoring Queues may be configured by defining a match specification, from which queue names are matched against as follows, and the particular, desired attribute/metric from each matched queue. For example, to monitor memory usage of all queues, the following configuration may be defined in the wombat.config file:

{set, wo_plugins, plugins, rabbitmq_queues, dynamic_queues, [{match_spec, ".*"},
{metric, memory}]
}.

This will capture all queues on the node being monitored and present memory metrics from queues.

Fig 2: RabbitMQ Dynamic Queue Metrics

2. Static operation

In static mode of operation, users explicitly specify Queues and corresponding attribute/metric they would like to monitor in the wombat.config . A complete static configuration entry would consist of the Queue Name, Virtual Host, and the Attribute being measured. For example, to monitor the number of messages , consumers and amount of memory utilisation from the SERVICES.QUEUE , and number of messages only, from the EVENTS.QUEUE, a user may specify the following configuration from the wombat.config file:

{set, wo_plugins, plugins, rabbitmq_queues, static_queues,
[{<<"SERVICES.QUEUE">>, <<"/">>, messages},
{<<"SERVICES.QUEUE">>, <<"/">>, memory},
{<<"SERVICES.QUEUE">>, <<"/">>, consumers},
{<<"EVENTS.QUEUE">>, <<"/">>, messages}]}.

Configuring Static Queues is of extreme importance if your mission critical queues which you need continuous visibility of metrics such as messages counts and memory usage

The following illustrates an example of static mode:

Fig 3: RabbitMQ Static Queue Metrics

Taking “things” further!

Coupling together our discussion of monitoring Queues, together with discussion with Part-1 of this series of carrying out advanced alarming operations for RabbitMQ operations, imagine how many alarming cases we could achieve by defining alarms specific to Queue metrics?

Not only does WombatOAM provide us with a huge spectrum of alarming cases we could handle, but useful metrics. Imagine how useful the following alarms would be:

“an alarm which when triggered would send your team email notifications indicating that the number of messages in your most critical SERVICE.QUEUE has just reached the 500 000 message, limit without messages being consumed?”

Or:

“an alarm configured to issue email notifications when the number of consuming clients falls below a certain minimum permissible number, indicating there’s a critical service affecting problem on the client end”

or even more interesting:

“an alarm and email notification issued when a queues individual memory usage exceeds a certain cap value, beyond which would be an indication of one or more problems manifesting in the cluster.”

Defining such alarms could be as simple as configuring the following in wombat.config as illustrated here.

Fig 4: RabbitMQ Queue Alarms

Conclusion

So with these capabilities in mind, imagine the total number Queue specific metrics attainable for monitoring on WombatOAM? The number can be immense, and only limited by the total number of queues you have running, along with the number of attributes you have configured/setup for monitoring. All this is dependant on your configuration. To be precise, a total of 16 attributes are configurable per queue on WombatOAM, meaning a total of 16 x N queue specific metrics are attainable (Wow!). So imagine a queue count of ~50 or more queues on a RabbitMQ installation? The number of attainable metric capabilities becomes crazy! That’s ~50 x 16 = a staggering 800 metrics!!!

WombatOAM also provides ability to order queues as desired since the number of available queue metrics has the potential to be extremely large. The rate at which metrics are acquired is also configurable. If you desire to reduce frequency of which metrics are gathered (which is recommended when you have an extremely large number of queues, and queue metrics configured), this can be carried out by simply updating configuration.


Erlang Solutions offers world-leading RabbitMQ consultancy, support & tuning solutions. Learn more >

We thought you might also be interested in:

RabbitMQ supoort part 1 - WombatOAM alarms

WombatOAM - RabbitMQ monitoring

Permalink

Mocking and faking external dependencies in elixir tests

During some recent work on thoughtbot’s company announcement app Constable, I ran into a situation where I was introducing a new service object that made external requests. When unit testing, it is easy enough to use some straightforward mocks to avoid making external requests. However, for tests unrelated to the service object, how can we mock out external requests without littering those tests with explicit mocks?

Unit testing using mocks

To set the stage, let’s take a look at what a hypothetical service object that makes an external request might look like:

defmodule App.Services.WebService do
  def make_request do
    HTTPoison.get!("http://thoughtbot.com/")
  end
end

And if we were to write a unit test for this service object, we would want to mock out the external request (in this case, the call to HTTPoison.get!/1). To do that, we might use a library like Mock:

defmodule App.Services.WebServiceTest do
  import Mock
  alias App.Services.WebService

  describe "#make_request" do
    test ". . ."
      # setup . . .

      get_mock = fn _url, _params, _headers ->
        %HTTPoison.Response{
          body: ". . .",
          status_code: 200
        }
      end

      response =
        Mock.with_mock HTTPoison, get!: get_mock do
          WebService.make_request()
        end

      # assertions . . .
    end
  end
end

Where it gets tricky

Mocking is exactly what we want when unit testing the service object, but if we have an unrelated unit tests that run code which happens to use our service object, we want to ensure that no external requests are made when running our test suite.

As an example, we might have a module that utilizes our service object:

defmodule SomeModule
  alias App.Services.WebServiceTest

  def do_something
    . . .
    response = WebServiceTest.make_request()
    . . .
  end
end

If we were testing this object and in our test we called SomeModule.do_something/0, we would inadvertently be making an external request. It would be incorrect to mock HTTPoison.get!/1 in this test because that’s an implementation detail of our service object. And while we could mock WebServiceTest.make_request/0, that will lead to a lot of noise in our tests.

Let’s create a fake

One way we can get around this issue is to create a fake version of our service object which has the same public interface, but returns fake data. That object might look like:

defmodule App.Services.FakeWebService do
  def make_request do
    %HTTPoison.Response{body: ". . .", status_code: 200}
  end
end

We want to utilize this fake by making our application code use WebService unless we are testing, in which case we want to use FakeWebService.

A common way to accomplish this is to have three modules: WebService, WebServiceImplementation, and WebServiceFake. Everyone calls methods on WebService which then delegates to WebServiceImplementation when not testing, or to WebServiceFake when testing. I don’t particularly like this pattern, because it requires an extra object and introduces complexity.

A much more simple and flexible solution is to use a form of dependency injection where we dynamically refer to our service object which is either the real service or the fake. There is a great elixir module called pact which accomplishes this by creating a dependency registry which allows us to define named dependencies, but conditionally switch out the actual value they resolve to.

Using pact, we define a dependency registry for our application:

defmodule App.Pact do
  use Pact
  alias App.Services.WebService

  register :web_service, WebService
end

App.Pact.start_link

And then we want to redefine that dependency to be our fake when we run our tests. The following code, either in a test helper or in some setup that occurs before all tests, will accomplish that:

App.Pact.register(:web_service, FakeWebService)

Finally, all calls to WebService.make_request() in our application and tests become App.Pact.get(:web_service).make_request(). The one exception to this is in our unit test for WebService itself - we want to test the actual service object! So we should still explicitly call WebService.make_request().

Keeping the fake up to date

This approach is good, but there is one problem: if the public interface of our real service object changes, we also have to update the fake. This may be acceptable; after all, it would likely cause a runtime test failure if the public interface of the fake differed from the real service object. But there is an easy way to make the compiler do more work for us.

Using behaviors we can specify a public interface that both the real service object and the fake must conform to. This can give us more confidence that we’re always keeping the two in sync with each other.

Let’s define a module describing the behavior of our service object:

defmodule App.Services.WebServiceProvider do
  @callback make_request() :: HTTPoison.Response.t()
end

And then we can adopt that behavior in our service object and fake:

defmodule App.Services.WebService do
  alias App.Services.WebServiceProvider
  @behavior WebServiceProvider

  @impl WebServiceProvider
  def make_request do
    HTTPoison.get!("http://thoughtbot.com/")
  end
end

. . .

defmodule App.Services.FakeWebService do
  alias App.Services.WebServiceProvider

  @impl WebServiceProvider
  def make_request do
    %HTTPoison.Response{body: ". . .", status_code: 200}
  end
end

Final thoughts

This approach is great when we want pure unit testing, and our goal is to avoid any external requests. The pact library even allows us to replace dependencies in a block, rather than permanently:

App.Pact.replace :web_service, FakeWebService do
  . . .
end

This can be an invaluable alternative to mocking a dependency for all tests, and may be preferable if we want to be very explicit about what is mocked in each test while still allowing us to easily make use of our fake.

For integration tests or more complex services where we want to test the full service-to-service interaction, we may want to consider building our own mock server instead of replacing the external service with a fake.

Permalink

Clever use of persistent_term

This blog post will go through three different uses of persistent_term that I have used since its release and explain a bit why they work so well with persistent_term.

Global counters

Let’s say you want to have some global counters in your system. For example the number of times an http request has been made. If the system is very busy that counter will be incremented many many times per second by many different processes. Before OTP-22 the best way that I know of to get the best performance is by using a striped ets tables. i.e. something like the code below:

incr(Counter) ->
  ets:update_counter(?MODULE,{Counter,erlang:system_info(scheduler_id)},1).

read(Counter) ->
  lists:sum(ets:select(?MODULE,[{{{Counter,'_'},'$1'},[],['$1']}])).

The code above would make sure that there is very little contention on the ets table as each scheduler will get a separate slot in the table to update. This comes at the cost of more memory usage and that when reading the value you may not get an exact value.

In OTP-22 the same can be achieved by using counters. Counters have built-in support for striping by using the write_concurrency option, so we don’t have to write our own implementation for that. They are also faster and use less memory than ets tables, so lots of wins.

The remaining problem then is finding the reference to the counter. We could put it into ets and then do an ets:lookup_element/3 when updating a counter.

cnt_incr(Counter) ->
    counters:add(ets:lookup_element(?MODULE,Counter,2),1,1).

cnt_read(Counter) ->
    counters:get(ets:lookup_element(?MODULE,Counter,2),1).

This gives a performance degradation of about 20%, so not really what we want. However, if we place the counter in persistent_term like the code below we get a performance increase by about 140%, which is much more in line with what we wanted.

cnt_pt_incr(Counter) ->
    counters:add(persistent_term:get({?MODULE,Counter}),1,1).

cnt_pt_read(Counter) ->
    counters:get(persistent_term:get({?MODULE,Counter}),1).

The reason for this huge difference is because when the counters are placed into persistent_term they are placed there as literals which means that at each increment we not longer have to make a copy of the counters reference. This is good for two reasons:

1) The amount of garbage will decrease. In my benchmarks the amount of garbage generated by cnt_incr is 6 words while both ets_incr and cnt_pt_incr create 3 words.

2) No reference counts have to be modified. What I mean by this is that the counters reference is what is called a magic reference or nif resource. These references work much in the same way as reference counted binaries in that they are not copied when sent to different processes. Instead only a reference count is incremented at copy and then decremented later by the GC. This means that for cnt_incr we actually have 3 counters that are modified for each call. First we increment the reference count on the counter when copying from ets, then we update the actual counter and then eventually we decrement the reference counter. If we use persistent_term, the term is never copied so we don’t have to update any reference counters, instead we just have to update the actual counter.

However, placing the counter in persistent_term is not trouble free. In order to delete or replace the counter reference in persistent_term we have to do a global GC which depending on the system could be very very expensive.

So this method is best to only be used by global persistent counters that will never be deleted.

You can find the code for all the above examples and the benchmark I ran here.

Logger level check

In logger there is a primary logging level that is the first test to be done for each potential log message to be generated. This check can be done many times per second and needs to be very quick. At the moment of writing (OTP-22) logger uses an ets table to keep all its configuration which includes the primary logging level.

This is not really ideal as doing a lookup from the ets table means that we have to take a read-lock to protect against parallel writes to the value. Taking such a read lock is not terribly expensive, but when done thousands of times per second it adds up.

So in this PR I’ve used persistent_term as a cache for the primary logging level. Now when reading the value from the hot path logger will instead use persistent_term. This removes all locks from the hot path and we only need to do a lookup in the persistent_term hash table.

But what if we need to update the primary logger level? Don’t we force a global GC then? No, because the small integer representing the primary logger level is an immediate. This means that the value fits in one machine word and is always copied in its entirety to the calling process. Which in turn means that we don’t have to do a global GC when replacing the value.

When doing this we have to be very careful so that the value does not become a heap value as the cost of doing an update would explode. However, it works great for logger and has reduced the overhead of a ?LOG_INFO call by about 65% when no logging should be done.

Large constant data

We use an internal tool here at the OTP-team called the “ticket tool”. It basically manages all of the OTP-XYZ tickets that you see in the release notes that comes with each release of Erlang/OTP. It is an ancient tool from late 90’s or early 00’s that no one really wants to touch.

One part of it is a server that contains a cache of all the 17000 or so tickets that have been created through the years. In that server there is a single process that has each ticket and its state in order to speed up searching in the tickets. The state of this process is quite large and when it is doing a GC it takes somewhere around 10 seconds for it to finish. This means that about every 10 minutes the server freezes for 10 seconds and we get to experience the joy of being Java programmers for a while.

Being a VM developer I’ve always thought the solution to this problem is to implement either an incremental GC or at least a mark and sweep GC for large heaps. However, the ticket tool server has never been of high enough priority to make me spend a year or two rewriting the GC.

So, two weeks ago I decided to take a look and instead I used persistent_term to move the data from the heap into the literal area instead. This was possible to do because I know that the majority of tickets are only searched and never changed, so they will remain in the literal area forever, while the tickets that do get edited move onto the heap of the ticket server. Basically my code change was this:

handle_info(timeout, State) ->
  persistent_term:put(?MODULE,State),
  erlang:start_timer(60 * 60 * 1000, self(), timeout),
  {noreply,persistent_term:get(?MODULE)}.

This small change puts the entire gen_server state into the literal area and then any changes done to it will pull the data into the heap. This dropped the GC pauses down to be non-noticeable and took considerable less time to implement than a new GC algorithm.

Permalink

Personal notes from Elixir Conf 2019

Personal notes from Elixir Conf 2019

Last week I attended Elixir Conf 2019 in Aurora, Colorado. It was my fourth Elixir Conf and by far the one that I engaged more with other people that I've never talked before, the hallway track was great and I could meet again some conference buddies from other editions. Another highlight was the conference hotel, a brand-new resort close to Denver, who knows me well understands how I like anything Colorado so I am glad the next edition will be in the same location.

The conference format was similar to previous years, it was three tracks, two days, with two extra days for an optional training, that I didn’t attend this time. The conference had 4 keynotes, José Valim (Elixir creator), Chris McCord (Phoenix creator) and Justin Schneck (Nerves co-creator), Dockyard team introducing Lumen, and great talks.

All the talks and keynotes are available in the Elixir Conf YouTube channel. Each keynote was focused in some areas, Elixir language and future, LiveView, Nerves and the brand-new Lumen, a project from Dockyard that uses Elixir in the client (browser).

As I always like to take notes when attending conferences and these are my highlights for Elixir Conf 2019. Please be advised that those notes are written like reminders for things I considered relevant during the talks and they are not a summary of them by any means. As the conference had many tracks, of course I couldn't attend all the talks, so feel free to comment with your favorite talk and notes:

Keynote: José Valim

Before anything, thank you very much José for the incredible language, it is a pleasure working full-time with Elixir and share the same community with great people, and I extend that to all Elixir core team as well.

Goal for 2019 is streamline Elixir in production

The main goal is enhance the ability to put applications in production easier, and to ensure what is running in production can be inspected.

The two main pieces are releases and telemetry.

Releases

  • Elixir 1.9 introduced release commands as part of the language, with lots of lessons and concepts from Distillery;
  • shellcheck to test shell scripts;

Telemetry

The library is composed by many pieces and helps snapshot what is going on inside running applications through metrics from the BEAM and custom ones.

  • telemetry, telemetry_poller, telemetry_metrics, and telemetry_reports;
  • Phoenix 1.5 will come with a Telemetry module to help define application specific metrics;

What is next to Elixir?

The next release, 1.10, will come with a different set up for its CI, now using Cirrus CI. It will also contain compiler tracing and ExUnit pattern diffing.

What is next to José Valim?

Now that the language enhancements are getting to a very stable point, and the community projects are being built and showcasing the power of the language, the efforts will be more directed to lower level (HiPE, Lumen and Hastega) and code analysis.

Keynote: Chris McCord

At Lonestar Elixir Conf 2019, Chris McCord presented the still-private project LiveView, which enables rich, real-time user experiences with server-rendered HTML. Right after the conference, the project was made available to the public and since then, a collection of projects is showcasing how interesting and powerful LiveView is. That also includes a Phoenix Phrenzy, a contest for LiveView projects.

In this keynote, he presents a few interesting points of LiveView and also what is coming next and the reasons behind.

LiveView library

  • template engine;
  • tests;
  • navigation: live_redirect, live_link and handle_params for pagination URL changes;
  • prepend, append, ignore and replace updates (phx-update);
  • JS hooks when you need just a piece of extra Javascript;
  • IE11 support;

Coming soon

  • phx-debounce;
  • file uploads;
  • exception translation;
  • stash for client state;
  • automatic from recovery;

Keynote: Justin Schneck

In his keynote, Justin Schneck, again, gave a really nice demo to show the improvements Nerves is getting.

Resilient

  • operating system: Erlang VM and Linux kernel;
  • application domain: Nerves runtime allows other than Elixir, and resilience from Erlang VM;
  • application data for specific data storage;

Reproducible

Read-only filesystem, immutable;

Reasonable

Whitelist approach (build up) with minimal output;

Nerves Hub

  • CryptoAuthentication chip;
  • NervesKey: write once;
  • delegated auth;
  • certificate;
  • remote console;
  • Terraform scripts to host your own Nerves Hub;

Keynote: Brian Cardarella, Paul Schoenfelder and Luke Imhoff

Introducing Lumen

Lumen is a new project from Dockyard that uses Elixir in the client. It uses WebAssembly and Rust as well in the compiler.

Compiler

The compiler has some unique constratins such as code size, load time and concurrency model.

Why not BEAM?

  • Runtime: unavailable APIs, incompatible scheduler, JS managed types;
  • Code size: BEAM bytecode is expensive, weak dead-code elimination;
  • Performance: VM on a VM, JS engine unable to reason about bytecode;

New Compiler

  • Restrictions: no hot-code, allow dead-code elimination;
  • Ahead of time vs VM: only pay for what you use, no interpretation overhead;
  • Build on existing tools: LLVM, Rust, wasm-bindgem;
  • Key challenges: recursion and tail-call optimization, non-local returns, green threads, webassembly specific limitations;
  • Continuations: represent jumps as calls to a continuation, never return, always moving forward;

Frontend
Accepts source in Erlang and Core Erlang, including mix task.

Middle tier
AST lowered to EIR.

Backend
Lowers from EIR to LLVM IR, generate object files.

Future goals
Push more data to LLVM, bootstrapping, MLIR, and embedded targets.

Runtime

Memory management
BEAM and Rust, with property-based testing.

Memory model
Process heap and pre-process garbage collection.

Processes
Very similar with what we have in Elixir. Code stack, mailbox, pid,
memory, links, and so on.

Schedulers
One per thread, main thread and webworkers.

WebAssembly main thread
Calls are blocking, scheduler needs to run timers.

Interacting with web
JS calling Lumen, and Lumen calling JS.

Why Lumen?

  • Joe's paper about GUI in functional programming;
  • Optimize front-end development;
  • GUI is concurrent by nature, we could have a supervisor taking care
    of a DOM tree;

Phoenix LiveView Demystified: Alex Garibay

In this talk Alex showed some LiveView internals, how it works and how it works so well.

LiveView EEx

From template with sigils to AST

  • static/vars and dynamic code;
  • %Phoenix.LiveView.Rendered{} with static, dynamic and fingerprint;
  • %Phoenix.LiveView.Comprehension{} to optmize data sent to the client;

Mounting the DOM

  • rounter, controller or template with live and live_render macros;
  • rendered template has few ids for channels and sessions;
  • container <div> that receives the template can be configured to be any HTML tag;

Phoenix Channels

  • uses Phoenix.LiveView.Socket;
  • uses a specific channel "lv:*";
  • socket receives session data and potentially user data;
  • client triggers an event that is sent to the socket using %Phoenix.Socket.Message{};
  • channel handles the event with callbacks;

Javascript

  • import LiveView library;
  • instantiate and connect LiveSocket;
  • a JS view is created with LiveView container, socket and so on;
  • each JS object in the view has static and dynamic data that is constructed for every change;
  • uses Morphdom.js to re-render the changes in the DOM;

WebRTC from start to finish: Scott Hamilton

WebRTC (Web Real-Time Communication) is a free, open-source project that provides web browsers and mobile applications with real-time communication (RTC) via simple application programming interfaces (APIs). It allows audio and video communication to work inside web pages by allowing direct peer-to-peer communication, eliminating the need to install plugins or download native apps.

Janus is a general purpose WebRTC server that has an Elixir client available.

WebRTC

  • spec and project;
  • basic implementation in P2P;
  • terminology: STUN, TURN and SDP;

Janus

  • general purpose WebRTC gateway;
  • JS, websocket;
  • 101: session, plugin, peer connection, handle;
  • resources to learn it are easy to find;

Elixir Phoenix Janus

Phoenix as middleman between client and Janus.

What could go wrong?

  • Janus configuration - ice trickle;
  • Janus on Docker;
  • deployment;
  • translation from Janus response to Elixir terms;
  • mixing HTTP and WS calls;

Elixir + CQRS - Architecting for availability, operability, and maintainability at PagerDuty: Jon Grieman

PagerDuty has a feature that records all incident user logs and they use CQRS pattern to design that piece of functionality.

They use composite keys for tracing that can be order-able. Other benefits of the approach is having separation for monitoring and scaling.

Overall incident log system

  • upstream systems;
  • Kafka;
  • creator;
  • DB cluster;
  • querier;
  • client systems;

DB incident recovery

  • whole stack that could be duplicated in region;
  • replace the DB engine entirely after back to normal;
  • operational benefits coming from ES and CQRS;

Date, Time, and Time Zones in Elixir 1.9: Lau Taarnskov

Handling date and time is a challenge in any language, in this talk we see the details behind Calendar in Elixir that had an upgrade in the version 1.9.

Differences between UTC, TAI and the concept of layers of wall time.

Elixir Calendar, Date and Time specifics

  • sigils: ~D, ~T, ~N, ~U;
  • chose the type by the meaning, not convenience;
  • NaiveDateTime vs DateTime date/time comparison;
    - no data is better than fake/assumed data (JS and Ruby, for example);
    - correct data is better than assumptions;
  • TimeZoneDatabase behaviour, as example tzdata;
  • check date time stored to verify they are appending Z;

Mint, disrupting HTTP clients: Andrea Leopardi

Mint is a reasonable new low-level HTTP client that aims to provide a small and functional core that others can build on top.

Story and features

The initial need was caused by a potential removal of httpc from Erlang standard library. Mint is a processless client, that defines a wrapper as data structure for the connection, on top of gen_tcp. Mint knows how to handle raw bits and also HTTP protocol.

Streaming by default
Responses will arrive async (status, headers, body and so on).

Security

  • httpc is not safe by default;
  • hackney is safe by default but can be overridden if you work with `ssl_options`;
  • mint is safe by default;
  • SSL certificate store with castore;

Proxying
Mint has proxying support for request and tunnel proxy types.

HTTP2

  • multiplexed streams;
  • server push;
  • backpressure;

Use cases

GenServer, GenStage, GenStatem with connection data. Also, it can be used as one process with many connections.

Challenges and future planes

  • immutability;
  • low level but usable;
  • increase adoption;
  • pooling;
  • websockets;
  • gRPC;

BEAM extreme: Miriam Pena

In her talk, Miriam showed us things to consider to improve performance, also alerted us that any tuning in the VM should be done only when needed, for very specific cases. Besides that, performance measuring should be done for a long time, and not using IEx.

One of the things she mentioned is that memory copy, something that happens a lot in the BEAM brings CPU overhead.

Pool bottleneck

  • use process directly instead of GenServer, 80% faster;
  • leverage process priority level;
  • as a con, hard code readability;

Key-value storage

  • process dictionary: hard to debug, tuple performance as an example;
  • code generation;
  • persistent term: access in constant time, write once read many, no copy to memory heap, tuple performance as an example;

NIFs

  • for when all the rest fails;
  • extreme memory usage;
  • no VM guarantees;

Other suggestion is to keep OTP version updated as possible as new releases are always improving performance.

Contracts for building robust systems: Chris Keathley

In this talk, Chris presents some insight why Design by Contract should be considered and how his new library Norm can help projects in the data specification and generation aspect.

Breaking changes require coordination

  • refactor;
  • requirement (requires no breaking changes);
  • technology;

Contracts

  • enforcing callers are correct (pre-conditions);
  • ensure the function is correct (post-conditions);
  • ExContract;

Data specification

  • Norm;
  • using in combination with ExContract;
  • supports schema, optionality;

Testing strategy

  • Norm uses stream data to generate data for testing;
  • can be used with property-based test;

Writing an Ecto adapter, introducing MyXQL: Wojtek Mach

Even not personally interested in MySQL as prefer and use only Postgres, I was willing to know more about internals of Ecto and how Ecto uses its adapters to connect and interact with databases.

Driver

  • :gen_tcp for database connection;
  • library binpp;
  • Wireshark app;

MySQL packet

  • payload length;
  • sequence id;
  • packet payload;
  • leverages Elixir binary pattern matching;
  • initial handshake package;
  • handshake response;

Building the adapter

  • encoding and decoding: function for every data type, OK_Packet with defrecord;
  • DBConnection behaviour: maintain a connection pool, does not overload the database, reconnect, support to common database features, and it needs to be fast;

Connection

  • start N connections using DBConnection based on the pool configuration;
  • fetching results preparing and executing the query;
  • other functions as disconnect, checkout, ping, and so on;

Ecto integration

  • all the features Postgres adapter has;
  • implements Ecto.Adapter, Ecto.Adapter.Queryable, Ecto.Adapter.Schema, Ecto.Adapter.Storage, and Ecto.Adapter.Transaction;
  • constraints;
  • data normalization: boolean as an example, as in MySQL it is set as 1 and 0;
  • integration tests;

Kubernetes at small scale: Phil Toland

Kubernetes has great benefits, even being a not so easy implementation. Some of the benefits are improved resource efficiency and reduced cost, and operational scalability. In this talk Phil described his process to implement Kubernetes at Hippware.

Main components

  • control plane (leave it alone);
  • workers;
  • workload:
    - pod;
    - deployment (how your pod runs in the cluster);
    - service (expose a pod to outside, expose a port);
    - config map;

Lessons learned

  • outsource as much as possible;
  • automate all the things;
  • pods are ephemeral;
  • automatic clustering: via libraries as libcluster and peerage;
  • deploying: via libraries as kubernetes-deploy and swarm;
  • one instance per node: anti-affinity specification;

ETS Versus ElasticSearch for Queryable Caching: David Schainks

In this talk David compares the characteristics of ElasticSearch that is well-known as a great solution for search and cached data, with Elixir/Erlang out-of-box solutions such as ETS and Persistent Term, listing the pros and cons of each option.

ElasticSearch

  • filtering, auto completion and full text search;
  • performant;
  • queryable cache;
  • operational cost: another DSL, integration time, expensive, and configuration gotchas;

ETS

  • no garbage collection;
  • use cases: preset configuration, cached data, message buffer;
  • filtering with match/2, match_object/1 and fun2ms/1;
  • auto completion with fun2ms/1;
  • full text search: using Joe Armstrong's elib1 library;

Real world concerns

  • performance with large data sets;
  • data ranking;

Operational concerns

  • high availability;
  • synchronization;
  • index changes;

Persistent term

  • no filtering;
  • much faster than ETS;
  • garbage collection on update;

UI Testing is Ruff; Hound Can Help: Vanessa Lee

Whether you call it UI testing, end-to-end testing, end-to-user testing, or acceptance testing, it is often an intensely manual and time-consuming process. An Elixir library, Hound, can carry some of the load through browser automation.

Hound takes screenshots of tests and stores in the test folder.

Library test helpers

  • check page elements;
  • fill, execute actions and submit form;
  • manage sessions;

Property-based testing

  • combining Hound with property-based libraries is very straightforward;
  • using custom StreamData generator using bind/2 helper;

While handy, there are some gotchas when elements are not ready or if the application is heavily dependent in Javascript events/side effects;

Permalink

Copyright © 2016, Planet Erlang. No rights reserved.
Planet Erlang is maintained by Proctor.