Rabbit’s Anatomy - Understanding Topic Exchanges

Topic Exchange Intro/Overview

In RabbitMQ (and AMQP in general), an exchange is the abstraction and routing entity to which messages are published to, from external connecting applications. There are four kinds of exchange types, one of which is the TOPIC exchange. Amongst these four exchange types, the TOPIC exchange offers the most flexible routing mechanism. Queues are bound to TOPIC exchanges with the routing key. This is a pattern matching process which is used to make decisions around routing messages. It does this by checking if a message’s Routing Key matches the desired pattern.

A Routing Key is made up of words separated by dots, e.g.: floor_1.bedroom.temperature. The Binding Key for Topic Exchanges is similar to a Routing Key, however there are two special characters, namely the asterisk * and hash #. These are wildcards, allowing us to create a binding in a smarter way. An asterisk * matches any single word and a hash # matches zero or more words. Here are examples of patterns to match on messages from:

  • - all devices on first floor in bedroom: floor_1.bedroom.*,
  • - all devices on first floor floor_1.#.

It’s clear that using a Topic Exchange allows for much simpler and more specific Routing. Now let’s take a look at how a Topic Exchange actually works.

Trie

Understanding how a trie data structure is a key to understanding how a Topic Exchange works. The trie data structure is a tree that holds ordered data and is typically used for storing string values. Each node in the tree represents a prefix of a string and holds links to child nodes, which share the same prefix. The child nodes are addressed with the character following the prefix for the node.

This data structure allows us to search for specific strings independent of the structure’s size. The characters of a string are used to traverse the tree while searching it. The trie is used for storing Binding Keys in a Topic Exchange. A Binding Key is split by dots and all the string parts are used as pointers to the next node. Every time a new binding is added to a Topic Exchange, the trie associated with it is updated. And each time a new message has to be routed, the trie is queried to look for the message’s destinations.

Implementation of the trie

A Topic Exchange trie is implemented on top of Mnesia, the built-in distributed database of Erlang. Nodes and edges of a trie are stored in rabbit_topic_trie_node and rabbit_topic_trie_edge tables respectively.

#topic_trie_node{
    trie_node = #trie_node{
        exchange_name,
        node_id
    },
    edge_count,
    binding_count
}
#topic_trie_edge{
    trie_edge = #trie_edge{
        exchange_name,
        node_id, % parent
        word
    },
    node_id % child
}

In this case, trie_node or trie_edge records are the primary keys used to identify records. Both nodes and edges are assigned to one particular Topic Exchange by specifying an exchange_name field in primary key. Nodes are also used to identify bindings, because they are not stored directly in the nodes table you will need to obtain them using node_id from the rabbit_topic_trie_binding table. Edges store information about the connections between parent and child nodes. Edges also contain the part of the Binding Key (word), which is used to traverse the tree. Therefore, traversing through the tree requires a sequence of Mnesia queries.

Topic Exchange internals

An Exchange type is created by implementing the rabbit_exchange behaviour. In the context of tries in a Topic Exchange, the are two interesting operations. Namely, add_binding/3 and route/2, the first implements the adding of a new binding to the structure and the latter is used to determine target for routing.

Binding Operation

The arguments needed to create a binding are:

  • - source Exchange
  • - Binding Key
  • - destination

Every trie starts with the root node,this represents the empty Binding Key. It makes sense, as an empty string is a prefix for any string. The first operation is pretty straightforward - the Binding Key has to be split by dots . and stored in a list. For example the key “a.b.c” will be transformed to [“a”, “b”, “c”]. Let’s call the list Words for later. It will be used for traversing the data structure. Then, recursively the tree is traversed down, starting with root as a current node.

  1. Repeat until the Words list is empty. 1.1. Take the head part of Words list and query Mnesia for child matching it. 1.2. If node is found, use it as a new current node and go to 1.1 with the rest of Words list. Otherwise go to 2.
  2. Create child nodes using rest of the Words list.
  3. When Words list is exhausted, create a rabbit_topic_trie_binding for the current node. It signals that there are bindings associated with it.

Here is an example binding operation. Let’s assume there is a Topic Exchange with two existing bindings: floor_1.bedroom.temperature and floor_1.#. our example trie structure would look like this:

Let’s add a new binding with the Binding Key floor_1.bedroom.air_quality. First we split it with dots: [floor_1, bedroom, air_quality]. There are already keys floor_1 and bedroom, but the latter one is missing. Therefore a new node has to be created. Then, the rest of the key [air_quality] is used to create nodes. Finally a new binding is associated with the newly created node and we have a structure that looks like this:

As you can see, to insert the new node, three read queries were executed to retrieve the edge between the last pair of nodes: {root, floor1}, {floor_1, bedroom} and {bedroom, air_quality}. However, the latter edge was not present, so 2 write operations were executed: the first updates the edge_count for the bedroom node and the second inserts a new edge. At this point the trie structure is ready to create the final node. Therefore, another two write operations will need to be completed:

  • - One to create the actual node, which corresponds to the given Binding Key,
  • - The second to create the entry in rabbit_topic_trie_binding, which bounds the node with the destination for messages.

The Mnesia tables used here are an ordered_set type, which means that it is implemented with a binary search tree. Thus, both read and write operations have complexity O(log(n)), where n is the size of the table. It can be observed, that first phase of traversing through the trie requires:

  1. read operation when, a node exists
  2. write operation when a node is not existing.

The final phase of inserting the actual binding requires two extra operations. In a worst case scenario, where there are no nodes and all of them need to be created, the complexity is O(n*2*log(m) + 2*log(k)), where n is the length of the Binding Key, m is the number of nodes in the table and k is number of actual bindings. The m and k are global, so the efficiency of queries depends on the global number of bindings/nodes, not just for the given Exchange. For simplicity it is assumed that the number of edges and nodes are equal, because in this structure (number of edges) = (number of nodes - 1).

Routing Operation

Routing happens when a new message arrives to a Topic Exchange. The trie structure needs to be queried using the Routing Key associated with the message. However, traversing through the trie is not straightforward, as wildcards * and # need to be taken in to account.

As with a binding operation, the beginning of the Routing Key is split by dots. Again, let’s call it Words list [ Word | RestW ] = Words. The process starts with a root node. Then the algorithm of discovering the binding is a recursive exploration of the tree in three ways:

  • - Look for # in child nodes. If node is found, it is considered as new root and new scans are started with all remaining parts of Words, e.g: if Words is [a,b,c], then start searching again with [a,b,c], [b,c], [c] and [].
  • - Look for * in childs nodes. If node is found, continue with found node as a root and RestW.
  • - Look for Word in child nodes. If node is found, continue with found node as a root and RestW.

Exploring will be finished for all cases when all pathways in Words is exhausted. The last step in this process is to look for extra child nodes connected through any of the hash # wildcards. This has to be done because the # wildcard stands for “zero or more”. So here is an example of the new search algorithm. Let the Topic Exchange have the following bindings: floor_1.*.air_quality, floor_1.bedroom.air_quality, floor_1.bathroom.temperature. Now, let’s examine the routing for message published with the Routing Key floor_1.bedroom.air_quality, which will match all the bindings. Here is the trie representation, where the current node is marked as blue and the number on the node represents the number of bindings.

The first step is to find out if there is a hash # child node. in the example above it is not present. Then, the asterisk * child node is queried, but it is also not present. Finally, the algorithm will find a node, matching the head of Words list - floor_1:

Now, the algorithm will consider the blue node as a new root and start again. Again, there is no hash # child, but an asterisks is found. Then, the head of the Words list is consumed and the algorithm moves down:

Here there is only one option available - air_quality:

The Words list is exhausted, so the current node is a result. There is one extra step - the hash # child node has to be queried again, because it also accepts empty lists. However, it is not found, so only the current blue node is considered to be a result. Let’s mark the found node with a green and get back to previous node:

The node was found using an asterisk, but there is one step left. It has to be checked to see if there is a bedroom child node. And, in this case there is one:

There is one word left and the child node is present:

The Words list is empty again, so current is result:

The final step is to query for any bindings associated with the bindings we found. According to the numbers on the found nodes, there are two bindings. They are the final result of the route/2 function.

Now, let’s consider another example, with hash bindings present. There are three bindings present: #.air_quality , floor_1.# and floor_1.bedroom.air_quality.#. Again the floor_1.bedroom.air_quality Routing Key will be used:

In this example we have found a hash node. This will cause the algorithm to go down to that node with all available Routing Key parts:

Let’s emphasise this again: the current node was reached via # edge, so the algorithm visits the current blue node 4 times with different Words lists. They are presented on the figure. One of the Words list is an empty one [], so this node is also appended to the results. There is no floor_1 or bedroom edge going out of this node, but there is air_quality one. So, the algorithm goes to the leaf node using the third Words list. Then:

The current Words list is empty, so this node is also a result of the search. There are no hash child nodes, so the current branch is finished. The algorithm will go back to the root node:

The only option for the algorithm to go down is the head of the Words list:

Again there is a hash child node, so it needs to be visited with all tails of the Words list. Three times in this case:

One of the Words lists is empty, so the current blue node is appended to the result list. As there are no child nodes, the algorithm goes back:

Now the algorithm will go down two times consuming all remaining words. Let’s jump directly to it:

The Words list is empty, so the current node is also part of the result. However there is also a hash # child node. According to the algorithm, if any node is considered as a result, its child hash nodes are matched. So finally there are 5 nodes found:

The final step is to find bindings associated with found nodes. The final results of route/2 function are all 3 bindings.

In term of complexity it is hard to estimate precisely. The algorithm does 3 queries for each node. The # nodes result in duplication of query paths, as it starts the whole algorithm with all remaining parts of Words. However all operations are depending on two factors - the Words list length and the total number of nodes existing in the table.

So assuming the worst case, where the bindings are: #, #.#, #.#.# ... k*#, we can see that each level will run with all possible combinations of Words, some of them will be visited many times with exactly the same Words. Then, the first node is visited n times, second is visited sum(1,n), third sum(1,sum(1,n)) and so on. We can rewrite it as:

The total number of operations is k1+k2+…+kk. When this recursive equation is unwrapped, every new level contains two times more multiplications than the previous one. The level k will contain 2k multiplications. It will be dominant in terms of complexity, so we can bound the complexity by O(2k*n*log(m)), where k is maximum trie depth, n is the length of the Words and m is the total number of nodes.

However, the above example is extreme and bindings like #.#.# make no sense. Then, the average complexity would be close to O(nlog(m)), because it makes no sense to put two subsequent hash # parts of the key. The overhead introduced by a single hash node should not be significant, because in such case the traversing trie with different Words stops after the hash edge.

Evaluation

This section will cover the performance of a Topic Exchange. The experiments will demonstrate the characteristics of a Topic Exchange under different conditions.

Synthetic tests

Two experiments will be presented, in order to experimentally confirm or reject the assumptions made in the previous Routing Operation section:

  • First, the relation between the number of bindings and routing operation time. The bindings are fixed and the routing key length is adjusted. The linear dependency is expected.
  • Secondly, the routing key length is fixed and the number of bindings is varying.

Both experiments are performed under following conditions:

  • - Tests are made on single RabbitMQ node.
  • - Routing time is measured by checking time of evaluation rabbit_exchange_type_topic:route/2.
  • - Measurements are made 50 times and average results are presented on the figures.
  • - The Bindings Keys are random, ensuring that there are no two subsequent hashes in any Binding Key.
  • - The Binding Key part has 70% chance to be a word, 20% chance to be an asterisk and 10% to be a hash.
  • - The Routing Keys are created from existing Binding Keys - For example the Routing Key with length n, will be created from existing Binding Key with the length n. Any hashes or asterisks, are replaced by random strings. It ensures that operation must traverse through at least n levels of trie.

The above figure presents the results of the three experiments. Now, let’s slightly modify the conditions to visualize the impact of a # hash key in Trie structure. There is only one binding added, which is just two subsequent hashes #.#. Then, the performance will look like this:

The red curve bends, as we expected. When there are more # bindings on the query path, the relation between the Routing Key length and query time is no longer linear. This effect can also be observed in 10k bindings series - the green curve also bends slightly. This can be explained in the same way - there are more Bindings Keys starting with a #, this increases query time for all queries.

Let’s check it in the RabbitMQ Management UI:

Here we have roughly 50 bindings like the ones above, if we replace them we will see a more linear relation and get a better overview of the impact of the performance from our hash # as seen below:

Again, the time to find relevant routes has improved. Now, let’s examine the way the number of bindings impacts query time. As we explained in the previous section, a logarithmic relation is expected:

This example also follows the expected behaviour. All the bindings are stored in a single Mnesia table. Querying any node has its own complexity. Where there are more entries in the table, the query time grows. As the table has an ordered_set type, the query time has logarithmic complexity, what is actually observed.

Summing up, the previous experiments align to the theory we started with. The expectations about the impact of Route Key length and number of bindings to the routing operation time was confirmed. The huge impact of hash a # wildcard has also been confirmed and the scale of it was presented.

Real world example

The two previous examples measured the time of a single query. While this is still valuable, it does not necessarily reflect a real world use case. The test is synthetic and focuses on a single query, but is a Topic Exchange performance overhead also observable when the overall performance of RabbitMQ is taken into account?

This section will present a performance evaluation of RabbitMQ integration to MongooseIM. MongooseIM is Erlang Solutions’ highly scalable, instant messaging server. The RabbitMQ component in MongooseIM simply reports each users’ activity, which may be:

  • - User became online/offline
  • - User sent/received message

Only sent/received messages will be discussed in this case. The Routing Key of the message activity follows simple pattern <username>.{sent, received}.

In order to evaluate the performance of the component, there was a load test designed. There were simulated XMPP clients connecting to the MongooseIM server. The simulated clients were exchanging messages with each other. Each message generated events, which were published to RabbitMQ. Then, we had a number of AMQP clients connecting to RabbitMQ, to consume generated events.

This is the outline of the experiment’s architecture:

For the purpose of this post, only results which were directly connected to a Topic performance were covered. Let’s define the key performance indicator as the Time To Delivery, the amount of time between a message being sent by a XMPP user and being received by Consumer of RabbitMQ’s queue. This value will be presented in the figures to follow.

Tests conditions were as follows:

  • - 90k XMPP users
  • - 10k AMQP consumers
  • - ~ 4,8k payloads/s from MongooseIM to RabbitMQ
  • - Payload size ~120B
  • - topic exchange with Binding Keys like user_N.*
  • - 20k bindings in total

In this case, the performance is presented in the following graph. It shows the 95th - 99th percentiles of the Time To Delivery, as well as the maximum observed Time To Delivery in a given time window.

The latter test had similar condition. The only difference was different Exchange type:

  • - 90k XMPP users
  • - 10k AMQP consumers
  • - ~ 4,8k payloads/s from MongooseIM to RabbitMQ
  • - Payload size ~120B
  • - direct exchange with Binding Keys like user_N.chat_msg_sent, user_N.chat_msg_recv
  • - 40k bindings in total

Under those condition performance was better, which is illustrated on following figure.

While the previous section showed the performance characteristics of a Topic Exchange, those examples provide an overview on a bigger scale. Both tests had identical characteristics apart from the exchange type being direct or topic (and consequently number of bindings). However, the difference in performance is significant in favor of the Direct Exchange. It allowed us to effectively decrease the Time To Delivery, which is a factor of efficiency in the case of the presented tests.

Summary

Now you’ve seen the basics of the Topic Exchange internals, a brief overview of its implementation, a theoretical overhead introduced by traversing the trie structure, as well as some performance evaluation. As it was observed, a Topic Exchange is not the fastest method and there are many factors which may influence its performance. However it is not true that Topic Exchanges are slow. In fact, they are generally fast in a typical RabbitMQ usage. These test conditions were specific. If there are a few bindings or the trie depth is not deep, the Topic Exchange overhead is usually negligible. Still, it is important to understand the underlying mechanism, as the example with MongooseIM’s RabbitMQ component presented - using different Exchange types resulted in a significant improvement in performance.

Few useful links

  1. Get some expert help or training on RabbitMQ
  2. Check the RabbitMQ and MongooseIM demos
  3. Find out more about MongooseIM

Permalink

The Best Code BEAM SF talks from the 2010s

Preparations for Code BEAM SF 2020 are well and truly underway. This year marks the 9th anniversary of the conference, meaning that Code BEAM has been bringing the best of the BEAM to the Bay Area for the best part of a decade. To whet for your appetite for this year’s event, and to say goodbye to the decade gone by, we thought it was a timely opportunity to look back at the talks that have made the biggest waves in the community every year since its launch in 2012. From the launch of Elixir to lessons from WhatsApp, we’ve got all the Major BEAM events covered. So sit back, relax and get excited with this comprehensive selection of top-class talks.

Highlights from Code BEAM SF 2019

Operable Erlang and Elixir by Fred Hebert

Successful systems grow, and as they grow, they become more complex. It’s not just the code that gets messier either; it is also the people who are part of the system who need to handle a growing level of complexity. In his talk, Fred Hebert explains why it is not enough to take a code-centric approach. Instead, he argues for a holistic approach to making Erlang and Elixir systems truly operator-friendly. This encompasses how our mental models work, what makes for best practice automation, and, what tools exist on the Erlang VM to help us deal with the unexpected. To learn more head to the page for Fred Hebert’s talk on the Code Sync website.

Announcing Broadway - keynote speech by José Valim

The development of Broadway was a significant step forward for Elixir in 2019. The protocol produced by Plataformatec streamlines data processing, making concurrent, multi-stage pipelines easier than ever. In his talk, José Valim explained how Broadway leverages GenStage to provide back-pressure and how Elixir depends on OTP for its famed fault-tolerance. Learn more about José’s talk at Code BEAM SF 2019 on the Code Sync website.

Highlights from Code BEAM SF 2018

The Forgotten Ideas in Computer Science - keynote speech by Joe Armstrong

Some things are just ahead of their time. In his 2018 keynote, Joe Armstrong looks at ideas from the early days of computing and reflects upon the good ideas that could be helpful to revisit and the bad ideas that we can still learn something from. This thought-provoking talk is one that still holds relevance today and is worth revisiting. Interested in what the forgotten ideas of computer science are? Learn more about Joe Armstrong’s talk at Code BEAM SF 2018.

A Reflection on Building the WhatsApp Server by Anton Lavrik

Whatsapp is arguably the BEAM’s most famous success story. In 2014, when Facebook purchased them, the stories of 10 server-side engineers managing a platform with 450 million active users, sending 54 billion messages a day were shared far and wide.
In his talk, Anton Lavrik described some of the tools they use for developing reliable and scalable servers in Erlang. The talk included tools that were not widely used at the time and methods that went against conventional Erlang practices. Want to find out what those tools and practices were? Learn more about Anton Lavrik’s talk at Code BEAM SF.

Highlights from Erlang Factory SF Bay Area 2017

Building a web app in Erlang by Garrett Smith

The long-held belief has been that Erlang is not suitable for web applications. The arrival of LiveView may have given Elixir the edge, but that doesn’t mean you can’t build web applications in Erlang. In this talk, Garrett Smith shows that Erlang can not only be used to build applications but, that it is actually great for it. The discussion is inspired by a presentation from Ian Bicking entitled “Building a Web Framework from Scratch”. It shows that web apps can be built without monolithic frameworks, starting with nothing and gradually adding functionality using abstractions that closely mirror the Web’s underlying protocols. This particular example may be built in Erlang, but the lessons and principles in the talk apply to developing web applications in general. Learn more about Garrett Smith’s talk at Erlang Factory 2017.

Highlights from Erlang Factory 2016

Why Functional Programming Matters, keynote speech by John Hughes

Nearly a decade before Erlang was released as open source; John Hughes published “Why Functional Programming Matters”, a manifesto for functional programming. In this talk, John takes a deep dive into the history of functional programming and explains why functional programming is more important than ever. Get a more detailed insight into John Hughes talk at Erlang Factory 2016.

Highlights from Erlang Factory 2015

Building And Releasing A Massively Multiplayer Online Game With Elixir by Jamie Winsor

The team from Undead Labs launched State of Decay to rave reviews, but they did have one piece of persistent criticism, ‘why was there no multiplayer option?’ To build the infrastructure to handle the massive scale of concurrent users required for massively multiplayer games online traditionally large engineering and support teams, as well as significant time and financial investment. In this talk, Jamie Winsor explains how they decided on Erlang as the right tool for the job to empower them to fast track their multiplayer offering without jeopardising company culture or their product. Learn more about Jamie Winsor’s talk at Erlang Factory 2015.

Highlights from Erlang Factory 2014

That’s 'Billion’ with a 'B’: Scaling to the Next Level at WhatsApp by Rick Reed

In 2014, Whatsapp was the toast of the tech community with its $19 billion sale to Facebook. The news came bought with it an influx of interest in Erlang, as people looked into how such an agile team of developers were able to build such a robust system. In this talk, Rick Reed explains how the team were able to meet the challenge of running hundreds of nodes, thousands of cores and hundreds of terabytes of RAM to scale and handle billions of users. Get more information on Rick Reed’s talk at Eralng Factory 2014.

Highlights from Erlang Factory 2013

The How and Why of Fitting Things Together by Joe Armstrong

Software is complex, and things get complicated when the parts don’t fit together. But how does this happen? And what can we do to prevent this happening? In his talk, Joe Armstrong answers these questions and explains where Erlang comes in to save the day. See the talk descriptions and more details about Joe’s talk from Erlang Factory 2013.

Highlights from Erlang Factory 2012

Scaling to Millions of Simultaneous Connections by Rick Reed

Just two years before they had to scale to billions, WhatsApp needed to figure out how to scale to millions. In this talk, Rick Reed explains how Erlang helps them meet the growing user-demand while continuing to keep a small server footprint. You can see more details from Rick Reed’s talk at Erlang Factory 2012 here.

Summary

For nearly a decade, Code BEAM SF has been the beacon of BEAM related news in the U.S.A. It provides a space for the Erlang and Elixir community to share great ideas, be inspired, connect and increase their knowledge base. As we enter a new decade, we are excited to see how these technologies will grow and become important players in growth sectors such as IoT, FinTech and machine learning. The first Code BEAM SF of the decade takes place in March, tickets are on sale now and you can see the full line up of speakers at codesync.global.

Permalink

Packaging and Distributing/Deploying Erlang GUI apps with ZX

In the last two posts I wrote walkthroughs for how to create new CLI and GUI apps in Erlang from scratch using a tool called ZX. Today I want to show how to package apps and publish them using Zomp (the package distribution system) and get them into the hands of your users with minimal fuss.

To understand how packages are distributed to users (the “why does anything do anything?” part of grokking ZX/Zomp), one must first understand a bit about how Zomp views the world.

Packages in the system are organized by “realms”. A code realm is like a package repository for a Linux distribution. The most important thing about realm/repository configuration is that you know by each package’s signature whether it really came from the it claims as its origin. Just like Linux repositories, anyone can create or host a Zomp realm, and realms can be mirrored.

(As you will see in a future tutorial, though, administration and and mirroring with Zomp is way easier and flexible than traditional Linux repositories. As you will see below, packaging with ZX is just a single command — ZX already knows everything it needs from the zomp.meta file.)

In this example I am going to put the example CLI project, Termifier GUI (a toy example app that converts JSON to Erlang terms) into the default FOSS realm, “otpr”. Because I am the sysop I have packaging and maintenance permissions for every package in the realm, as well as the sole authority to add projects and “accept” a package into the write-only indexes (packagers have “submit” authority, maintainers have “review”, “reject” and “approve” authorities).

[Note: The indexes are write only because dependencies in ZX are statically defined (no invisible updates) and the indexes are the only complete structure that must be mirrored by every mirroring node. Packages are not copied to new mirrors, they are cached the first time they are requested, with mirror nodes connected in a tree instead of a single hub pushing to all mirrors at once. This makes starting a new mirror very light weight, even for large realms, as no packages need to be copied to start (only the realm’s update history, from which the index is constructed), and packages in high demand experience “trickle down” replication, allowing mirrors to be sparse instead of complete. Only the “prime node” for a given realm must have a complete copy of everything in that particular realm. Nodes can mirror an arbitrary number of realms, and a node that is prime for one or more realms may mirror any number of others at the same time, making hosting of private project code mixed with mirrored public FOSS code a very efficient arrangement for organizations and devops.]

In the original Termifier GUI tutorial I simply created it and launched it from the command line using ZX’s zx rundir [path] and zx runlocal commands. The package was implicitly defined as being in the otpr realm because I never defined any other, but otpr itself was never told about this, so it merely remained a locally created project that could use packages hosted by Zomp as dependencies, but was not actually available through Zomp. Let’s change that:

ceverett@okonomiyaki:~/vcs$ zx add package otpr-termifierg

Done. That’s all there is to it. I’m the sysop, so this command told ZX to send a signed instruction (signed with my sysop key) to the prime node of otpr to create an entry for that package in the realm’s index.

Next we want to package the project. Last time we messed with it it was located at ~/vcs/termifierg/, so that’s where I’ll point ZX:

ceverett@okonomiyaki:~/vcs$ zx package termifierg/
Packaging termifierg/
Writing app file: ebin/termifierg.app
Wrote archive otpr-termifierg-0.1.0.zsp

Next I need to submit the package:

ceverett@okonomiyaki:~/vcs$ zx submit otpr-termifierg-0.1.0.zsp

The idea behind submission is that normally there are two cases:

  1. A realm is a one-man show.
  2. A realm has a lot of people involved in it and there is a formal preview/approval, review/acceptance process before publication (remember, the index is write-only!).

In the case where a single person is in charge rushing through the acceptance process only involves three commands (no problem). In the case where more than one person is involved the acceptance of a package should be a staged process where everyone has a chance to see each stage of the acceptance process.

Once a package has been submitted it can be checked by anyone with permissions on that project:

ceverett@okonomiyaki:~/vcs$ zx list pending otpr-termifierg
0.1.0
ceverett@okonomiyaki:~/vcs$ zx review otpr-termifierg-0.1.0
ceverett@okonomiyaki:~/vcs$ cd otpr-termifierg-0.1.0
ceverett@okonomiyaki:~/vcs/otpr-termifierg-0.1.0$ 

What the zx review [package_id] command does is download the package, verify the signature belongs to the actual submitter, and unpacks it in a directory so you can inspect it (or more likely) run it with zx rundir [unpacked directory].

After a package is reviewed (or if you’re flying solo and already know about the project because you wrote it) then you can “approve” it:

ceverett@okonomiyaki:~/vcs$ zx approve otpr-termifierg-0.1.0

The if the sysop is someone different than the packager then the review command is actually necessary, because the next step is re-signing the package with the sysop’s key as a part of acceptance into the realm. That is, the sysop runs zx review [package_id], actually reviews the code, and then once satisfied runs zx package [unpacked_dir] which results in a .zsp file signed by the sysop. If the sysop is the original packager, though, the .zsp file that was created in the packaging step above is already signed with the sysop’s key.

The sysop is the final word on inclusion of a package. If the green light is given, the sysop must “accept” the package:

ceverett@okonomiyaki:~/vcs$ zx accept otpr-termifierg-0.1.0.zsp

Done! So now let’s see if we can search the index for it, maybe by checking for the “json” tag since we know it is a JSON project:

ceverett@okonomiyaki:~/vcs/termifierg$ zx search json
otpr-termifierg-0.1.0
otpr-zj-1.0.5
ceverett@okonomiyaki:~/vcs/termifierg$ zx describe otpr-termifierg-0.1.0
Package : otpr-termifierg-0.1.0
Name    : Termifier GUI
Type    : gui
Desc    : Create, edit and convert JSON to Erlang terms.
Author  : Craig Everett zxq9@zxq9.com
Web     : 
Repo    : https://gitlab.com/zxq9/termifierg
Tags    : ["json","eterms"]

Yay! So we can now already do zx run otpr-termifierg and it will build itself and execute from anywhere, as long as the system has ZX installed.

I notice above that the “Web” URL is missing. The original blog post is as good a reference as this project is going to get, so I would like to add it. I do that by running the “update meta” command in the project directory:

ceverett@okonomiyaki:~/vcs/termifierg$ zx update meta

DESCRIPTION DATA
[ 1] Project Name             : Termifier GUI
[ 2] Author                   : Craig Everett
[ 3] Author's Email           : zxq9@zxq9.com
[ 4] Copyright Holder         : Craig Everett
[ 5] Copyright Holder's Email : zxq9@zxq9.com
[ 6] Repo URL                 : https://gitlab.com/zxq9/termifierg
[ 7] Website URL              : 
[ 8] Description              : Create, edit and convert JSON to Erlang terms.
[ 9] Search Tags              : ["json","eterms"]
[10] File associations        : [".json"]
Press a number to select something to change, or [ENTER] to continue.
(or "QUIT"): 7
... [snip] ...

The “update meta” command is interactive so I’ll spare you the full output, but if you followed the previous two tutorials you already know how this works.

After I’ve done that I need to increase the “patch” version number (the “Z” part of the “X.Y.Z” semver scheme). I can do this with the “verup” command, also run in the project’s base directory:

ceverett@okonomiyaki:~/vcs/termifierg$ zx verup patch
Version changed from 0.1.0 to 0.1.1.

And now time to re-package and put it into the realm. Again, since I’m the sysop this is super fast for me working alone:

ceverett@okonomiyaki:~/vcs$ zx submit otpr-termifierg-0.1.1.zsp 
ceverett@okonomiyaki:~/vcs$ zx approve otpr-termifierg-0.1.1
ceverett@okonomiyaki:~/vcs$ zx accept otpr-termifierg-0.1.1.zsp

And that’s that. It can immediately be run by anyone anywhere as long as they have ZX installed.

BONUS LEVEL!

“Neat, but what about the screenshot of it running?”

Up until now we’ve been launching code using ZX from the command line. Since Termifier GUI is a GUI program and usually the target audience for GUI programs is not programmers, yesterday I started on a new graphical front end for ZX intended for ordinary users (you know, people expert at things other than programming!). This tool is called “Vapor” and is still an ugly duckling in beta, but workable enough to demonstrate its usefulness. It allows people to graphically browse projects from their desktop, and launch by clicking if the project is actually launchable.

Vapor is like low-pressure Steam, but with a strong DIY element to it, as anyone can become a developer and host their own code.

I haven’t written the window manager/desktop registration bits yet, so I will start Vapor from the command line with ZX:

You’ll notice a few things here:

  • Termifier GUI’s latest version is already selected for us, but if we click that button it will become a version selector and we can pick a specific version.
  • Observer is listed, but only as a “virtual package” because it is part of OTP, not actually a real otpr package. For this reason it lacks a version selector. (More on this below.)
  • Vapor lacks a “run” button of its own because it is already running (ZX is similarly special-cased)

When I click Termifier’s “run” button Vapor’s window goes away and we see that the termifierg-0.1.1 package is fetched from Zomp (along with deps, if they aren’t already present on the system), built and executed. If we run it a second time it will run immediately from the local cache since it and all deps are already built.

When Termifier terminates Vapor lets ZX know it is OK to shutdown the runtime.

A special note on Observer and “Virtual Packages”

[UPDATE 2020-01-12: The concept of virtual packages is going away, observer will have a different launch method soon, and a rather large interface change is coming to Vapor soon. The general principles and function the system remain the same, but the GUI will look significantly different in the future — the above is the day-2 functioning prototype.]

When other programs are run by Vapor the main Vapor window is closed. Remember, each execution environment is constructed at runtime for the specific application being run, so if we run two programs that have conflicting dependencies there will be confusion about the order to search for modules that are being called! To prevent contamination Vapor only allows a single application to be run at once from a single instance of Vapor (you can run several Vapor instances at once, though, as each invocation of ZX creates an independent Erlang runtime with its own context and environment — the various zx_daemons coordinate locally to pick a leader, though, so resource contention is avoided by proxying through the leader). If you want several inter-related apps to run at once within the same Erlang runtime, create a meta-package that has the sole function of launching them all together with commonly defined dependencies.

Because Observer is part of OTP it does not suffer from dependency or environmental conflict issues, so running Observer is safe and the “run” button does just that: it runs Observer. Vapor will stay open while Observer is running, allowing you to pick another application to run, and you can watch what it is up to using Observer as a monitoring tool, which can be quite handy (and interesting!).

If you want to run an Erlang network service type application using Vapor while using Observer (like a chat server, or even a Zomp node) you should start Vapor using the zxh command (not just plain zx), because that provides an Erlang shell on the command line so you can still interact with the program from there. You can also run anything using plain old zx run, and when the target application terminates that instance of the runtime will shut down (this is why ZX application templates define applications as “permanent“).

Cool story, bro. What Comes Next?

The next step for this little bundle of projects is to create an all-encompassing Windows installer for Erlang, ZX and Vapor (so it can all be a one-shot install for users), and add a desktop registration feature to Vapor so that Erlang applications can be “installed” on the local system, registered with desktop icons, menu entries and file associations in FreeDesktop and Windows conformant systems (I’ll have to learn how to do it on OSX, but that would be great, too!). Then users could run applications without really knowing about Vapor, because authors could write installation scripts that invoke Vapor’s registration routines directly.

If I have my way (and I always get my way eventually) Erlang will go from being the hardest and most annoying language to deploy client-side to being one of the easiest to deploy client-side across all supported platforms. BWAHAHAHA! (I admit, maybe this isn’t a world-changing goal, but for me it would be a world-changing thing…)

Permalink

NEWS: Erlang Solutions partners with EMQ

EMQ – the world’s Leading Provider of IoT Messaging and Streaming Platform – and Erlang Solutions – the world’s leading provider of Erlang and Elixir Solutions, sign channel distribution agreement across the USA and Europe.

Press Release | San Jose – April 9th 2019

San Jose US, London UK and Hangzhou, China: As IoT continues its explosive growth across the world, from autonomous vehicles to smart home devices, from Smart City Infrastructure to telecom grade IoT solutions, enterprise systems seek a more efficient, secure and faster communication mechanism. Companies are continuously turning to MQTT – and specifically EMQ – for stable enterprise class solutions. And supporting customers on a local basis has become an absolute prerequisite.

“We are delighted to announce this partnership. Our business is growing rapidly and Erlang Solutions possess the architectural and network expertise and additional product solutions that complement our EMQ X broker software solutions perfectly. They have offices all over Europe and presence in the US that will enable us to support our customers with greater efficiency.” - says Feng Lee, CEO and Founder of EMQ.

“We have demonstrated that we are the go-to solution for MQTT and the only open source provider to offer full MQTT V.5.0 support. We back that up with world class support and consulting services. Now EMQ and Erlang Solutions experts will be on hand in local geographies to have detailed conversations about solutions, and demonstrate how we can jointly help our customers build massively scalable and secure IoT systems.” - says Dylan Kennedy, VP of Global Operations at EMQ.

“We have been following the EMQ X broker since its inception, seeing it go from strength to strength. Alongside our expertise in the XMPP and AMQP, we are delighted to add MQTT to the suite of messaging protocols we officially support.” - says Francesco Cesarini, Founder and Technical Director at Erlang Solutions. “It will allow us to jointly help our customers with the development and deployment of end to end solutions in the IoT space.”

About EMQ

Founded in 2012, the Hangzhou EMQ Technologies Co., Ltd.’s mission was to create a massively scalable, highly available, MQTT-based message broker using the best tool for the job, the Erlang open source language. EMQ X Open Source was released in 2013, and is deployed in millions of IoT-connected devices worldwide.

EMQ X Enterprise is the commercial version that offers a highly extensible, feature-rich security software suite, as well as world-class support deployed to support tens of millions of devices. EMQ is headquartered in Hangzhou, China, and has offices in Beijing, Frankfurt and Silicon Valley, California. Visit our partner’s website www.emqx.io for further information. To contact the team, email contact@emqx.io or call Dylan Kennedy, SVP of Global Operations, at +1 (408) 476-1873 email: dylan@emqx.io.

About Erlang Solutions

Founded in 1999, Erlang Solutions Ltd. is an international technology company building trusted, fault-tolerant systems that can scale to billions of users. Erlang Solutions offers world-leading consultancy in the Erlang and Elixir - open source programming languages and for a range of leading messaging protocols: AMQP, MQTT and XMPP.

With over 80 experts based across London (HQ), Stockholm, Krakow, Budapest and satellite offices in the Americas, Erlang Solutions works with clients ranging from startups to Fortune 500, including WhatsApp, Klarna, Pivotal, Motorola, Toyota Connected, Ericsson and aeternity the blockchain company to mention a few. For more information, visit www.erlang-solutions.com or get in touch with Stuart Whitfield, CEO, at stuart.whitfield@erlang-solutions.com.

We thought you might also be interested in:

EMQ product page

Erlang & Elixir consultancy

Erlang & Elixir development

Permalink

Starting a simple GUI project in Erlang with ZX

A few days ago I wrote a tutorial about how to create a CLI program in Erlang using a new code utility called ZX that makes launching Erlang a little bit more familiar for people accustomed to modern dynamic language tooling.

Today I want to do another one in the same spirit, but with a simple GUI program as the example.

In the previous tutorial I did a CLI utility that converts files containing JSON to files containing Erlang terms. It accepts two arguments: the input file path and the desired output file path.

Today we’ll do a slightly more interesting version of this: a GUI utility that allows hand creation/editing of both the input before conversion and the output before writing. The program will have a simple interface with just three buttons at the top: “Read File”, “Convert” and “Save File”; and two text editing frames as the main body of the window: one on the left with a text control for JSON input, and one on the right a text control for Erlang terms after conversion.

First things, first: We have to choose a name and create the project. Since we did “Termifier” with a package and module name “termifier” before, today we’ll make it called “Termifier GUI” with a package and appmod “termifierg” and a project prefix “tg_”. I’ve clipped out the creation prompt output for brevity like before, but it can be found here: zx_gui_creation.txt.

ceverett@okonomiyaki:~/vcs$ zx create project

### --snip snip--
### Prompts for project meta
### --snip snip--

Writing app file: ebin/termifierg.app
Project otpr-termifierg-0.1.0 initialized.
ceverett@okonomiyaki:~/vcs$

If we run this using ZX’s zx rundir command we see a GUI window appear and some stuff happen in the terminal (assuming you’re using a GUI desktop and WX is built into the Erlang runtime you’re using).

The default templated window we see is a GUI version of a “Hello, World!”:

If we try the same again with some command line arguments we will see the change in the text frame:

The output occurring in the terminal is a mix of ZX writing to stdout about what it is building and WX’s GTK bindings warning that it can’t find an optional style module (which isn’t a big deal and happens on various systems).

So we start out with a window that contains a single multi-line text field and accepts the “close window” event from the window manager. A modest, but promising start.

What we want to do from here is make two editable text fields side by side, which will probably require increasing the main window’s size for comfort, and adding a single sizer with our three utility buttons at the top for ease of use (and our main frame, of course, being derived from the wxEvtHandler, will need to connect to the button click events to make them useful!). The text fields themselves we probably want to make have fixed-width fonts since the user will be staring at indented lines of declarative code, and it might even be handy to have a more natural “code editor” feel to the text field interface, so we can’t do any better than to use the Scintilla-type text control widget for the text controls.

Now that we know basically what we want to do, we need to figure out where to do it! To see where to make these changes we need to take a little tour of the program. It is four modules, which means it is a far different beast than our previous single-module CLI program was.

Like any project, the best way to figure out what is going on is to establish two things:

  1. How is the code structured (or is there even a clear structure)?
  2. What is called to kick things off? (“Why does anything do anything?”)

When I go into termifierg/src/ I see some very different things than before, but there is a clear pattern occurring (though it is somewhat different than the common Erlang server-side “service -> worker” pattern):

ceverett@okonomiyaki:~/vcs$ cd termifierg/src/
ceverett@okonomiyaki:~/vcs/termifierg/src$ ls -l
合計 16
-rw-rw-r-- 1 ceverett ceverett 1817 12月 27 12:50 termifierg.erl
-rw-rw-r-- 1 ceverett ceverett 3166 12月 27 12:50 tg_con.erl
-rw-rw-r-- 1 ceverett ceverett 3708 12月 27 12:50 tg_gui.erl
-rw-rw-r-- 1 ceverett ceverett 1298 12月 27 12:50 tg_sup.erl

We have the main application module termifierg.erl, the name of which we chose during the creation process, and then we also have three more modules that use the tg_ prefix we chose during creation: tg_con, tg_gui and tg_sup. As any erlanger knows, anything named *_sup.erl is going to be a supervisor, so it is logical to assume that tg_sup.erl is the top (in this case the only) supervisor for the application. It looks like there are only two “living” modules, the *_con one, which seems short for a “control” module, and the *_gui one, which seems likely to be just the code or process that controls the actual window itself.

We know that we picked termifierg as the appmod for the project, so it should be the place to find the typical OTP AppMod:start/2 function… and sure enough, there it is: termifierg:start/2 is simply call to start things by calling tg_sup:start_link/0. So next we should see what tg_sup does. Being a supervisor its entire definition should be a very boring declaration of what children the supervisor has, how they depend on one another (order), and what restart strategy is being employed by that supervisor.

(Protip: magical supervision is evil; boring, declarative supervision is good.)

init([]) ->
     RestartStrategy = {one_for_one, 0, 60},
     Clients   = {tg_con,
                  {tg_con, start_link, []},
                  permanent,
                  5000,
                  worker,
                  [tg_con]},
     Children  = [Clients],
     {ok, {RestartStrategy, Children}}.

Here we see only one thing is defined: the “control” module called tg_con. Easy enough. Knowing that we have a GUI module as well, we should expect that the tg_con module probably links to the GUI process instead of monitoring it, though it is possible that it might monitor it or maybe even use the GUI code module as a library of callback functions that the control process itself uses to render a GUI on its own.

[NOTE: Any of these approaches is valid, but which one makes the most sense depends entirely on the situation and type of program that is being written. Is the GUI a single, simple interface to a vast and complex system underneath? Does each core control component of the system have its own window or special widget or tab to render its data? Are there lots of rendered “views” on the program data, or a single view on lots of different program data? Is it OK for updates to the GUI to block on data retrieval or processing? etc.]

Here we see a program that is split between interface code and operation code. Hey! That sounds a lot like the “View” and “Control” from the classic “MVC” paradigm! And, of course, this is exactly the case. The “Model” part in this particular program being the data we are handling which is defined by the Erlang syntax on the one hand and JSON’s definition on the other (and so are implicit, not explicit, in our program).

The tg_con process is the operation code that does things, and it is spawn_linking the interface that is defined by tg_gui. If either one crashes it will take the other one down with it — easy cleanup! For most simple programs this is a good model to start with, so we’ll leave things as they are.

The tg_gui process is the one that interfaces with the back-end. In this simple of a program we could easily glom them all together without getting confused, but if we add even just a few interesting features we would bury our core logic under the enormous amounts of verbose, somewhat complex code inherent in GUI applications — and that becomes extremely frustrating to separate out later (so most people don’t do it and their single-module-per-thingy WX code becomes a collection of balls of mud that are difficult to refactor later, and that stinks!).

Since we already know what we want to do with the program and we already proved the core code necessary to accomplish it in the previous tutorial, we can move on to building an interface for it.

This is what the final program looks like, using the same example.json from the CLI example:

At this point we are going to leave the blog and I will direct you instead to the repository code, and in particular, the commit that shows the diff between the original template code generated by ZX and the modified commit that implements the program described at the beginning of this tutorial. The commit has been heavily commented to explain every part in detail — if you are curious about how this Wx program works I strongly recommend reading the code commit and comments!

Permalink

Starting a simple CLI project in Erlang with ZX

Yesterday I wrote a post about a new tooling suite for developers and users that makes dealing with Erlang more familiar to people from other languages. Using the tool for packaging and deployment/launch makes writing and deploying end-user programs in Erlang non-mysterious as well, which is a great benefit as Erlang provides a wonderful paradigm for making use of modern overwhelmingly multi-core client systems.

It is still in beta, but works well for my projects, so I’ll leave a quick tutorial here that shows the basic flow of writing a simple CLI utility in Erlang using ZX.

In this example we’ll make a program that accepts two arguments: a path to a file with JSON in it and a path to a file where the data should be written back after being converted to Erlang terms.

To start a project we do zx create project and follow the prompts.
(The snippet below excludes the full output for brevity, but you can view the entire creation prompt log here: zx_cli_creation.txt.)

ceverett@okonomiyaki:~/vcs$ zx create project

### --snip snip--
### Prompts for project meta
### --snip snip--

Writing app file: ebin/termifier.app
Project otpr-termifier-0.1.0 initialized.
ceverett@okonomiyaki:~/vcs$ 

After the project is created we see a new directory in front of us called “termifier” (or whatever the project is named). We can execute this now just to make sure everything is going as expected:

ceverett@okonomiyaki:~/vcs$ ls
termifier
ceverett@okonomiyaki:~/vcs$ zx rundir termifier
Recompile: src/termifier
Hello, World! Args: []
ceverett@okonomiyaki:~/vcs$ zx rundir termifier foo bar baz
Hello, World! Args: ["foo","bar","baz"]

Ah! So we already have something that builds and runs, very similar to how an escript works, except that using ZX we can easily add dependencies from Zomp package realms and package and execute this program on any system in the world that has ZX on it via Zomp ourselves.

…not that we have any reason to deploy a “Hello, World!” program to the wider public, of course.

Notice here that the first time we run it we see a message Recompile: src/termifier. That means the module termifier is being compiled and cached. On subsequent runs this step is not necessary unless the source file has changed (the compiler detects this on its own).

Next lets search Zomp for the tag “json” to see if there are any packages that list it as a tag, and if there are any let’s get a description so maybe we can find a website or docs for it:

ceverett@okonomiyaki:~/vcs$ zx search json
otpr-zj-1.0.5
ceverett@okonomiyaki:~/vcs$ zx describe otpr-zj
Package : otpr-zj-1.0.5
Name    : zj
Type    : lib
Desc    : A tiny JSON encoder/decoder in pure Erlang.
Author  : Craig Everett zxq9@zxq9.com
Web     : https://zxq9.com/projects/zj/docs/
Repo    : https://gitlab.com/zxq9/zj
Tags    : ["json","json encoder","json decoder"]

Ah. Checking the website it is clear we can use this to decode JSON by simply calling zx:decode(JSON). Easy. So let’s add it to the project as a dependency and invoke it in src/termifier.erl:

ceverett@okonomiyaki:~/vcs$ cd termifier
ceverett@okonomiyaki:~/vcs/termifier$ zx set dep otpr-zj-1.0.5
Adding dep otpr-zj-1.0.5
ceverett@okonomiyaki:~/vcs/termifier$ cd src
ceverett@okonomiyaki:~/vcs/termifier/src$ vim termifier.erl

Inside termifier.erl we can see the templated code for start/1:

start(ArgV) ->
    ok = io:format("Hello, World! Args: ~tp~n", [ArgV]),
    zx:silent_stop().

Lets change it so it does what we want (note here I’m going a bit further and writing a function write_terms/2 based on an older post of mine — this performs the inverse procedure of file:consult/1):

start([InPath, OutPath]) ->
    {ok, Bin} = file:read_file(InPath),
    {ok, Terms} = zj:decode(Bin),
    ok = write_terms(OutPath, Terms),
    zx:silent_stop();
start(_) ->
    ok = io:format("ERROR: Two arguments are required."),
    zx:silent_stop().


write_terms(Path, Terms) when is_list(Terms) ->
    Format = fun(Term) -> io_lib:format("~tp.~n", [Term]) end,
    Text = lists:map(Format, Terms),
    file:write_file(Path, Text);
write_terms(Path, Term) ->
    write_terms(Path, [Term]).

Note that we are calling zj:decode/1 in the body of start/1 above, knowing that ZX will find it for us and configure the environment at execution time. And now let’s give it a go!

ceverett@okonomiyaki:~/vcs$ zx rundir termifier example.json example.eterms
Recompile: src/zj
Recompile: src/termifier
ceverett@okonomiyaki:~/vcs$ cat example.json 
{
    "fruit": "Apple",
    "size": "Large",
    "color": "Red"
}
ceverett@okonomiyaki:~/vcs$ cat example.eterms 
{"color" => "Red","fruit" => "Apple","size" => "Large"}.

From here I could run zx package termifier, submit it to a realm (either the default public realm, or a new one I can create and host myself by doing zx create realm and then zx run zomp), and anyone could then run the command zx run termifier [in path] [out path] and ZX will take care of finding and building the necessary packages and launching the program.

That’s all there is to it. ZX’s template for CLI applications is quite minimal (as you can see) and is more similar to an escript than a more traditional OTP-style, supervised Erlang application. ZX has templates, however, for full-blown OTP applications, GUI code (structured also in the OTP paradigm), minimalist CLI programs like we see above, pure library code, and escripts (sometimes escript is exactly the tool you need!).

Happy coding!

Permalink

Supercharge Your Elixir and Phoenix Navigation with vim-projectionist

If you came to Elixir from Rails-land, you might miss the navigation that came with vim-rails. If you’re not familiar with it, vim-rails creates commands like :A, :AV, :AS, and :AT to quickly toggle between a source file and its test file and commands like :Econtroller, :Emodel, and :Eview to edit files based on their type.

The good news is that the same person who made vim-rails also made vim-projectionist (thanks Tim Pope). And with it, we can supercharge our navigation in Elixir and Phoenix just like we had in Rails with vim-rails.

Projecting Back to the Future

The easiest way to use vim-projectionist is to set up projections in a .projections.json file at the root of your project. This is a basic file for Elixir projections:

{
  "lib/*.ex": {
    "alternate": "test/{}_test.exs",
    "type": "source"
  },
  "test/*_test.exs": {
    "alternate": "lib/{}.ex",
    "type": "test"
  }
}

With this configuration, projectionist allows us to alternate between test and source files using :A, and it can open that alternate file in a separate pane with :AS or :AV, or if you’re a tabs person, in a separate tab with :AT. Note that we define the "alternate" both ways so that both the source and test files have alternates.

If you’re wondering how it works, projectionist is grabbing any directory and files matched by * — from a globbing perspective it acts like **/* — and expanding it with {}. So the alternate of lib/project/sample.ex is test/project/sample_test.exs (and vice versa).

With that simple configuration, projectionist also defines two :E commands based on the "type":

  • :Esource project/sample will open lib/project/sample.ex, and
  • :Etest project/sample will open test/project/sample_test.exs.

Pretty neat, right? But wait! There’s more.

Templating

Projectionist has another really interesting feature — defining templates to use when creating files. Add the following templates to each projection:

{
  "lib/*.ex": {
    "alternate": "test/{}_test.exs",
    "type": "source",
+   "template": [
+     "defmodule {camelcase|capitalize|dot} do",
+     "end"
+   ]
  },
  "test/*_test.exs": {
    "alternate": "lib/{}.ex",
    "type": "test",
+   "template": [
+     "defmodule {camelcase|capitalize|dot}Test do",
+     "  use ExUnit.Case, async: true",
+     "",
+     "  alias {camelcase|capitalize|dot}",
+     "end"
+   ]
  }
}

The "template" key takes an array of strings to use as the template. In them, projectionist allows us to define a series of transformations that will act upon whatever is captured by *. We use {camelcase|capitalize|dot}, so if * captures project/super_random, projectionist will do the following transformations:

  • camelcase: project/super_random -> project/superRandom,
  • capitalize: project/superRandom -> Project/SuperRandom,
  • dot: Project/SuperRandom -> Project.SuperRandom

Example workflow

Let’s put it all together in a sample MiddleEarth project.

We can create a new file via :Esource middle_earth/minas_tirith. It will create a file lib/middle_earth/minas_tirith.ex with this template:

defmodule MiddleEarth.MinasTirith do
end

We can then create a test file by attempting to navigate to the (non-existing) alternate file. Typing :A will give us something like this:

Create alternate file?
1 /dev/middle_earth/test/middle_earth/minas_tirith_test.exs
Type number and <Enter> or click with mouse (empty cancels):

Typing 1 and <Enter> will create the test file test/middle_earth/minas_tirith_test.exs with this template:

defmodule MiddleEarth.MinasTirithTest do
  use ExUnit.Case, async: true

  alias MiddleEarth.MinasTirith
end

Here it is in gif form:

gif of the flow we just talked about

Very cool, right? But wait. There’s more.

Supercharge Phoenix Navigation

That simple configuration works for Elixir projects. And since Phoenix projects (beginning with Phoenix 1.3) have their files under lib/, it also works okay for Phoenix projects.

But without further changes, creating a Phoenix controller or a Phoenix channel will gives us an extra Controllers or Channels namespace in our modules because of the directory structure. For example, creating lib/project_web/controllers/user_controller.ex will create a module ProjectWeb.Controllers.UserController instead of the desired ProjectWeb.UserController.

It would also be nice to have controller-specific templates that include use ProjectWeb, :controller in controllers and use ProjectWeb.ConnCase in controller tests (since we always need those use declarations). And, it would be extra nice to have access to an :Econtroller command.

We can make that happen by adding Phoenix-specific projections to our .projections.json file. Start with controllers:

{
  "lib/**/controllers/*_controller.ex": {
    "type": "controller",
    "alternate": "test/{dirname}/controllers/{basename}_controller_test.exs",
    "template": [
      "defmodule {dirname|camelcase|capitalize}.{basename|camelcase|capitalize}Controller do",
      "  use {dirname|camelcase|capitalize}, :controller",
      "end"
    ]
  },
  "test/**/controllers/*_controller_test.exs": {
    "alternate": "lib/{dirname}/controllers/{basename}_controller.ex",
    "type": "test",
    "template": [
      "defmodule {dirname|camelcase|capitalize}.{basename|camelcase|capitalize}ControllerTest do",
      "  use {dirname|camelcase|capitalize}.ConnCase, async: true",
      "end"
    ]
  },
  # ... other projections
}

Note that these projections no longer use the single * matcher for globbing. They use ** and * separately. And instead of simply using {} in alternate files, they explicitly use {dirname} and {basename}.

Why the change? Here’s what the projectionist documentation says:

For advanced cases, you can include both globs explicitly: "test/**/test_*.rb". When expanding with {}, the ** and * portions are joined with a slash. If necessary, the dirname and basename expansions can be used to split the value back apart.

Controller templates

By separating the globbing, we are able to create templates that do not include the extra Controllers namespace even though the path includes /controllers.

We get the project name with **, and we get the file name after /controllers with *_controller.ex. We then generate the namespace ProjectWeb by grabbing dirname (i.e. project_web) and putting it through a series of transformations. Similarly, we generate the rest of the module’s name by using basename, putting it through a series of transformations, and appending either Controller or ControllerTest.

We are also able to create more helpful controller templates since the projections are specific to controllers. Note the inclusion of " use {dirname|camelcase|capitalize}, :controller" and " use {dirname|camelcase|capitalize}.ConnCase, async: true" in our templates. Our controllers will now automatically include use ProjectWeb, :controller and our controller tests will automatically include use ProjectWeb.ConnCase, async: true.

:Econtroller command

Finally, we set the "type": "controller". That gives us the :Econtroller command. We can now create a controller with :Econtroller project_web/user. And for existing controllers, projectionist has smart tab completion. So typing :Econtroller user and hitting tab should expand to :Econtroller project_web/user or give us more options if there are multiple matches.

For example, in the MiddleEarth project we can edit the default PageController that ships with Phoenix by using :Econtroller page along with tab completion. And we can create a new MinasMorgul controller and controller test with our fantastic templates by typing :Econtroller middle_earth_web/minas_morgul and then going to its alternate file.

gif of using :Econtroller to open page controller

Projecting All the Things

I think you get the gist of it, so I will not go through all the projections. But just like we added the projections for the controllers, we can do the same for views, channels, and even feature tests if you frequently write those.

Below I included a sample file to get you started with controllers, views, channels, and feature tests. Take a look at it. If you prefer it in github-gist form, here’s a link to one. The best thing is that if my sample file does not fit your needs, you can always adjust it!

If you find any improvements, I would love to hear about them. I’m always looking for better ways to navigate files.

{
  "lib/**/views/*_view.ex": {
    "type": "view",
    "alternate": "test/{dirname}/views/{basename}_view_test.exs",
    "template": [
      "defmodule {dirname|camelcase|capitalize}.{basename|camelcase|capitalize}View do",
      "  use {dirname|camelcase|capitalize}, :view",
      "end"
    ]
  },
  "test/**/views/*_view_test.exs": {
    "alternate": "lib/{dirname}/views/{basename}_view.ex",
    "type": "test",
    "template": [
      "defmodule {dirname|camelcase|capitalize}.{basename|camelcase|capitalize}ViewTest do",
      "  use ExUnit.Case, async: true",
      "",
      "  alias {dirname|camelcase|capitalize}.{basename|camelcase|capitalize}View",
      "end"
    ]
  },
  "lib/**/controllers/*_controller.ex": {
    "type": "controller",
    "alternate": "test/{dirname}/controllers/{basename}_controller_test.exs",
    "template": [
      "defmodule {dirname|camelcase|capitalize}.{basename|camelcase|capitalize}Controller do",
      "  use {dirname|camelcase|capitalize}, :controller",
      "end"
    ]
  },
  "test/**/controllers/*_controller_test.exs": {
    "alternate": "lib/{dirname}/controllers/{basename}_controller.ex",
    "type": "test",
    "template": [
      "defmodule {dirname|camelcase|capitalize}.{basename|camelcase|capitalize}ControllerTest do",
      "  use {dirname|camelcase|capitalize}.ConnCase, async: true",
      "end"
    ]
  },
  "lib/**/channels/*_channel.ex": {
    "type": "channel",
    "alternate": "test/{dirname}/channels/{basename}_channel_test.exs",
    "template": [
      "defmodule {dirname|camelcase|capitalize}.{basename|camelcase|capitalize}Channel do",
      "  use {dirname|camelcase|capitalize}, :channel",
      "end"
    ]
  },
  "test/**/channels/*_channel_test.exs": {
    "alternate": "lib/{dirname}/channels/{basename}_channel.ex",
    "type": "test",
    "template": [
      "defmodule {dirname|camelcase|capitalize}.{basename|camelcase|capitalize}ChannelTest do",
      "  use {dirname|camelcase|capitalize}.ChannelCase, async: true",
      "",
      "  alias {dirname|camelcase|capitalize}.{basename|camelcase|capitalize}Channel",
      "end"
    ]
  },
  "test/**/features/*_test.exs": {
    "type": "feature",
    "template": [
      "defmodule {dirname|camelcase|capitalize}.{basename|camelcase|capitalize}Test do",
      "  use {dirname|camelcase|capitalize}.FeatureCase, async: true",
      "end"
    ]
  },
  "lib/*.ex": {
    "alternate": "test/{}_test.exs",
    "type": "source",
    "template": [
      "defmodule {camelcase|capitalize|dot} do",
      "end"
    ]
  },
  "test/*_test.exs": {
    "alternate": "lib/{}.ex",
    "type": "test",
    "template": [
      "defmodule {camelcase|capitalize|dot}Test do",
      "  use ExUnit.Case, async: true",
      "",
      "  alias {camelcase|capitalize|dot}",
      "end"
    ]
  }
}

Permalink

The Changeset API Pattern

The Changeset API Pattern

Over time, as you gain overall experience with software development, you start noticing some paths that can lead to much more smooth sailing. Those are called design patterns, formalized best practices that can be used to solve common problems when implementing a system.

One of these patterns that I am having great success while working on web applications in Elixir, is what I am calling, for the lack of a better name, the Changeset API Pattern.

Before I start with the pattern itself, I'd like to outline some information that I consider as the motivation behind the usage, and it is called Data Integrity.

Data integrity is the maintenance of, and the assurance of the accuracy and consistency of, data over its entire life-cycle, and is a critical aspect to the design, implementation and usage of any system which stores, processes, or retrieves data.
The overall intent of any data integrity technique is the same: ensure data is recorded exactly as intended. In short, data integrity aims to prevent unintentional changes to information. Data integrity is not to be confused with data security, the discipline of protecting data from unauthorized parties.

Overall Goal

Facilitate the data integrity main goal, ensure that data is recorded exactly as intended.

The Changeset API Pattern is not the sole responsible for achieving this goal, however, once used in conjunction with some data modeling best practices such as column types and constraints, default values and so on, the pattern will become an important application layer on top of an already established data layer, aiming for an overall better data integrity.

Database Data Integrity

As mentioned above, having good database specifications will facilitate data integrity. In Elixir, this is commonly achievable through Ecto, the most common component to interact with application data stores, through Ecto Migration DSL:

defmodule Core.Repo.Migrations.CreateUsersTable do
  use Ecto.Migration

  def change do
    create table(:users) do
      add :company_id, references(:companies, type: :binary_id), null: false
      add :first_name, :string, null: false
      add :last_name, :string, null: false
      add :email, :string, null: false
      add :age, :integer, null: false
      timestamps()
    end

    create index(:users, [:email], unique: true)
    create constraint(:users, :age_must_be_positive, check: "version > 0")
  end
end

In the migration above we are specifying:

  • column data types;
  • columns can't have null values;
  • company_id is a foreign key;
  • email column is unique;
  • age has to be greater than zero.

Depending on your datastore and column type you can apply a variety of data constraints to fulfill your needs. Ideally, the specifications defined in the migration should align with your Ecto Schema and generic changeset:

defmodule Core.User do
  use Ecto.Schema
  import Ecto.Changeset
  
  alias Core.Company

  @primary_key {:id, :binary_id, autogenerate: true}
  @timestamps_opts [type: :utc_datetime]
  schema "users" do
    belongs_to(:company, Company, type: :binary_id)
    field(:first_name, :string)
    field(:last_name, :string)
    field(:email, :string)
    field(:age, :integer)
    timestamps()
  end
  
  @required_fields ~w(company_id first_name last_name email age)a
  
  def changeset(struct, params) do
    struct
    |> cast(params, @required_fields)
    |> validate_required(@required_fields)
    |> validate_number(:age, greater_than: 0)
    |> unique_constraint(:email)
    |> assoc_constraint(:company)
  end
end

Those should be considered your main gate in terms of data integrity as it is ensuring data only will be stored if all checks pass. From there you can have other layers on top, for example, the Changeset API Pattern.

The Changeset API Pattern

Once you have a good foundation, it is time to tackle your application API scenarios regarding data integrity. While a generic changeset, as above, is sufficient to ensure that the data integrity matches what is defined in the database in a general sense (all inserts and all updates), usually not all changes are equal from the application standpoint.

The Problem

For example, let's assume that besides the existing columns in the users table example  above, we also have a column called encrypted_password for user authentication. In our application, we have the following endpoints in our API that modify data:

  • Register User;
  • Update User Profile;
  • Change User Password.

Having a generic changeset in our schema will allow all these three operations to happen as desired, however, it opens some data integrity concerns for the two update operations:

  • While updating my first name as part of Update User Profile flow, I also can change my password;
  • While changing my password as part of Change User Password flow, I can update my age.

As long as the fields are conforming with the generic changeset validations, these unexpected changes will be allowed. You can remedy this behavior by applying filters in your API or your controller, however, this will become brittle once your application evolves. Other than that, Ecto.Schema and Ecto.Changeset modules provide lots of functions for field validation, casting and database constraint checks, not leveraging them would require lots of code duplication, at least in terms of functionality.

The Solution

The Changeset API Pattern states that:

For each API operation that modifies data, a specific Ecto Changeset is implemented, making it explicit the desired changes and all validations to be performed.

Instead of a generic changeset, we will implement three changesets with a very clear combination for cast, validation and database constraint checks.

Register User Changeset

defmodule Core.User do
  # Code removed

  schema "users" do
    # Code removed
    field(:hashed_password, :string)
    # Code removed
  end

  @register_fields ~w(company_id first_name last_name email age hashed_password)a

  def register_changeset(struct, params) do
    struct
    |> cast(params, @register_fields)
    |> validate_required(@register_fields)
    |> validate_number(:age, greater_than: 0)
    |> unique_constraint(:email)
    |> assoc_constraint(:company)
  end
end

Update User Profile Changeset

defmodule Core.User do
  # Code removed

  @update_profile_fields ~w(first_name last_name email age)a

  def update_profile_changeset(struct, params) do
    struct
    |> cast(params, @update_profile_fields)
    |> validate_required(@update_profile_fields)
    |> validate_number(:age, greater_than: 0)
    |> unique_constraint(:email)
  end
  
  # Code removed
end

Change User Password Changeset

defmodule Core.User do
  # Code removed

  @change_password_fields ~w(hashed_password)a

  def change_password_changeset(struct, params) do
    struct
    |> cast(params, @change_password_fields)
    |> validate_required(@change_password_fields)
  end
end

In your API functions, even if extra data comes in, you are safe because the intent and output expectation of each operation is already defined in the closest point to the data store interaction from the application standpoint, in our case, in the schema definition module.

Caveat

One thing that I noticed when I started implementing this pattern is the fact that sometimes I was doing a little more than my initial intent within the changeset functions.

Instead of performing the data type casting, validations and database checks, in a few cases, I was also setting the field value. For the sake of illustration only but it can be anything along these lines, let's take an example of a user schema, that has a column verified_at that is nullable when the user is registered, but it will store the date and time the user was verified.

The changeset for this operation would only allow verified_at field to be cast with the proper data type, but beyond that, the current date and time were set in the changeset using Ecto.Changeset.put_change/3.

Instead, what should be done is to delegate to the API the responsibility to set the value for verified_at, that value would be later validated in the changeset as any other update.

Another common example is encrypting the plain text password (defined as a virtual field) during user registration or password change inside the schema module. The schema should not need to know about encryption hashing libraries, modules or functions, and that should be delegated to the API functions.

There is nothing wrong with Ecto.Changeset.put_change/3, in some cases it makes sense to use it, for values that can't come through the API for any reason, if you need a mapping between the value sent via API and your datastore, or if you need to nullify a field.

Advantages

  • pushes data integrity concerns upfront in the development process;
  • protects the schema against unexpected data updates;
  • adds explicitness for allowed data changes and checks to be performed per use-case;
  • complements the commonly present data integrity checks in schema modules with use-cases checks;
  • leverages Ecto.Schema and Ecto.Changeset functions for better data integrity overall;
  • concentrate all data integrity checks in one single place, and in the best place, the schema module;
  • simplifies data changes testing per use-case;
  • simplifies input data handling in the API functions or controller actions.

Disadvantages

  • adds extra complexity in the schema modules;
  • can mislead to handle more than data integrity in the schema modules, as mentioned in the caveats session.

When the pattern is not needed

Even this pattern presents itself to me as a great way to achieve better data integrity, there is one scenario that I find myself skipping it:

  • usually, the entity (model) is much simpler;
  • the API only provides two types of change (create and a generic update);
  • both create and update require same data integrity checks.

Conclusion

Data is a very important asset in any software application and data integrity is a critical component to achieve data quality. The benefits of using this pattern so far are giving me much more reliability and control regarding the data handled by my applications nowadays. Other than that, it is making me think ahead in the development process regarding how I structure the data and how the application interacts with them.

Permalink

Copyright © 2016, Planet Erlang. No rights reserved.
Planet Erlang is maintained by Proctor.