Rabbit’s Anatomy - Understanding Topic Exchanges

Topic Exchange Intro/Overview

In RabbitMQ Exchange is the abstraction, where messages are published to. There are a few types of Exchanges, where Topic Exchange is one of them. Topic Exchange provides most flexible routing mechanisms. It depends on the Binding Key, which is provided when queue is bound to a Topic Exchange. It is kind of a pattern, which is used to make decision on routing messages by checking if a message’s Routing Key matches the pattern.

A Routing Key is made up of words separated by dots, e.g.: floor_1.bedroom.temperature. The Binding Key for Topic Exchanges is similar to Routing Key, however there are two special characters, namely asterisk * and hash #. These are wildcards, allowing to create a binding in a smarter way. Asterisk * matches any single word and # matches zero or more words. Here are examples of patterns to match on messages from:

  • - all devices on first floor in bedroom: floor_1.bedroom.*,
  • - all devices on first floor floor_1.#.

It is clearly visible Topic Exchange allows to greatly simplify the routing, allowing to match only.

Trie

To understand how a Topic Exchange works, the trie data structure has to be introduced. It is a tree, which holds ordered data. Typically it is used for storing string values. Each node in the tree represents a prefix of a string and holds links to child nodes, which share the same prefix. The child nodes are addressed with the next character following the prefix for the node.

Such a data structure makes searching for specific strings independent of the structure’s size. The characters of string are used to traverse the tree while searching it. The trie is used for storing Binding Keys in Topic Exchange. A Binding Key is split by dots and all string parts are used as pointers to the next node. So, each time when new binding is added to the Topic Exchange, the trie associated with it is updated. And each time when new message has to be routed, the trie is queried to look for the message’s destinations.

Implementation of the trie

Topic Exchange trie is implemented on the top of Mnesia. Nodes and edges of trie are stored in respectively rabbit_topic_trie_node and rabbit_topic_trie_edge tables.

#topic_trie_node{
    trie_node = #trie_node{
        exchange_name,
        node_id
    },
    edge_count,
    binding_count
}
#topic_trie_edge{
    trie_edge = #trie_edge{
        exchange_name,
        node_id, % parent
        word
    },
    node_id % child
}

In this case, trie_node or trie_edge records are primary keys used to identify records. Both nodes and edges are assigned to one particular Topic Exchange by specifying exchange_name field in primary key. Nodes are also used to identify bindings, which should be used, they are not stored directly in the node’s table, but they can be easily obtained by node_id from rabbit_topic_trie_binding table. Edges store information about connections between parent and child nodes. Edges also contain the part of the Binding Key (word), which is used to traverse the tree. Therefore, traversing through the tree requires a sequence of Mnesia queries. Reading edges and nodes is done using dirty operations.

Topic Exchange internals

A Exchange type is created by implementing the rabbit_exchange behaviour. In the context of tries in Topic Exchange, two operations are interesting. Namely, it is add_binding/3 and route/2, where first implements adding new binding to the structure and the latter is used to determine target for routing.

Binding Operation

The arguments needed to create the binding are:

  • - source Exchange
  • - Binding Key
  • - destination

Every trie starts with the root node, representing the empty Binding Key. It actually makes sense, as empty string is a prefix for any string. First operation is pretty straightforward - Binding Key has to be split by dots . and it is stored in a list. For example the key “a.b.c” is transformed to [“a”, “b”, “c”]. Let’s call the list Words for later. It will be used for traversing the data structure. Then, recursively the tree is traversed down, starting with root as a current node.

  1. Repeat until Words list is empty. 1.1. Take the head part of Words list and query Mnesia for child matching it. 1.2. If node is found, use it as a new current node and go to 1.1 with the rest of Words list. Otherwise go to 2.
  2. Create child nodes using rest of the Words list.
  3. When Words list is exhausted, create a rabbit_topic_trie_binding for the current node. It signals that there are bindings associated with it.

Here is an example binding operation. Let’s assume there is an Topic Exchange with two existing bindings: floor_1.bedroom.temperature and floor_1.#. Thus, here is a trie structure:

Let’s add new binding with the Binding Key floor_1.bedroom.air_quality. First we split it by dots: [floor_1, bedroom, air_quality]. There are already keys floor_1 and bedroom, but the latter one is missing. Therefore new node has to be created. Then, the rest of the key [air_quality] is used to create nodes. Finally new binding is associated with newly created node and structure ends with following shape:

Summarizing, to insert new node, 3 read queries were executed to retrieve edge between last pairs of nodes: {root, floor1}, {floor_1, bedroom} and {bedroom, air_quality}. However, the latter edge was not present, so 2 write operations were executed: first which updates edge_count for bedroom node and second which inserts new edge. At this point the trie structure is ready to create final node. Therefore another two write operations happen:

  • - creating actual node, which corresponds to the given Binding Key,
  • - creating entry in rabbit_topic_trie_binding, which bounds node with destination for messages.

The Mnesia tables used here are ordered_set type, which means that it is implemented with binary search tree. Thus, both read and write operation has complexity O(log(n)), where n is the size of the table. It can be observed, that first phase of traversing through the trie requires:

  1. read operation when, node exists
  2. write operation when node is not existing.

The final phase of inserting actual binding requires two extra operations. So, in worst case, there are no nodes and all of them need to be created. Then, the complexity is O(n*2*log(m) + 2*log(k)), where n is the length of the Binding Key, m is the number of nodes in the table and k is number of actual bindings. The m and k are global, so the efficiency of queries depends on global number of bindings/nodes, not only for given Exchange. For the simplicity it is assumed that the number of edges and nodes are equal, because in this structure (number of edges) = (number of nodes - 1).

Routing Operation

Routing happens, when a new message arrives to Topic Exchange. The trie structure needs to be queried using the Routing Key associated with the message. However the traversing through the trie is not straightforward, as wildcards * and # need to be taken in to account.

Same as in binding operation, at the beginning the Routing Key is split by dots. Again, let’s call it Words list [ Word | RestW ] = Words. The process starts with root node. Then the algorithm of discovering bindings is recursive exploration of the tree in three ways:

  • - Look for # in child nodes. If node is found, it is considered as new root and new scans are started with all remaining parts of Words, e.g: if Words is [a,b,c], then start searching again with [a,b,c], [b,c], [c] and [].
  • - Look for * in childs nodes. If node is found, continue with found node as a root and RestW.
  • - Look for Word in child nodes. If node is found, continue with found node as a root and RestW.

Exploring for all cases is finished when Words is exhausted. At the end there is one more step, which is looking for extra child nodes connected through hash #. It has to be made, because # wildcard stands for “zero or more”. So here is example of the searching algorithm. Let the Topic Exchange have the following bindings: floor_1.*.air_quality, floor_1.bedroom.air_quality, floor_1.bathroom.temperature. Let’s examine the routing for message published with the Routing Key floor_1.bedroom.air_quality, which is going to match all bindings. Here is trie representation, where current node is marked as blue and number on the node represents number of bindings.

First step is to find if there is hash # child node, but it is not present. Then, asterisk * child node is queried, but it is also not present. Finally algorithm finds node, matching the head of Words list - floor_1:

Now algorithm considered blue node as a new root and starting again. Again, there is no hash # child, but asterisk is found. Then head of the Words list is consumed and algorithm moves down:

Here there is only one option available - air_quality:

Words list is exhausted, so current node is result. There is one extra step made - hash # child node has to be queried again, because it also accepts empty lists. However it is not found, so only the current blue node is considered to be a result. Let’s mark found node with a green and get back to previous node:

Node was found using asterisk, but there is one step left. It has to be checked if there is bedroom child node. And actually there is one:

There is one word left and the child node is present:

Words list is empty again, so current is result:

The final step is to query for bindings associated with found bindings. According to the numbers on the found nodes, there are two bindings. They are final result of route/2 function.

Now let’s consider another example, with hash bindings present. There are three bindings present: #.air_quality , floor_1.# and floor_1.bedroom.air_quality.#. Again the floor_1.bedroom.air_quality Routing Key will be used:

Now we have hash node found. Then, algorithm goes down to that node with all available Routing Key parts:

Let’s emphasise this again: current node was reached via # edge, so the algorithm visits current blue node 4 times with different Words lists. They are presented on the figure. One of the Words list is empty one [], so this node is also appended to the results. There is no floor_1 or bedroom edge going out of this node, but there is air_quality one. So algorithm goes to the leaf node using the third Words list. Then:

Current Words list is empty, so this node is also the result of the search. There are no hash child nodes, so the current branch is finished. Algorithm goes back to the root node:

The only option to go down is using head of Words list:

Again there is hash child node, so it needs to be visited with all tails of the Words list. Three times in this case:

One of the Words lists is empty, so current blue node is appended to the result list. As there are no child nodes, the algorithm goes back:

Now the algorithm will go down two times consuming all remaining words. Let’s jump directly to it:

The Words list is empty, so the current node is also part of the result. However there is also a hash # child node. According to the algorithm, if any node is considered as a result, also its child hash nodes are matched. So finally there are 5 nodes found:

The final step is to find bindings associated with found nodes. The final results of route/2 function are all 3 bindings.

In term of complexity it quite hard to estimate it precisely. Algorithm does 3 queries for each node. The # nodes results in duplication of query path, as it starts the whole algorithm with all remaining parts of the Words. However all operations are depending on two factors - the Words list length and the total number of nodes existing in the table.

So assuming the worst case, where the bindings are: #, #.#, #.#.# ... k*#, we can see that each level will run with all possible combinations of Words, some of them will be visited many times with exactly the same Words. Then, the first node is visited n times, seconds is visited sum(1,n), third sum(1,sum(1,n)) and so on. We can rewrite it as:

The total number of operations is k1+k2+…+kk. When this recursive equation it unwrapped, every new level contains two times more multiplications than the previous one. The level k will contain 2k multiplications. It will be dominant in terms of complexity, so we can bound the complexity by O(2k*n*log(m)), where k is maximum trie depth, n is the length of the Words and m is the total number of nodes.

However, above example is extreme and bindings like #.#.# make no sense. Then, the average complexity would be close to O(nlog(m)), because it makes no sense to put two subsequent hash # parts of the key. The overhead introduced by single hash nodes should not be significant, because in such case the traversing trie with different Words stops after the hash edge.

Evaluation

This section will cover the performance of Topic Exchange. The experiments are going to capture the characteristics of Topic Exchange under different conditions.

Synthetic tests

Two experiments will be presented, in order to experimentally confirm of reject assumptions made in previous Routing Operation section:

  • First, the relation between the number of bindings and routing operation time. The bindings are fixed and the routing key length is adjusted. The linear dependency is expected.
  • Secondly, the routing key length is fixed and the number of bindings is varying.

Both experiments are performed under following conditions:

  • - Tests are made on single RabbitMQ node.
  • - Routing time is measured by checking time of evaluation rabbit_exchange_type_topic:route/2.
  • - Measurements are made 50 times and average results are presented on the figures.
  • - The Bindings Keys are random, ensuring that there are no two subsequent hashes in any Binding Key.
  • - The Binding Key part has 70% chance to be a word, 20% chance to be an asterisk and 10% to be a hash.
  • - The Routing Keys are created from existing Binding Keys - For example the Routing Key with length n, will be created from existing Binding Key with the length n. Any hashes or asterisks, are replaced by random strings. It ensures that operation must traverse through at least n levels of trie.

Above figure presents the results of three experiments. Now let’s slightly modify conditions, in order to visualize the impact of # hash key in Trie structure. There is only one binding added, which is just two subsequent hashes #.#. Then, the performance looks like:

The red curve bends, this was actually predicted. When there are more # bindings on the query path, the relation between Routing Key length and query time is no longer linear. This effect is also observable in 10k bindings series - the green curve also slightly bends. It can be explained in the same way - there are more Bindings Keys starting with #, this increases query time for all queries.

Let’s check it in RabbitMQ Management UI:

Actually there are around 50 such bindings. Let’s replace them, which should result in more linear relation and will give an overview of hash impact to performance:

Indeed the time of finding routes is improved. Now let’s examine the number of bindings impact on query time. As it was explained in previous section, the logarithmic relation is expected:

This example also follows the expected behaviour. All bindings are stored in a single Mnesia table. Querying for any node has its own complexity. While there are more entries in the table, the query time grows. As the table has ordered_set type, the query time has logarithmic complexity, what is actually observed.

Summing up, the previous experiments aligns to the theory from previous section. The expectations about impact of RK length and number of bindings to routing operation time was confirmed. The huge impact of hash # wildcard have been also confirmed and the scale of it was presented.

Real world example

Two previous examples were measuring time of single query. While it is still valuable, it does not give real-life experience. Test is synthetic and focuses on single query. But does Topic Exchange performance overhead is also observable when overall performance of RabbitMQ is taken into account?

This chapter will present performance evaluation of RabbitMQ integration to MongooseIM. MongooseIM is Erlang Solutions’ highly scalable, instant messaging server. The RabbitMQ component in MongooseIM simply reports each users’ activity, which may be:

  • - User became online/offline
  • - User sent/received message

Only sent/received messages will be discussed in this case. The Routing Key of the message activity follows simple pattern <username>.{sent, received}.

In order to evaluate the performance of the component, there was a load test designed. There were simulated XMPP clients connecting to the MongooseIM server. The simulated clients were exchanging messages with each other. Each message generates events, which are published to RabbitMQ. Then, there are a number of AMQP clients, connecting to RabbitMQ, to consume generated events.

This is the outline of the experiment’s architecture:

For the purpose of this post only results which are directly connected to Topic performance will be covered. Let’s define the key performance indicator as a Time To Delivery as an amount of time between message being sent by XMPP user and being received by Consumer for RabbitMQ’s queue. This value will be presented on future figures.

Tests conditions were as follows:

  • - 90k XMPP users
  • - 10k AMQP consumers
  • - ~ 4,8k payloads/s from MongooseIM to RabbitMQ
  • - Payload size ~120B
  • - topic exchange with Binding Keys like user_N.*
  • - 20k bindings in total

In this case performance is presented on the following figure. It shows the 95th, 99th percentiles of Time To Delivery, as well as maximum observed Time To Delivery in given time window.

The latter test had similar condition. The only difference was different Exchange type:

  • - 90k XMPP users
  • - 10k AMQP consumers
  • - ~ 4,8k payloads/s from MongooseIM to RabbitMQ
  • - Payload size ~120B
  • - direct exchange with Binding Keys like user_N.chat_msg_sent, user_N.chat_msg_recv
  • - 40k bindings in total

Under those condition performance was better, which is illustrated on following figure.

While previous section showed performance characteristics of Topic Exchange, those examples provides overview on a bigger scale. Both tests had identical characteristics apart from Topic type (and consequently number of bindings). However the difference in performance is significant in favor of the Direct Exchange. It allowed to effectively decrease the Time To Delivery, which is a factor of efficiency in case of the presented tests.

Summary

This post covered a few areas, which are the Topic Exchange internals, brief overview of its implementation, theoretical overhead introduced by traversing the trie structure, as well as some performance evaluation. As it was observed, Topic Exchange is not the fastest one and there are many factors which may influence its performance. However it is not true that Topic Exchanges are slow. In fact they are really fast in typical RabbitMQ usages. Test conditions were specific. If there are a few bindings or the trie depth is not deep, usually the Topic Exchange overhead is negligible. Still, it is important to understand the underlying mechanism, as example with MongooseIM’s RabbitMQ component presented - using different Exchange type resulted in much better performance.

Few useful links

  1. Get some expert help or training on RabbitMQ
  2. Check RabbitMQ and MongooseIM demo
  3. Find out more about MongooseIM

Permalink

10 Unusual Blockchain Use Cases

Digital assets have increased in the last few years, and the FinTech industry is set to continue growing throughout 2019. Blockchain obviously falls into this category, with Bitcoin and Ethereum. Crypto startups and established companies are looking to develop their business strategies by incorporating FinTech. By allowing transparency on the ledger, this offers more secure online transactions, which are increasing at a rapid rate. But what unusual blockchain cases have been emerging, in this blog post we will explore some cases that caught our eye. It’s safe to say blockchain’s transparency and accountability is a huge factor and can lend itself to a number of industries in unusual and impressive ways. Here, we look through how:

1 Reducing Identity Theft

Forbes magazine states that 2.6 BILLION records were lost or stolen in 2017, and identified that theft accounts for 69% of all data breaches. Using blockchain to protect records offers a new level of security in the form of identification checks such as verifiable user IDs and multi-factor authentication. This means the need for countless passwords and usernames you’ll evitably end up forgetting is redundant. The Civic’s Secure Identity Platform (SIP) is the perfect example of this.

The entire blockchain is decentralised, meaning there is no point of weakness and therefore a very limited chance of hackers breaking in. This includes the self-sovereign IDs as mentioned and the removal of countless passwords and paperwork connected to accounts. The result is a single key that is matched to an immutable ledger. Your digital ID can include social security information, social media data, and medical records, but also all the private data you gather from online actions.

2 Buying virtual cats and gaming

Blockchain can be used in gaming in general by creating digital and analog gaming experiences. Cryptokitties emerged in 2017 and generated more than $1.3 million in approximately one month. By March 2018, it had raised $12 million. What’s the point? It utilises a game into an economy. By investing in CryptoKitties, players can invest, build and extend their gaming experience.

BitPainting is another example where you can create and share your digital art in the form of “crypto-collectables”. A third example is a gaming platform called Chimaera that converts gaming into virtual assets to allow developers to manage, share and trade their games.

3 Cannabis

Legal cannabis is a booming business in the United States with an estimated $9 billion dollars spent in 2017. This estimate is set to grow, no pun intended. With this amount of money, a cashless solution could offer business owners further security. Transactions are easily trackable and offer transparency and accountability that traditional banking doesn’t.

The link between cryptocurrencies and cannabis isn’t a new revelation; it has been used in the industry for a number of years. Cryptocurrencies have been created including PotCoin, DopeCoin, and ParagonCoin. However, none have fully reached their potential.

There are some key differences between why blockchain would be more appropriate for the cannabis trade than cryptocurrencies, and this is similar to traditional banking. Campaignlive sums it up perfectly here: “By integrating blockchain into supply chain management, business owners could provide legislators, consumers, and banks with the data they need to build what they need to gain mainstream acceptance: trust.”

Transparency is a key benefit to using blockchain within the cannabis industry. But what other advantages does it hold?

Blockchain offers anonymity. As legal cannabis is still a fairly new industry, stepping towards the edge of caution is sensible business acumen. Secondly, the tax put in place for legal marijuana in the US can be crippling to small businesses. The account fee for cannabis in California can be up to $60,000 per annum. Blockchain results in less tax because the individual will be taxed as property, not currency. Thirdly, many cannabis consumers aren’t aware of how cryptocurrencies work, offering further security.

It makes sense for traditional banking too. Along with various taxes to the individual businesses, the federal government has also put in place rules and regulations for banks. This places a risk on banks investing in the cannabis business, as a result, the California state treasurer John Chiang has proposed opening a public bank dedicated to the cannabis industry.

4 Sharing solar power

Siemens has partnered with startup LO3 Energy with an app called Brooklyn Microgrid. This allows residents of Brooklyn who own solar panels to transfer their energy to others that don’t have this capability. Consumers and solar panel owners are in control for the entire transaction.

5 Marriage

Same-sex marriage is still banned in 87% of all countries in the world, according to figures from campaignlive. Bearing that in mind, the Swedish sportswear brand Björn Borg have discovered an ingenious way for loved ones to be in holy matrimony, regardless of your sexual orientation, beliefs and the country you live in. But how?

Blockchain is stereotypically linked with money but take away those connotations and all you have is an effective ledger that can record events as well as transactions - no finance is necessarily required. It has the ability to record and preserve events without the need for a third party so Björn Borg has put this loophole to extremely good use by forming the digital platform Marriage Unblocked where you can propose, marry and exchange vows all on the blockchain. What’s more, the records can be kept anonymous offering security for those in potential danger. And of course, you can request a certificate to display proudly too!

The first couple to do this was David Mondrus and Joyce Bayo in 2014, by engraving their nuptials on the Bitcoin blockchain. The plus side of a blockchain marriage is the flexibility; libertarianism claims that when billionaire Brock Pierce wed his wife last year, they created a contract that could be “renewed, changed and dissolved annually”. Put simply, blockchain offers smart contracts.

Whilst this doesn’t hold any legal requirements, everything is produced and stored online. If religion or government isn’t a primary concern of yours, where’s the harm in a blockchain marriage? As Marriage Unblocked says, “no state or religion should control love”.

6 Simplifying the Internet of Things (IoT)

Blockchain offers ledgers that can record the huge amounts of data produced by IoT systems, and yep you can guess, it’s the transparency that offers the level of trust that other services cannot offer.

Internet of Things is one of the most exciting element to come out of technology; these connected ecosystems can record and share various interactions and blockchain lends itself perfectly to this. It can transfer data and gives identification for both public and private sector use cases. Blockchain can be used for the following:

  • Public sector; infrastructure management, taxes (and other municipal services).
  • Private sector; logistical upgrade, warehousing tracking, greater efficiency, and enhanced data capabilities.

There is even a blockchain specifically for IoT which handles machine-to-machine micropayments by the name of Tangle.

Tangle is the data structure behind micro-transaction crypto token that is purposely optimised and developed for IoT. It differs from other blockchains and cryptocurrencies by having a much lighter and efficient way to deal with ten of billions of devices. Created by David Sønstebø, Sergey Ivancheglo, Serguei Popov and Dominik Schiener, they included a decentralised peer-to-peer network that relies on a Distributed Acyclic Graph (DAG), which creates a distributed ledger rather than “blocks”. There are no transaction fees, no mining, and no external consensus process. This also secure data to be transferred between digital devices. Serguei Popov has provided a whitepaper on The Tangle if you’d like further information.

7 Improving Supply Chains

Blockchain provides real-time tracking that is essential for any companies with a significant number of supply chains. This is incredibly useful for the consumer industry and pharmaceutical industry. In fact, Forbes reports that Walmart partnered with IBM to produce a blockchain called Hyperledger Fabric blockchain to track foods from the supplier to the shop shelf. This allows for accountability and a clear process from the start of the business trail, right to the end. The whole supply chain is clear and allows multiple parties to access the same database whilst providing one truthful point of origin. You can find examples of this all over the web. We particularly enjoy this from origintrail.io.

8 Elections and Voter Fraud

Voting on a blockchain offers full transparency and therefore reduces the chance of fraudulent voting. One example of this is the app Sovereign which was created by nonprofit organisation Democracy Earth. This blockchain produces tokens that represent votes rather than money.

Another example is Sierra Leone which became the first country to run a blockchain-based election last year with 70% of the pollers using the technology to anonymously store votes in an immutable ledger. This offered instant access of the election results to the public.

These results were placed on the Agora’s blockchain and by allowing anyone to view, the government’s aim was to provide a level of trust with its citizens and offers a platform to reduce controversy as well as reducing costs enquired when using paper ballots. The result is a trustworthy and legitimate result that will also limit the amount of the hear’say from opposition voters and parties, especially in Sierra Leone that has had corruption claim in the past.

Leonardo Gammar created the Agora blockchain and raises a very good point about switching to blockchain for the next voting platform. With the online security threat, resulting in paper ballots continuing, blockchain again offers transparency from start to finish.

Whilst we still have a long way to go, the potential for using a blockchain voting platform is exciting.

9 Healthcare

With the emphasis on keeping many records in a secure manner, blockchain lends itself nicely to medical records and healthcare.

MedRec is one business using blockchain to keep secure files of medical records by using a decentralised CMS and smart contracts. This also allows transparency of data and the ability to make secure payments connected to your health.

Blockchain can also be used to track dental care in the same sort of way. One example is Dentacoin that uses the global token ERC20. It can be used for dental records but also to ensure dental tools and materials are sourced appropriately, whether tools are used on the correct patients, networks that can transfer information to each other quickly and a compliance tool.

10 Luxury items and art selling

With the ability to see everything that’s going on, and track the data and transactions within the blockchain, it lends itself nicely to luxury items such as diamonds and art.

Deals made on a blockchain reduces the chances of fraudulent behaviour. One business offering this is Everledger), as well as many other lines of business, that we mention in our post Blockchain Myth vs Reality. Everledger works by verifying provenance and limiting risks of fraud by certifying the assets (for example a piece of artwork) and then storing these records publicly. This offers transparency by displaying the verified ownership of the piece in question. Another example is Chronicled; a start-up blockchain-based technology for supply chain management to create a registry that tracks every item and places it in a single product line.

Whilst fraud is by no means impossible on a blockchain, it does reduce the risks. For example, if a hacker does by a small chance break into the blockchain, it will be impossible for them to resell an item without the blockchain network noticing.

It’s safe to say we’ve worked in the FinTech Industry for some while now. Whilst there is a buzz around blockchain, it’s important to note that the industry is well-established, and these surprising cases of blockchain display the broad and exciting nature of the industry as a whole. There are still other advantages to blockchain that we haven’t delved into in this article, such as decentralised apps, banking and the energy market. Perhaps this requires a follow up!

The most important element of a blockchain is its transparency, from displaying votes during an election and protecting an expensive painting. Blockchain is also secure, and can aid in new lines of businesses.

If you or your business are working on an unusual blockchain case, let us know - we would love to hear about it! Also if you are looking for reliable FinTech or blockchain experts, give us a shout, we offer many services to fix issues of scale!

Permalink

MongooseIM 3.3.0: Supporting happy relations

Have you ever tried to use Mnesia with datasets larger than your transient memory? Are you confused with Erlang data types stored within Mnesia? We know some of you were. That is why we answered these problems by introducing something that is familiar to mainstream developers and also efficient with larger datasets at the same time - i.e. RDBMS backends for our PubSub plugin. Now, lets bundle that up with full support for XEP-0178, as well as, a RabbitMQ backend for our Event Pusher for a more complete package. Ladies and gentlemen welcome MongooseIM version 3.3.0.

Relations meet PubSub

The Publish-Subscribe plugin was one of our main focal points for recent weeks thanks to our friends from Safaricom Alpha, who have sponsored this work. The rationale behind implementing it is simple, and yet not obvious, so let’s dig in.

The XMPP protocol features an extension called Personal Eventing Protocol which is a special case of PubSub, where any entity (e.g. a user) may become a PubSub node. In turn, this can be used for many purposes. For instance, a client may publish information to its contacts about the music currently being played. Alternatively, it may announce microblog entries (e.g. personal Twitter). Nevertheless, end to end encryption is the thing of importance to us and many of MongooseIM users in regard to PEP.

End to end encryption is often a selling point for many pieces of modern IM software. For example, the popular TLS encryption ensures that nobody will be able to eavesdrop on your communication with a bank website. Of course, there are many more properties but the crucial fact remains: it secures the data you exchange with the server. Therefore, when you connect to MongooseIM with TLS enabled your transmission is safe. You have to bear in mind that everything you write is still readable on the server side and if you would like to improve your privacy even further - you need to encrypt your message content as well.

There are several protocols you can use to achieve it, but in most cases, you will need a way to announce public data (e.g. public key) that others may use to establish a secure session with your device. This is where PEP comes in. It provides a facility for storage and distribution that can hold, for instance, OMEMO keys and metadata to your contacts.

Since this data is retrieved and updated fairly often, we have realised that a classic Mnesia backend is no longer sufficient for this purpose and we have put a lot of work into developing an efficient RDBMS backend for PubSub. Besides performance in high volume scenarios, it allows developers to use databases other than Mnesia, ones they are more familiar with.

During the development, we have been also able to pinpoint bottlenecks in the core PubSub code. That is why we have the parallelised distribution of PubSub messages.The pre-3.3 extension used only a single Erlang process to handle all requests, as well as, the broadcasts triggered by them. Currently, the notification distribution is done by a new, short-lived process for every request. Requests themselves are still processed in a single queue - since parallel execution led to transaction deadlocks in Mnesia - but the extension may be configured to process them in several queues. That fixes the issue we have observed on several occasions, where the PubSub process was simply overwhelmed with messages with no reasonable overflow control and back pressure what greatly impaired user experience.

Standardised PKI

Password-based authentication is still a basic method in many places. Why? Try remembering your RSA 4096-bit public key, not to mention the whole public+private key pair. (yes, we know about Keepass; but you most probably have it configured to use a master password, right?)

The PKI authentication is the current industry standard though. It is more secure than a private-public key pair and, if implemented properly, much more convenient for the average user to use. Support for this method debuted in MongooseIM 2.2.x and received a batch of improvements in every subsequent release. This one adds one of the last important pieces i.e. compliance with XEP-0178, which is an official PKI authentication method specification for XMPP. It describes how the SASL EXTERNAL mechanism should behave. In other words, it describes which certificate fields should be used in the process and how.

Pre-3.3 implementation verified only the Common Name field, while now it verifies xmppAddr fields (there may be more than one such field) with CN optionally used as a fallback. What is more, a full JID is verified instead of just the username.

Integration with RabbitMQ

The Event Pusher extension emerged in MongooseIM 2.1.x as a unification of several channels that our server used to deliver data to external endpoints - e.g. delivering user messages to a push notifications service. It has been extended over time and in MIM 3.3 it receives yet another backend: RabbitMQ.

It is especially beneficial for developers who need to digest IM events in an asynchronous manner. Since AMQP is a popular, powerful and pretty easy to learn protocol, it may be used to build a spam detection component. Events published via RabbitMQ may also be consumed for big data analysis (finding patterns in user preferences, behaviour etc.). These are only 2 examples. We imagine that every application may find innovative uses for a stream of events coming in from MongooseIM. What is more, using another Erlang-based piece of software ensures the reliability and performance of this tandem.

Consider the demo of the spam detection mechanism below to be an inspiration for you.

Changelog

Please feel free to read the detailed changelog. Here, you can find a full list of source code changes and useful links. Contributors Special thanks to our contributors: Test our work on MongooseIM 3.3.0 and share your feedback

  1. Help us improve the MongooseIM platform:
  2. Star our repo: esl/MongooseIM
  3. Report issues: esl/MongooseIM/issues
  4. Share your thoughts via Twitter
  5. Download Docker image with new release
  6. Sign up to our dedicated mailing list to stay up to date about MongooseIM, messaging innovations and industry news.
  7. Check out our MongooseIM product page for more information on the MongooseIM platform.

Permalink

Erlang OTP 21.3 is released

img src=http://www.erlang.org/upload/news/

OTP 21.3

Erlang/OTP 21.3 is the third service release for version 21 release with, improvements as well as a few features!

Highlights

Kernel:

  • The standard logger handler, logger_std_h, now has a new internal feature for log rotation. For full information see the documentation.

SSL:

  • The Reason part of the error return from the functions connect and handshake has a better and documented format. This is a potential incompatibility. See the documentation.
  • Refactoring of state handling has improved the TLS application data throughput and reduced CPU overhead
  • Code optimizations has reduced CPU load for encryption/decryption, especially for Erlang's distribution protocol over TLS
  • Now supports active N

Erl_interface:

  • Support for plugin of a user supplied socket implementation has been added.

OTP:

  • The HTML reference documentation now shows the OTP version where modules and functions were first introduced.
  • Versions of OTP older than R13B04 is not shown in the reference documentation

For a full list of details see:
http://erlang.org/download/otp_src_21.3.readme

Pre built versions for Windows can be fetched here:
http://erlang.org/download/otp_win32_21.3.exe
http://erlang.org/download/otp_win64_21.3.exe

Online documentation can be browsed here:
http://erlang.org/documentation/doc-10.3/doc

The Erlang/OTP source can also be found at GitHub on the official Erlang repository, Here: OTP-21.2

Please report any new issues via Erlang/OTPs public issue tracker

 
We want to thank all of those who sent us patches, suggestions and bug
reports!
 
Thank you!
 
The Erlang/OTP Team at Ericsson

Permalink

Idea: GenServers with Map-based state

I recently gave a talk at Empex LA in which I talked about my desire to see simplifications and enhancements to using some of the OTP behaviors offered in Elixir. In this post I’m going to explore a simple improvement to the GenServer API that would make it a little easier to work with.

GenServers are processes that have state that can be transformed when the GenServer receives a message. This state is represented in a single value that is passed into the handle_call or handle_cast function.

This is easy to manage if your GenServer only needs to manage a single piece of information. But as soon as you find that your GenServer needs multiple pieces of information in state, you need to substantially refactor it.

Suppose we have a GenServer that wraps an integer value, and you can increment it by sending it the increment message:

State is simple enough to work with here.

But let’s suppose we wanted to be able to increment by values other than just 1?

First we’ll make a struct for the module:

We’ll also need a type:

Now, when we start the GenServer, we want to specify that increment_by value. For backwards compatibility, we will default to an increment_by value of 1:

And now the init function needs updating too:

And when we want to check the current value of the integer, we can no longer just return state; we need to specifically grab the value part of our struct:

And, of course, when we increment, we can no longer just use state + 1 to increment, because state is no longer just the integer.

Here is our now-refactored GenServer:

To recap, we had to add a defstruct, a type for the module, and we had to modify all four functions in this module to make this work!

What I’d like to see

I’d like to take some inspiration for how ExUnit handles test contexts and propose a new way of managing GenServer state.

State as map

First off, instead of GenServer state being an any, let’s make it a map instead. Furthermore, when you return from handle_call or handle_cast, you will be expected to return a map that will then be merged with the existing state map. If there are no changes, to make, you could either return an empty map with your tuple, or just exclude the new state from your tuple. Excluding the state from the return tuple to indicate not updating it isn’t strictly needed for this to work, but it addresses a mild annoyance I’ve always had with GenServers and since we’re just talking hypothetically here let’s do it!

Now your individual functions no longer need to be concerned with the overall structure of state; you can instead pattern match on just the pieces of state that your function needs, returning a map of the values that you want to have merged into state. This gives your GenServer state the freedom to store more than a single value without burdening all of your handle_call and handle_cast callbacks with the complexity of that.

In case you have a use case where it would be better to replace the entire state with a brand new state(which is how GenServers currently work), this could be accomplished with a different reply atom, like :reply_replace_state, or :noreply_replace_state.

It may also be worth considering adding helper functions that can construct these return tuples, saving the developer from needing to remember a specific tuple structure.

Separating config data from state data

Let’s take this a step further. In our example GenServer, we have 2 types of data in our state: static configuration data that affects the GenServer’s behavior, and state data.

If our config is not stateful, let’s not keep it in state at all! To do this, we have handle_call and handle_cast take a parameter for state, and a separate parameter for config.

Now, we can treat our config as an immutable property of the GenServer because in your message handler callbacks you aren’t expected to update your config. As an added bonus, this provides you with some safety in knowing that you config won’t accidentally get overwritten by a bad merge of data into state.

The config can be static in this case, but that doesn’t mean we can never update it! This GenServer can provide a built-in update_config callback. After sending an update_config message to the GenServer, subsequent messages would be processed with the new config.

You could have the option to implement your own version of this in case you needed to have special handling(for instance, maybe you want to write the change to a log or a monitoring service).

Putting it all together

Let’s take a look at how our theoretical GenServer might now look with these characteristics:

By having the state data always be a map, it’s never a big leap to add additional values. And if we want to change the amount we increment by, it’s nice and easy:

This would be a simple quality of life improvement to using the GenServer API, and I’d like to see some simple improvements like this in the near future and have these improvements pave the way for more dramatic simplifications of these APIs.

In a future blog post, we will implement our own GenServer OTP behavior that follows these semantics.

Permalink

Personal notes from Lonestar Elixir Conf 2019

Personal notes from Lonestar Elixir Conf 2019

I just came back from Lonestar Elixir Conf 2019 in Austin, Texas. The conference was a single track, 2 days, with an extra day for an optional training (that I didn’t attend this time). Even not been considered the most important conference in the Elixir ecosystem in the US, the lineup was really great, with 4 keynotes including José Valim (Elixir creator), Chris McCord (Phoenix creator) and Justin Schneck (Nerves co-creator) and great talks.

If you want to watch the talks and keynotes, they are available in the Lonestar Elixir Conf YouTube channel.

The same way happened in the latest editions of Elixir Conf (the main conference in the US), the keynotes were focused in the 3 main areas, the Elixir language itself, the Phoenix web framework and the Nerves project for embedded devices. Besides the keynotes, the talks were also mostly around these 3 subjects somehow.
As always, I like to take notes when attending conferences and these are my highlights for Lonestar Elixir Conf 2019:

Elixir, distributed systems and future

Following José Valim's statement in his keynote at Elixir Conf US 2018 when he said that Elixir 2.0 is not in the plans, Elixir is getting to a point of maturity and stability. With a stable language and platform, the efforts now are directed to:

  • how we can show to companies that Elixir is a really strong option;
  • how we can facilitate the adoption and keep teams moving forward;
  • how we can solve hard problems, the ones not specific to Elixir, but in the high level, architecture way, such as distributed systems, data replication, and integration with other technologies;

José Valim introduced Broadway, a concurrent and multi-stage data ingestion and data processing. Broadway is based on GenStage and its goal is to facilitate integration with other data pipelines (SQS, Kafka and so on) and to leverage concurrency/performance that Elixir offers with controlled back-pressure.

In another part of José Valim's keynote he presented the Erlang Ecosystem Foundation. The main goal is to enhance the BEAM platform in areas such as interoperability , tools, documentation, embedded systems, and marketing, facilitating the general adoption. One of the foundation elements will be the working groups, allowing forward thinking and projects that can be community driven in some sense. Companies will be able to sponsor the foundation, and help the ecosystem funding projects, but the foundation teams are composed by individual members, projects will not be in the hands of a single company, for example.

Another common subject was Elixir in a distributed system with all the complexities regarding data synchronization, network issues and so on. Distributed systems are not new, and the common problems are known for anyone that experienced a distributed system environment.

Paul Schoenfelder, most know by its amazing contributions to the Elixir community with Distillery, Libcluster, Timex and many others, presented the challenges of a distributed system, how they are solved today in Elixir, and the important things to consider when implementing it. One of the challenges is how to test and ensure that the guarantees needed based on the desired solution are in place. He highlighted Raft consensus algorithm as a very good solution but in reality he is working on a library called Cadre that will concentrate in the guarantees you need and test them.

Another interesting talk was done by Dan Dresselhaus, his talk was more related with data replication and focused on questions we need to make before deciding what type of solution we will implement. As an example, things like where store data, how durable, how consistent, how performant, what is the data size, and what are the desired access patterns are really great to consider.

Phoenix

I attended Elixir Conf US last year when Chris McCord presented his library Phoenix LiveView, it is a really exciting project and it will open some many doors for applications that fit the context.
LiveView will allow client-side changes being implemented in pure Elixir, leveraging Phoenix channels and minimizing the need of having code in Javascript. As mentioned by Chris, LiveView will not replace Javascript development for all the use-cases but it will allow so many interactive client applications to be developed entirely in the server side via Elixir.

From last year to now, LiveView had very nice improvements in terms of templating through LiveEEx and the amount of data that is moved from the server to the client. Another great highlight is the LiveView.js library that is very lightweight if compared with React, Vue or Ember.
LiveView is coming up with lots of expectations and I am really interested in using it at FootBroker, my personal project using Event Sourcing.

Nerves

In his keynote, Justin Schneck showed some details around the new version 1.4 and how a firmware updated happens behind the scenes. The ability now to update firmware in many ways opens a good path for manufacture a device in large scale, thanks to Nerves Hub, an open source service that can be self-managed by anyone if firmware privacy is a concern.

Another important news is around some libraries to interact with different board circuits such as GPIOs, I2C and SPI that are receiving some good polishing.

In another talk, Todd Resudek showed his personal project with Nerves to create a smart sprinkler controller for his home. Awesome talk showing his development process and the project evolution with a failover strategy in case something goes bad.

Besides some initial play with Nerves, I never built anything so far, however Nerves is in my list for this year for things to try.

Hallway Track

Conferences are great, meeting some friends and making new connections are always an advantage if compared with only watching the talks at home after the fact.

I enjoyed every table and hallway conversation at Lonestar Elixir, and see you at Elixir Conf in Denver, CO.

Permalink

Personal notes from Lonestar Elixir Conf 2019

Personal notes from Lonestar Elixir Conf 2019

I just came back from Lonestar Elixir Conf 2019 in Austin, Texas. The conference was a single track, 2 days, with an extra day for an optional training (that I didn’t attend this time). Even not been considered the most important conference in the Elixir ecosystem in the US, the lineup was really great, with 4 keynotes including José Valim (Elixir creator), Chris McCord (Phoenix creator) and Justin Schneck (Nerves co-creator) and great talks.

If you want to watch the talks and keynotes, they are available in the Lonestar Elixir Conf YouTube channel.

The same way happened in the latest editions of Elixir Conf (the main conference in the US), the keynotes were focused in the 3 main areas, the Elixir language itself, the Phoenix web framework and the Nerves project for embedded devices. Besides the keynotes, the talks were also mostly around these 3 subjects somehow.
As always, I like to take notes when attending conferences and these are my highlights for Lonestar Elixir Conf 2019:

Elixir, distributed systems and future

Following José Valim's statement in his keynote at Elixir Conf US 2018 when he said that Elixir 2.0 is not in the plans, Elixir is getting to a point of maturity and stability. With a stable language and platform, the efforts now are directed to:

  • how we can show to companies that Elixir is a really strong option;
  • how we can facilitate the adoption and keep teams moving forward;
  • how we can solve hard problems, the ones not specific to Elixir, but in the high level, architecture way, such as distributed systems, data replication, and integration with other technologies;

José Valim introduced Broadway, a concurrent and multi-stage data ingestion and data processing. Broadway is based on GenStage and its goal is to facilitate integration with other data pipelines (SQS, Kafka and so on) and to leverage concurrency/performance that Elixir offers with controlled back-pressure.

In another part of José Valim's keynote he presented the Erlang Ecosystem Foundation. The main goal is to enhance the BEAM platform in areas such as interoperability , tools, documentation, embedded systems, and marketing, facilitating the general adoption. One of the foundation elements will be the working groups, allowing forward thinking and projects that can be community driven in some sense. Companies will be able to sponsor the foundation, and help the ecosystem funding projects, but the foundation teams are composed by individual members, projects will not be in the hands of a single company, for example.

Another common subject was Elixir in a distributed system with all the complexities regarding data synchronization, network issues and so on. Distributed systems are not new, and the common problems are known for anyone that experienced a distributed system environment.

Paul Schoenfelder, most know by its amazing contributions to the Elixir community with Distillery, Libcluster, Timex and many others, presented the challenges of a distributed system, how they are solved today in Elixir, and the important things to consider when implementing it. One of the challenges is how to test and ensure that the guarantees needed based on the desired solution are in place. He highlighted Raft consensus algorithm as a very good solution but in reality he is working on a library called Cadre that will concentrate in the guarantees you need and test them.

Another interesting talk was done by Dan Dresselhaus, his talk was more related with data replication and focused on questions we need to make before deciding what type of solution we will implement. As an example, things like where store data, how durable, how consistent, how performant, what is the data size, and what are the desired access patterns are really great to consider.

Phoenix

I attended Elixir Conf US last year when Chris McCord presented his library Phoenix LiveView, it is a really exciting project and it will open some many doors for applications that fit the context.
LiveView will allow client-side changes being implemented in pure Elixir, leveraging Phoenix channels and minimizing the need of having code in Javascript. As mentioned by Chris, LiveView will not replace Javascript development for all the use-cases but it will allow so many interactive client applications to be developed entirely in the server side via Elixir.

From last year to now, LiveView had very nice improvements in terms of templating through LiveEEx and the amount of data that is moved from the server to the client. Another great highlight is the LiveView.js library that is very lightweight if compared with React, Vue or Ember.
LiveView is coming up with lots of expectations and I am really interested in using it at FootBroker, my personal project using Event Sourcing.

Nerves

In his keynote, Justin Schneck showed some details around the new version 1.4 and how a firmware updated happens behind the scenes. The ability now to update firmware in many ways opens a good path for manufacture a device in large scale, thanks to Nerves Hub, an open source service that can be self-managed by anyone if firmware privacy is a concern.

Another important news is around some libraries to interact with different board circuits such as GPIOs, I2C and SPI that are receiving some good polishing.

In another talk, Todd Resudek showed his personal project with Nerves to create a smart sprinkler controller for his home. Awesome talk showing his development process and the project evolution with a failover strategy in case something goes bad.

Besides some initial play with Nerves, I never built anything so far, however Nerves is in my list for this year for things to try.

Hallway Track

Conferences are great, meeting some friends and making new connections are always an advantage if compared with only watching the talks at home after the fact.

I enjoyed every table and hallway conversation at Lonestar Elixir, and see you at Elixir Conf in Denver, CO.

Permalink

Copyright © 2016, Planet Erlang. No rights reserved.
Planet Erlang is maintained by Proctor.