Anyone who has ever built a distributed system must have considered the issue of communication between the components of the system. A message broker can typically be used to provide certain communication features or properties, such as temporal decoupling or reliability. This especially applies to data exchange between applications, where the distributed system consists of a collection of systems that exchange messages with each other.
Next comes the tricky matter of choosing a broker. Ideally, this choice should be based on the expected communication properties. Different views and different ways of understanding a problem can lead to a great deal of discussion about this choice. Moreover, with human organizations being what they are, sometimes less factual considerations come into play:
NOTE 1: An inappropriate choice doesn’t mean that “it won’t work”; it just means you will introduce some biases and/or require compensation mechanisms that might be clumsy and of uncertain effectiveness on the one hand, and expensive to implement on the other hand.
NOTE 2: This could easily lead to the conclusion that several types of brokers are required over the whole perimeter of the distributed system. It is even possible that several types of brokers could also be required for a given communication (message flow).
In my experience, the two biases just described are especially relevant when the choice is between a log-based broker and a queue-based or subscription-based broker (as defined by the Gartner classification, see the note below). This is the main subject of this article.
Let’s start with a quick review of the terminology related to types of brokers.
Let’s add a brief note on the different message categories.
At first sight, this distinction does not seem very useful if stated in this way. These shortcuts are mnemonic devices that encompass several considerations, which are described below. Finally, you could compare it to Tortuga Island: you don’t know how to go there unless you have already been there. These shortcuts are only effective if you know what’s behind them.
We will gradually get to the bottom of this in the following sections.
One of the first interesting questions to ask is: does the sender system have a specific intention when it sends one message in particular? Is it expecting something to happen at the other end of the line? Or does it need something to happen so it can continue doing its job?
If not, this system can be considered a data supplier, which informs the rest of the world of something only it knew about at a time T. This could be:
If not – in other words, if the sender has an expectation for the future – then writing in a log is probably not the most efficient solution. Instead, the “producer” will issue a command and in fact will then behave like a client consuming a service. This command might need:
Here I have focused on the producer’s expectations, but the same type of question can and should be asked from the consumer’s point of view. In the next paragraph, we will see that expectations can differ widely.
In the previous paragraph, I indirectly introduced the problem of relationships between systems via the concept of expectation or intention (implication: the producer’s intention with regard to the consumer(s)).
But this problem goes much further than just finding out what the producer expects. The consumer can also have expectations that do not necessarily align perfectly with what the producer wants. This comes as no surprise, since all of this is well documented and much discussed in the field of Domain Driven Design (DDD which, incidentally, is incredibly rich, informative, and efficient on many levels). Here I am going to take a rough but effective shortcut. For the purposes of our discussions, a message broker will materialize (or rather, drive) a border between two contexts (more concretely, and as a first approximation, between two systems).
Except that, in reality, it’s a bit like the border between two countries: a border has two sides, and the formalities for going from one country (context) to the other depend on the agreements (relationships) between the countries.
Let’s take an example: a warehouse management system provides an almost real-time flow of stock updates. In return for these stock updates, it expects nothing in particular from any consumer. It just says ‘to whom it may concern’ that a particular item has left a certain warehouse in a stated quantity (the same for incoming stock). Whether or not another system is listening to these updates makes no difference to it. No matter what, the stock movement took place, regardless of whether or not anybody is interested.
On the other side of the border, there are systems or partners (such as retail outlets or shopping websites) that need updates, but only for a specific subset of the stock, and possibly at a lower frequency, to supply their own internal cache (for example, once every night). We could also imagine another consumer being interested in all the updates, in order to perform predictive calculations with a view to adjusting the activity of a production line.
In that scenario, the producer clearly cannot accommodate the conflicting demands of the different consumers. You could take it one step further and say that those demands are not the producer’s problem.
Concretely, what effect does this have on the driving of borders? Realistically, relying exclusively on a log-based broker, which could be an appropriate solution from the warehouse management system’s standpoint (see next chapter), is too restrictive because it does not allow an efficient response to the demands of all consumers. You are then faced with the choice of making the consumers carry some of the features (e.g., filtering the flow to take only what interests them) or combining two types of brokers to provide a more complete service to a greater number of consumers.
To spice up the discussion, you can add the question of responsibility: who does what in all of this? This does mean people: which team is responsible for implementing this or that feature. Every development is a socio-technical construction, meaning that beyond the ‘code’ itself, certain areas of responsibility are involved. This subject is far from trivial, and it is crucial for the smooth running of the organization. Among other things, it’s the difference between tactical integration and strategic integration, but that’s a subject for another article.
To take it further, a DDD-type context can need a broker to drive its own internal needs/mechanisms. In the example above, the warehouse management system might consist of several micro-services that produce (and consume) each of the events. This context might make it necessary to publish these events in a log-based broker.
This creates a strong temptation to kill two birds with one stone by simply exposing this broker to outside consumers. And why not – but be careful not to confuse internal needs with the needs of external consumers. If you do, you might shift the load of certain features onto the consumer when they could have been offered on the producer side and therefore shared.
For example: in our scenario, if consumers only need some subsets of the events that are published, then each one will have to read the whole log and do its own filtering. This is quite inefficient for one consumer, and when multiplied over several consumers it becomes a significant waste of resources. In this case, it might be more beneficial to use the functionality of a queue-based or subscription-based broker to drive the context borders.
NOTE : it is technically possible to perform ‘routing’ based on the partition in a log-based broker. But be careful: it’s a form of coupling! This solution might be acceptable within the boundaries of one context, but far less acceptable for a third-party context.
Up to now, we have talked about events, but without addressing the distinction between discrete events and event flows. Let’s illustrate the difference using an example from the field of physics: position vs. velocity.
Each position measurement has its own value that is useful for certain applications (e.g., monitoring presence in an area), but the sequence of positions could provide information that is needed for another purpose. You can obtain the velocity by extrapolating from the sequence of positions: this effectively lets you monitor the velocity based on the position measurements.
In other words, the flow of positions has its own meaning, that you can use in addition to each of the separate positions. Another way to look at it is that you are going up one level of abstraction from the basic data (like the derivative in mathematics).
In this type of situation, a log-based broker is strongly recommended.
NOTE : although this is far beyond the scope of this discussion, be sure to bear in mind the limitations of real-time flow analysis (if real-time is needed), because the only effective analysis dimensions (indexes) are time and the partition key. Also be careful of the analysis depth, which should be restricted.
A related point to consider is the importance of the sequence. For additional clarity, think about managing a non-nominal case in a flow of messages: what should you do if you ‘miss’ a message or event (for any reason)? Or when the order is changed?
And, most importantly: what is the real impact on the consumer? What is the functional significance?
Take the example of warehouse management: an integration error on one of the stock events (or an inversion of events) can cause stock calculation errors, leading to the replenishment of a particular product. This results in a loss of efficiency over the whole processing chain and it wastes company resources.
In a case like this, it can be risky to choose a solely subscription-oriented broker. A log-oriented broker is clearly more advisable. Because the concept of order is important, however, partitioning should be done very carefully.
Furthermore, how should the non-nominal case be processed? In other words: what should you do about the message that couldn’t be processed? There are four possible options.
These problems of managing non-nominal cases give rise to a point that is often made and often aims to tip the scales in favor of a log-based broker: that messages can be replayed from a certain point in time. This is obviously a tempting argument, but be careful not to confuse a gadget with a requirement.
The question to ask is therefore: for which known use case is this actually needed?
This overview of the main issues illustrates the importance of integration analysis in the architecture of distributed systems. Note also that I have barely scratched the surface of some fundamental subjects such as the organizational impact of the integration subjects.
By way of conclusion, I offer a rough shortcut below.
Nevertheless, this is just a very personal and limited shortcut, that is only of any use to someone who has already followed the entire path (cf. Tortuga 🙂 ).