Messaging is a critical aspect to any distributed architecture. Even more vital in modern architecture is that you need a way to decouple actors or systems within your ecosystem. Pub/Sub is one model that allows you achieve this. In tthis model one component in your environment broadcasts/ publishes a message which can then be consumed/subscribed to by by 1 or more components or consumers. The compoment that acts as the middle man in this model is called a broker, which is what MQTT and Kafka do.

MQTT is a protocol made for light weight messages. It is great for situations where low bandwidth is a constraint and with about 2 bytes header size it is well suited for small devices like an Arduino and Raspberry Pi.

Key Characteristics of MQTT

  • Lightweight (2byte overhead per message)
  • Easy to parse for computers (length prefixed length)
  • Broker keeps track of client state
  • Guarantees message delivery through retries
  • Security (usr/pwd over TLS)
  • Dynamic topics
  • Runs on TCP

Kafka is essentially a distributed commit log. Kafka’s claim to fame is its support for high throuhput and message durability even at scale.

Key Characteristics of Kafka

  • Distributed Log aggregator
  • Client keeps track of what messages it has processed
  • Meant for server side environments (server to server)
  • Static topics (topic - static, key - dynamic)

Common features across both solutions

  • Both brokers for pub/sub
  • Decouple consumers from producers
  • Offer measure of reliability (QoS 1, 2 &3)

Both solutions address the key needs I have for messaging so which did I choose. Well I chose both. The essential thing about these two solutions is that:

  • MQTT is suited for dumb clients
  • Kafka for smart clients

In my ecosystem the sensors at the edge capturing electricty usage, temperature and humidity readings etc are simply observing an event and transmitting light weight data. In this case I use MQTT to pipe in their data. Kafka comes into the picture as the main broker through which all collected data are streamed over to the ELK Stack. I forsee Kafka playing a much bigger role when I pull in data at rest or batch data for analysis.

I’m still working with these two to really grasp where they fit, but so far I feel comfortable about how they are used. I did come across some material on the web around problems with Scaling MQTT. Scale is not something I’m worried about around the edge, so MQTT does what I need. Kafka helps me manage the scale I have at the heart of the ecosystem.

The other choice I had to make recently was how to persist the data, which I’ll cover in a later post.