Incoming MQTT messages from clients as well as client-state are saved directly into Kafka without intermediate storage.
Why Waterstream
Waterstream is an MQTT broker that uses Kafka as its own and only storage and distribution engine, combining the most popular IoT protocol with the standard de-facto streaming API.
MQTT
MQTT is the most popular IoT protocol for very good reasons: it’s lightweight, it’s supported by all programming languages, and it’s built for poor connectivity scenarios such as mobile networks. But it’s not designed for stream processing and does not support the reprocessing of events.
Apache Kafka
Apache Kafka is a streaming platform that enables large scale, high availability, long term storage and seamless integration with other technologies. But it’s not designed for the IoT because it requires a stable network, does not support tens of thousands of connections, and does not support IoT specific features such as Keep Alive or Last Will.
The best of both ecosystems
Waterstream gives you the the best of both ecosystems in one making MQTT and Kafka streaming work together perfectly with high scalability, millions of connections, real-time stream processing, easy integration with database, key-value stores, search indexes and file systems.
How it works
Waterstream can read records from Kafka topics and eventually send them to clients using MQTT or Websockets.
Every Waterstream node is stateless because everything is stored into Kafka. This allows low latency and excellent scalability
Reference architecture
A typical scenario requires to deploy multiple Waterstream instances, also called nodes, to provide fault tolerance and scalability. Waterstream nodes do not store any information, everything is persisted in Kafka, therefore they can be added or removed dynamically according to the load. A load balancer is required between MQTT clients and Waterstream nodes to distribute network traffic.
Waterstream persists incoming MQTT messages to the configured Kafka topics. Once in Kafka, data that can be consumed by any Kafka client, such as a Kafka consumer, Kafka Connect, and Kafka Streams applications. Kafka producers can send messages back to the MQTT clients by writing on designated Kafka topics.
Waterstream provides integrated observability through Prometheus and Grafana, Customized metric solutions can be added through a plugin system.
Manage millions of clients
Waterstream scales out linearly. For most operations, its nodes don’t depend on each other and more nodes can be added to support an increasing number of clients.
Several scalability tests have been executed to test and tune Waterstream performance.As shown in the below graph, Waterstream was capable of managing more than one million connected devices, using only 12 nodes of modest computing power (2 CPU, 7.5 GB RAM).
Deploy everywhere with any Kafka compatible platform
Waterstream is distributed as a Docker image (x86/ARM64) with minimal requirements of RAM and CPU. Waterstream can be deployed at the edge, on-premises, and in the cloud as a standalone process or inside a Kubernetes cluster. To know more about this, check our documentation.
Waterstream requires Apache Kafka version 1.1.0 or greater to work. Several distributions of Kafka support this, like Confluent Cloud or IBM Event Stream. Waterstream is a Confluent Verified Integration meeting the standard quality and functional requirements to work with Confluent Cloud.
Waterstream also works with alternative implementations of the Kafka protocol like Redpanda. To know more, check out our Redpanda integration demo.
Ready to get started?
Request a demo or talk to our technical sales team to answer your questions.