Waterstream is a full-featured MQTT broker that runs on top of Apache Kafka—requiring nothing else—enabling bidirectional communication with devices as well as device MQTT state storage. A mission-critical IoT use case can be implemented on the cloud or on-premises with a seamless deployment, and directly “out of the box” Waterstream supports millions of connected clients.
MQTT is the most popular IoT protocol for very good reasons: it’s lightweight, it’s supported by all programming languages, and it’s built for poor connectivity scenarios such as mobile networks. On the other hand, it’s not designed for stream processing and does not support the reprocessing of events.
Apache Kafka is a streaming platform that enables large scale, high availability, long term storage, and very good integration with enterprise technologies. Yet, it’s not designed for the IoT: it requires a stable network, does not support tens of thousands of connections, and does not support IoT specific features such as Keep Alive or Last Will.
With Waterstream, you get the best of both ecosystems in one. Now MQTT and Kafka streaming work together perfectly, making high scalability with millions of connections, real-time stream processing, enterprise integration, as well as analytics of IoT data, a reality. Better still, Waterstream delivers this reality in the simplest way possible.
Waterstream is an MQTT broker that runs Kafka natively as a Kafka Streams application. There are no external MQTT clusters to manage or integration pipelines to develop in order to move your data from devices to topics. That’s because Waterstream works as a bidirectional layer between Kafka and IoT devices.
Waterstream does not require an intermediate persistence layer because it uses Kafka “under the hood” as the only persistence layer. The built-in persistence benefits of using Kafka are all included: high availability, high throughput, and low latency. Messages coming from MQTT clients are immediately written to Kafka. Likewise, as soon as a message is retrieved from Kafka, it’s sent to MQTT clients. This means that MQTT state is stored in Kafka with no need for additional storage solutions.
With Waterstream, you have to manage and monitor only one distributed infrastructure for your IoT projects. IoT can’t be any simpler than that.
In the typical deployment scenario, devices send data to Waterstream through a load balancer. Due to its stateless and elastic nature, Waterstream nodes are added or removed dynamically according to the load or needs. Waterstream persists incoming data to the Kafka topic that can be consumed by any Kafka tool, such as a Kafka consumer, Kafka Connect, and Kafka Streams applications. Producers can send messages back to the devices by writing on designated Kafka topics. Waterstream delivers these messages to the edge as rapidly as they are consumed.
Waterstream provides integrated observability through Prometheus and Grafana. Check out our demo page to see an example in action. Our modern architecture is pluggable, which means, for example, that you can add in your own metrics solution.
Waterstream scales out linearly. For most operations, its nodes don’t depend on each other, so you can add more machines if you have an increasing number of clients to support.
Several scalability tests have been executed to test and tune Waterstream performance. For your project’s benefit, we provide a simple toolset that you can use to compare Waterstream with other streaming solutions. The details of the test methodology is a the topic of an upcoming blog post.
As shown in the below graph, Waterstream was capable of managing more than one million connected devices, using only 12 nodes of modest computing power (2 CPU, 7.5 GB RAM).
Waterstream can also scale in to support smaller deployments on the edge that are closer to the monitored devices. This delivers two business advantages: it reduces the latency between the devices and Waterstream; data may be pre-filtered and pre-aggregated on the edge before sending payloads to the data center, thus reducing the networking and storage needs.
Consider an outstanding example of Waterstream throughput. Using a computer with the minimal specifications of 1 CPU and 3.7 GB RAM, that hosts both Kafka and ZooKeeper, Waterstream managed to, within 10 seconds, process a single message from 20 thousand devices. Clearly, Waterstream gives you the edge you require for edge computing.