Product

Explore the reference architecture and advanced features of Waterstream for large-scale IoT

How it works

Incoming MQTT messages from clients as well as client-state are saved directly into Kafka without intermediate storage.

favicon_wt-01

Waterstream can read records from Kafka topics and eventually send them to clients using MQTT or Websockets.

favicon_wt-01

Every Waterstream node is stateless because everything is stored into Kafka. This allows low latency and excellent scalability.

favicon_wt-01

Reference architecture

A typical scenario requires to deploy multiple Waterstream instances, also called nodes, to provide fault tolerance and scalability. Waterstream nodes do not store any information, everything is persisted in Kafka, therefore they can be added or removed dynamically according to the load. A load balancer is required between MQTT clients and Waterstream nodes to distribute network traffic.

Waterstream persists incoming MQTT messages to the configured Kafka topics. Once in Kafka, data that can be consumed by any Kafka client, such as a Kafka consumer, Kafka Connect, and Kafka Streams applications. Kafka producers can send messages back to the MQTT clients by writing on designated Kafka topics.

Waterstream provides integrated observability through Prometheus and Grafana, Customized metric solutions can be added through a plugin system.

Manage millions of clients

Waterstream scales out linearly. For most operations, its nodes don’t depend on each other and more nodes can be added to support an increasing number of clients.

Several scalability tests have been executed to test and tune Waterstream performance.As shown in the graph, Waterstream was capable of managing more than one million connected devices, using only 12 nodes of modest computing power (2 CPU, 7.5 GB RAM).

Key Features

MQTT-Kafka integration

Waterstream enables transparent and bi-directional integration between MQTT clients and Kafka, allowing data collected via MQTT to be used, processed and distributed by Kafka using streaming tools (e.g. Apache Flink, Kafka Streams). At the same time, it can feed MQTT clients with data from Kafka.

Absence of additional dependencies

No extra dependencies beyond Kafka are required, thus simplifying the implementation and integration process. This reduces operational complexity and facilitates maintenance.

Stateless and scalable

Waterstream is a stateless application, which makes it easily scalable and resilient to failures. It can be deployed across multiple instances behind a load balancer, ensuring high availability and support for millions of MQTT clients.

Multi-cloud and hybrid environments support

 Waterstream is not tied to a single cloud provider, making it suitable for multi-cloud and hybrid scenarios. This offers greater flexibility for businesses operating in diverse cloud environments.

Optimization for unstable networks

Thanks to the MQTT protocol, it is ideal for scenarios where clients operate on unstable or intermittently connected networks, while still guaranteeing reliable data access on Kafka.

Advanced features through Kafka

Advanced features through Kafka: By integrating Kafka, Waterstream adds advanced features such as rewind (ability to go back in message history) and message validation through Kafka’s Schema Registry.

WebSocket Support

Supports MQTT over WebSocket, allowing data to be streamed from Kafka directly to web clients via the browser, further expanding application potential.

Bridge mode

Waterstream can operate as a bridge to an existing MQTT broker, bringing only a subset of topics to Kafka, allowing a gradual and flexible integration.

Support for MQTT 3.1 and 5

Waterstream fully implements the specifications of MQTT protocol versions 3.1 and 5, ensuring full compatibility with applications using these standards.

Deploy everywhere with any Kafka compatible platform

Waterstream is distributed as a Docker image (x86/ARM64) with minimal requirements of RAM and CPU. Waterstream can be deployed at the edge, on-premises, and in the cloud as a standalone process or inside a Kubernetes cluster. To know more about this, check our documentation.

Waterstream requires Apache Kafka version 1.1.0 or greater to work. Several distributions of Kafka support this, like Confluent Cloud or IBM Event Stream. Waterstream is a Confluent Verified Integration meeting the standard quality and functional requirements to work with Confluent Cloud.

Waterstream also works with alternative implementations of the Kafka protocol like Redpanda. To know more, check out our Redpanda integration demo.

Ready to get started?

Request a demo or talk to our technical sales team to answer your questions.