Skip to content
Waterstream.io
The MQTT platform at scale
Waterstream.ioWaterstream.io
  • Product
    • What is Waterstream?
    • Use cases
    • Online demo
    • Try Waterstream
    • Technical comparison
    • Professional services
  • FAQ
  • Blog
  • Resources
    • Documentation
    • Code
    • DockerHub
    • Support/Forum
    • Contact
  • Try Waterstream
  • Product
    • What is Waterstream?
    • Use cases
    • Online demo
    • Try Waterstream
    • Technical comparison
    • Professional services
  • FAQ
  • Blog
  • Resources
    • Documentation
    • Code
    • DockerHub
    • Support/Forum
    • Contact
  • Try Waterstream

What if the Waterstream node goes down?

Apr162021
MQTT

If you’ve ever wondered how Waterstream cluster behaves if one of its members goes down – here are some tests we’ve done.

The test setup consists of Kafka provided by Confluent Cloud, 5 nodes of Waterstream and a load balancer running on Google Cloud Platform, and 5 nodes of MQTT load simulator – also on GCP.

The scripts we used for creating topics in Confluent Cloud and for running Waterstream in Google Cloud are here: https://github.com/simplematter/waterstream-gcp-terraform. This setup has a single Kafka topic for MQTT messages with 10 partitions (in Confluent Cloud cluster capacity depends on number of partitions). For Waterstream deployment, we’ve used 5 n1-standard-1 nodes (1 CPU, 3.75 GB RAM). A separate VM hosts Prometheus and Grafana, which we’re going to use for monitoring Waterstream behavior during a simulated node failure.

MQTT load generator also has scripts for launching it on Google Cloud: https://github.com/simplematter/simplematter-mqtt-load-simulator/tree/master/toolbelt/terraform_gcp. We’ve configured it to run 5 nodes on the same machine type – n1-standard-1. Each node spawns 20k clients with ramp-up time 120 seconds. Together that makes 100k clients. When ramp-up completes, each clients sends 0.8… 1.2 KB QoS 2 (exactly once) PUBLISH message every 10 seconds. Clean Session flag for all the clients is false, so that we could also test the loading of the session data upon client reconnect.

Having all this infrastructure started, we’ve waited few minutes to see all the clients connected and produce messages and expected. Then opened a console of one of the Waterstream nodes and shut down the Docker container of the Waterstream:

Then watch Waterstream Grafana dashboard to see the effect:

As you can see, after a while clients started to notice that connections are broken and they need to re-connect. 1 minute 45 seconds after the start of the simulated incident all the clients have successfully connected to the survivor nodes. Looking at the connection details in the Load Simulator Waterstream dashboard we can see that there were 19.9k connections with the node that went down and that there were some unsuccessful attempts to re-connect while the load balancer hadn’t yet detected the node failure:

And here is the part of the Waterstream dashboard that shows the session loading metrics:

You may see that it has successfully loaded existing sessions for the clients that were re-connecting.

As the tests are complete, shut down load generator and Waterstream, and remove topics from Confluent Cloud to stop being charged.

This test demonstrates how survivor Waterstream nodes may take over the load of the failed one, keep the cluster running and client sessions available. If you want to repeat these tests yourselves you can ask evaluation license here: https://waterstream.io/contact/ and get support on our forum: https://dev.waterstream.io/.

Enjoy your IoT!

Category: MQTTBy Paul LysakApril 16, 2021
Tags: Mqtt Broker KafkaMulti Node Mqtt Broker KafkaResilient Kafka Mqtt
Share this post
Share on FacebookShare on Facebook TweetShare on Twitter Pin itShare on Pinterest Share on LinkedInShare on LinkedIn

Author: Paul Lysak

Post navigation

PreviousPrevious post:Security in Industrial Internet of Things networksNextNext post:IoT and satellite communications

Related posts

Shift to IoT.2
Shift to IoT.2
January 10, 2022
Machine Learning and AI in Travel
Machine Learning and AI in Travel
January 3, 2022
The Benefits of Standardisation in IoT
The Benefits of Standardization in IoT
December 27, 2021
IoT Connectivity and the Death of 2G and 3G
IoT Connectivity and the Death of 2G and 3G
December 2, 2021
Data Analytics on the Farm
Data Analytics on the Farm
November 25, 2021
Connected Cars with MQTT and Kafka
Empowering Connected Car with MQTT, Kafka, and Waterstream
October 22, 2021
Contact Info
  • SimpleMatter Srl
    The Era of Evolutionary Systems
  • Location:
    Galleria Gandhi 15, Rho - 20017, Milano, Italy

Find us on:

Linkedin page opens in new windowGithub page opens in new windowMail page opens in new windowWebsite page opens in new window
Product
  • Waterstream
  • Use cases
  • Online demo
  • Technical comparison
  • Try Waterstream
  • Professional services
  • FAQ
  • Documentation
  • Code
Recent News
  • Shift to IoT.2
    January 10, 2022
  • Machine Learning and AI in Travel
    January 3, 2022
  • The Benefits of Standardization in IoT
    December 27, 2021
  • Privacy Policy