Event streaming with Apache Kafka at the edge is commonly used to provide the same open, flexible, and scalable architecture at the edge as in the cloud or data centre. Possible locations for a Kafka edge deployment include retail stores, cell towers, trains, small factories, restaurants, etc. “Edge Kafka” is not simply another IoT project using Kafka in a remote location.
Edge Kafka is an essential component of a streaming nervous system that spans IoT (or OT in Industrial IoT) and non-IoT (traditional datacenter / cloud infrastructures). Running Kafka clients and brokers at the edge enables edge processing, integration, decoupling, low latency, and cost-efficient data processing.
What is different at the edge?
Offline business continuity is important even if the connection to the central data centre or cloud is not available. Disconnected sites often do not require or provide high availability. They use local pre-processing, real-time analytics with low latency, and are only online from time to time or with low bandwidth.
Often Kafka brokers need to be deployed across hundreds of locations. A single broker is often good enough, without high availability, but for back pressure and local processing. Low-footprint, low-touch, easy installations of Kafka brokers are mandatory for many of these use-cases. Usually there are no IT experts available “on-site” to operate Kafka. Many edge use cases are all around sensor and telemetry data. And an application that processes millions of messages per second is fine with losing a few of the messages as it does not affect the outcome of the calculation.
Consumer IoT always includes the users, Industrial IoT always includes tangible goods. These are examples of hybrid or non-completely cloud based systems. Thousands and tens of thousands of connected interfaces: Sensors, machines, mobile devices, etc. Using one single technical infrastructure enables building edge and hybrid architectures. Not having different frameworks and products is a huge benefit from a development, testing, operations, and support point of view.
Use Cases for Kafka at the Edge
Data integration, pre-processing and replication to the cloud, big and small data processing and analytics are all good examples of what can be done at the edge. Working at the edge can be the best solution in disconnected offline scenarios, where there are very low footprint requirements, scenarios with hundreds of locations, or without the high availability.
Kafka is well implemented alongside Waterstream in fields such as Automotive (Connected cars systems monitoring, automated emergency management), Agriculture (Sensor-based field and resource mapping, remote crop monitoring), Smart energy (Fault detection, smart lighting, smart grid asset monitoring), Logistics (Real time fleet management, smart labels, predictive maintenance), Industrial IoT (Predictive maintenance, failure mitigation and safety control, workforce tracking), and Sport and fitness (Fitness trackers to mobile devices integration, data integration from different sports gear, support of tiny sportswear sensors).