Technology 20 Aug 2021
Why and How You Should Deploy Kafka Clusters on Kubernetes
Apache Kafka is a popular open-source tool for message streaming. You can use it to store and process vast amounts of data and allow applications to utilise data streams in real-time. Kafka is typically used with Apache Zookeeper to create scalable and fault-tolerant clusters to make application messaging seamless. Meanwhile, Kubernetes is a platform that allows a team to […]
Apache Kafka is a popular open-source tool for message streaming. You can use it to store and process vast amounts of data and allow applications to utilise data streams in real-time. Kafka is typically used with Apache Zookeeper to create scalable and fault-tolerant clusters to make application messaging seamless.
Meanwhile, Kubernetes is a platform that allows a team to manage, deploy, automate and scale workloads like Kafka. Deploy Kafka on Kubernetes as an adventure that promises huge returns. It enables organisations to simplify many operations like upgrades, monitoring and restarts as these are built within the Kubernetes platform.
This blog guides you through the process you need to deploy Kafka on Kubernetes.
Kafka on Kubernetes: Check out the process
1. Configuring the namespace
This step is not mandatory but is advisable. A namespace in Kubernetes serves as a separation between the scope and the functionalities of the system.
2. Node based deployment
The primary reason for node-based deployment is to run Kafka brokers on different machines and different availability zones. Then, if one availability zone or one machine goes down, the cluster is still active and serving applications with data.
First, you get the nodes. Then, you need to identify the nodes you want to deploy Kafka on and then tag them with a name.
3. Deploy Zookeeper
Kafka needs Zookeeper to manage service discovery for brokers that form the cluster. Zookeeper sends any new changes in topology to Kafka. With such information, each node in the cluster gets to know when a broker dies or joins and when a topic is added or removed. Zookeeper gives an in-sync view of Kafka Cluster configuration. In a nutshell, Zookeeper is a primary dependency for Apache Kafka, so it is crucial to deploy it first.
4. Deploy Kafka
After the process of deploying the Zookeeper cluster, use the service names to allow Kafka to communicate with the cluster. You can build the Kafka image through installation with a specific or ready-made configuration. To allow external apps to publish messages to Kafka, you can create a load balancer in service for Kafka pod.
Other Factors When Running Kafka on Kubernetes
Low latency network and storage
The ideal conditions for Kafka have low contention for data on the wire, low noise accessing storage and high throughput.
Disaster Recovery Strategy
Kafka provides data mirroring between clusters and replication of topics. So, it’s essential to consider the time it takes to rebuild replicas and the disaster recovery strategy that’s in place when a cluster or zone fails.
Data Security
Kafka’s in-built security features include encryption using SSL between brokers, access controls for operations and authentication. It’s equally important to consider the data’s level of security in the disk’s file systems. If the data is not adequately protected, bad actors can gain access to manipulate it.
Download our whitepaper
Want to know how to deploy Kafka and how we built a platform based on Apache Kafka, including the learnings? Download our whitepaper.