Technology 20 Feb 2020
Why Should You Deploy Kafka Clusters on Kubernetes
Apache Kafka is a popular open-source tool for message streaming. It can be used to store and process huge amounts of data and can allow applications to utilise streams of data in real-time. Kafka is typically used together with Apache Zookeeper to create scalable and fault-tolerant clusters to make application messaging seamless. Meanwhile, Kubernetes is […]
Apache Kafka is a popular open-source tool for message streaming. It can be used to store and process huge amounts of data and can allow applications to utilise streams of data in real-time. Kafka is typically used together with Apache Zookeeper to create scalable and fault-tolerant clusters to make application messaging seamless.
Meanwhile, Kubernetes is a platform that allows a team to manage, deploy, automate and scale workloads like Kafka. Deploying Kafka on Kubernetes is an adventure that promises huge returns. It allows organisations to simplify a lot of operations like upgrades, monitoring, and restarts as these are built within the Kubernetes platform.
In this blog, you will learn the process involved in deploying Kafka on Kubernetes. If you are ready, here we go.
Kafka on Kubernetes: Step by Step
1. Configuring the namespace
This is not a mandatory step but it is advisable. A namespace in Kubernetes will serve as a separation between the scope and the functionalities of the system.
2. Node based deployment
The primary reason for node-based deployment is to enable the running of Kafka brokers on different machines and different availability zones. With this, if one availability zone or one machine goes down, the cluster will still serve applications with data and be active.
To do this, first you get the nodes. Afterwards,you need to identify the nodes you want to deploy Kafka on and then you tag them with a name.
3. Deploy Zookeeper
Kafka needs Zookeeper to manage service discovery for brokers that form the cluster. It’s the Zookeeper that sends any new changes in topology to Kafka. With such information, each node in the cluster gets to know when a broker dies or joins, when a topic is added or removed, etc. Zookeeper gives an in-sync view of Kafka Cluster configuration. In a nutshell, Zookeeper is a primary dependency for Apache Kafka so it is crucial to deploy it first.
4. Deploy Kafka
After the Zookeeper cluster has been deployed, use the service names to allow Kafka to communicate with the cluster. You can build the Kafka image through installation with a specific configuration or by using a ready-made one. To allow external apps to publish messages to Kafka, a load balancer in service can be created for Kafka pod.
Other Factors to Consider When Running Kafka on Kubernetes
Low latency network and storage
The ideal conditions for Kafka have low contention for data on the wire, low noise accessing storage and high throughput.
Disaster Recovery Strategy
Kafka provides data mirroring between clusters and replication of topics. Therefore, it’s important to consider the time it takes for replicas to be rebuilt and the disaster recovery strategy that’s in place when a cluster or zone fails.
The in-built security features in Kafka include encryption using SSL between brokers, access controls for operations, and authentication. Be that as it may, it’s equally important to consider the level of protection of the data in the disk’s file systems. If it’s not adequately protected, dubious users can have access to manipulate the data.