Why should you Deploy Kafka Clusters on Kubernetes

Apache Kafka is a popular open-source tool for message streaming. It can be used to store and process huge amounts of data and can allow applications to utilise streams of data in real-time. Kafka is typically used together with Apache Zookeeper to create scalable and fault-tolerant clusters to make application messaging seamless.

Meanwhile, Kubernetes is a platform that allows a team to manage, deploy, automate and scale workloads like Kafka. Deploying Kafka on Kubernetes is an adventure that promises huge returns. It allows organisations to simplify a lot of operations like upgrades, monitoring, and restarts as these are built within the Kubernetes platform.

In this piece, you will learn the process involved in deploying Kafka on Kubernetes. If you are ready, here we go.

Kafka on Kubernetes: Check Out the Process

  1. Configure Namespace

This is not a mandatory step but it is advisable. A namespace in Kubernetes will serve as a separation between the scope and the functionalities of the system.

  1. Node-based deployment

The primary reason for node-based deployment is to enable the running of Kafka brokers on different machines and different availability zones. With this, if one availability zone or one machine goes down, the cluster will still serve applications with data and be active.

To do this, first you get the nodes. Afterwards,you need to identify the nodes you want to deploy Kafka on and then you tag them with a name.

  1. Deploy Zookeeper

Kafka needs Zookeeper to manage service discovery for brokers that form the cluster. It’s the Zookeeper that sends any new changes in topology to Kafka. With such information, each node in the cluster gets to know when a broker dies or joins, when a topic is added or removed, etc. Zookeeper gives an in-sync view of Kafka Cluster configuration. In a nutshell, Zookeeper is a primary dependency for Apache Kafka so it is crucial to deploy it first.

  1. Deploy Kafka

After the Zookeeper cluster has been deployed, use the service names to allow Kafka to communicate with the cluster. You can build the Kafka image through installation with a specific configuration or by using a ready-made one. To allow external apps to publish messages to Kafka, a load balancer in service can be created for Kafka pod.

 

Other Factors to Consider When Running Kafka on Kubernetes

  • Low Latency Network and Storage

The ideal conditions for Kafka have low contention for data on the wire, low noise accessing storage and high throughput.

  • Disaster Recovery Strategy

Kafka provides data mirroring between clusters and replication of topics. Therefore, it’s important to consider the time it takes for replicas to be rebuilt and the disaster recovery strategy that’s in place when a cluster or zone fails.

  • Data Security

The in-built security features in Kafka include encryption using SSL between brokers, access controls for operations, and authentication. Be that as it may, it’s equally important to consider the level of protection of the data in the disk’s file systems. If it’s not adequately protected, dubious users can have access to manipulate the data.

 

Download our whitepaper

Want to know how we have build a platform based on Apache Kafka, including the learnings? Fill in the form below and we send you our whitepaper.

 

Important Kafka Performance Metrics to Monitor

Release Update 2020.2