On this page
Apache Kafka is a very powerful and flexible platform for event streaming. The use of event streaming often grows to every corner of the organization, and this means that sooner or later it will handle event data that is considered sensitive or confidential. This is where the security setup of our deployment becomes very important.
Kafka offers many configuration options regarding security, at the cluster and topic level.
But with all these options of Kafka, the question that we need to ask ourselves is:
did I miss something?
The generic nature of the question brings another fact into focus. Security itself is a generalization. It consists of very different subjects that handle different aspects of security. To make sure that I capture most requirements for securing a Kafka Cluster I usually split it into 4 different subjects.
- Communication security
- Authentication
- Authorization
- Message Security
Communication security
Communication Security is the part of security I usually start with because it’s a fairly standard and straightforward subject with Kafka.
What do I mean with communication security?
Simply put, it’s the way how the communication between a client application and the Kafka cluster is set up and secured. If the event data is sensitive then we don’t want every system that’s on the same network as the cluster or client to be able to capture the data while it’s in transit. Kafka supports Transport Layer Security, or TLS, to make sure that the data that is transmitted is encrypted, and that the client applications can verify that nobody is impersonating the server.
There are several versions of TLS, and Kafka supports the latest version, 1.3, since Kafka 2.5.0 Kafka allows us to specify which TLS versions are supported, and which TLS version is used by default. We can also specify which cipher suites are supported, allowing us to use TLS settings that match the TLS capabilities of the client applications.
Authentication
Authentication in Kafka is about determining the identity of connecting clients. Kafka has several ways of authenticating clients.
TLS Client Authentication, or Mutual TLS, can be activated as part of the TLS configuration. This requires connecting clients to provide a certificate signed by a certificate authority that is trusted by the Kafka cluster. Kafka can extract the distinguished name or parts of it from your certificate to function as the principal name for the authorization phase
Kafka can also be configured to use SASL, or Simple Authentication and Security Layer. This is a standard framework used for authentication for internet protocols. TLS can be activated on top of SASL to make sure that the data cannot be read by other systems on the network. The Kafka SASL implementation supports several mechanisms for authentication, which can be enabled in parallel:
- PLAIN, which is based on the simple cleartext username password exchange With the default Kafka implementation all users and passwords need to be defined in the cluster configuration files
- SCRAM, or Salted Challenge Response Authentication System. This is a more modern password based authentication mechanism, where the passwords are never stored or exchanged as plain text, but salted and hashed using either SHA-256 or SHA-512.
- OAUTHBEARER, this allows us to use an OAuth2 framework for obtaining and verifying authentication tokens. The client application and the Kafka cluster both need access to the OAuth2 framework at all times.
- GSSAPI / Kerberos. Kafka can use existing Kerberos servers to handle the authentication of clients. Each broker in the kafka cluster and the client application need to be registered in the Kerberos server.
The Kafka SASL implementation supports customising or externalisation of these mechanisms. The libraries with the new implementations need to be in the Kafka classpath before setting the callback handler configurations.
Authorization
Authorization is the part of security about determining which operations a connecting client is allowed to perform and which operations are denied.
Kafka allows customisation of this logic by setting the authorizer class name in the configuration. The default implementation is provided by the AclAuthorizer class.
This class is based on Access Control Lists. An entry, or rule, consists of 7 parts:
- Principal name, or the name that the authentication implementations provided as the identity of the client
- Resource type, like topic, group, transactional id, cluster
- Resource name
- Pattern-type, or the mode of matching the resource name with the actual names on the cluster. Literal or prefixed matching is supported
- Operation
- Host
- ACL type, wether we want to allow or deny with this rule.
The default ACL approach gives us a good way to get a fine grained access control. Don’t forget to disable the cluster configuration that allows everyone if no acls are found for a resource.
And don’t forget that the brokers need specific access entries as well, or be set to super user in the cluster configuration
Message Security
The fourth and final subject I examine is message security.
I usually handle three parts of message security.
1. Access to the data as it’s stored on the brokers
2. Producer identity
3. Integrity check of record content
Kafka persists the record data it has received to disk. This means that anyone with access to the disk could read the data. Kafka itself doesn’t offer encrypted storage, but relies on the operating system to encrypt the storage drive if needed. Most of the operating systems or cloud providers have their own implementations to provide this functionality, and it is a good practice to investigate the advantages and disadvantages of each of the implementations.
The second and third part of message security, determining producer identity and integrity checking the content of a record can be important to some organisations. A common approach is by using message signatures, which use a producer specific key to generate a signature which is attached to the message. The consumer can verify that the contents of the record has not changed by using that information.
Kafka does not support this out of the box. If this is needed you need to develop or deploy third party implementations that can add the relevant information to the produced record and verify the message content and signature at the consumer side.
As a final note, Kafka supports multiple listeners.
A listener basically listens on a specific port, and each listener can have its own configuration regarding TLS and Authentication. This allows us to create a listener for external networks with more different TLS and Authentication options, while also having a separate listener for internal networks which can use internal authentication mechanisms. The authorisation model does not change per listener.
These are the security options of Kafka in a nutshell.
If you’re struggling with setting up security for Apache Kafka, feel free to reach out to us – we’re here to help.
Download the Whitepaper
Download nowAnswers to your questions about Axual’s All-in-one Kafka Platform
Are you curious about our All-in-one Kafka platform? Dive into our FAQs
for all the details you need, and find the answers to your burning questions.
Apache Kafka uses Access Control Lists (ACLs) to manage authorization. ACLs specify which clients are allowed or denied access to specific resources, such as topics, consumer groups, or clusters, based on their principal name (authenticated identity). Kafka’s ACLs allow fine-grained control over permissions like read, write, and delete. Custom authorization models can also be implemented by configuring Kafka’s authorizer class. This ensures that only authorized users can perform certain actions within the Kafka ecosystem.
Apache Kafka supports several authentication methods, including TLS Client Authentication (Mutual TLS), SASL (Simple Authentication and Security Layer) with mechanisms like PLAIN, SCRAM, GSSAPI (Kerberos), and OAUTHBEARER. Each of these methods ensures that only authorized clients can connect to the Kafka cluster. Organizations can choose the authentication mechanism that best fits their security needs, including integrating with OAuth2 or existing Kerberos servers for centralized management.
Related blogs
Apache Kafka has become a central component of modern data architectures, enabling real-time data streaming and integration across distributed systems. Within Kafka’s ecosystem, Kafka Connect plays a crucial role as a powerful framework designed for seamlessly moving data between Kafka and external systems. Kafka Connect provides a standardized, scalable approach to data integration, removing the need for complex custom scripts or applications. For architects, product owners, and senior engineers, Kafka Connect is essential to understand because it simplifies data pipelines and supports low-latency, fault-tolerant data flow across platforms. But what exactly is Kafka Connect, and how can it benefit your architecture?
Apache Kafka is a powerful platform for handling real-time data streaming, often used in systems that follow the Publish-Subscribe (Pub-Sub) model. In Pub-Sub, producers send messages (data) that consumers receive, enabling asynchronous communication between services. Kafka’s Pub-Sub model is designed for high throughput, reliability, and scalability, making it a preferred choice for applications needing to process massive volumes of data efficiently. Central to this functionality are topics and partitions—essential elements that organize and distribute messages across Kafka. But what exactly are topics and partitions, and why are they so important?
Strimzi Kafka offers an efficient solution for deploying and managing Apache Kafka on Kubernetes, making it easier to handle Kafka clusters within a Kubernetes environment. In this article, we'll guide you through opening a shell on a Kafka broker pod in Kubernetes and listing all the topics in your Kafka cluster using an SSL-based connection.