Apache Kafka 23 Aug 2024

Understanding Kafka: Message Size, Producer Examples, and Consumer Groups

Understanding Kafka can seem challenging, but in this blog, we simplify the concepts of Kafka’s maximum message size, how to use Kafka producers, and what consumer groups do. Ideal for beginners and those looking to expand their knowledge.

Understanding Kafka: Message Size, Producer Examples, and Consumer Groups

Apache Kafka is a powerful tool for handling real-time data streams, and understanding its components can greatly enhance your ability to manage and process data efficiently. In this blog, we’ll break down three essential aspects of the streaming framework: maximum message size, Kafka producers, and consumer groups. Let’s dive in!

Kafka Maximum Message Size

One of the key parameters you need to understand is the maximum message size. This is the largest message that Kafka will allow a producer to send to a topic.

What is the Default Maximum Message Size?

By default, the maximum message size is set to 1 MB (megabyte). This limit is set to ensure that the brokers can handle the messages without running into memory issues. However, depending on your use case, you might need to send larger messages.

How to Increase the Maximum Message Size

If you need to increase this limit, you can adjust the message.max.bytes setting on the broker. For example, if you want to increase the maximum size to 10 MB, you would set message.max.bytes=10485760. Similarly, the producer and consumer also have corresponding settings (max.request.size for the producer and fetch.max.bytes for the consumer) that might need to be adjusted to handle larger messages.

Why Not Always Set a Large Size?

While it might be tempting to set a very high limit, be cautious. Large messages can strain the broker’s memory, storage and  network resources, leading to potential performance issues, as well as really quickly blindside you if during the night a producer starts pumping out dozens of 10MB messages per second, where it is usually only one per minute. It’s generally better to keep messages small and break down larger data into smaller parts.

Kafka Producer Examples

Producers are responsible for sending data (messages) to topics. Understanding how to properly configure and use a producer is crucial for efficiently sending data to a cluster.

Basic Kafka Producer Example

Here’s a simple example in Java that shows how to send a message to a topic:

import org.apache.kafka.clients.producer.KafkaProducer;

import org.apache.kafka.clients.producer.ProducerRecord;

import java.util.Properties;

public class SimpleProducer {

    public static void main(String[] args) {

        Properties props = new Properties();

        props.put("bootstrap.servers", "localhost:9092");

        props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");

        props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");

        KafkaProducer<String, String> producer = new KafkaProducer<>(props);

        producer.send(new ProducerRecord<>("my-topic", "key", "Hello world!"));

        producer.close();

    }

}

Explanation of the Code

In the example above:

This basic producer sends a single message, “Hello world!” to the topic “my-topic.” The producer is then closed to free up resources.

Advanced Producer Configurations

Kafka producers can be configured with various settings to optimize performance, such as acks for controlling the acknowledgment mechanism (e.g “none”, “broker for the leader partition copy” or “all broker copies”), and retries for handling transient failures. Tuning these settings allows you to control the trade-offs between throughput, latency, and reliability.

Consumer Groups in Kafka

Kafka consumer groups are a critical concept that allows multiple consumers to process data from a topic together. Consumers within a consumer group all use the same group id, which allows for horizontal scaling (run more consumers) opposed to just vertical scaling (add more resources to single consumer).

What is a Consumer Group?

A consumer group is a collection of consumers that coordinate to consume data from topics. Each consumer in the group reads data from one or more partitions of the topic. Kafka ensures that each partition is read by only one consumer in the group, providing load balancing and fault tolerance.

How Consumer Groups Work

When a consumer joins a group, Kafka assigns partitions to it. If a consumer leaves the group (either due to failure or manual shutdown), Kafka will reassign its partitions to the remaining consumers in the group. This ensures that the data is continuously processed even if some consumers go down.

Why Use Consumer Groups?

Consumer groups are essential for scaling the processing of data. By adding more consumers to a group, you can increase the processing throughput since more consumers can read from the topic’s partitions simultaneously.

Example of a Consumer in a Group

Here’s a simple example of a consumer:

import org.apache.kafka.clients.consumer.ConsumerRecords;

import org.apache.kafka.clients.consumer.KafkaConsumer;

import java.util.Collections;

import java.util.Properties;

public class SimpleConsumer {

    public static void main(String[] args) {

        Properties props = new Properties();

        props.put("bootstrap.servers", "localhost:9092");

        props.put("group.id", "my-group");

        props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");

        props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");

        KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);

        consumer.subscribe(Collections.singletonList("my-topic"));

        while (true) {

            ConsumerRecords<String, String> records = consumer.poll(100);

            records.forEach(record -> System.out.printf("Consumed record with key %s and value %s%n", record.key(), record.value()));

        }

    }

}

Explanation of the Code

The consumer subscribes to the topic “my-topic” and continuously polls the cluster for new records. Each record is then processed by the consumer.

Conclusion

Understanding maximum message size, how to effectively use producers, and the importance of consumer groups can significantly improve your ability to manage data streams efficiently. With these basics under your belt, you’re well on your way to mastering Kafka. Happy streaming!

Axual’s all-in-one Kafka platform

For those looking to simplify the implementation of Apache Kafka and optimize event streaming, Axual offers an effective platform. Axual provides a managed, secure, and scalable event streaming service that integrates seamlessly with existing microservices architectures. With Axual, you can focus on building your business logic while leveraging powerful tools for event processing, monitoring, and governance. Axual handles the complexities of Kafka. Enabling you to implement real-time data with ease, ensuring reliable, consistent, and scalable event delivery across your system.

Contact us

How to calculate Kafka average message size?

Metrics within Apache Kafka don’t expose the average message size. It is however, possible to calculate this value. It is important to base your calculation on the amount of messages that flow into the system, because 1 produced message can be consumed 0 to n times and skew the data, plus there is no “messages out” metric, only “bytes out”. First, for example, you need to enable the Prometheus Client to expose metrics to begin with. After you have done this you can expose the amount of messages In per second by using the code below

 kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec,topic=([-.\w]+)

Next, you can expose the amount of Bytes In per second by using the following code

kafkaT.server:type=BrokerTopicMetrics,name=BytesInPerSec,topic=([-.\w]+)

To calculate the average message size divide the amount of Bytes In per second by the amount of Messages In per second.

This calculation can be as fine grained as you desire, as the metric is enriched with the broker and topic names.

How to check Kafka message size?

The message size needs to be calculated by dividing the amount of messages per second by the Bytes per second. You do this by exposing metrics within your Client. You can expose the amount of messages per second by using the code below

 kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec,topic=([-.\w]+)

After this, you can expose the amount of Bytes per second by using the following code

kafkaT.server:type=BrokerTopicMetrics,name=BytesInPerSec,topic=([-.\w]+)

 

What is the max message size in Kafka

By default, the maximum message size is set to 1 MB (megabyte). This limit is set to ensure that the brokers can handle the load without directly running into memory or storage issues. You are able to change the maximum message size in the broker and topic configurations. the first use cases at Rabobank; Real-Time Financial Alerts and figured our organization was the go-to player within the real-time data space. Based on the requirements provided by CGI, Axual provided a reference architecture.

 

 

Other blogs

Apache Kafka 3 weeks ago

Use Case | Logius legacy modernization for Dutch government  

Logius, with CGI and Axual, modernizes Dutch government communication using a scalable Kafka platform for efficient, secure, and future-proof digital services, streamlining interactions between government, citizens, and businesses.

Jurre Robertus
Apache Kafka 3 weeks ago

Kafka Operator and linger.ms in Apache Kafka

Linger.ms in Kafka optimizes batch sending delays, balancing throughput and latency. Kafka Operators help manage this setting in Kubernetes, simplifying configuration and performance tuning for efficient data handling.

Jurre Robertus
Apache Kafka 3 weeks ago

Kafka for Government and citizen services

Apache Kafka transforms government services by enabling real-time data integration, modernizing legacy systems, and supporting smart cities, public administration, and citizen services for more efficient and responsive public sector operations.

Jurre Robertus

Apache Kafka is great, but what do you do
when great is not good enough?
See what Axual offers on top of Kafka.

Start your free trial
No credit card required