August 23, 2024

Understanding Kafka: Message Size, Producer Examples, and Consumer Groups

Understanding Kafka can seem challenging, but in this blog, we simplify the concepts of Kafka’s maximum message size, how to use Kafka producers, and what consumer groups do. Ideal for beginners and those looking to expand their knowledge.

On this page

Understanding Kafka: Message Size, Producer Examples, and Consumer Groups

Apache Kafka is a powerful tool for handling real-time data streams, and understanding its components can greatly enhance your ability to manage and process data efficiently. In this blog, we’ll break down three essential aspects of the streaming framework: maximum message size, Kafka producers, and consumer groups. Let’s dive in!

Kafka Maximum Message Size

One of the key parameters you need to understand is the maximum message size. This is the largest message that Kafka will allow a producer to send to a topic.

What is the Default Maximum Message Size?

By default, the maximum message size is set to 1 MB (megabyte). This limit is set to ensure that the brokers can handle the messages without running into memory issues. However, depending on your use case, you might need to send larger messages.

How to Increase the Maximum Message Size

If you need to increase this limit, you can adjust the message.max.bytes setting on the broker. For example, if you want to increase the maximum size to 10 MB, you would set message.max.bytes=10485760. Similarly, the producer and consumer also have corresponding settings (max.request.size for the producer and fetch.max.bytes for the consumer) that might need to be adjusted to handle larger messages.

Why Not Always Set a Large Size?

While it might be tempting to set a very high limit, be cautious. Large messages can strain the broker’s memory, storage and  network resources, leading to potential performance issues, as well as really quickly blindside you if during the night a producer starts pumping out dozens of 10MB messages per second, where it is usually only one per minute. It’s generally better to keep messages small and break down larger data into smaller parts.

Kafka Producer Examples

Producers are responsible for sending data (messages) to topics. Understanding how to properly configure and use a producer is crucial for efficiently sending data to a cluster.

Basic Kafka Producer Example

Here’s a simple example in Java that shows how to send a message to a topic:

import org.apache.kafka.clients.producer.KafkaProducer;

import org.apache.kafka.clients.producer.ProducerRecord;

import java.util.Properties;

public class SimpleProducer {

   public static void main(String[] args) {

       Properties props = new Properties();

       props.put("bootstrap.servers", "localhost:9092");

       props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");

       props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");

       KafkaProducer<String, String> producer = new KafkaProducer<>(props);

       producer.send(new ProducerRecord<>("my-topic", "key", "Hello world!"));

       producer.close();

   }

}

Explanation of the Code

In the example above:

  • Bootstrap Servers: This tells the producer where the Kafka broker is located.
  • Key and Value Serializer: These convert the key and value to bytes so that they can be sent to Kafka.

This basic producer sends a single message, “Hello world!” to the topic “my-topic.” The producer is then closed to free up resources.

Advanced Producer Configurations

Kafka producers can be configured with various settings to optimize performance, such as acks for controlling the acknowledgment mechanism (e.g “none”, “broker for the leader partition copy” or “all broker copies”), and retries for handling transient failures. Tuning these settings allows you to control the trade-offs between throughput, latency, and reliability.

Consumer Groups in Kafka

Kafka consumer groups are a critical concept that allows multiple consumers to process data from a topic together. Consumers within a consumer group all use the same group id, which allows for horizontal scaling (run more consumers) opposed to just vertical scaling (add more resources to single consumer).

What is a Consumer Group?

A consumer group is a collection of consumers that coordinate to consume data from topics. Each consumer in the group reads data from one or more partitions of the topic. Kafka ensures that each partition is read by only one consumer in the group, providing load balancing and fault tolerance.

How Consumer Groups Work

When a consumer joins a group, Kafka assigns partitions to it. If a consumer leaves the group (either due to failure or manual shutdown), Kafka will reassign its partitions to the remaining consumers in the group. This ensures that the data is continuously processed even if some consumers go down.

Why Use Consumer Groups?

Consumer groups are essential for scaling the processing of data. By adding more consumers to a group, you can increase the processing throughput since more consumers can read from the topic’s partitions simultaneously.

Example of a Consumer in a Group

Here’s a simple example of a consumer:

import org.apache.kafka.clients.consumer.ConsumerRecords;

import org.apache.kafka.clients.consumer.KafkaConsumer;

import java.util.Collections;

import java.util.Properties;

public class SimpleConsumer {

   public static void main(String[] args) {

       Properties props = new Properties();

       props.put("bootstrap.servers", "localhost:9092");

       props.put("group.id", "my-group");

       props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");

       props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");

       KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);

       consumer.subscribe(Collections.singletonList("my-topic"));

       while (true) {

           ConsumerRecords<String, String> records = consumer.poll(100);

           records.forEach(record -> System.out.printf("Consumed record with key %s and value %s%n", record.key(), record.value()));

       }

   }

}

Explanation of the Code

  • Group ID: This identifies the consumer group to which this consumer belongs.
  • Deserializers: These convert the bytes back into their original format (e.g., String).

The consumer subscribes to the topic “my-topic” and continuously polls the cluster for new records. Each record is then processed by the consumer.

Conclusion

Understanding maximum message size, how to effectively use producers, and the importance of consumer groups can significantly improve your ability to manage data streams efficiently. With these basics under your belt, you’re well on your way to mastering Kafka. Happy streaming!

Axual’s all-in-one Kafka platform

For those looking to simplify the implementation of Apache Kafka and optimize event streaming, Axual offers an effective platform. Axual provides a managed, secure, and scalable event streaming service that integrates seamlessly with existing microservices architectures. With Axual, you can focus on building your business logic while leveraging powerful tools for event processing, monitoring, and governance. Axual handles the complexities of Kafka. Enabling you to implement real-time data with ease, ensuring reliable, consistent, and scalable event delivery across your system.

Contact us

Table name
Lorem ipsum
Lorem ipsum
Lorem ipsum

Answers to your questions about Axual’s All-in-one Kafka Platform

Are you curious about our All-in-one Kafka platform? Dive into our FAQs
for all the details you need, and find the answers to your burning questions.

What are message sizes in Kafka?

In Apache Kafka, message size refers to the size of the individual messages that are produced, transmitted, and consumed within the Kafka ecosystem. Understanding message sizes is crucial for performance tuning, resource allocation, and efficient system design.

What are Kafka Consumer Groups?

Kafka Consumer Groups are a fundamental concept in Apache Kafka that allows for the scalable consumption of messages from Kafka topics. They enable multiple consumers to work together to read messages from a topic, ensuring that each message is processed only once by a single consumer within the group.

Rachel van Egmond
Senior content lead

Related blogs

View all
Rachel van Egmond
November 19, 2024
Optimizing Healthcare Integration with Kafka at NHN | Use case
Optimizing Healthcare Integration with Kafka at NHN | Use case

Norsk Helsenett (NHN) is revolutionizing Norway's fragmented healthcare landscape with a scalable Kafka ecosystem. Bridging 17,000 organizations ensures secure, efficient communication across hospitals, municipalities, and care providers.

Apache Kafka Use Cases
Apache Kafka Use Cases
Richard Bosch
November 12, 2024
Understanding Kafka Connect
Understanding Kafka Connect

Apache Kafka has become a central component of modern data architectures, enabling real-time data streaming and integration across distributed systems. Within Kafka’s ecosystem, Kafka Connect plays a crucial role as a powerful framework designed for seamlessly moving data between Kafka and external systems. Kafka Connect provides a standardized, scalable approach to data integration, removing the need for complex custom scripts or applications. For architects, product owners, and senior engineers, Kafka Connect is essential to understand because it simplifies data pipelines and supports low-latency, fault-tolerant data flow across platforms. But what exactly is Kafka Connect, and how can it benefit your architecture?

Apache Kafka
Apache Kafka
Richard Bosch
November 1, 2024
Kafka Topics and Partitions - The building blocks of Real Time Data Streaming
Kafka Topics and Partitions - The building blocks of Real Time Data Streaming

Apache Kafka is a powerful platform for handling real-time data streaming, often used in systems that follow the Publish-Subscribe (Pub-Sub) model. In Pub-Sub, producers send messages (data) that consumers receive, enabling asynchronous communication between services. Kafka’s Pub-Sub model is designed for high throughput, reliability, and scalability, making it a preferred choice for applications needing to process massive volumes of data efficiently. Central to this functionality are topics and partitions—essential elements that organize and distribute messages across Kafka. But what exactly are topics and partitions, and why are they so important?

Event Streaming
Event Streaming