Managing topics, partitions, and offsets is a crucial aspect of working with Apache Kafka. Topics represent the categories or streams of data, partitions enable parallel processing of data within topics, and offsets keep track of the progress of consumers within partitions. In this topic, we will explore various techniques and code samples for managing topics, partitions, and offsets in Apache Kafka.

  1. Creating and Configuring Topics:
    We will cover how to create topics and configure their properties such as replication factor, number of partitions, and retention policies.

Code Sample 1: Creating a Topic using Kafka CLI

Bash
$ kafka-topics.sh --create --bootstrap-server localhost:9092 --topic my-topic --partitions 3 --replication-factor 1
  1. Listing and Describing Topics:
    We will learn how to list all the topics in a Kafka cluster and retrieve detailed information about a specific topic.

Code Sample 2: Listing Topics using Kafka CLI

Bash
$ kafka-topics.sh --list --bootstrap-server localhost:9092

Code Sample 3: Describing a Topic using Kafka CLI

Bash
$ kafka-topics.sh --describe --bootstrap-server localhost:9092 --topic my-topic
  1. Managing Partitions:
    We will explore techniques for managing partitions, such as increasing or decreasing the number of partitions, and understanding the impact of partition changes on data distribution and parallelism.

Code Sample 4: Altering Partition Count of a Topic using Kafka CLI

Bash
$ kafka-topics.sh --alter --bootstrap-server localhost:9092 --topic my-topic --partitions 5
  1. Working with Offsets:
    We will cover how to work with offsets, including setting consumer offsets manually, committing offsets, and resetting offsets to a specific position.

Code Sample 5: Manually Committing Consumer Offsets in Java

Java<span role="button" tabindex="0" data-code="Properties props = new Properties(); props.put("bootstrap.servers", "localhost:9092"); props.put("group.id", "my-consumer-group"); KafkaConsumer<string, String> consumer = new KafkaConsumer<>(props); consumer.subscribe(Collections.singletonList("my-topic")); try { while (true) { ConsumerRecords<string, String> records = consumer.poll(Duration.ofMillis(100)); for (ConsumerRecord
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("group.id", "my-consumer-group");

KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Collections.singletonList("my-topic"));

try {
    while (true) {
        ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));
        for (ConsumerRecord<String, String> record : records) {
            // Process the record

            // Manually commit the offset
            consumer.commitSync(Collections.singletonMap(
                new TopicPartition(record.topic(), record.partition()),
                new OffsetAndMetadata(record.offset() + 1)
            ));
        }
    }
} finally {
    consumer.close();
}

Reference Link: Apache Kafka Documentation – Managing Topics – https://kafka.apache.org/documentation/#topics

Helpful Video: “Apache Kafka for Beginners – Managing Topics, Partitions, and Offsets” by Learn with Sumit – https://www.youtube.com/watch?v=NclY-y7ZzII

Conclusion:

Managing topics, partitions, and offsets is essential for effectively working with Apache Kafka. By utilizing the provided code samples, administrators and developers can create and configure topics, list and describe topics, manage partitions, and work with offsets. Understanding these concepts and techniques is crucial for optimizing data distribution, ensuring parallel processing, and tracking the progress of consumers within Kafka.

The reference link to Kafka’s documentation and the suggested video resource provide additional insights and guidance for managing topics, partitions, and offsets in Kafka. By mastering these management techniques, users can efficiently organize and control the flow of data within Kafka clusters, enabling reliable and scalable real-time data streaming.