Introduction

At the heart of every Kafka system lie Kafka topics. Topics are categories or feeds under which messages or data records are published and stored. Understanding their creation and configuration is paramount to leveraging Kafka effectively. In this post, we dive deep into the anatomy of a Kafka topic, dissecting the process of creating and configuring topics for optimal performance.

Kafka Topic Basics

A Kafka topic consists of one or more partitions. Each partition is an ordered, immutable sequence of records continually appended to a structured log. The records in the partitions are each assigned a sequential ID number called the offset.

1. Creating a Kafka Topic

The kafka-topics.sh script can be used to create a topic. Here’s a basic command to create a topic:

Bash
kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic my-first-topic

In this command, --create is used to indicate the creation of a new topic, --bootstrap-server specifies the Kafka server, --replication-factor denotes the number of times the topic should be replicated, --partitions is the number of partitions for the topic, and --topic is the name of the topic.

2. Describing a Kafka Topic

You can get details about a topic using the --describe option.

Bash
kafka-topics.sh --describe --bootstrap-server localhost:9092 --topic my-first-topic

This command displays information about the my-first-topic topic, including its partition count, replication factor, and the in-sync replica for each partition.

Kafka Topic Configuration

There are numerous configurations for Kafka topics that you can set to meet your specific requirements.

3. Changing the Number of Partitions

You can change the number of partitions for a topic using the --alter option with the --partitions flag:

Bash
kafka-topics.sh --alter --bootstrap-server localhost:9092 --topic my-first-topic --partitions 3

This command increases the number of partitions for my-first-topic to 3.

4. Configuring the Replication Factor

While you can’t change the replication factor of a topic directly once it’s created, you can do it indirectly using partition reassignment:

Bash
kafka-reassign-partitions.sh --generate --bootstrap-server localhost:9092 --topics-to-move-json-file topics-to-move.json --broker-list "1,2" --output-dir .

This command generates a reassignment configuration for the topics in topics-to-move.json to the brokers specified in the --broker-list, which can increase or decrease the replication factor.

5. Setting Topic-Level Configurations

Kafka provides various topic-level configurations like retention.ms, max.message.bytes, compression.type, etc. You can alter these using the --config option:

Bash
kafka-configs.sh --alter --bootstrap-server localhost:9092 --entity-type topics --entity-name my-first-topic --add-config max.message.bytes=128000

This command sets the maximum size of a message that the topic my-first-topic can receive to 128000 bytes.

6. Removing Topic-Level Configurations

You can remove any added configuration using the --delete-config option:

Bash
kafka-configs.sh --alter --bootstrap-server localhost:9092 --entity-type topics --entity-name my-first-topic --delete-config max.message.bytes

This command removes the max.message.bytes configuration from my-first-topic.

7. Checking Topic-Level Configurations

To check the current configurations for a topic, use the --describe option:

Bash
kafka-configs.sh --describe --bootstrap-server localhost:9092 --entity-type topics --entity-name my-first-topic

Advanced Topic Configuration

Apart from the basic configurations, there are several advanced configurations that can help optimize your Kafka topics based on your use case.

8. Configuring Log Compaction

Log compaction is a mechanism that gives a guarantee on the availability of a minimum set of data in Kafka. Here’s how to enable it:

Bash
kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic compacted-topic --config "cleanup.policy=compact"

This command creates a topic with log compaction enabled, meaning Kafka will retain at least the last known value for each message key.

9. Controlling the Rate of Message Insertion

To prevent a Kafka topic from being overwhelmed by too many messages, you can use the quota.producer.default configuration:

Bash
# server.properties
quota.producer.default=10485760

This configuration limits the number of bytes that producers can write to the Kafka cluster to 10MB per second.

10. Controlling the Rate of Message Consumption

Similarly, you can also control the rate of message consumption using the quota.consumer.default

configuration:

Bash
# server.properties
quota.consumer.default=10485760

This configuration limits the number of bytes that consumers can read from the Kafka cluster to 10MB per second.

Conclusion

Understanding Kafka topic creation and configuration is essential for developing efficient Kafka-based systems. Topics are more than just named channels for sending and receiving messages – their configurations can significantly impact the performance and reliability of your Kafka system.

Through this post, you have seen how to create, describe, alter, and configure Kafka topics, alongside some advanced settings to optimize topic behavior. These techniques will empower you to design and manage Kafka topics with confidence, and ultimately build more robust Kafka-based applications and data pipelines. The true power of Kafka lies in its flexibility, and this flexibility is partly realized through a detailed understanding of topics and their configurations. Happy Kafka-ing!