In this section, we will explore the architecture of Apache Kafka, understanding its key components such as brokers, topics, partitions, and replication. Having a solid understanding of Kafka’s architecture is crucial for building scalable and fault-tolerant real-time data streaming applications.

  1. Brokers: Kafka runs as a cluster of servers called brokers. Brokers are responsible for handling incoming data streams from producers, storing the data in partitions, and serving it to consumers. Each broker in the cluster can handle a subset of the data and client requests.
  2. Topics: In Kafka, data streams are organized into topics. A topic is a category or feed name to which records are published by producers. Topics can be thought of as logical containers for organizing related data streams. For example, a topic could be “sensor_data” or “user_logs.”
  3. Partitions: Each topic in Kafka is divided into one or more partitions. Partitions allow data to be distributed and processed in parallel across multiple brokers. Each partition is an ordered and immutable sequence of records. The number of partitions in a topic determines the parallelism and throughput of data processing.
  4. Replication: Kafka provides a replication mechanism for fault tolerance and data durability. Each partition in Kafka has a configurable number of replicas, which are copies of the partition. Replicas are distributed across different brokers to ensure that data is replicated and available even in the event of broker failures.

Code Sample:

To illustrate the concepts of brokers, topics, partitions, and replication, consider the following code examples:

Creating a Kafka Topic (Command Line):

Bash
bin/kafka-topics.sh --create --topic my_topic --bootstrap-server localhost:9092 --partitions 3 --replication-factor 2

Producing Data to a Kafka Topic (Java):

Java<span role="button" tabindex="0" data-code="import org.apache.kafka.clients.producer.*; import java.util.Properties; public class KafkaProducerExample { public static void main(String[] args) { // Configure Kafka producer Properties props = new Properties(); props.put("bootstrap.servers", "localhost:9092"); props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer"); props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer"); // Create Kafka producer Producer<string, String> producer = new KafkaProducer<>(props); // Produce a sample record ProducerRecord<string, String> record = new ProducerRecord
import org.apache.kafka.clients.producer.*;

import java.util.Properties;

public class KafkaProducerExample {

    public static void main(String[] args) {
        // Configure Kafka producer
        Properties props = new Properties();
        props.put("bootstrap.servers", "localhost:9092");
        props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
        props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");

        // Create Kafka producer
        Producer<String, String> producer = new KafkaProducer<>(props);

        // Produce a sample record
        ProducerRecord<String, String> record = new ProducerRecord<>("my_topic", "my_key", "Hello, Kafka!");
        producer.send(record, (metadata, exception) -> {
            if (exception == null) {
                System.out.println("Message sent successfully. Offset: " + metadata.offset());
            } else {
                System.out.println("Failed to send message: " + exception.getMessage());
            }
        });

        // Close the producer
        producer.close();
    }
}

Consuming Data from a Kafka Topic (Java):

Java<span role="button" tabindex="0" data-code="import org.apache.kafka.clients.consumer.*; import org.apache.kafka.common.TopicPartition; import java.util.Collections; import java.util.Properties; public class KafkaConsumerExample { public static void main(String[] args) { // Configure Kafka consumer Properties props = new Properties(); props.put("bootstrap.servers", "localhost:9092"); props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); props.put("group.id", "my_consumer_group"); // Create Kafka consumer Consumer<string, String> consumer = new KafkaConsumer<>(props); // Subscribe to a topic consumer.subscribe(Collections.singleton("my_topic")); // Consume records while (true) { ConsumerRecords<string, String> records = consumer.poll(1000); for (ConsumerRecord
import org.apache.kafka.clients.consumer.*;
import org.apache.kafka.common.TopicPartition;

import java.util.Collections;
import java.util.Properties;

public class KafkaConsumerExample {

    public static void main(String[] args) {
        // Configure Kafka consumer
        Properties props = new Properties();
        props.put("bootstrap.servers", "localhost:9092");
        props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
        props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
        props.put("group.id", "my_consumer_group");

        // Create Kafka consumer
        Consumer<String, String> consumer = new KafkaConsumer<>(props);

        // Subscribe to a topic


        consumer.subscribe(Collections.singleton("my_topic"));

        // Consume records
        while (true) {
            ConsumerRecords<String, String> records = consumer.poll(1000);
            for (ConsumerRecord<String, String> record : records) {
                System.out.println("Received message: " + record.value() + ", Offset: " + record.offset());
            }
        }
    }
}

Reference Links:

  • Apache Kafka documentation on architecture: link
  • Apache Kafka replication and fault tolerance: link

Helpful Video:

  • “Apache Kafka Architecture Explained” by Confluent: link

Conclusion:
In this module, we have explored the core components of Apache Kafka’s architecture, including brokers, topics, partitions, and replication. We have seen that brokers handle the incoming data streams, topics organize related data streams, partitions allow for parallel processing, and replication provides fault tolerance and data durability. Through the provided code examples, we have learned how to create Kafka topics, produce data to topics, and consume data from topics using the Kafka Java API.

Understanding Kafka’s architecture is essential for designing and building scalable and fault-tolerant real-time data streaming applications. With this knowledge, you are now equipped to harness the power of Kafka’s distributed streaming platform and leverage its capabilities to handle high-volume, high-velocity data streams in real-time.