Introduction
Apache Kafka is an event streaming platform designed for high-throughput, fault-tolerant, and low-latency event streaming. One of its most remarkable features is its ability to integrate with a multitude of other data systems, both as a source and as a sink. That’s where Kafka Connect comes in.
Kafka Connect is a framework that provides scalable and resilient integration between Kafka and other data systems. With its pluggable connector architecture, it can transport data into and out of Kafka without writing additional code. This blog post explores how to leverage Kafka Connect’s power, effectively bridging the gap between Kafka and your other data systems.
Kafka Connect Architecture
Before we delve into code, it’s crucial to understand Kafka Connect’s architecture. It operates in two modes: Standalone and Distributed.
- Standalone mode is ideal for testing, while
- Distributed mode is used for production as it provides scalability and fault tolerance.
1. Starting Kafka Connect
To start Kafka Connect, you’ll need to specify either standalone or distributed mode. Here’s how to start it in standalone mode:
connect-standalone.sh connect-standalone.properties file-source.properties file-sink.propertiesIn this command, connect-standalone.properties specifies the Kafka Connect runtime configuration, and file-source.properties and file-sink.properties specify the configurations for the connectors.
2. Configuration for Standalone Mode
Let’s consider an example of a standalone mode configuration:
# connect-standalone.properties
bootstrap.servers=localhost:9092
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=true
value.converter.schemas.enable=trueIn this configuration, we define the Kafka bootstrap servers and the converters for the key and value.
Source Connectors
Source connectors import data from another system into Kafka. They are responsible for splitting the data into records and writing them to Kafka.
3. Configuring a File Source Connector
The following configuration describes a simple file source connector:
# file-source.properties
name=local-file-source
connector.class=org.apache.kafka.connect.file.FileStreamSourceConnector
tasks.max=1
file=test.txt
topic=test-topicThis connector reads from test.txt and writes to test-topic in Kafka.
4. Verifying Source Connector
You can verify whether data has been produced to Kafka using a console consumer:
kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test-topic --from-beginningSink Connectors
Sink connectors export data from Kafka to another system. They read records from Kafka topics and convert them into a format suitable for the destination system.
5. Configuring a File Sink Connector
Here’s a basic configuration for a file sink connector:
# file-sink.properties
name=local-file-sink
connector.class=org.apache.kafka.connect.file.FileStreamSinkConnector
tasks.max=1
file=test.sink.txt
topics=test-topicThis connector reads from test-topic in Kafka and writes to test.sink.txt.
6. Verifying Sink Connector
The sink file test.sink.txt should now contain the records consumed from test-topic.
Distributed Mode
For scalable and fault-tolerant production deployments, Kafka Connect should be run in distributed mode.
7. Starting Kafka Connect in Distributed Mode
connect-distributed.sh connect-distributed.propertiesThis command starts Kafka Connect in distributed mode, using the configuration specified in connect-distributed.properties.
8. Configuring a Connector in Distributed Mode
Connectors in distributed mode are configured via a REST API. The following curl command configures a file source connector:
curl -X POST -H "Content-Type: application/json" --data '{"name": "distributed-file-source", "config": {"connector.class":"org.apache.kafka.connect.file.FileStreamSourceConnector", "tasks.max":"1", "file":"test.txt", "topic":"test-topic-distributed"}}' http://localhost:8083/connectors9. Checking Connector Status
You can check the status of a connector using the Connect REST API:
curl http://localhost:8083/connectors/distributed-file-source/status10. Deleting a Connector
If you need to delete a connector, you can do so with the following command:
curl -X DELETE http://localhost:8083/connectors/distributed-file-sourceConclusion
Kafka Connect represents a robust, scalable solution to link Kafka with other data systems. With its standardized API, pre-built connectors, and distributed nature, it makes data ingestion and export tasks manageable and efficient.
Understanding Kafka Connect is a significant step towards leveraging Kafka’s full potential. The ability to connect to a variety of data sources and sinks turns Kafka into a central hub for your real-time data pipelines. The examples provided in this blog post should help you get started with creating your connectors in both standalone and distributed modes. Happy streaming!
Subscribe to our email newsletter to get the latest posts delivered right to your email.
