Streaming Data with Kafka in Python: A small walkthrough
Kafka is a powerful message streaming platform that allows for the efficient exchange of data between systems and applications.
In this blog post, we will walk through the process of setting up a Kafka environment, creating a topic, and creating both a consumer and producer in Python.
Setting up a Kafka environment:
To get started, you will need to have Docker installed on your machine.
https://docs.docker.com/desktop/install/windows-install/
The code in the compose file sets up the Kafka and Zookeeper images, along with necessary environment variables for the broker and zookeeper.
Next we can check if the containers are running with : docker ps
Creating a topic:
Topics are used to partition data streams and provide a way for multiple systems to read and write to the same data stream.
To create a topic we use this code:
We specify the number of partitions and replication factor for the topic.
Partitions are used to split the data stream into smaller chunks for more efficient processing, while the replication factor controls how many copies of each partition are kept to ensure data durability.
If we run the code now, the topic gets created.
Creating a consumer:
The consumer is responsible for fetching the data from the Kafka cluster.
The consumer subscribes to the topic we created and continuously polls for new messages.
In this example, we are printing the consumed messages, but in a real-world scenario, these messages can be processed and acted upon by the consuming application.
By setting the auto offset reset to 'earliest', consumer will start reading the messages from the beginning, otherwise, it will start reading from the latest message.
Now we can start our consumer which will wait for incoming data.
Creating a producer:
A producer is responsible for sending data to the Kafka cluster.
The producer is continuously sending messages to the topic, which can be consumed by the consumer.
Lets have a look what is happening after start of the script.
The messages are beeing sent to our Kafka cluster. Nice!
But lets check the consumer now, if he got any data.
The consumer fetched 20 messages with the random data.
Mission accomplished!
Overall, this is a basic example of how to use Kafka for message streaming with Python.
There are many more advanced features and configurations that can be applied to a Kafka environment, but this should give you a good starting point for working with Kafka and Python.
Thank you for reading.
Greetings
Patrick :)
Comments
Post a Comment