Apache Kafka Under The Hood | A Quick Hands On
If you had a chance to go through my previous post, you should have developed a good understanding of Kafka Architecture. But, how good is understanding a technology without a hands-on? So, we take a step forward and practically see how these architectural components interact & work under the hood. The hands on will cover the following points
- Cluster Setup
- Topic Creation
- Consumer
- Producer
At the end of this post, you will also be left with enough knowledge to setup your own working kafka cluster. So lets get started !!
Note : Throughout this post we will be using docker to demonstrate the hands-on. Make sure that docker is setup & running in your working environment. The code used for the hands on is available in this repository. Please make sure that you clone this repository in your working environment before you proceed further.
Cluster Setup
To make the cluster setup easy, we use docker-compose
. This way, the cluster can be launched using a single command, provided we specify all the dependencies/requirements in a docker-compose.yml
file. For this purpose, I created a docker-compose.yml
file with all the necessary configuration. Lets now go through the components used in the cluster setup.
Kafka Brokers
We use the bitnami kafka docker image for the setup. A kafka cluster is a collection of kafka brokers. In this working example, our kafka cluster contains two brokers broker-0
& broker-1
which can be seen in the docker-compose.yml
file.
While launching a broker, we need to provide a configuration file which determines the properties of the broker. The bitnami kafka docker image uses /opt/bitnami/kafka/conf/server.properties
file for setting up the broker. For the purpose of this hands on, we use custom configuration file for each broker. Here, we create two files broker-0.properties
& broker-1.properties
which serve as configuration for the brokers. Thanks to docker volumes, which helps to map these files in such a way that the bitnami kafka image uses these files for setting up the brokers.
Zookeeper
Kafka is built using zookeeper. Zookeeper is a distributed synchronization service used to manage a set of interconnected systems. Kafka uses zookeeper for coordination of brokers and also stores metadata of topics, partitions etc. The internal working of zookeeper is out of this blog’s scope. Since kafka is dependent on zookeeper, before starting the kafka cluster, we ensure that zookeeper is up and running. We use the bitnami zookeeper docker image to setup zookeeper. We can see from the docker-compose.yml
file that the kafka brokers broker-0
& broker-1
depend on zookeeper
.
Now that we have defined all the dependencies, its time to launch our kafka cluster. All that you have to do is, navigate to the cloned repository and execute the commands below
cd hands-on
sh cluster-start.sh
This should start the kafka cluster which comprises of the zookeeper and two kafka brokers. This can be verified by checking the list of running docker containers running the command below in a new terminal window.
docker container ls
CONTAINER ID | IMAGE | COMMAND | CREATED | STATUS | PORTS | NAMES |
---|---|---|---|---|---|---|
1c26dfb264db | bitnami/kafka:latest | “/entrypoint.sh /run…” | 7 seconds ago | Up 6 seconds | 9092/tcp, 0.0.0.0:9093->9093/tcp | broker-1 |
aa0f4f7073b1 | bitnami/kafka:latest | “/entrypoint.sh /run…” | 8 seconds ago | Up 7 seconds | 0.0.0.0:9092->9092/tcp | broker-0 |
96f169a92a19 | bitnami/zookeeper:latest | “/entrypoint.sh /run…” | 9 seconds ago | Up 8 seconds | 2888/tcp, 3888/tcp, 0.0.0.0:2181->2181/tcp, 8080/tcp | zookeeper |
You should see three containers(one zookeeper & two kafka) up and running as shown above. With this we have finished setting up the kafka cluster.
Topic Creation
Now that we have the cluster up and running we would like to see how it works. Before we start producing/consuming messages, we need a topic. A topic can be created in kafka by providing the topic name and some configuration details. You can find the list of topic configurations in apache kafka documentation. We create a topic by using only two configuration parameters namely --partitions
& --replication-factor
which indicate the number of partitions & number of replicas per partition respectively. The topic creation is carried out by execution of below command
docker run --network=hands-on_kafka-tier -ti bitnami/kafka:latest kafka-topics.sh --create --zookeeper zookeeper:2181 --replication-factor 2 --partitions 2 --topic cars
This command results in the creation of a topic named cars with two partitions having two replicas each. The details of the above created topic can be seen by the using command below
docker run --network=hands-on_kafka-tier -ti bitnami/kafka:latest kafka-topics.sh --zookeeper zookeeper:2181 --describe cars
Topic: cars PartitionCount: 2 ReplicationFactor: 2 Configs:
Topic: cars Partition: 0 Leader: 0 Replicas: 0,1 Isr: 0,1
Topic: cars Partition: 1 Leader: 1 Replicas: 1,0 Isr: 1,0
This output can be mapped as follows
Component | Description |
---|---|
Partition: 0 | 0th partition of topic |
Partition: 1 | 1st partition of topic |
Leader: 0 | broker-0 |
Leader: 1 | broker-1 |
Replicas | List of brokers(broker-0,broker-1) replicating this topic data |
In Sync Replicas(Isr) | List of brokers(broker-0,broker-1) that in sync with the leader |
As mentioned in the previous post, each partition is assigned a leader. From the above result we can see that for Partition: 0
the leader is broker-0
. Similarily for Partition: 1
the leader is broker-1
. For the sake of simplicity let us call Partition: 0
& Partition: 1
of cars
topic as cars-0
& cars-1
respectively.
Note : Due to the randomness in leader election, it might occur that the leader for
Partition: 0
is broker-1 and that ofPartition: 1
is broker-0. In that case the output & mapping will slightly differ.
Consumer
We first setup consumers subscribed to the above generated topic followed by writing messages to the topic. This way we get to see how each message is being processed by the consumers. We will be launching two consumer instances belonging to a consumer group car-group
& subscribe to the topic cars
. In this working example, we set up each consumer as a separate docker container. Now, open a new terminal window and enter the command below which should launch the first consumer instance with name consumer-0
.
docker run --network=hands-on_kafka-tier --name=consumer-0 -ti bitnami/kafka:latest kafka-console-consumer.sh --bootstrap-server broker-0:9092,broker-1:9093 --topic cars --consumer-property group.id=car-group
Inorder to launch the second consumer instance, open another terminal window and execute the same command with a different name say consumer-1
as shown below.
docker run --network=hands-on_kafka-tier --name=consumer-1 -ti bitnami/kafka:latest kafka-console-consumer.sh --bootstrap-server broker-0:9092,broker-1:9093 --topic cars --consumer-property group.id=car-group
Now that we have launched two consumers ready to process messages from cars
, lets check the status of these consumers through their consumer group.
docker run --network=hands-on_kafka-tier -ti bitnami/kafka:latest kafka-consumer-groups.sh --bootstrap-server broker-0:9092,broker-1:9093 --describe --group car-group
TOPIC | PARTITION | CURRENT-OFFSET | LOG-END-OFFSET | LAG | CONSUMER-ID | HOST | CLIENT-ID |
---|---|---|---|---|---|---|---|
cars | 1 | 0 | 0 | 0 | consumer-1-306edd7e-18b3-4502-9783-093b9b3597c9 | /172.27.0.5 | consumer-car-group-1 |
cars | 0 | 0 | 0 | 0 | consumer-1-0a67dcb0-8687-47af-9f3b-a6eb82a7bb4b | /172.27.0.6 | consumer-car-group-1 |
As discussed in the previous post, the partitions of a topic are equally distributed across consumers of a consumer group to allow parallel processing of messages in a topic. From the above result we can clearly see that one consumer is assigned to cars-1 and the other one is assigned to cars-0. We can also notice that both the consumers have their CURRENT-OFFSET as 0, which indicates that they have not yet processed any of the messages in their respective partitions.
Producer
Now that we have the topic setup and consumers subscribed to our topic, its time for some action. Let’s produce !! Open a new terminal window and enter the command below.
docker run --network=hands-on_kafka-tier --name=producer -ti bitnami/kafka:latest kafka-console-producer.sh --broker-list broker-0:9092,broker-1:9093 --topic cars
This should result in a command prompt where we can write messages to our topic. Let’s write our first message.
> Mercedes
Open the terminal windows where we launched the consumer instances. You can see this message is consumed by one of those consumers. Let us also verify this by checking the status of the consumers
docker run --network=hands-on_kafka-tier -ti bitnami/kafka:latest kafka-consumer-groups.sh --bootstrap-server broker-0:9092,broker-1:9093 --describe --group car-group
GROUP | TOPIC | PARTITION | CURRENT-OFFSET | LOG-END-OFFSET | LAG | CONSUMER-ID | HOST | CLIENT-ID |
---|---|---|---|---|---|---|---|---|
car-group | cars | 1 | 0 | 0 | 0 | consumer-car-group-1-3cf14661-c5cc-43f6-8320-88ee3381e658 | /172.27.0.5 | consumer-car-group-1 |
car-group | cars | 0 | 1 | 1 | 0 | consumer-car-group-1-32e408ce-6ba3-4506-92e9-5f4827d083fe | /172.27.0.6 | consumer-car-group-1 |
We can see that the CURRENT-OFFSET
of consumer reading from cars-0 is 1. This indicates that it has processed the above message. Let’s write another message by going to the terminal window which is running the producer instance.
> BMW
From the terminal windows running the consumer instances, we can see that this message is processed by the other consumer. Let us verify this by checking the status of the consumers again.
docker run --network=hands-on_kafka-tier -ti bitnami/kafka:latest kafka-consumer-groups.sh --bootstrap-server broker-0:9092,broker-1:9093 --describe --group car-group
GROUP | TOPIC | PARTITION | CURRENT-OFFSET | LOG-END-OFFSET | LAG | CONSUMER-ID | HOST | CLIENT-ID |
---|---|---|---|---|---|---|---|---|
car-group | cars | 1 | 1 | 1 | 0 | consumer-car-group-1-3cf14661-c5cc-43f6-8320-88ee3381e658 | /172.22.0.5 | consumer-car-group-1 |
car-group | cars | 0 | 1 | 1 | 0 | consumer-car-group-1-32e408ce-6ba3-4506-92e9-5f4827d083fe | /172.22.0.5 | consumer-car-group-1 |
The CURRENT-OFFSET
of consumer reading from cars-1 is also 1 which indicates that it has processed the message.
From the above results, we can clearly see that the messages are equally shared among the consumers each reading from their respective partitions. This also indicates that the messages of the topic are equally distributed across partitions.
Chaos Engineering
What if the broker goes down?
So far so good. But, what if one of our brokers goes down? Such situations are very common in production environments. Since kafka is known for its fault tolerant behaviour, let’s take one of our brokers down & see how kafka handles the situation.
docker stop broker-0
This command stops broker-0
which can be verified by docker container ls
command. Now lets check the status of our cars
topic.
docker run --network=hands-on_kafka-tier -ti bitnami/kafka:latest kafka-topics.sh --zookeeper zookeeper:2181 --describe cars
Topic: cars PartitionCount: 2 ReplicationFactor: 2 Configs:
Topic: cars Partition: 0 Leader: 1 Replicas: 0,1 Isr: 1
Topic: cars Partition: 1 Leader: 1 Replicas: 1,0 Isr: 1
By comparing this with the previous status of topic cars, we can clearly see that for Partition: 0, the leader has changed from Leader: 0
to Leader: 1
i.e broker-0 to broker-1. Since, broker-0 is down, it is not in sync with the new leader(broker-1) and therefore the Isr
is changed from 0,1
to 1
. Thus the cluster stays unaffected even after one of the brokers is down, with the other broker acting as the sole leader for both the partitions. You can further check this by writing more messages and verify that they are getting processed by the consumers.
What if one of the consumers go down?
Well, we saw kafka cluster being unaffected after one of the brokers went down. Now, lets add more chaos to the system. We now take down one of the consumers and see whether kafka can handle this failure.
docker stop consumer-0
This command stops consumer-0
which can be verified by docker container ls
command. Now lets check the status of the consumers.
docker run --network=hands-on_kafka-tier -ti bitnami/kafka:latest kafka-consumer-groups.sh --bootstrap-server broker-1:9093 --describe --group car-group
GROUP | TOPIC | PARTITION | CURRENT-OFFSET | LOG-END-OFFSET | LAG | CONSUMER-ID | HOST | CLIENT-ID |
---|---|---|---|---|---|---|---|---|
car-group | cars | 1 | 1 | 1 | 0 | consumer-car-group-1-32e408ce-6ba3-4506-92e9-5f4827d083fe | /172.22.0.5 | consumer-car-group-1 |
car-group | cars | 0 | 1 | 1 | 0 | consumer-car-group-1-32e408ce-6ba3-4506-92e9-5f4827d083fe | /172.22.0.5 | consumer-car-group-1 |
From the above result we can see that both the partitons are assigned to only one consumer. This clearly shows that kafka rebalances the consumers across the partitions to ensure that the consumer group reads all the messages of the partition at any point of time. You can try and verify the working of the cluster by writing some more messages and see them getting processed by the working consumer.
Both the above demonstrations indicate the fault tolerant behaviour of kafka.
Conclusion
This blog tried to give a practical explanation about interactions between architectural components within a Kafka cluster. In addition to this the hands-on also demonstrated the fault tolerant behaviour of kafka. Hopefully this gave you a good understanding on how things work under the hood. Also, you now have enough knowledge to setup your own kafka cluster. Please do share your feedback in the comments section below. In the upcoming blog post we’ll be going over the internals of Apache Kafka. Stay tuned & Happy Coding !!