What is Apache Kafka?
Apache Kafka is a community distributed event streaming platform capable of handling trillions of events a day. Initially conceived as a messaging queue, Kafka is based on an abstraction of a distributed commit log.
Installation
You must have at least 4GB of ram in your Ubuntu VM.
Step 1. Creating a User for Kafka
Because Kafka can handle requests over a network, your first step is to create a dedicated user for the service. This minimizes damage to your Ubuntu machine in the event that someone compromises the Kafka server. We will create a dedicated kafka user in this step.
Logged in as your non-root sudo user, create a user called kafka:
sudo adduser kafka
Follow the prompts to set a password and create the kafka user.
Next, add the kafka user to the sudo group with the adduser command. You need these privileges to install Kafka’s dependencies:
sudo adduser kafka sudo
Your kafka
user is now ready. Log into the account using su
:
su -l kafka
Now that you’ve created a Kafka-specific user, you are ready to download and extract the Kafka binaries.
Step 2. Downloading and Extracting the Kafka Binaries
Let’s download and extract the Kafka binaries into dedicated folders in our kafka user’s home directory.
To start, create a directory in /home/kafka called Downloads to store your downloads:
mkdir ~/Downloads
Use curl
to download the Kafka binaries:
curl "https://downloads.apache.org/kafka/2.6.2/kafka_2.13-2.6.2.tgz" -o ~/Downloads/kafka.tgz
Create a directory called kafka and change to this directory. This will be the base directory of the Kafka installation:
mkdir ~/kafka && cd ~/kafka
Extract the archive you downloaded using the tar
command:
tar -xvzf ~/Downloads/kafka.tgz --strip 1
We specify the --strip 1
flag to ensure that the archive’s contents are extracted in ~/kafka/ itself and not in another directory (such as ~/kafka/kafka_2.13-2.6.0/
) inside of it.
Now that we’ve downloaded and extracted the binaries successfully, we can start configuring our Kafka server.
Step 3. Configuring the Kafka Server
Kafka’s default behavior will not allow you to delete a topic. A Kafka topic is the category, group, or feed name to which messages can be published. To modify this, you must edit the configuration file.
Kafka’s configuration options are specified in server.properties. Open this file with nano or your favorite editor:
nano ~/kafka/config/server.properties
First, add a setting that will allow us to delete Kafka topics. Add the following to the bottom of the file:
delete.topic.enable = true
Second, change the directory where the Kafka logs are stored by modifying the logs.dir property:
log.dirs=/home/kafka/logs
Save and close the file. Now that you’ve configured Kafka, your next step is to create systemd unit files for running and enabling the Kafka server on startup.
Step 4. Creating Systemd Unit Files and Starting the Kafka Server
In this section, you will create systemd unit files for the Kafka service. This will help you perform common service actions such as starting, stopping, and restarting Kafka in a manner consistent with other Linux services.
Zookeeper is a service that Kafka uses to manage its cluster state and configurations. It is used in many distributed systems. If you would like to know more about it, visit the official Zookeeper docs.
Create the unit file for zookeeper:
sudo nano /etc/systemd/system/zookeeper.service
Enter the following unit definition into the file:
[Unit]
Requires=network.target remote-fs.target
After=network.target remote-fs.target
[Service]
Type=simple
User=kafka
ExecStart=/home/kafka/kafka/bin/zookeeper-server-start.sh /home/kafka/kafka/config/zookeeper.properties
ExecStop=/home/kafka/kafka/bin/zookeeper-server-stop.sh
Restart=on-abnormal
[Install]
WantedBy=multi-user.target
The [Unit]
section specifies that this unit file depends on zookeeper.service. This will ensure that zookeeper gets started automatically when the kafka service starts.
The [Service] section specifies that systemd should use the kafka-server-start.sh and kafka-server-stop.sh shell files for starting and stopping the service. It also specifies that Kafka should be restarted if it exits abnormally.
Now that you have defined the units, start Kafka with the following command:
sudo systemctl start kafka
To ensure that the server has started successfully, check the journal logs for the kafka unit:
sudo systemctl status kafka
● kafka.service Loaded: loaded (/etc/systemd/system/kafka.service; disabled; vendor preset: enabled) Active: active (running) since Wed 2021-02-10 00:09:38 UTC; 1min 58s ago Main PID: 55828 (sh) Tasks: 67 (limit: 4683) Memory: 315.8M CGroup: /system.slice/kafka.service ├─55828 /bin/sh -c /home/kafka/kafka/bin/kafka-server-start.sh /home/kafka/kafka/config/server.properties > /home/kafka/kafka/kafka.log 2>&1 └─55829 java -Xmx1G -Xms1G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent -XX:MaxInlineLevel=15 -Djava.awt.headless=true -Xlog:gc*:file=>
You now have a Kafka server listening on port 9092.
You have started the kafka service. But if you rebooted your server, Kafka would not restart automatically. To enable the kafka service on server boot, run the following commands:
sudo systemctl enable zookeeper sudo systemctl enable kafka
In this step, you started and enabled the kafka and zookeeper services. In the next step, you will check the Kafka installation.
That’s it, you have successfully installed Apache Kafka on your Ubuntu Server and you can start using it!
## If you found this usful then please comment and follow me! Also check out [my website where I also post everything from here](https://howtoubuntu.xyz)