• Articles
  • Tutorials
  • Interview Questions

Kafka Configuration

Kafka Configuration Types

By using the property file Kafka makes its configuration. It can be supplied either from a file or programmatically. It is either taken from a default file or else also can be self-programmed.

Some configurations have both a default global setting as well as Topic-level overrides. The Topic level properties have the format of the CSV(e.g., “xyz.per.Topic=Topic1:value1,Topic2:value2”). And in certain Topics the default quantity is written again.

  • Broker config’s
  • Producer config’s
  • Consumer Configs
    • Old consumer Configs
    • New consumer Configs
  • Kafka Connect Configs

Kafka Spark Streaming Tutorial Video:

Video Thumbnail

1. Broker config’s

The important configurations are the following:

  • broker. id
  • log. dirs
  • zookeeper. connect
Property Description
broker.id It is a list of  a lot of directories arranged properly separated by commas and each partition is placed in the directory having the less number of partitions.
log.dirs It stores the log and used when it is 0.
zookeeper.connect  ZK leader and follower distance

 

Name Explanation
bootstrap. servers

A list of host/port pairs to use in establishing the initial connection to the Kafka cluster.It initiates the total servers.

key. serializer It is the key to implement the interface for the serializer
value. serializer It is the value for the same
acks

When it is 0, the producer asks for no acknowledgement.If 1, the leader will acknowledge without the concern of other followers. When act=all, until all the in-sync replicas are acknowledged, the leader will wait for approving it. It gives full guarantee.

Buffer. memory The total memory is buffered before sending.If all the blocks are sent together, then the producer will block or buffer it so that delivery to the server do not overlap. Buffering uses only a part of the memory not all. Few parts of memory are for compressing and other requests.
 compression. type It tells about the compressing of data made by the producer. It compresses the full set of data so compression is of higher quality. If more batching is done.it automatically qualifies the compression. Gzip and snappy are valid.
retries If transfer of any record fails, then the client sends it again if retries is set to 1.Using this, the user has to be careful, because, if the producer  sends two data together and if one is failed and we retry sending it. In that case the first one will get delivered after the second one.
ssl.key.password Its password is for storing the files with a key.
ssl.keystore.location It tells about the location whey the file is stored.
ssl.keystore.password It is the password for the store of the key store file.it is required only when location key is used
ssl.truststore.location It gives the location of the file used to store the trust.
ssl.truststore.password The password for the trust store file.
batch.size If a lot of records are needed to accumulate together, the producer attempts for fewer number of the records so that it assists the performance of servers as well as for the clients. Sizes are trying to be minimized as possible.Batch sizes should be proper because if it is smaller than it may affect the throughput. And if the size is more than required, then it will waste a lot of its memory.
client.id Its an id to use for making requests to the broker. It is sent to the servers so that they allows to track the source of requests.
connections.max.idle.ms If the given time is expired, it shuts down the connections so that no extra time is beg wasted.
max.block.ms Sometimes memory becomes full or unavailable. So at that time, the links need to be blocked so that data do not overflow.
max.request.size It indicates till what size, you can make the request and also how big your record can be. The request the producer does should not exceed the maximum limit and this command have to make sure of it.
partitioner.class This is a class that implements the interface of the partitioner.
receive.buffer.bytes It indicates the size of the buffer that is used to read the records and requests.
request.timeout.ms If any request takes more time to reach the customer, then the customer can stop it by this command.If the  record or request fails and still there is time, then the customer will wait and the we  can resend it.
sasl.kerberos.service.name It dictates how the Kafka tool runs in the JAAS configuration.
security.protocol These are some security rules and regulations used while exchanging words with the servers.
send.buffer.bytes It indicates the size of the memory buffer which will hold the data to be sent  to the producer.
ssl.enabled.protocols It is the lists of the rules for maintaining the connections related to SSL.
ssl.keystore.type It is not the key, but the format of how the key should be.
ssl.protocol It is responsible for the contexts. TLS is the original one and, TLSv1.1 and TLSv1.2. SSL, SSLv2 are for other virtual machines.
ssl.provider It provides security for the virtual machine. And its name and value will be similar to the security giver of the virtual machine.
ssl.truststore.type It indicates the file formation of the trust stores.
timeout.ms It tells the maximum time the brokers wait for the requests.
If the time is exceededthen it is set to
one and block the request from reaching the servers.
If the timeout is not one even if the time is exceeded, there an error will be elapsed.
 metadata.fetch.timeout.ms When time is out or memory overflows,  so accepting of additional data needs to be stopped either by blocking or by showing errors.  Sometimes when blocking is not  urgent, you can show errors just by resetting this to zero.
metadata.max.age.ms Prior to  sending data, it studies about the meta data of the Topic to find out which broker serves the host. So ones the meta data of the Topic and its host is calculated, accordingly you can send data to the Topic.
metric.reporters It contains classes that are used as reporters and by using those reporters, allows using classes for creating new metrics.
metrics.num.samples It indicates the quantity of the samples for the metrics creation.
metrics.sample.window.ms It indicates the quantity of the samples for the metrics creation.
reconnect.backoff.ms Establishing reconnection to any host needs time and it is specified by this command. It makes the bond with the hosts stronger.
retry.backoff.ms The amount of time to wait before attempting to retry a failed fetch request to a given Topic partition
sasl.kerberos.min.time.before.relogin These are logged in and logout timings between the refreshing.
sasl.kerberos.ticket.renew.jitter During the renewal period certain jitter is added and this jitter is calculated in percentage by this command.
sasl.kerberos.ticket.renew.window.factor Until the next renewal of ticket is raised the login thread will be silent.
ssl.cipher.suites For negotiating the network connection with TLS, the group of algorithms for encrypting and authority are in this list. All the suites included in this list is  supported.
ssl.endpoint.identification.algorithm It validates the server hostname
ssl.keymanager.algorithm It is used for the SSL connection maintenance.
ssl.trustmanager.algorithm Even it is for the connection maintenance used by trust manager. Its value is similar to the trust manager’s value in the virtual machine.

Certification in Bigdata Analytics

2. Consumer Configs

The essential consumer configurations are the following:

  • group.id
  • zookeeper.connect
group.id All the processes belonging to similar consumer process is connected to a singular group and given an identity. If the id is set, it means that all the processes belong to the same group.
zookeeper.connect

It is the Zookeeper connector using both the hosts and ports of different Zookeepers. This connection also includes a chroot path  which keeps the data under another path.The same chroot path should be used by the consumer as well as the producer. When you have to use a chroot path, the string will be as  hostname1:port1,hostname2:port2 /chroot/path

The things the producer configuration takes care of includes compression, synchronous and asynchronous configuration and also batching sizes. And the consumer configuration takes care of the fetching sizes.

The producer configuration is as follows:

#Replication configurations

num.replica.fetchers=4
replica.fetch.max.bytes=1048576
replica.fetch.wait.max.ms=500
replica.high.watermark.checkpoint.interval.ms=5000
replica.socket.timeout.ms=30000
replica.socket.receive.buffer.bytes=65536
replica.lag.time.max.ms=10000
controller.socket.timeout.ms=30000
controller.message.queue.size=10

#Log configuration

num.partitions=8
message.max.bytes=1000000
auto.create.Topics.enable=true
log.index.interval.bytes=4096
log.index.size.max.bytes=10485760
log.retention.hours=168
log.flush.interval.ms=10000
log.flush.interval.messages=20000
log.flush.scheduler.interval.ms=2000
log.roll.hours=168
log.retention.check.interval.ms=300000
log.segment.bytes=1073741824

#ZK configuration

zookeeper.connection.timeout.ms=6000
zookeeper.sync.time.ms=2000

# Socket server configuration

num.io.threads=8
num.network.threads=8
socket.request.max.bytes=104857600
socket.receive.buffer.bytes=1048576
socket.send.buffer.bytes=1048576
queued.max.requests=16
fetch.purgatory.purge.interval.requests=100
producer.purgatory.purge.interval.requests=100

Course Schedule

Name Date Details
Big Data Course 23 Nov 2024(Sat-Sun) Weekend Batch View Details
30 Nov 2024(Sat-Sun) Weekend Batch
07 Dec 2024(Sat-Sun) Weekend Batch

About the Author

Technical Research Analyst - Big Data Engineering

Abhijit is a Technical Research Analyst specialising in Big Data and Azure Data Engineering. He has 4+ years of experience in the Big data domain and provides consultancy services to several Fortune 500 companies. His expertise includes breaking down highly technical concepts into easy-to-understand content.