• Articles
  • Tutorials
  • Interview Questions

Monitoring Kafka

How to monitor Kafka?

Yammer Metrics is used for reporting purposes between the brokers and the clients.

Description Value
No of under replicated partitions 0
It means that the server is on active control Only one server is 1
Leader election rate (LER) Server failure not 0
Unclean LER 0
Partition counts Mostly even near servers
Leader replica counts Mostly even near servers
ISR shrinks rate Normally the ISR and expansion is 0.
The partitions will shrink if the server powers down. Again, that partition will be expanded once the replicas of the servers are up.
ISR expansion rate Same as above
Max lags in messages btw follower and leader replicas Lag must be proportional  to the size of a request made by the producer
Lag in messages per follower replica Lag must be proportional  to the size of a request made by the producer
If such condition then requests wait near the producer non-zero if ack=-1 is used
It is the span for which the request will be waiting During the producer request if act =-1
The processors will be constant When time is greater than 0.3 and between 0 and 1
Here the request handler threads will be constant When time is greater than 0.3 and between 0 and 1
Quota metrics per client-id Throttle-time is the time for which the client-id is throttled i.e it is 0, and the byte-rate is the rate at which the data is produced or consumed in bytes/sec

Kafka Spark Streaming Tutorial Video:

Video Thumbnail

New producer Monitoring

Description
These are the threads that were blocked and are waiting to  add their records by the buffer memory Waiting-threads
The largest buffer than can be used by the client Buffer-total-bytes
buffer-available-bytes
It indicates the overall usable buffer memory
This is  the  time  for which the  fixer waits for the space assigned bufferpool-wait-time
Bytes count  for each partition  and for each request made by the partition batch-size-avg
Maximum bytes for each partition and on each request batch-size-max
It is the rate of compression in average count compression-rate-avg
Time spend by record in average record-queue-time-avg
Highest time  spend by the record record-queue-time-max
The  rate at which the record is retried record-retry-rate
 It is the rate at which the record error occurs record-error-rate
It is the largest size any record can be of record-size-max
It the average size is any record record-size-avg
It is the age  of the present metadata metadata-age
It is the rate at which the connection can be closed connection-close-rate
 It is the rate at which the connection can be created connection-creation-rate
 The rate of the network operations network-io-rate
The rate at which the bytes are outgoing outgoing-byte-rate
The rate at which  the requests are sent request-rate
It is the average size of all the requests  that is sent request-size-avg
It is the largest size any request is sent request-size-max
It is the rate at which the bytes enter incoming-byte-rate
The rate at which the responses are obtained response-rate
 It is the rate of selection of input,output performance select-rate
 It is the average time for which the input output waits io-wait-ratio
 It is the average time for which the input, output call in ns io-time-ns-avg
It is the time for which the input, output io-ratio
thread spends
It is the number of present active connections connection-count
It is the number of bytes send with respect to the time outgoing-byte-rate
 It is the rate of requests sent in each second for a node request-rate
 It is the average of the size of the requests request-size-avg
 The largest size any request can be of  request-size-max
 It is the rate in which the responses are obtained incoming-byte-rate
 It is the average of the request latency request-latency-avg
It is the maximum of the same request-latency-max
It is the rate at which the answers to the requests are obtained response-rate
It is the rate at which the records are sent to the Topic record-send-rate
It is the rate at which the bytes are sent to the Topic byte-rate
For a topic it is the rate at which the records are compressed compression-rate
It is the rate at which the records are tried again to be sent to the Topic record-retry-rate
It is the rate at which error occurs when records are being sent to the Topic record-error-rate
It is the maximum time in which the request can be throttled by the server produce-throttle-time-max
It is the average amount of time any request can be throttled by the server. produce-throttle-time-avg

Certification in Bigdata Analytics

Always keep in mind that for a consumer to be in good position, keep the max lag less than the threshold and fetching rate should be always larger than 0.

Course Schedule

Name Date Details
Big Data Course 14 Dec 2024(Sat-Sun) Weekend Batch View Details
21 Dec 2024(Sat-Sun) Weekend Batch
28 Dec 2024(Sat-Sun) Weekend Batch

About the Author

Technical Research Analyst - Big Data Engineering

Abhijit is a Technical Research Analyst specialising in Big Data and Azure Data Engineering. He has 4+ years of experience in the Big data domain and provides consultancy services to several Fortune 500 companies. His expertise includes breaking down highly technical concepts into easy-to-understand content.