Tuning Cassandra Performance

By Abhijit | Last updated on November 18, 2024 | 87388 Views

Cassandra Performance Tuning: Methodologies

There are multiple dimensions where Cassandra performance can be tuned. Some of them are described below:
Write Operations:
Commit log and data dirs (sstables) should be on different disks. Commit log uses sequential write however, if SSTables share the same drive with commit log , I/O contention between commit log & SSTables may deteriorate commit log writes and SSTable reads.

Read Operations:
A good rule of thumb is 4 concurrent_reads per processor core. May increase the value for systems with fast I/O storage.

Cassandra Compaction Contention:
Reduce the frequency of memtable flush by increasing the memtable size or preventing too pre-mature flushing. Less frequent memtable flush results in fewer SSTables files and less compaction. Fewer compaction reduces SSTables I/O contention, and therefore improves read operations. Bigger memtables absorb more overwrites for updates to the same keys, and therefore accommodating more read/write operations between each flushes.

Memory Cache:
Do not increase Cassandra cache size unless there is enough physical memory (RAM). Avoid memory swapping at any cost.

Row Cache:
The row cache holds the entire content of a row in memory. It provides data caching instead of reading data from the disk. good if column’s data is small so the cache is big enough to hold most of the hotspot data. Bad if column’s data is too large so the cache is not big enough to hold most of the hotspot data. It’s bad for high write/read ratios. By default, it is off. If hit ratio is below 30%, row cache should be disabled.

Key Cache Tuning:
The key cache holds the location of data in memory for each column family. Its Effective if there are hot data spot & cannot use row cache effectively because of the large column size. By default, Cassandra caches 200000 keys per column family. Use absolute number for keys_cached instead of percentage.

JVM:
Minimum and Maximum Java Heap Size should be half of available physical memory. Size of young generation heap should be 1/4 of Java Heap. Do NOT increase the size without confirming there are enough available physical memory- Always reserves memory for OS File cache.

About the Author

Abhijit

Technical Research Analyst - Big Data Engineering

Abhijit is a Technical Research Analyst specialising in Big Data and Azure Data Engineering. He has 4+ years of experience in the Big data domain and provides consultancy services to several Fortune 500 companies. His expertise includes breaking down highly technical concepts into easy-to-understand content.