Cassandra Performance Tuning: Methodologies
There are multiple dimensions where Cassandra performance can be tuned. Some of them are described below:
Commit log and data dirs (sstables) should be on different disks. Commit log uses sequential write however, if SSTables share the same drive with commit log , I/O contention between commit log & SSTables may deteriorate commit log writes and SSTable reads.
A good rule of thumb is 4 concurrent_reads per processor core. May increase the value for systems with fast I/O storage.
Cassandra Compaction Contention:
Reduce the frequency of memtable flush by increasing the memtable size or preventing too pre-mature flushing. Less frequent memtable flush results in fewer SSTables files and less compaction. Fewer compaction reduces SSTables I/O contention, and therefore improves read operations. Bigger memtables absorb more overwrites for updates to the same keys, and therefore accommodating more read/write operations between each flushes.
Do not increase Cassandra cache size unless there is enough physical memory (RAM). Avoid memory swapping at any cost.
The row cache holds the entire content of a row in memory. It provides data caching instead of reading data from the disk. good if column’s data is small so the cache is big enough to hold most of the hotspot data. Bad if column’s data is too large so the cache is not big enough to hold most of the hotspot data. It’s bad for high write/read ratios. By default, it is off. If hit ratio is below 30%, row cache should be disabled.
Key Cache Tuning:
The key cache holds the location of data in memory for each column family. Its Effective if there are hot data spot & cannot use row cache effectively because of the large column size. By default, Cassandra caches 200000 keys per column family. Use absolute number for keys_cached instead of percentage.
Minimum and Maximum Java Heap Size should be half of available physical memory. Size of young generation heap should be 1/4 of Java Heap. Do NOT increase the size without confirming there are enough available physical memory- Always reserves memory for OS File cache.
A detailed understanding of Apache Cassandra is available in this blog post for your perusal!