• Articles
  • Tutorials
  • Interview Questions

Tuning Cassandra Performance

Cassandra Performance Tuning: Methodologies

There are multiple dimensions where Cassandra performance can be tuned. Some of them are described below:
Write Operations:
Commit log and data dirs (sstables) should be on different disks. Commit log uses sequential write however, if SSTables share the same drive with commit log , I/O contention between commit log & SSTables may deteriorate commit log writes and SSTable reads.

Read Operations:
A good rule of thumb is 4 concurrent_reads per processor core. May increase the value for systems with fast I/O storage.

Cassandra Compaction Contention:
Reduce the frequency of memtable flush by increasing the memtable size or preventing too pre-mature flushing. Less frequent memtable flush results in fewer SSTables files and less compaction. Fewer compaction reduces SSTables I/O contention, and therefore improves read operations. Bigger memtables absorb more overwrites for updates to the same keys, and therefore accommodating more read/write operations between each flushes.

Go through Cassandra Tutorial to get a better understanding of the topic.

Certification in Bigdata Analytics

Memory Cache:
Do not increase Cassandra cache size unless there is enough physical memory (RAM). Avoid memory swapping at any cost.

Row Cache:
The row cache holds the entire content of a row in memory. It provides data caching instead of reading data from the disk. good if column’s data is small so the cache is big enough to hold most of the hotspot data. Bad if column’s data is too large so the cache is not big enough to hold most of the hotspot data. It’s bad for high write/read ratios. By default, it is off. If hit ratio is below 30%, row cache should be disabled.

To learn more about APIs, go through our blog on Cassandra API.

Key Cache Tuning:
The key cache holds the location of data in memory for each column family. Its Effective if there are hot data spot & cannot use row cache effectively because of the large column size. By default, Cassandra caches 200000 keys per column family. Use absolute number for keys_cached instead of percentage.

Minimum and Maximum Java Heap Size should be half of available physical memory. Size of young generation heap should be 1/4 of Java Heap. Do NOT increase the size without confirming there are enough available physical memory- Always reserves memory for OS File cache.
A detailed understanding of Apache Cassandra is available in this blog post for your perusal!

Course Schedule

Name Date Details
Big Data Course 20 Jul 2024(Sat-Sun) Weekend Batch
View Details
Big Data Course 27 Jul 2024(Sat-Sun) Weekend Batch
View Details
Big Data Course 03 Aug 2024(Sat-Sun) Weekend Batch
View Details

About the Author

Technical Reseach Analyst - Data Engineering

Abhijit is a Technical Research Analyst specializing in Deep Learning. He holds a degree in Computer Science with a focus on Data Science. Being proficient in Python, Scala, C++, Dart, and R, he is passionate about new-age technologies. Abhijit crafts insightful analyses and impactful content, bridging the gap between cutting-edge research and practical applications.

Find MongoDB Training in Other Regions

Bangalore Chennai Delhi Hyderabad London Sydney United States