• Articles
  • Tutorials
  • Interview Questions

Cassandra Architecture - The Definitive Guide

What is Cassandra Architecture

Cassandra’s architecture is based on the understanding that system and hardware failures occurs eventually. Before talking about Cassandra lets first talk about terminologies used in architecture design.
Node: Is computer (server) where you store your data. It is the basic infrastructure component of Cassandra.
Data center :A collection of related nodes. A data center can be a physical data center or virtual data center. Replication is set by data center. Depending on the replication factor, data can be written to multiple data centers. However, data centers should never span physical locations.
Cluster: A cluster contains one or more data centers. It can span physical locations.
Commit log: Is a crash-recovery mechanism.All data is written first to the commit log (file) for durability. After all its data has been flushed to SSTables, it can be archived, deleted, or recycled.
Table: A collection of ordered columns fetched by row.
SSTable: A sorted string table (SSTable) is an immutable data file to which Cassandra writes memtables periodically. SSTables are append only and stored on disk sequentially and maintained for each Cassandra table.

Certification in Bigdata Analytics

Gossip: Gossip is a peer-to-peer communication protocol in which nodes periodically exchange state information about themselves and about other nodes they know about. The gossip process runs every second and exchanges state messages with up to three other nodes in the cluster.
Bloom filter: These are quick, nondeterministic, algorithms for testing whether an element is a member of a set. Bloom filters are accessed after every query.
A detailed understanding of Apache Cassandra is available in this blog post for your perusal!

Cassandra addresses the problem of failures by using a peer-to-peer distributed system across homogeneous nodes where data is distributed among all nodes in the cluster. Each node exchanges information across the cluster every second. A sequentially written commit log on each node captures write activity to ensure data durability. Data is then indexed and written to an in-memory structure, called a memtable, which resembles a write-back cache. Once the in-memory data structure is full, the data is written to disk in an SSTable data file. All writes are automatically partitioned and replicated throughout the cluster.

Using a process called compaction Cassandra periodically consolidates SSTables, discarding obsolete data and tombstones (an indicator that data was deleted). Client’s read or write requests can be sent to any node in the cluster. When a client connects to a node with a request, that node serves as the coordinator for that particular client operation. The coordinator acts as a proxy between the client application and the nodes that own the data being requested. The coordinator determines which nodes in the ring should get the request based on how the cluster is configured

img 1Learn everything about Cassandra by enrolling in Cassandra Training.

Course Schedule

Name Date Details
Big Data Course 20 Jul 2024(Sat-Sun) Weekend Batch
View Details
Big Data Course 27 Jul 2024(Sat-Sun) Weekend Batch
View Details
Big Data Course 03 Aug 2024(Sat-Sun) Weekend Batch
View Details

About the Author

Technical Reseach Analyst - Data Engineering

Abhijit is a Technical Research Analyst specializing in Deep Learning. He holds a degree in Computer Science with a focus on Data Science. Being proficient in Python, Scala, C++, Dart, and R, he is passionate about new-age technologies. Abhijit crafts insightful analyses and impactful content, bridging the gap between cutting-edge research and practical applications.

Find MongoDB Training in Other Regions

Bangalore Chennai Delhi Hyderabad London Sydney United States