• Articles
  • Tutorials
  • Interview Questions

Introduction to Apache Cassandra

Introduction to Apache Cassandra

Apache Cassandra database is an open-source distributed database management system designed to handle large amounts of Big data across many commodity servers, providing high availability with no single point of failure. Cassandra is a non-relational highly scalable, eventually consistent, distributed, structured column family-based data store. Cassandra is a peer-to-peer architecture.

What is Cassandra?

  •  Originally developed at Facebook organization
  •  Written in Java
  •  Open-source
  •  The name came from Greek Mythology
  •  Cassandra uses a mixture of concepts from Google’s BigTable and Distributed Hash Table (DHT) of Amazon’s Dynamo

Now, let’s discuss, what has changed with the introduction of NoSQL?

  • Massive data volumes
  • Extreme query load
  • Flexible schema evolution- schema never gets fixed and it gets evolved
  • Schema changes can be gradually introduced in the system.

Cassandra falls under the Columnar or extensible record category where each key is associated with many associates. It still uses tables but has no joins. Cassandra does not support joins or sub-queries, except for batch analysis via Hadoop. Rather, Cassandra emphasizes denormalization through features like collections. Cassandra stores data by columns, not like traditional row-oriented databases.  To know more about Cassandra training courses, you can visit intellipaat.com

Get 100% Hike!

Master Most in Demand Skills Now!

What is CAP Theorem?

It is also known as Brewer’s theorem, states that it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees:

  • Consistency (all nodes see the same data at the same time)
  • Availability (a guarantee that every request receives a response about whether it was successful or failed)
  • Partition Tolerance (the system continues to operate despite arbitrary message loss)

According to this theorem, a distributed system can satisfy any two of these guarantees at the same time, but not all three.

Certification in Bigdata Analytics

So, we need to understand that partitioning is unavoidable when a network partition fails, i.e. systems can’t communicate with each other, in that particular point, the system should be operational but while it is operational whether it has to hold on availability or hold on to consistency is what each distributed system has to decide.

Cassandra has a concept called the Tunable consistency model, this is the only database that has this particular concept, so you can set up Cassandra either for availability or consistency, Cassandra can work in both modes, unlike other databases.

We can’t build banking or financial systems using Cassandra instead it is used in social media. OLTP or payment models can’t be used in Cassandra.

If you look at the diagram below, it is very clear that Cassandra falls under the category of Available & tolerant partition.

nosql_cap

Click To Enlarge

To know more about Cassandra, you can visit our Online Self Paced Courses, or give us a call if you are interested in online instructor-based Big Data Training or Online Cassandra Training.

About the Author

Technical Research Analyst - Big Data Engineering

Abhijit is a Technical Research Analyst specialising in Big Data and Azure Data Engineering. He has 4+ years of experience in the Big data domain and provides consultancy services to several Fortune 500 companies. His expertise includes breaking down highly technical concepts into easy-to-understand content.

Big Data ad