Apache Cassandra- A Brief Intro
Apache Cassandra database is an open source distributed database management system designed to handle large amounts of Big data across many commodity servers, providing high availability with no single point of failure. Cassandra is a non-relational highly scalable, eventually consistent, distributed, structured column family based data store. Cassandra is a peer to peer architecture.
What is Cassandra?
- Originally developed at Facebook organization
- Written in Java
- Name came from Greek Mythology
- Cassandra uses mixture of concepts from Google’s BigTable and Distributed Hash Table (DHT) of Amazon’s Dynamo
Now, let’s discuss, what has changed with introduction of NoSQL?
- Massive data volumes
- Extreme query load
- Flexible schema evolution- schema never gets fixed and it gets evolved
- Schema changes can be gradually introduced in the system.
Cassandra falls under the Columnar or extensible record category where each key is associated with many associates. It still uses tables but have no joins. Cassandra does not support joins or sub-queries, except for batch analysis via Hadoop. Rather, Cassandra emphasizes denormalization through features like collections. Cassandra stores data by columns, not like traditional row oriented databases. To know more about Cassandra training courses, you can visit intellipaat.com
What is CAP Theorem?
It is also knows as Brewer’s theorem, states that it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees:
- Consistency (all nodes see the same data at the same time)
- Availability (a guarantee that every request receives a response about whether it was successful or failed)
- Partition Tolerance (the system continues to operate despite arbitrary message loss)
According to this theorem, a distributed system can satisfy any two of these guarantee at the same time, but not all three.
So, we need to understand that partitioning is unavoidable when a network partition fails, i.e. systems can’t communicate with each other, in that particular point, the system should be operational but while it is operational whether it has to hold on availability or hold on to consistency is what each distributed system has to decide.
Cassandra has a concept called Tunable consistency model, this is the only database that has this particular concept, so you can setup Cassandra either for availability or consistency, Cassandra can work in both the modes unlike other databases.
We can’t build banking or financial systems using Cassandra instead it is used in social media. OLTP or payment models can’t be used in Cassandra.
If you look at the diagram below, it is very clear that Cassandra falls under the category of Available & tolerant partition.
Click To Enlarge