NoSQL Database Overview
A NoSQL (often interpreted as Not only SQL) database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. Motivations for this approach include simplicity of design, horizontal scaling, and finer control over availability.
Get 100% Hike!
Master Most in Demand Skills Now!
What is NoSQL?
NoSQL databases first started out as in-house solutions to real problems in companies such as Amazon Dynamo, Google, and others. These companies found that SQL didn’t meet their requirements.
In particular, these companies faced three primary issues: unprecedented transaction volumes, expectations of low-latency access to massive datasets, and nearly perfect service availability while operating in an unreliable environment. Initially, companies tried the traditional approach: they added more hardware or upgraded to faster hardware as it became available.
When that didn’t work, they tried to scale existing relational solutions by simplifying their database schema, de-normalizing the schema, relaxing durability and referential integrity, introducing various query caching layers, separating read-only from write-dedicated replicas, and, finally, data partitioning in an attempt to address these new requirements. None fundamentally addressed the core limitations, and they all introduced additional overhead and technical tradeoffs.
NoSQL Database Examples
Let’s take a look at some examples of how NoSQL databases are used in practice.
1. Document Database
Document databases store data in a document data model with the help of JSON or XML objects. Every document has a markup that identifies the fields and values. The values vary in types including strings, numbers, nested data, arrays, and Booleans.
This database has gained popularity among developers due to the fact that the JSON documents are able to capture structures that typically align with objects that the developers are dealing with in code.
2. Key-value Database
Key-value databases have a very simple schema with a unique key that is paired with a collection of values like a string, a large binary object, or anything else. Since there are no complex queries, databases using this structure benefit with regard to performance.
The unique key store’s small chunks of arbitrary data from the results of database calls, API calls, or page rendering. Performance improves by caching the results of a database query into Memcached for some arbitrary amount of time and then, querying Memcached first.
3. Wide-column Store
A wide-column store can work with data that implements a modified table model. The data is stored using key rows that have the capability to associate with one or multiple dynamic columns. The flexibility of this model is contributed by the fact that the column data structure can vary from row to row. Wide-column stores are capable of storing large amounts of data in billions of rows with millions of columns.
4. Graph Database
Connecting data in relational databases requires creating JOINs between tables. Such JOINs take a very long time. In the case of application development where the connections between data need to be traversed rapidly, a graph database is a suitable choice.
For example, for real-time recommendations on an e-commerce site, the application needs to connect data about the user search, purchase history, purchases by similar users, preferences and interests of the user, suitable product combinations, in-stock products, and more.
The ability to connect all of the relevant data that can optimize the experience of a user in real-time in the best case can grab the user’s attention that can eventually convert to a new sale or an add-on to an existing order.
Features of NoSQL
Following are the NoSQL Features:
1. Non-relational
- NoSQL databases don’t follow the relational model nor does it provide tables that have flat fixed-column records.
- They work with self-contained aggregates or BLOBs.
- A NoSQL database doesn’t require data normalization and object-relational mapping
- It doesn’t include complex features like query languages, referential integrity joins, query planners, ACID, etc.
2. Schema-free
- NoSQL databases are either schema-free or have relaxed schemas.
- They do not require any kind of schema definition of the data and provide heterogeneous data structures in the same domain.
3. Simple API
- NoSQL databases have easy-to-use interfaces for storage and querying.
- The APIs allow for low-level data manipulation and selection methods.
- NoSQL databases are web-enabled databases that run as internet-facing services.
- A standard-based NoSQL query language is not used.
- Mostly, text-based protocols are used with HTTP REST with JSON.
4. Distributed
- Several NoSQL databases can be executed in a distributed fashion.
- NoSQL offers auto-scaling and fail-over capabilities.
- For scalability and throughput, the ACID concept can often be sacrificed.
- Replication comes in two forms—Master-slave replication and peer-to-peer replication.
- NoSQL only provides eventual consistency.
- Shared Nothing Architecture leads to less coordination and higher distribution.
Types of NoSQL Databases
There are primarily four types of NoSQL databases and each category has its unique attributes and limitations. Users can choose the database as per their product requirements.
1. Key-Value Pair Based
In this type, the data is stored in key/value pairs. The database is designed with the ability to handle heavy data loads. Here, the data is stored as a hash table where each key is unique, and the value can be a string, JSON, BLOB(Binary Large Objects), etc.
2. Column-based
Column-oriented databases are based on BigTable paper by Google and work on columns that are treated separately. The values of single-column databases are stored contiguously.
As the data is readily available in a column, these databases deliver high performance on aggregation queries.
Column-based NoSQL databases are popular in the management of data warehouses, CRM, business intelligence, Library card catalogs, etc. HBase, Hypertable, and Cassandra are a few NoSQL query examples of column-based databases.
3. Document-Oriented:
This type of NoSQL database stores and retrieves data as a key-value pair. The value part is stored as a document in JSON or XML format. The database understands the value and it can be queried. It should not be used for complex transactions requiring multiple operations or queries against varying aggregate structures.
The document type is mostly used for real-time analytics, CMS systems, e-commerce applications, and blogging platforms. MongoDB, Amazon SimpleDB, Riak, Lotus Notes, CouchDB, etc. are some of the popular document-oriented DBMS systems.
4. Graph-Based
A graph-based database stores entities as nodes and relations amongst those entities that are stored as edges. All nodes and edges have a unique identifier. An edge gives a relationship between nodes. This type of database is multi-relational in nature and traversing relationships is fast due to the fact that they are already captured into the database with no need for calculation.
Graph-based databases are mostly used for logistics, social networks, and spatial data.
Infinite Graph, FlockDB, Neo4J, OrientDB, etc. are examples of popular graph-based databases.
Advantages of NoSQL
- It can be used as primary or analytic data source
- It offers big data capability
- It has a flexible schema design that is easily alterable without downtime or service disruption
- There is no Single Point of Failure (SPOF)
- Replication is easy
- There is no need for separate caching layer
- It allows horizontal scalability and fast performance
- NoSQL can handle the three types of data with equal effect
- Object-oriented programming makes it easy to use and provides flexibility
- It handles big data, managing data velocity, volume, variety, and complexity
- NoSQL is excellent for distributed database and multi-data center operations
- It eliminates the need for a specific caching layer for storing data
- It doesn’t require a dedicated high-performance server
- Supports key languages and platforms used by developers
- More simple to implement than RDBMS
- It can serve as the primary data source for online applications
Disadvantages of NoSQL
- It has no standardization rules
- NoSQL offers limited query capabilities
- RDBMS databases and tools are comparatively more mature than NoSQL
- It does not have traditional database capabilities, like consistency during simultaneous multiple transactions
- It is difficult to maintain unique values with an increase in data volume as keys become difficult
- It doesn’t work well with relational data
- The learning curve is stiff for new developers
Understanding CAP Theorem
When evaluating NoSQL or other distributed systems, you’ll inevitably hear about the “CAP theorem.” In 2000 Eric Brewer proposed the idea that in a distributed system you can’t continually maintain perfect consistency, availability, and partition tolerance simultaneously. CAP is defined as:
Consistency: all nodes see the same data at the same time
Availability: a guarantee that every request receives a response about whether it was successful or failed
Partition tolerance: the system continues to operate despite arbitrary message loss
The theorem states that you cannot simultaneously have all three; you must make tradeoffs among them. The CAP theorem is sometimes incorrectly described as a simple design-time decision—“pick any two [when designing a distributed system]”—when in fact the theorem allows for systems to make tradeoffs at run-time to accommodate different requirements.
NoSQL vs SQL
Parameters | NoSQL | SQL |
Definition | Non-relational database | Relational database |
Schema | Dynamic | Static |
Representation | Represented as key-value pair, graph database, wide-column stores, etc. | Represented as tables |
Scalability | Horizontal | Vertical |
Complex Queries | Not so good for complex queries | Best for complex queries |
Language | Language varies from database to database | Uses SQL, a powerful standard language |