Microsoft Azure is a cloud computing platform that has witnessed significant growth recently with more and more organizations adopting it. Azure offers various services in a multitude of domains including Database Management, Content Delivery, Networking, etc. There is a significant increase in the number of people who search for Azure Cosmos DB tutorials, how does Cosmos DB work, introduction to Cosmos DB, Basics of Cosmos DB, etc. Cosmos DB in Azure is a proprietary, globally distributed, multi-model database service that takes advantage of Azure’s tools and technologies to provide high throughput, high availability, and low latency.
Before going ahead with ‘What is Azure Cosmos DB’ it would be worthwhile to refresh your Azure concepts.
This article gives an overview of Cosmos DB basics. Topics that will be covered here are:
Check out this youtube tutorial video on Cosmo DB Tutorial for Beginners:
Before going deep to learn Azure Cosmos DB, we need to understand what a database is.
What is a Database?
A database is a collection of data that is organized so that it can be easily retrieved and managed. A database is typically stored in a computer system, so that the applications associated with it can access it. It is also associated with a database management system which can be used to create, edit or update them. The term database is most often used to represent both the database and the database management system.
There are different types of databases depending on the way we intend to use it. They are:
- Relational databases
- Object-oriented databases
- Distributed databases
- Data warehouses
- NoSQL databases
- Graph databases
- OLTP databases.
- Open source databases
- Cloud databases
- Multi Model database
- Document/JSON database
- Self-driving databases
For our understanding of Cosmos DB, we need to know what relational, NoSQL, distributed, and Multi-model databases are.
If you want to learn Azure concepts, please refer to our blog on Azure Tutorial!
A relational database is used to store data that is related to one another. In a relational model, data is stored in the form of tables as rows and columns. The columns are used to store a certain kind of information of different objects and the rows represent different details of one object. The rows can be uniquely identified using primary keys and different tables can be connected to one another using foreign keys.
NoSQL databases are used for data modeled in ways that are not like relational databases. The data structures like key-value pair, wide column, graph, or document are not like those used in relational databases and are supported in NoSQL. This makes them more flexible and can make operations on them faster.
Differences between Relational and NoSQL Database
|Relational Database||NoSQL Database|
|Data is stored in tables||Data can be stored as documents, graphs, key-value pairs, etc.|
|Vertically scalable||Horizontally scalable|
|Predefined schema||No predefined schema, hence easier to update|
|Supports powerful query language||Supports simple query language|
|Can handle data in moderate volumes||Can handle data in very high volumes|
|Has a centralised structure||Has a decentralised structure|
|Data can be written from one or a few locations||Data can be written from many locations|
Learn more about the differences between SQL and NoSQL from here: NoSQL vs. SQL – What is Better?
When data in a database is stored in different physical locations it is called a distributed database. The data can be stored at multiple computers in one physical location or either at different interconnected locations.
There are two ways in which data is stored in different locations:
- Replication: Redundant copies of the database are stored in every location. Hence every update made in one location needs to be made in every other location also.
- Fragmentation: The database is split into fragments and each of them is stored in different locations.
Multi Model Database
A database model that can support both the relational data models as well as the NoSQL models is called Multi-Model Databases.
Azure Cosmos DB
Cosmos Database (DB) is a globally distributed, low latency, multi-model database for managing data at large scales. It is a cloud-based NoSQL database offered as a PaaS (Platform as a Service) from Microsoft Azure. It is a highly available, high throughput, reliable database and is often called a serverless database. Cosmos database contains the Azure Document DB and is available everywhere.
Key Benefits of Azure Cosmos DB
The key features of Cosmos DB are:
- Globally Distributed: With Azure regions spread out globally, the data can be replicated globally.
- Scalability: Cosmos DB is horizontally scalable to support hundreds of millions of reads and writes per second.
- Schema-Agnostic Indexing: This enables the automatic indexing of data without schema and index management.
- Multi-Model: It can store data in Key-value Pairs, Document-based, Graph-based, Column Family-based databases. Global distribution, horizontal partitioning, and automatic indexing capabilities are the same irrespective of the data model.
- High Availability: It has 99.99 % availability for reads and writes for both multi region and single region Azure Cosmos DB accounts.
- Low Latency: The global availability of Azure regions allows for the global distribution of data, which further makes it available nearest to the customers. This reduces the latency in retrieving data.
Get certified in Microsoft Azure with this course on Azure Data Factory Training!
How does Azure Cosmos DB work?
If a website used by people all over the world writes its data into a primary database in one location (non-multi-master mode), the people near to the location will be able to retrieve the data faster than the rest of the world due to network latency issues.
But Cosmos DB has multi-master support where the data can be simultaneously written into different databases spread out globally. In this way, the data is replicated onto the user’s nearest region so that it can be accessed faster. But there can still be a difference of milliseconds between the data being replicated and this affects the consistency.
Consistency indicates whether the data are in sync and are at the same state at any given point in time. Cosmos DB offers multiple levels of consistency with varying performances and availability.
The various consistency levels offered are:
- Eventual: Here the data is written on the primary node and is propagated eventually to read-only secondary nodes. It might take some time for the users to get updated data.
- Consistent Prefix: Clients can read data in the same order as it is written.
- Session: Users who just committed the data will be able to see it but it will take some time for the others to get that data version.
- Bounded Staleness: Here a staleness period can be set, for which the data won’t be replicated into the secondary nodes.
- Strong: This offers the latest copy of data for all the users but gives a relatively low performance.
Cosmos DB allows us to set a default consistency while creating it and which can be changed from the application while retrieving data.
With an Azure Cosmos DB account, we can manage the data by creating multiple databases, containers and items.
- Cosmos Database: The rich API support of Azure Cosmos DB architecture allows us to create and manage databases using SQL API, Cassandra API, Azure Cosmos DB API for MongoDB, etc. These can be used to enumerate, read, create and update the databases.
- Cosmos Containers: They are horizontally partitioned and replicated across multiple regions for scalability and throughput. They are schema-agnostic and are indexed automatically. Containers can also be created, updated, and edited using the Cosmos DB APIs
- Cosmos Items: Depending on the API in use, an item can be a row in a table, a document, or a part of a graph. Insert, delete, update, read, replace, etc can be done on these items using the APIs
Preparing for an Azure interview? Check out our blog on Azure Interview Questions with Answers.
Backup and Restore
Automatic backups of the data are taken at regular intervals without interference to the performance or availability of the databases. The backups are stored in separate services and can be helpful in scenarios of accidental delete or update of the database.
There are two types of backup:
- Periodic Backup Mode: This is the default mode in which backups are taken periodically and we can set the time interval and retention interval.
- Continuous Backup Mode: In this mode, the data is backed up continuously and can be restored to any point in time within the last 30 days.
Common Use Cases
Cosmos DB allows the throughput to be increased greatly and also the storage of data globally. Some of the common Cosmos DB use cases are:
- IoT: The use cases in IoT have to deal with large chunks of data from different locations of the world. This data needs to be written, analyzed, and retrieved quickly. Cosmos DB can be leveraged here for the same.
- Retail and Marketing: Cosmos DB can make the smooth addition, updating, and retrieval of huge volumes of data related to product catalogs, logistics, inventory, etc.
- Gaming: Popular games like The Walking Dead: No Man’s Land by Next Games, and Halo 5: Guardians, use Azure Cosmos DB to provide low latency in-game stats, scoreboards, social media integration, etc.
- Web and mobile applications: It can be used in web and mobile applications for modeling social interactions, integrating with third-party services, and building rich personalized experiences.
Conclusion of Azure Cosmos DB tutorial
With the help of Cosmos DB, data can be globally stored, which can make it possible for databases to be near customers which will reduce latency. The various features provided by Cosmos DB also makes it easier to index, scale and make high availability possible.
If you found this Azure Cosmos DB tutorial useful, do let us know in the comments below.
If you have any queries regarding databases, reach out to us at our Database Community