What is MongoDB?
Basically, MongoDB is a NoSQL database that is used to store high-volume data. Around the year 2000, it came in the light.
It is written in C++ programming language, which makes it a highly scalable and performance-oriented database. It works on the concept of collection and documentation.
When you look at the data storage, you will find MongoDB storing data as JSON-like documents, i.e., from document to document, fields can vary. Also, the data structure can be changed over time.
The document model of it maps to the objects in your application code, making data easy to work with.
Watch this MongoDB video:
MongoDB is available free of cost. Before October 16, 2018, MongoDB versions were released under AGPL. However, all the versions after this date, including the patch fixes of the prior versions, are published under the Server Side Public License (SSPL) v1.
Below is an outline of the topics that we are going to cover in this MongoDB tutorial:
- Aggregation: A MongoDB server has multiple databases, each of which is a physical container for collections; further, each collection is a group of documents. Now, a document is a set of key-value pairs that has a dynamic schema, i.e., all documents in a particular collection might not have the same set of fields or structure, and the common fields among documents might hold different data types.
- Document-oriented: The document structure in it is more in line with how developers construct their classes and objects in their respective programming languages. Developers often say that the classes are not in rows and columns but have a clear structure as key-value pairs.
- Schema-less Database: In NoSQL databases, the rows (or documents as called in it) don’t need to have a schema that is defined beforehand. It is a schema-less database written in C++. The MongoDB data modelling available in it which allows you to represent hierarchical relationships to store arrays and other more complex structures with ease.
- Ad-hoc queries: It supports ad-hoc queries, i.e., in MongoDB, you can search by field, range query, etc. and it also supports regular expression searches.
- Indexing: It shows indexes with which you can index any fields in a document.
- Sharding: For horizontal scalability, it supports auto-sharding. It has an automatic load balancing configuration because it distributes data across shards in the cluster.
- GridFS: For storing and retrieving large files, which can be images, audio, video, etc., GridFS is used in it. GridFS is a file system that only stores the files, and their data is stored within MongoDB collections. GridFS can store files greater than its document size within the limit of 16 MB.
- High Performance: All the above features make it a high-performance database. These facts differentiate it from other databases and make it unique.
Another important feature of MongoDB is that it supports MapReduce and aggregation tools. In its MapReduce operations, the map() operation can write results to a collection. If you perform the subsequent reduce() operation on the same input collection, then it merges or replaces or reduces the new results with the previous results.
There are some more features of MongoDB. Let’s check out:
- It uses JavaScript instead of using its own procedures.
- It stores files of any size easily without complicating your stack.
- It supports various data formats:
- JSON data model with dynamic schemas
- BSON data model, etc.
As we have known the features of MongoDB, moving ahead, we will learn the architecture of it in detail.
MongoDB Architecture
The main purpose of designing MongoDB was to meet the demands of the modern-day apps and this is made possible with the unique architecture of it.
The MongoDB architecture provides you with the document data model that is the best way to work with data.
- Easy: You can work with data in a natural and intuitive way.
- Fast: You get great performance from it without much effort.
- Flexible: It adapts to the environment easily and makes quick changes.
- Versatile: It supports a wide variety of data and queries.
The distributed systems design of it allows its users to place data intelligently wherever they want it to be.
- Availability: It delivers data globally over resilient apps through sophisticated replication and self-healing recovery.
- Scalability: By using the native sharding, data grows horizontally.
- Workload Isolation: It can run operational and analytical workloads in the same cluster.
- Locality: You can place your data on particular devices and in a specific geographical location, for governance, class of service, and low-latency access.
Another important feature of MongoDB architecture is that it has a unified experience that gives you the freedom to run your applications anywhere.
- Portability: In MongoDB, the same database can run everywhere.
- Cloud Agnostic: Users can leverage a multi-cloud strategy without any bound.
- Global Coverage: It is available as a service in 50+ regions from the major public cloud providers, such as AWS, Azure, etc.
With these capabilities and resources, you can build an Intelligent Operational Data Platform, promoted by MongoDB.
So far, we have seen ‘What is MongoDB?’, its features, and MongoDB architecture in detail. Now, we will get to know why to use it.
Why MongoDB?
There are many databases like SQL and others, but why MongoDB? By now, we already know that MongoDB is a NoSQL database, and that’s the same reason why we need to learn it.
These are some facts that have made MongoDB popular:
- Aggregation Framework
- BSON Format
- Sharding
- Ad-hoc Queries
- Indexing
Let’s have a look at each one of them in detail.
Aggregation Framework
Users can use the aggregation framework in a very efficient manner in it. For aggregation operations, MapReduce can be used. When large datasets process and generate results with the help of parallel and distributed algorithms on clusters, it is called MapReduce.
MapReduce itself consists of two sets of operations, which are: map() and reduce(). Let’s see what these are:
- map(): Map is used to perform operations like filtering datasets and then performing sorting on those filtered datasets.
- reduce(): After map() is performed, the operation of summarizing all the data is done by reduce().
BSON Format
BSON stands for Binary JSON, and it is JSON-like storage. MongoDB uses BSON, which is a binary-encoded serialization of JSON-like documents while storing documents in collections.
As a primary key, the BSON format uses _id. As _id is used as a primary key, it should have a unique value associated with it, which is called ObjectId. The ObjectId is either generated by the application driver or the MongoDB service. You can understand the BSON format in a better way by checking out the following example.
[
{
"_id": ObjectId("5a934e000102030405000000"),
"collection": "collection",
"content": {
"k": {
"maxInt": 10,
"minInt": 0,
"type": "int"
}
},
"count": 10
}
]
There are many other advantages of using the BSON format, such as it enables internal index and map document properties. Also, it increases the read/write throughput of MongoDB.
Sharding
The major problem with any web/mobile application is scaling. To overcome this, MongoDB provides sharding that helps users with its awesome features. Sharding is a method in which data is distributed across multiple machines. With the help of sharding, it is able to offer horizontal scalability.
Sharding is a complicated process and is done with the help of several shards. Each shard holds some part of the data and functions as a separate database. Merging all the shards together forms a single logical database. Operations over here are performed by query routers.
Ad-hoc Queries
As mentioned earlier, MongoDB supports range query, MongoDB regular expression, and many other types of searches. The queries in it include user-defined JavaScript functions and they return specific fields from the documents. By indexing BSON documents, it can support ad-hoc queries.
Let us see the difference between a SQL and a MongoDB query, by checking how to fetch all the records from an employee table for the employee name ‘XYZ’ is done in both databases.
SELECT * FROM employee WHERE emp_name LIKE ‘%XYZ%’;
db.employee.find({emp_name:/XYZ/ });
Indexing
MongoDB index is used to improve the performance of searches. Any field in a MongoDB document can be indexed as either primary or secondary. This lets the database engine to efficiently resolve the queries.
In this MongoDB tutorial, we have seen ‘What is MongoDB?’, its features, MongoDB architecture, and why to use it. These are some important facts to keep in mind before diving deep into MongoDB. I hope you must be motivated to learn further about MongoDB, and this tutorial can guide you for choosing the right path.