4.1 Data Modeling Introduction
Data in MongoDB has a flexible schema. Unlike SQL databases, where you must determine and declare a table’s schema before inserting data, MongoDB’s collections do not enforce document structure. This flexibility facilitates the mapping of documents to an entity or an object. Each document can match the data fields of the represented entity, even if the data has substantial variation. In practice, however, the documents in a collection share a similar structure.
The key challenge in data modeling is balancing the needs of the application, the performance characteristics of the database engine, and the data retrieval patterns. When designing data models, always consider the application usage of the data (i.e. queries, updates, and processing of the data) as well as the inherent structure of the data itself.
4.1.1 Document Structure
The key decision in designing data models for MongoDB applications revolves around the structure of documents and how the application represents relationships between data. There are two tools that allow applications to represent these relationships: references and embedded documents.Embedded documents capture relationships between data by storing related data in a single document structure.
4.1.2 Atomicity of Write Operations
In MongoDB, write operations are atomic at the document level, and no single write operation can atomically affect more than one document or more than one collection. A denormalized data model with embedded data combines all related data for a represented entity in a single document. This facilitates atomic write operations since a single write operation can insert or update the data for an entity.
Normalizing the data would split the data across multiple collections and would require multiple write operations that are not atomic collectively. However, schemas that facilitate atomic writes may limit ways that applications can use the data or may limit ways to modify applications. The Atomicity Considerations documentation describes the challenge of designing a schema that balances flexibility and atomicity.
4.1.3 Document Growth
Some updates, such as pushing elements to an array or adding new fields, increase a document’s size. For the MMAPv1 storage engine, if the document size exceeds the allocated space for that document, MongoDB relocates the document on disk. When using the MMAPv1 storage engine, growth consideration can affect the decision to normalize or denormalize data. See Document Growth Considerations for more about planning for and managing document growth for MMAPv1.
4.1.4 Data Use and Performance
When designing a data model, consider how applications will use your database. For instance, if your application only uses recently inserted documents, consider using Capped Collections. Or if your application needs are mainly read operations to a collection, adding indexes to support common queries can improve performance.
4.2 Data Modeling Concepts
4.2.1 Data Model Design
Embedded Data Models: In general, use embedded data models when:
- you have “contains” relationships between entities. See Model One-to-One Relationships with Embedded Documents
- you have one-to-many relationships between entities. In these relationships the “many” or child documents always appear with or are viewed in the context of the “one” or parent documents.
Normalized Data Models: In general, use normalized data models-
- when embedding would result in duplication of data but would not provide sufficient read performance advantages to outweigh the implications of the duplication.
- to represent more complex many-to-many relationships.
- to model large hierarchical data sets.
4.2.2 Operational Factors and Data Models
Modeling application data for MongoDB depends on both the data itself, as well as the characteristics of MongoDB itself. For example, different data models may allow applications to use more efficient queries, increase the throughput of insert and update operations, or distribute activity to a sharded cluster more effectively. These factors are operational or address requirements that arise outside of the application but impact the performance of MongoDB based applications.
GridFS is a specification for storing and retrieving files that exceed the BSON-document size limit of 16MB. Instead of storing a file in a single document, GridFS divides a file into parts, or chunks, 9 and stores each of those chunks as a separate document. By default GridFS limits chunk size to 255k. GridFS uses two collections to store files. One collection stores the file chunks, and the other stores file metadata.
When you query a GridFS store for a file, the driver or client will reassemble the chunks as needed. You can perform range queries on files stored through GridFS. You also can access information from arbitrary sections of files, which allows you to “skip” into the middle of a video or audio file. GridFS is useful not only for storing files that exceed 16MB but also for storing any files for which you want access without having to load the entire file into memory.
4.3 Data Model Examples and Patterns
The following documents provide overviews of various data modeling patterns and common schema design considerations:
- Model One-to-One Relationships with Embedded Documents – Presents a data model that uses embedded documents – to describe one-to-one relationships between connected data.
- Model One-to-Many Relationships with Embedded Documents – Presents a data model that uses Embedded documents to describe one-to-many relationships between connected data.
- Model One-to-Many Relationships with Document References – Presents a data model that uses references to describe one-to-many relationships between documents.
- Model Tree Structures with Parent References –Presents a data model that organizes documents in a tree-like structure by storing references to “parent” nodes in “child” nodes.
- Model Tree Structures with Child References – Presents a data model that organizes documents in a tree-like structure by storing references to “child” nodes in “parent” nodes.
- Model Data for Atomic Operations – Illustrates how embedding fields related to an atomic update within the same document ensures that the fields are in sync.
- Model Data to Support Keyword Search – Describes one method for supporting keyword search by storing keywords in an array in the same document as the text field. Combined with a multi-key index, this pattern can support application’s keyword search operations.
4.4 Data Model Reference
- Documents – MongoDB stores all data in documents, which are JSON-style data structures composed of field-and-value pairs.
- Database References – Define manual references and DBRefs, which MongoDB can use to represent relationships between documents.
- GridFS Reference – Convention for storing large files in a MongoDB Database.
- ObjectId – A 12-byte BSON type that MongoDB uses as the default value for its documents’ _id field if the _id field is not specified.
- BSON Types – Outlines the unique BSON types used by MongoDB.