Data Warehouse Terms
This section will illustrate important and most frequently used terms used in Data Warehousing.
Metadata
In simple terms, metadata provides the answers to all your data-related questions in the data warehouse. It refers to data about data giving users detailed explanation of of the syntax and semantics and describing all relevant attributes of the data in DWH. For instance, the number of tables in a DB can be referred as metadata. It is commonly known as:
- Table of Contents for the data
- Data Catalog
- Data directory
- DWH Roadmap
- The nerve Center
Metadata is an important concept since it is essential for building, administering and using your data warehouse. With the support of metadata, developers and database administrators can create their own ad hoc reports, which is of prime significance in this era of Big Data. In fact, without metadata, the data warehouse is considered futile – a big box without any valuable and meaningful information. Thus, from Programmers to Administrators to Testers, metadata is the key to data warehousing.
OLAP
OLAP expands to Online Analytical Processing. As the name suggests, OLAP is computer processing allowing users to interactively analyze multidimensional data from multiple prospects. The various techniques of multidimensional model of OLAP constitute and encapsulate large volumes of data for rapid evaluation using online analysis tools. In a multidimensional environment, each attribute of data is considered a separate dimension, and OLAP can establish an intersection between these dimensions.
Learn more about Data Warehouse Quality Management and other various tools in this insightful tutorial now!
Dimension and Dimensional Model (DM)
As defined above, a dimension refers to single attribute of same data type. For instance, year, month, day, date, hour, minute, second are all divisions of time attribute. Thus, DWH supports dimensional model that enables users to store and analyze information on each dimension. Unlike entity-relationship (ER) model, DM does not involve a relational database every time. This type of modeling technique is useful for end-user queries in DWH.
OLAP Cube
Typically similar to the shape of a cube, an OLAP cube is the multi-dimensional representation of data optimized for data warehouse. It is an array or matrix of facts and dimensions, wherein facts refer to measurements and dimensions are the entity (a category of information). It is useful when businesses require presenting a spreadsheet containing two or three dimensions. For instance, in the below image, the data is summarized with respect to Time (month), Department and Store value.
Hypercube is an OLAP data cube with more than three dimensions
ETL
Extract, Transform and Load, which refers to movement of data from one location to another.
Drill across
Allows data analysis across multiple dimensions
Drill up
Allows data analysis up to the parent attribute
Drill down
Allows data analysis down to the child attribute
Drill Through
To analyze data from an OLAP cube to the relational DB
Read these Top Trending Data Warehouse Interview Q’s that helps you grab high-paying jobs !
OLAP vs OLTP Tutorial Video
Data Mart (DM)
Aligned to a specific business perspective, data mart is a subsection of data warehouse. While DW stores detailed information of multiple subject, the latter focuses on storing data about only one subject like Finance, Sales and other departments. DM may concentrate on storing more summarized form than complete data and integrating information within specific set of source systems.