• Articles
  • Tutorials
  • Interview Questions

DWH External/Unstructured Data in Warehouse

Data Warehousing and the Unstructured Data

As we have discussed so far, it is clear that most enterprises build data warehouse using the data available within the internal source systems. Besides available internally in the organization, this data is structured and has been configured in a regular format.
However, we could sometimes encounter chunks of data that is useful for the organization but not available within. This data is termed as External data in the warehouse, which is found unstructured and in unpredictable format. It is always advised to store such external data in the data warehouse as it could be helpful for business analysis and operations. If it isn’t stored in DWH, several problems arise.

Data Warehouse Tutorial Video

External data in data warehouse

There are primarily two types of external data

  • External data records gathered by some external source like supermarket, medicine store, clothing store, etc.
  • Data from random articles and reports available in the internet.

Problems with external data

Several issues that arise due to the use and storage of external data are-

  • Frequency of availability: There is no fixed pattern or appearance of the external data and thus, it must be constantly monitored to ensure capturing the appropriate data in the warehouse.
  • Totally Undisciplined data: Since external data is unformatted and unstructured, certain structuring functionalities need to be implemented to make it meaningful and usable in DWH. The external data is passed through simple checks like domain check, and made compatible with the internally available data.
  • Unpredictability of data: external data may come from any source at any time causing an irregularity and uncertainty of the available data.

 

Metadata and External Data

Metadata in a warehouse plays a major role when any external data is identified.

meta data

Metadata provides users to determine information about the external data. There is another type of data associated with metadata – notification data, which alerts and notifies users about the data they are interested in the form of a file. When the external data enters the data warehouse and in the metadata, a check is placed to see who is interested in that external one. The system notifies that persona and then the external data is captured in the warehouse.

meta data with notification file

Storing the External Data

storing the external data

Go through the Data Warehouse tutorial to learn Star schema in Data Warehouse, Fact Table, Dimension Table, features of Star Schema and its benefits.

Archiving External Data

Since every information after certain time becomes uninteresting and is not worth keeping. Similarly, external data must be archived by deciding the useful lifetime of the data. However, we still face the dilemma  whether  that external data needs to removed or archived. The rule says that the external data must be removed from the warehouse and stored in less-expensive storage files. Consequently, the metadata reference to external data is also updated to indicate the new storage area, and thus, it is left there in the metadata.

Become a Business Intelligence Architect

Comparison between Internal and External data

One of the most meaningful actions attempted on external data is comparing it with internal data over a period of time. This comparison will enable business people to gain insights into unique components, which could never be possible without otherwise. For instance, you can compare personal activities and trends with the global trends.  Mostly, this comparison is done on a common key.
Storing external data in DWH provides some relevant information that is available outside the company and  helps businesses to run and update their enterprise processes accordingly.
Learn more about Business Intelligence in this insightful blog now!

About the Author

Data Analyst & Machine Learning Associate

As a Data Analyst & Machine Learning Associate, Nishtha uses a combination of her analytical skills and machine learning knowledge to interpret complicated datasets. She is a passionate storyteller who transforms crucial findings into gripping tales that further influence data-driven decision-making in the business frontier.