• Articles
  • Tutorials
  • Interview Questions

DWH External/Unstructured Data in Warehouse

Data Warehousing and the Unstructured Data

As we have discussed so far, it is clear that most enterprises build data warehouse using the data available within the internal source systems. Besides available internally in the organization, this data is structured and has been configured in a regular format.
However, we could sometimes encounter chunks of data that is useful for the organization but not available within. This data is termed as External data in the warehouse, which is found unstructured and in unpredictable format. It is always advised to store such external data in the data warehouse as it could be helpful for business analysis and operations. If it isn’t stored in DWH, several problems arise.

Data Warehouse Tutorial Video

External data in data warehouse

There are primarily two types of external data

  • External data records gathered by some external source like supermarket, medicine store, clothing store, etc.
  • Data from random articles and reports available in the internet.

Problems with external data

Several issues that arise due to the use and storage of external data are-

  • Frequency of availability: There is no fixed pattern or appearance of the external data and thus, it must be constantly monitored to ensure capturing the appropriate data in the warehouse.
  • Totally Undisciplined data: Since external data is unformatted and unstructured, certain structuring functionalities need to be implemented to make it meaningful and usable in DWH. The external data is passed through simple checks like domain check, and made compatible with the internally available data.
  • Unpredictability of data: external data may come from any source at any time causing an irregularity and uncertainty of the available data.

Metadata and External Data

Metadata in a warehouse plays a major role when any external data is identified.

meta data

Metadata provides users to determine information about the external data. There is another type of data associated with metadata – notification data, which alerts and notifies users about the data they are interested in the form of a file. When the external data enters the data warehouse and in the metadata, a check is placed to see who is interested in that external one. The system notifies that persona and then the external data is captured in the warehouse.

meta data with notification file

Storing the External Data

storing the external data

Archiving External Data

Since every information after certain time becomes uninteresting and is not worth keeping. Similarly, external data must be archived by deciding the useful lifetime of the data. However, we still face the dilemma  whether  that external data needs to removed or archived. The rule says that the external data must be removed from the warehouse and stored in less-expensive storage files. Consequently, the metadata reference to external data is also updated to indicate the new storage area, and thus, it is left there in the metadata.

Become a Business Intelligence Architect

Comparison between Internal and External data

One of the most meaningful actions attempted on external data is comparing it with internal data over a period of time. This comparison will enable business people to gain insights into unique components, which could never be possible without otherwise. For instance, you can compare personal activities and trends with the global trends.  Mostly, this comparison is done on a common key.
Storing external data in DWH provides some relevant information that is available outside the company and  helps businesses to run and update their enterprise processes accordingly.

Course Schedule

Name Date Details
Data Warehousing Training 23 Nov 2024(Sat-Sun) Weekend Batch View Details
30 Nov 2024(Sat-Sun) Weekend Batch
07 Dec 2024(Sat-Sun) Weekend Batch

About the Author

Data Analyst & Machine Learning Associate

As a Data Analyst and machine learning associate, Nishtha combines her analytical skills and machine learning knowledge to interpret complicated datasets. She is also a passionate storyteller who transforms crucial findings into gripping tales that further influence data-driven decision-making in the business frontier.