Top Data Warehouse Interview Questions And Answers
Top Data Warehouse Interview Questions And Answers
Top Answers to Data Warehouse Interview Questions
|Type of data||Relational or object-oriented data||Large volume with multiple data types|
|Data operations||Transaction processing||Data modeling and analysis|
|Dimensions of data||Two dimensional||Multi-dimensional|
|Data design||ER based and application-oriented||Star/Snowflake schema and subject-oriented|
|Size of data||Small ( in GB)||Large ( in TB)|
|Functionality||High availability & performance||High flexibility and user autonomy|
Purpose of cluster analysis:-
• Ability to deal with different kinds of attributes
• Discovery of clusters with attribute shape
• High dimensionality
• Ability to deal with noisy
• Agglomerative hierarchical method consists of objects in which each object creates its own clusters and these clusters are grouped together to create a large cluster. It defines a process of continuous merging until all the single clusters are merged together into a complete big cluster that will consist of all the objects of child clusters. However, in divisive clustering, the parent cluster is divided into smaller cluster and it keeps on dividing until each cluster has a single object to represent.Go through this Data Warehousing Tutorial to get a clear understanding of Data Warehousing.
This representation allows large dataset to be created and operated successfully. The method finds the clusters that are used in the dataset using two phase algorithm.
• The first phase consists of the graph partitioning that allows the clustering of the data items into large number of sub-clusters.
• Second phase uses an agglomerative hierarchical clustering algorithm to search for the clusters that are genuine and can be combined together with the sub-clusters that are produced.Interested in learning Data Warehousing? Well, we have the in-depth data modeling courses Course to give you a head start in your career.
• Virtual data warehousing is a ‘de facto’ information system strategy for supporting analytical decision making. It is one of the best ways for translating raw data and presenting it in the form that can be used by decision makers. It provides semantic map – which allows the end user for viewing as virtualized.
• A form of repository of captured transactional data is known as ‘active data warehousing’. Using this concept, trends and patterns are found to be used for future decision making. Active data warehouse has a feature which can integrate the changes of data while scheduled cycles refresh. Enterprises utilize an active data warehouse in drawing the company’s image in statistical manner.
• A snapshot is a process of knowing about the activities performed. It is stored in a report format from a specific catalog. The report is generated soon after the catalog is disconnected.
• XMLA is an industry standard for accessing data in analytical systems, such as OLAP. It is based on XML, SOAP and HTTP.
• XMLA specifies MDXML as the query language. In the XMLA 1.1 version, the only construct in MDXML is an MDX statement enclosed in the tag
• In ODS, data can be scrubbed, resolved for redundancy and checked for compliance with the corresponding business rules. This data store can be used for integrating disparate data from multiple sources so that business operations, analysis and reporting can be carried while business operations occur. This is the place where most of the data used in current operation is housed before it’s transferred to the data warehouse for longer term storage or archiving.
• An ODS is designed for relatively simple queries on small amounts of data (such as finding the status of a customer order), rather than the complex queries on large amounts of data typical of the data warehouse.
• An ODS is similar to your short term memory where it only stores very recent information. On the contrary, the data warehouse is more like long term memory storing relatively permanent information.Give your career a big boost by going through our data modeling videos now!
e.g.Employee performance is a very high level of granularity. Employee_performance_daily, employee_perfomance_weekly can be considered lower levels of granularity.
The granularity is the lowest level of information stored in the fact table. The depth of data level is known as granularity. In date dimension, the level could be year, month, quarter, period, week, day of granularity.
The process consists of the following two steps:
– Determining the dimensions that are to be included
– Determining the location to locate the hierarchy of each dimension of informationThe above factors of determination will be resent to the requirements.
– Tail raid data representation is provided by a view to access data from its table.
– It has logical structure that does not occupy space.
– Changes get affected in corresponding tables.
– Pre-calculated data persists in materialized view.
– It has physical data space occupation.
– Changes will not get affected in corresponding tables.
• A single dimension is formed by lumping a number of small dimensions. This dimension is called a junk dimension. Junk dimension has unrelated attributes. The process of grouping random flags and text attributes in dimension by transmitting them to a distinguished sub dimension is related to junk dimension.
Three types of SCDs are used in data warehousing, which are defined as:
– SCD1: It is a record that is used to replace the original record even there is only one record existing in the database. The current data will be replaced and the new data will take its place.
– SCD2: It is the new record file that is added to the dimension table. This record exists in the database with the current data and previous data that is stored in the history.
– SCD3: This uses the original data that is modified to the new data. This consists of two records: one record that exist in the database and another record that will replace the old database record with the new information.
MOLAP: Multi-dimensional OLAP
Data is stored in a multidimensional cube. The storage is not in the relational database, but in proprietary formats (one example is PowerOLAP’s .olp file). MOLAP products can be compatible with Excel, which can make data interactions easy to learn.
ROLAP: Relational OLAP
ROLAP products access a relational database by using SQL (structured query language), which is the standard language that is used to define and manipulate data in an RDBMS. Subsequent processing may occur in the RDBMS or within a mid-tier server, which accepts requests from clients, translates them into SQL statements, and passes them on to the RDBMS.
It may happen that in a table, some columns are important and we need to track changes for them i.e., capture the historical data for them whereas in some columns even if the data changes, we do not have to bother.
For such tables, we implement Hybrid SCDs, where in some columns are Type 1 and some are Type 2.
In case of ActionForm class, we can develop validate() method. This method will return the ActionErrors object. In this method we can write the validation code. If this method returns null or ActionErrors with size=0, the web container will call execute() as part of the Action class.
• If it returns size > 0, it will not call the execute() method. It will rather execute the jsp, servlet or html file as value for the input attribute as part of the attribute in struts-config.xml file.
• A conformed dimension is a dimension that has exactly the same meaning and content when being referred from different fact tables. A conformed dimension can refer to multiple tables in multiple data marts within the same organization.
• Kimball views data warehousing as a constituency of Data marts. Data marts are focused on delivering business objectives for departments in the organization. And the data warehouse is a conformed dimension of the data marts. Hence, a unified view of the enterprise can be obtained from the dimension modeling on a local departmental level.
• Inmon explains in creating a data warehouse on a subject-by-subject area basis. Hence, the development of the data warehouse can start with data from the online store. Other subject areas can be added to the data warehouse as their needs arise. Point-of-sale (POS) data can be added later if management decides it is necessary.
• Hence, Kimball–First Data Marts–Combined way —Data warehouse
Inmon—First Data warehouse–Later—-Data martsTake charge of your career by going through our professionally designed Data Warehouse Certifications Course.