What is ETL?
ETL in computing stands for Extraction, Transformation and Loading. These terms are mainly used in data warehousing. In this ETL process, data is extracted from different sources, transformed to a required format, and finally loaded to a target. The usage of ETL cannot be described with any particular example as it varies from situation to situation.
|ETL source file||Database|
|ETL destination file||Data Warehouse|
|Parallel ETL tasks||Possible|
Now, we will discuss about one of the most important concept of ETL that is ETL lookup Stage.
ETL Lookup Stage:
ETL lookup stage enables us to evaluate data with various options, but it cannot be used in the case of a giant database as data can be analyzed only when it is in memory. Although, it is the more preferable one as compared to join and merge stages. Additionally, the lookup stage also permits the condition-based data analysis.
Properties of the Lookup Stage:
- It is a processing stage.
- It works on memory by reading data from a dataset.
- Direct lookup on DB2 and Oracle can also be performed.
- Lookup is also used for the validation of rows. It rejects a row if it doesn’t not have a corresponding entry.
- Lookup stage can have single or multiple reference links, but a single input link and a single output link.
This stage can be worked with three options:
- Equality Match: This is also known as the normal look. Here, data is looked at for the exact case-sensitive match.
- Casesless Match: This function looks for values with no importance to case sensitivity.
- Range Match: With the help of the range function, a lookup stage can be configured to look up for a range of values between two lookup columns.
Lookup stage is ideal for small amounts of data as compared to join and merge stages, as it uses memory to process data. However, it is not possible to use the lookup stage for a huge amount of data. For databases or datasets with huge values, a join stage or a merge stage is used. The drawbacks of the join stage include the fact that it cannot reject a row that is not having a corresponding entry; whereas, the merge stage can help us reject such values.
(ETL) Lookup toolbar has following buttons:
- Stage properties: This option helps define various stage properties like stage name, link properties, etc.
- Conditions: All conditions can be defined with the help of this button.
- Show all selected relations: This button helps us see all selected relations.
- Cut/Copy/Paste: These include general functions.
- Load/Save column definitions: This specific option is for loading and saving the column data.
- Find/Replace: They do the general functions of Find or Replace.
- Column auto match: It helps match columns automatically depending on mapping.
- Input/Output link execution order: This one displays the links used for execution and helps in reordering.
Apart from the lookup toolbar, we also have some editor components.
Link Area shows the details of input and output links.
Metadata Area shows the metadata details of column, input and output links.
Shortcut menus assist in opening all toolbar functions at one place. It also helps validate and clear details and append a new column to a selected link. All lookup stage conditions can be checked using this shortcut menu.