Top Answers to Informatica Interview Questions
Informatica PowerCenter is an ETL/Data Integration tool that has a wide range of applications. This tool allows users to connect and fetch data from different heterogenous sources and subsequently process the same.
For example, users can connect to an SQL Server Database or an Oracle Database, or both, and also integrate the data from both these databases to a third system.
Learn more about Business Objects v/s Informatica in this insightful blog!
There are many typical use cases of Informatica, but this tool is predominantly leveraged in the following:
- When organizations migrate from the existing legacy systems to new database systems
- When enterprises set up their data warehouse
- While integrating data from various heterogenous systems including multiple databases and file-based systems
- For data cleansing
|GUI for Development and Monitoring||Power Designer, Repository Manager, Workflow Designer, and Workflow Manager||DataStage Designer, Job Sequence Designer, and Director|
|Data Integration Solution||Step-by-step Solution||Project-based Integration Solution|
There are two ways to filter rows in Informatica.
- Source Qualifier Transformation: It filters rows while reading data from a relational data source. It minimizes the number of rows while mapping to enhance performance. Also, Standard SQL is used by the filter condition for executing in the database.
- Filter Transformation: It filters rows within a mapped data from any source. It is added close to the source to filter out the unwanted data and maximize performance. It generates true or false values based on conditions.
Go through the Informatica Course in London to get clear understanding of AWS.
|It is not possible to override the query.||It is possible to override the query.|
|Only the ‘=’ operator is available.||All operators are available for use.|
|Users cannot restrict the number of rows while reading relational tables.||Users can restrict the number of rows while reading relational tables.|
|It is possible to join tables based on Joins.||It behaves as Left Outer Join while connecting with the database.|
Depending upon the number of ports that are required, repositories can be created. In general, however, there can be any number of repositories.
There are four different types of lookup transformation:
- Relational or flat file lookup: Performs lookup on relational tables
- Pipeline lookup: Performs lookup on application sources
- Connected or unconnected lookup: While the connected lookup transformation receives data from source, performs lookup, and returns the result to the pipeline, the unconnected lookup happens when the source is not connected. It returns one column to the calling transformation.
- Cached or un-cached lookup: Lookup transformation can be configured to cache lookup data, or we can directly query the lookup source every time a lookup is invoked.
Learn more about Power BI in this Informatica training in New York to get ahead in your career!
A command task can be called as a pre- or post-session shell command for a session task. Users can run it as a pre-session command, a post-session success command, or a post-session failure command. Based on use cases, the application of shell command can be changed or altered.
Aggregator performance improves dramatically if records are sorted before passing to the aggregator and if the ‘sorted input’ option under Aggregator Properties is checked. The record set should be sorted on those columns that are used in the Group By operation. It is often a good idea to sort the record set in database level, e.g., inside a source qualifier transformation, unless there is a chance that the already sorted records from the source qualifier can again become unsorted before reaching the aggregator.
A target table can be updated without using ‘Update Strategy’. For this, we need to define the key in the target table in Informatica level and then we need to connect the key and the field we want to update in the mapping target. In the session level, we should set the target property as ‘Update as Update’ and check the ‘Update’ check box.
Let us assume, we have a target table ‘Customer’ with fields as ‘Customer ID’, ‘Customer Name’, and ‘Customer Address’. Suppose, we want to update ‘Customer Address’ without an Update Strategy. Then, we have to define ‘Customer ID’ as the primary key in Informatica level, and we will have to connect ‘Customer ID’ and ‘Customer Address’ fields in the mapping. If the session properties are set correctly as described above, then the mapping will only update the ‘Customer Address’ field for all matching customer IDs.
Basically, mapping parameters and mapping variables represent values in mappings and mapplets.
- Mapping parameters represent constant values that are defined before running a session.
- After creation, parameters appear in Expression Editor.
- These parameters can be used in source qualifier filter, user-defined joins, or for overriding.
- As opposed to mapping parameters, mapping variables can change values during sessions.
- The last value of a mapping variable is saved to the repository at the end of each successful session by the Integration Service. However, it is possible to override saved values with parameter files.
- Basically, mapping variables are used to perform incremental reads of data sources.
A surrogate key is basically an identifier that uniquely identifies modeled entities or objects in a database. Not being derived from any other data in the database, surrogate keys may or may not be used as primary keys. It is basically a unique sequential number. If an entity exists in the outside world and modeled within the database, or represents an object within the database, it is denoted by a surrogate key. In these cases, surrogate keys for specific objects or modeled entities are generated internally.
A session is nothing but a teaching set which is ought to be implemented to convert data from a source to a target. To carry out sessions, users need to leverage the session’s manager or use the pmcmd command. For combining sessions, in either a serial or a parallel manner, batch execution is used. Any number of sessions can be grouped in batches for migration.
Basically, incremental aggregation is the process of capturing changes in the source and calculating aggregations in a session. This process incrementally makes the integration service to update targets and avoids the process of calculating aggregations on the entire source.
Upon the first load, the table becomes:
On the next load, the data will be aggregated with the next session date.
Star schema is the simplest style of data mart schema in computing. It is an approach which is most widely used to develop data warehouses and dimensional data marts. It features one or more fact tables referencing to numerous dimension tables.
A logical arrangement of tables in a multidimensional database, snowflake schema is represented by centralized fact tables which are connected to multidimension tables. Dimension tables in a star schema are normalized using snowflaking. Once normalized, the resultant structure resembles a snowflake with the fact table at the middle. Low-cardinality attributes are removed, and separate tables are formed.
Fact Constellation Schema:
Fact constellation schema is a measure of online analytical processing (OLAP) and OLAP happens to be a collection of multiple fact tables sharing dimension tables and viewed as a collection of stars. It can be seen as an extension of the star schema.
We can delete duplicate rows from flat files by leveraging the sorter transformation and selecting the distinct option. Selecting this option will delete the duplicate rows.
From an Informatica developer’s perspective, some of the new features in Informatica Developer 9.1.0 are as follows:
- In the new version, lookup can be configured as an active transformation—it can return multiple rows on a successful match.
- Now, you can write SQL override on un-cached lookup also. Previously, you could do it only on cached lookup.
- Control over the size of your session log: In a real-time environment, you can control the session log file size or log file time.
- Database deadlock resilience feature: This will ensure that your session does not immediately fail if it encounters any database deadlock. It will retry the operation. You can configure the number of retry attempts.
First up, Informatica is a data integration tool, while Teradata is an MPP database with some scripting and fast data movement capabilities. Major advantages of Informatica over Teradata are:
1) It functions as a metadata repository for the organization’s ETL ecosystem. Informatica jobs (sessions) can be arranged logically into worklets and workflows in folders. It leads to an ecosystem which is easier to maintain and quicker for architects and analysts to analyze and enhance.
2) Job monitoring and recovery: Easy to monitor jobs using Informatica Workflow Monitor. Easier to identify and recover in the case of failed jobs or slow running jobs. It exhibits an ability to restart from failure row step.
3) Informatica Market Place: It is a one-stop shop for lots of tools and accelerators to make SDLC faster and improve application support.
4) Enables plenty of developers in the market with varying skill levels and expertise to interact.
5) Lots of connectors to various databases, including support for Teradata MLoad, TPump, FastLoad, and Parallel Transporter in addition to the regular (and slow) ODBC drivers
6) Surrogate key generation through shared sequence generators inside Informatica could be faster than generating them inside the database.
7) If the company decides to move away from Teradata to another solution, then vendors like Infosys can execute migration projects to move the data and change the ETL code to work with the new database quickly, accurately, and efficiently using automated solutions.
8) Pushdown optimization can be used to process the data in the database.
9) It has an ability to code ETL such that processing load is balanced between the ETL server and the database box—This is useful if the database box is ageing and/or in case the ETL server has a fast disk/large enough memory and CPU to outperform the database in certain tasks.
10) It has an ability to publish processes as web services.
OLAP or Online Analytical Processing is a specific category of software that allows users to analyze information from multiple database systems simultaneously. Using OLAP, analysts can extract and have a look at business data from different sources or points of view.
Types of OLAP:
- ROLAP: ROLAP or Relational OLAP is an OLAP server that maps multidimensional operations to standard relational operations.
- MOLAP: MOLAP or Multidimensional OLAP uses array-based multidimensional storage engines for multidimensional views on data. Numerous MOLAP servers use two levels of data storage representation to handle dense and sparse data sets.
- HOLAP: HOLAP of Hybrid OLAP combines both ROLAP and MOLAP for faster computation and higher scalability of data.
When a mapplet is used in a mapping, the Designer allows users to set target load order for all sources that pertain to the mapplet. In the Designer, users can set the target load order in which Integration Service sends rows to targets within the mapping. A target load order group is basically a collection of source qualifiers, transformations, and targets linked together in a mapping. The target load order can be set to maintain referential integrity while operating on tables that have primary and secondary keys.
Steps to set the target load order:
Step 1: Create a mapping that contains multiple target load order groups
Step 2:Click on Mappings and then select Target Load Plan
Step 3: The Target Load Plan dialog box lists all Source Qualifier transformations with targets that receive data from them
Step 4: Select a source qualifier and click on the Up and Down buttons to change the position of the source qualifier
Step 5: Repeat Steps 3 and 4 for other source qualifiers if you want to reorder them
Step 6: Click on OK after you are done
If we are required to perform ETL operations, we need source data, target tables, and required transformations. Target Designer in Informatica allows us to create target tables and modify the pre-existent target definitions.
Target definitions can be imported from various sources including Flat Files, Relational Databases, XML Definitions, Excel Worksheets, etc.
For opening the Target Designer, click on Tools menu and select the Target Designer option.
The advantages of Informatica as a Data Integration tool are many.
- It facilitates the effective and efficient communication and transformation of data between different and disparate sources.
- Informatica is faster, cheaper, and easy to learn.
- Monitoring jobs becomes easy with it, and so do recovering failed jobs and pointing out slow jobs.
- It has many robust features including database information, data validation, migration of projects from one database to another, etc.
- Repository Manager: An administrative tool which is used to manage repository folders, objects, groups, etc.
- Administration Console: Used to perform service tasks
- PowerCenter Designer: Contains several designing tools including source analyzer, target designer, mapplet designer, mapping manager, etc.
- Workflow Manager: Defines a set of instructions that are required to execute mappings
- Workflow Monitor: Monitors workflows and tasks
Interested in learning Informatica? Click here to learn more in this Informatica Training in Sydney!
Available in the Workflow Manager, sessions are configured by creating a session task. Within a mapping program, there can be multiple sessions which can be either reusable or non-reusable.
Properties of sessions:
- Session tasks can run concurrently or sequentially, as per requirement.
- They can be configured to analyze performance.
- Sessions include log files, test load, error handling, commit interval, target properties, etc.
- Aggregator transformation
- Expression transformation
- Normalizer transformation
- Rank transformation
- Filter transformation
- Joiner transformation
- Lookup transformation
- Stored procedure transformation
- Sorter transformation
- Update Strategy transformation
- XML Source Qualifier transformation
- Router transformation
- Sequence Generator transformation
The features of connected lookup are:
- It takes in the input directly from the pipeline.
- Actively participates in data flow and both dynamic and static cache are used.
- It caches all lookup columns and returns default values as the output when lookup condition does not match.
- It is possible to return more than one column value to the output port.
- It supports user-defined default values.
A structure that consists of a group of a few junk attributes such as random codes or flags, Junk Dimensions form a framework to store related codes with respect to a specific dimension at a single place, instead of creating multiple tables for the same.
Become Master of Informatica by going through this online Informatica course in Toronto.
Be it active or connected, Rank transformation is used to sort and rank a set of records either from top or bottom. It is also used to select data with the largest or smallest numeric value based on specific ports.
Available in both passive and connected configurations, the Sequence Generator transformation is responsible for the generation of primary keys, or a sequence of numbers for calculations or processing. It has two output ports that can be connected to numerous transformations within a mapplet. These ports are:
- NEXTVAL: This can be connected to multiple transformations for generating a unique value for each row or transformation.
- CURRVAL: This port is connected when NEXTVAL is already connected to some other transformation within the mapplet.
When invoked, the INITCAP function capitalizes the first character of each word in a string and converts all other characters to lowercase.
When the data of organization is developed at a single point of access it is known as enterprise data warehousing.
Learn more about Informatica in Informatica tutorial.
Database have a group of useful information which is brief in size as compared to data warehouse whereas in data warehouse their are set of every kind of data whether it is useful or not and data is extracted as the the requirement of customer.
Read this this blog to get a clear understanding of Data Warehousing
Domain is the term in which all interlinked relationship and nodes are under taken by sole organizational point.
Repository server mainly guarantees the repository reliability and uniformity while powerhouse server tackles the execution of many procedures between the factors of server’s database repository.
Learn Complete Informatica at Hyderabad in 24 Hrs.
It mainly depends upon the number of ports we required but as general there can be any number of repositories.
The main advantage of partitioning a session is to get better server’s process and competence. Other advantage is it implements the solo sequences within the session.
With the help of command task at session level we can create indexes after the load procedure.
Are you interested in learning Informatica course in Bangalore from Experts?
Session is a teaching group that requires to be to transform information from source to a target.
We can have any number of session but it is advisable to have lesser number of session in a batch because it will become easier for migration.
At the time values alter during the session’s implementation it is known as mapping variable whereas the values that don’t alter within the session implementation is called as mapping parameters.
Interested in learning Informatica? Well, we have the comprehensive Informatica Online Training Course.
The features of complex mapping are:
Many numbers of transformations
tricky needscompound business logic
With the help of debugging option we can identify whether mapping is correct or not without connecting sessions.
Yes, we can use mapping parameter or variables into any other reusable transformation because it doesn’t have any mapplet.
If extra memory is needed aggregator provides extra cache files for keeping the transformation values. It also keeps the transitional value that are there in local buffer memory.
The transformation that has entrance right to RDBMS Is known as lookup transformation.
The dimensions that are used for playing diversified roles while remaining in the same database domain are known as role playing dimensions.
We can access repository reports by using metadata reporter. No need of using SQL or other transformation as it is a web app.
The types of metadata which is stored in repository are Target definition, Source definition, Mapplet, Mappings, Transformations.
Transfer of data take place from one code page to another keeping that both code pages have the same character sets then data failure cannot occur.
At a time we can validate only one mapping. Hence mapping cannot be validated simultaneously.
It is different from expression transformation in which we can do calculations in set but here we can do aggregate calculations such as averages, sum, etc.
It is used for performing non aggregated calculations. We can test conditional statements before output results move to the target tables.
Filter transformation is a way of filtering rows in a mapping. It have all ports of input/output and the row which matches with that condition can only pass by that filter.
It combines two associated mixed sources located in different locations while a source qualifier transformation can combine data rising from a common source.
Lookup transformation is used for maintaining data in a relational table through mapping. We can use multiple lookup transformation in a mapping.
Go through our Informatica Training Video Course to learn more.
It is a different input group transformation that is used to combine data from different sources.
The incremental aggregation is done whenever a session is developed for a mapping aggregate.
In connected lookup inputs are taken straight away from various transformations in the pipeline it is called connected lookup. While unconnected lookup doesn’t take inputs straight away from various transformations, but it can be used in any transformations and can be raised as a function using LKP expression.
A mapplet is a recyclable object that is using mapplet designer.
This transformation is used various times in mapping. It is divest from other mappings which use the transformation as it is stored as a metadata.
Whenever the row has to be updated or inserted based on some sequence then update strategy is used. But in this condition should be specified before for the processed row to be tick as update or inserted.
When it faces DD_Reject in update strategy transformation then it sends server to reject files.
It is a substitute for the natural prime key. It is a unique identification for each row in the table.
In order to perform session partition one need to configure the session to partition source data and then installing the Informatica server machine in multifold CPU’s.
Errors log, Bad file, Workflow low and session log namely files are created during the session rums.
It is a mass of instruction that guides power center server about how and when to move data from sources to targets.
This task permits one or more than one shell commands in UNIX or DOS in windows to run during the workflow.
This task can be used anywhere in the workflow to run the shell commands.
Command task can be called as the pre or post session shell command for a session task. One can run it as pre session command r post session success command or post session failure command.
Predefined event are the file-watch event. It waits for a specific file to arrive at a specific location.
User defined event are a flow of tasks in the workflow. Events can be developed and then raised as need comes.
The group of directions that communicates server about how to implement tasks is known as work flow.
The different tools in workflow manager are:
‘CONTROL M’ is the third party tool for scheduling purpose other than workflow manager.
It is a process by which multi-dimensional analysis occurs.
Take charge of your career by going through our professionally designed Informatica Certification Course.
Different types of OLAP are ROLAP, HOLAP< DOLAP.
Worklet is said when the workflow tasks are collected in a group. It includes timer, decision, command, event wait, etc.
With the help of target designer we can create target definition
In workflow monitor we can find throughput option.
Right click on session, then press on get run properties and under source/target statistics we can find this option.
It is specified on the criteria of source qualifiers in a mapping. If there are many source qualifiers attached to various targets then we can entitle order in which informatica loads data in targets.
Informatica is a tool, supporting all the steps of Extraction, Transformation and Load process. Now days Informatica is also being used as an Integration tool.Informatica is an easy to use tool. It has got a simple visual interface like forms in visual basic. You just need to drag and drop different objects (known as transformations) and design process flow for Data extraction transformation and load.
These process flow diagrams are known as mappings. Once a mapping is made, it can be scheduled to run as and when required. In the background Informatica server takes care of fetching data from source, transforming it, & loading it to the target
Aggregator performance improves dramatically if records are sorted before passing to the aggregator and “sorted input” option under aggregator properties is checked. The record set should be sorted on those columns that are used in Group By operation.It is often a good idea to sort the record set in database level e.g. inside a source qualifier transformation, unless there is a chance that already sorted records from source qualifier can again become unsorted before reaching aggregator.
Informatica Lookups can be cached or un-cached (No cache). And Cached lookup can be either static or dynamic. A static cache is one which does not modify the cache once it is built and it remains same during the session run. On the other hand, A caches refreshed during the session run by inserting or updating the records in cache based on the incoming source data.
By default, Informatica cache is static cache.A lookup cache can also be divided as persistent or non-persistent based on whether Informatica retains the cache even after the completion of session run or deletes it.
A target table can be updated without using ‘Update Strategy’. For this, we need to define the key in the target table in Informatica level and then we need to connect the key and the field we want to update in the mapping Target. In the session level, we should set the target property as “Update as Update” and check the “Update” check-box.Let’s assume we have a target table “Customer” with fields as “Customer ID”, “Customer Name” and “Customer Address”.
Suppose we want to update “Customer Address” without an Update Strategy. Then we have to define “Customer ID” as primary key in Informatica level and we will have to connect Customer ID and Customer Address fields in the mapping. If the session properties are set correctly as described above, then the mapping will only update the customer address field for all matching customer IDs.
From an Informatica developer’s perspective, some of the new features in Informatica 9.x are as follows:Now Lookup can be configured as an active transformation – it can return multiple rows on successful match
Now you can write SQL override on un-cached lookup also. Previously you could do it only on cached lookup
You can control the size of your session log. In a real-time environment you can control the session log file size or time
Database deadlock resilience feature – this will ensure that your session does not immediately fail if it encounters any database deadlock, it will now retry the operation again. You can configure number of retry attempts.
First up, Informatica is a data integration tool, while Teradata is a MPP database with some scripting (BTEQ) and fast data movement (mLoad, FastLoad, Parallel Transporter, etc) capabilities.Informatica over Teradata1) Metadata repository for the organization’s ETL ecosystem.
Informatica jobs (sessions) can be arranged logically into worklets and workflows in folders.
Leads to an ecosystem which is easier to maintain and quicker for architects and analysts to analyze and enhance.2) Job monitoring and recovery-
Easy to monitor jobs using Informatica Workflow Monitor.
Easier to identify and recover in case of failed jobs or slow running jobs.
Ability to restart from failure row / step.3) InformaticaMarketPlace- one stop shop for lots of tools and accelerators to make the SDLC faster, and improve application support.4) Plenty of developers in the market with varying skill levels and expertise5) Lots of connectors to various databases, including support for Teradata mLoad, tPump, FastLoad and Parallel Transporter in addition to the regular (and slow) ODBC drivers.Some ‘exotic’ connectors may need to be procured and hence could cost extra.Examples – Power Exchange for Facebook, Twitter, etc which source data from such social media sources.6) Surrogate key generation through shared sequence generators inside Informatica could be faster than generating them inside the database.7) If the company decides to move away from Teradata to another solution, then vendors like Infosys can execute migration projects to move the data, and change the ETL code to work with the new database quickly, accurately and efficiently using automated solutions.8) Pushdown optimization can be used to process the data in the database.9) Ability to code ETL such that processing load is balanced between ETL server and the database box – useful if the database box is ageing and/or in case the ETL server has a fast disk/ large enough memory & CPU to outperform the database in certain tasks.10) Ability to publish processes as web services.Teradata over Informatica
- Cheaper (initially) – No initial ETL tool license costs (which can be significant), and lower OPEX costs as one doesn’t need to pay for yearly support from Informatica Corp.
- Great choice if all the data to be loaded are available as structured files – which can then be processed inside the database after an initial stage load.
- Good choice for a lower complexity ecosystem
- Only Teradata developers or resources with good ANSI/Teradata SQL / BTEQ knowledge required to build and enhance the system.
- What is Informatica ETL Tool?
Informatica ETL tool is market leader in data integration and data quality services. Informatica is successful ETL and EAI tool with significant industry coverage.ETL refers to extract, transform, load. Data integration tools are different from other software platforms and languages.
They have no inbuilt feature to build user interface where end user can see the transformed data. Informatica ETL tool “power center” has capability to manage, integrate and migrate enterprise data.
Its GUI tool, Coding in any graphical tool is generally faster than hand code scripting.
Can communicate with all major data sources (mainframe/RDBMS/Flat Files/XML/VSM/SAP etc).
Can handle vary large/huge data very effectively.
User can apply Mappings, extract rules, cleansing rules, transformation rules, aggregation logic and loading rules are in separate objects in an ETL tool. Any change in any of the object will give minimum impact of other object.
Reusability of the object (Transformation Rules)
Informatica has different “adapters” for extracting data from packaged ERP applications (such as SAP or PeopleSoft).
Availability of resource in the market.
Can be run on Window and Unix environment.
InformaticaPowerCenter is one of the Enterprise Data Integration products developed by Informatica Corporation. InformaticaPowerCenter is an ETL tool used for extracting data from the source, transforming and loading data in to the target.The Extraction part involves understanding, analyzing and cleaning of the source data.
Transformation part involves cleaning of the data more precisely and modifying the data as per the business requirements.
The loading part involves assigning the dimensional keys and loading into the warehouse.
- What is the need of an ETL tool?
The problem comes with traditional programming languages where you need to connect to multiple sources and you have to handle errors. For this you have to write complex code. ETL tools provide a ready-made solution for this. You don’t need to worry about handling these things and can concentrate only on coding the requirement part.
Gain expertise for Informatica Interview with our Informatica Training