Flat 10% & upto 50% off + 10% Cashback + Free additional Courses. Hurry up

Top DataStage Interview Questions And Answers

Datastage Interview Questions
In this DataStage interview questions you will get all the information to clear your DataStage interview with care. As part of this interview questions and answers you will learn the difference between DataStage &Informatica, what is DataStage Routine, Join, Merge, Lookup Stage comparison, quality state, job control, performance tuning of jobs, repository table, data type conversions and more. Learn DataStage from Intellipaat DataStage training and excel in your career.

Interested in Learning DataStage? Click Herea

Top Answers to DataStage Interview Questions

1. DataStage Characteristics
Support for Big Data HadoopAccess Big Data on a distributed file system, JSON support & JDBC integrator
Ease of useImprove speed, flexibility, & efficacy for data integration
DeploymentOn-premise or cloud as the need dictates
2. Explain What is IBM DataStage?

Datastage is an extract, transform and load tool that is part of the IBM Infosphere suite. It is a tool that is used for working with large data warehouses and data marts for creating and maintaining such a data repository.

Learn more about DataStage in this insightful blog post.

3. How is a DataStage source file filled?

We can develop an SQL query or we can use a row generator extract tool through which we can fill the source file in Data Stage.


Check out this video on DataStage Tutorial

Learn for free ! Subscribe to our youtube Channel.

4. How is merging done in DataStage?

Merging is done when two or more tables are expected to be combined based on their primary key column. This is the basis for merging in Data Stage.

5. What is a data file and a descriptor file?

Both these files are as the name indicates are serving different purpose in Data Stage. The descriptor files contain all the information or description while the data file is the one that just contains the data.

Interested in learning DataStage? We have the in-depth DataStage Training to give you a head start in your career!

6. How is DataStage different from Informatica?

Data stage and Informatica are both powerful ETL tools but there are a few difference between the two tools. Data stage has the parallelism and partition concept for node configuration whereas the Informatica tool there is not support for parallelism in node configuration. Data stage is simpler to use as compared to Informatica.

7. What is a Routine in Data Stage?

The DataStage manager defines a collection of functions within this tool which is called as a Routine. There are basically there types of Routines in DataStage namely Job Control Routine, Before/After Sub-routine, Transform Function.

8. What is the process for removing duplicates in DataStage?

The duplicates within the data stage can be removed using the sort function. While running the sort function you need specify for the option which allows for duplicates by setting it to false

9. What is the difference between Join, Merge & Lookup stage?

The fundamental difference between these three stages is the amount of memory they take. Other than that how they treat the input requirement and the various records is also a differentiating factor. So based on memory usage, the Lookup stage uses a very less amount of memory. Both Lookup and Merge use a huge amounts of memory.

These are described in DataStage’s online reference guide and on DataStage community.

10. What is the quality state in DataStage?

The quality state is used for cleansing the data with the DataStage tool. It is a client server software tool that is provided as part of the IBM Information server.

11. What is job control in DataStage?

This tool is used for control the job or executing multiple jobs in a parallel manner. It is deployed using the Job Control Language within the IBM data stage tool.

12. How to do DataStage jobs performance tuning?

First you have to select the right configuration files. Then you need to select the right partition and buffer memory. You have to handle the sorting of data and handling null time values. Try to use the modify, copy or filter instead of the transformer. Reduce the propagation of unnecessary metadata between the various stages.

13. What is a repository table in DataStage?

The repository is another name for a data warehouse. It can be centralized or a distributed one. The repository table is used for answering the queries like ad hoc, historical, analytical or complex queries.

Learn more about dataStage in DataStage tutorial.

14. Compare the massive parallel processing and symmetric multiprocessing?

In the process of massive parallel processing  many of the computers are present in the same chasis. While in the symmetric multiprocessing there are many processors that a share the same hardware resources. The massive parallel processing is called as shared nothing as there is no aspect between the various computers. On the other hand the massive parallel processing is faster than the symmetric multiprocessing.

15. How can you kill the DataStage job?

To kill a DataStage job you need to first kill the individual processing ID so that this ensures that the DataStage is killed.

16. How do you compare the validated OK and Compiled processes in DataStage?

The Compiled step ensures tha the important stage parameters are mapped and these are correct so this creates an executable job. Whereas in the Validated OK we make sure that the connections are valid.

17. Explain the feature of data type conversion in DataStage?

If you want to do data conversion in DataStage then you can use the data conversion function. For this to be successfully executed you need to ensure that the input or the output to and from the operator is the same and the record schema needs to be compatible with the operator.

18. What is the significance of the exception activity in DataStage?

Whenever there is an unfamiliar error that is happening when we are executing the job sequencer, during this time all the stages after the exception activity are run. So this makes the exception activity so important in the DataStage.

Learn how the DataStage Training Videos can take your career to the next level!

19. What are the various types of Lookups in DataStage?

There are different types of Lookups in DataStage. These include the Normal, Sparse, Range and Caseless Lookup in DataStage.

20. When do you use a parallel job and a server job?

Using the parallel job or a server job depends on the processing need, functionality, time to implement and the cost. The server job usually runs on a single node, it executes on a DataStage Server Engine and handles small volumes of data. The Parallel job runs on multiple nodes, it executes on a DataStage Parallel Engine and handles large volumes of data.

21. What is usage analysis in DataStage?

If you want to whether a certain job is part of the sequence then you right click in the Manager on the job and then choose the Usage Analysis.

22. How to find in a sequential file the number of rows?

For counting the number of rows, we should use the @INROWNUM variable.

23. What is the difference between a Sequential file and a Hash file?

The Hash file is based on a Hash algorithm and it can used with a key value. The sequential file on the other hand does not have any key column value. The Hash file can be used as a reference for a Look Up while a Sequential file cannot be used for Look Up. Due to the presence fo the Hash key, the Hash file is easier to search than a Sequential file.

24. How do you clean a DataStage Repository?

For cleaning a DataStage Repository you need to go to the DataStage Manager and go to the Job in the Menu bar and go to the Clean Up Resources. If you want to further remove the logs  then you need to go to the respective job and clean the log files.

25. How do you call a Routine in DataStage?

The Routines are stored in the Routine branch of the DataStage Repository. This is where you can create, view or edit all the Routines. The Routines in DataStage could be among the following: Job Control Routine, Before-after Sub-Routine, Transform Function.

26. What is the difference between an Operational DataStage and a Data Warehouse?

An Operational DataStage can be considered as a staging area, for real-time analysis, for user processing. Thus it is a temporary repository. Whereas the data warehouse is used for long-term data storage needs and has the complete data of the entire business.

27. What does NLS mean in DataStage?

NLS means National Language Support in DataStage. This means you can use this IBM DataStage tool in various languages like multi-byte character languages like Chinese or Japanese too. You can read and write in any language and process it as per the requirement.

"0 Responses on Top DataStage Interview Questions And Answers"

    Leave a Message

    100% Secure Payments. All major credit & debit cards accepted Or Pay by Paypal.

    Sales Offer

    Sign Up or Login to view the Free Top DataStage Interview Questions And Answers.