Top Datastage Interview Questions and Answers
IBMâs DataStage uses graphical notations to construct data integration solutions. It offers extensive enterprise connectivity, right-time data integration, market-leading scalability and flexibility, integrated design interfaces and advanced development maintenance
DataStage is a tool by IBM that is used for designing, developing and executing applications to fill the data into data warehouses. It can fill in multiple tables in data marts. This is done by data extraction from databases from windows servers. The tool is one of the most powerful ETL tools with several versions including MVS Edition, Server Edition and Enterprise Edition. Companies use the version according to their requirement.
Datastage Interview Questions and Answers
Interviewers can question you about the details of the tool if you are going to work on the tool. Letâs go through the list of most asked Datastage interview questions.
Q1. What are the characteristics of DataStage?
Ans. Following are the characteristics of IBM DataStage:
- The tool can be deployed on local servers and the cloud as per the requirement.
- It is easy to use and can efficiently increase the speed and flexibility of data integration.
- It can support and access big data through JSON support, JDBC integrator and distributed file systems.
Check out big data courses on the leading platforms
Q2. Describe the architecture of DataStage.
Ans. DataStage has a client-server model as its architecture. There are different architecture types according to the version. The components of the client-server architecture are:
- Client components
- Servers
- Stages
- Table definitions
- Containers
- Projects
- Jobs
Q3. What are DataStage jobs?
Ans. Datastage jobs determine data sources, required transformations and data destinations. Jobs are together compiled for creating reusable components and parallel job flows.
Q4. What is the need for exception activity in Datastage?
Ans. Exception activity is necessary for Datastage since post exception activity, every stage is executed whenever an unknown error occurs while executing the job sequencer.
Q5. How to perform usage analysis in Datastage?
Ans. You can perform Usage Analysis in a few clicks. First, you need to launch the Datastage Manager and then, right-click the job. After that, select Usage Analysis.
Q6. How can you perform date conversion in IBM Datastage?
Ans. Datastage has a function called âdate conversionâ.i.e. Oconv(I conv(Filedname,âExisting Date Formatâ),âNew Date Formatâ).
Q7. What are the different types of Lookups in the Datastage?
Ans. Datastage has two types of Lookups: Normal lkp and Sparse lkp.
- Normal lkp: First, data is saved in the memory and then the lookup is performed.
- Sparse lkp: Data is directly saved into the database. Sparse lkp is faster than the Normal lkp.
Q8. What is APT_CONFIG?
Ans. APT_CONFIG is the environment variable that identifies the *.apt file in the tool. It also stores the disk storage information, node information and scratch information.
Q9. Explain OConv () and IConv () functions?
Ans. In Datastage, OConv () and IConv() are two functions that are used for converting from one format to another etc. OConv () converts formats for users to understand whereas IConv () converts formats for the system to understand.
Q10. What is the difference between Sequential and Hash files?
Ans. The sequential file does not have any key value for saving the data. Hash file saves the data on the hash algorithm and on hash key value due to which searching in Hash file is faster than in sequential file.
Q11. Define flow designer in IBM DataStage?
Ans. Flow designer is the web-based user interface that is used for creating, editing, loading, and running jobs in IBM DataStage. It has the following features:
- Performs jobs with a large number of stages.
- No need for migrating the jobs to use the flow designer.
- Add/remove connectors and operators through the palette onto the designer canvas using the drag and drop feature.
Q12. Differentiate between Operational Datastage (ODS) and Data warehouse?
Ans. Operational Datastage is a mini data warehouse since it does not contain information for more than one year. A data warehouse contains in-depth information related to the entire business.
Q13. What is the use of NLS in Datastage?
Ans. National language support (NLS) can include multiple languages in the data as per the data warehouse processing requirements.
Q14. What are the different types of hash files in the Datastage?
Ans. DataStage has two types of hash files i.e. Static and Dynamic Hash File. Static hash file is used in cases where limited data has to be loaded within the target database. Dynamic hash file is used in case of loading an unknown amount of data from the source file.
Q15. Why do we need to use the surrogate key instead of the unique key?
Ans. The surrogate key is used instead of a unique key as it can retrieve the data faster through the retrieval operation.
Do give a read to InfoSphere MDM Reference Data Management V.10
Q16. Which command line is used for running a job in the tool?
Ans. You can run jobs using âdsjob -run -jobstatus <projectname> <jobname>â command.
Q17. How can you find bugs in a job sequence?
Ans. You can use DataStage Director for finding bugs in the job sequence.
Q18. Name the types of views in a Datastage Director?
Ans. There are three view types in a Datastage Director: Job View, Log View and Status View.
Q19. How can you improve performance in Datastage?
Ans. It is advisable not to use more than 20 stages in each job. It is better to use another job if you have to use more than 20 stages.
Q20.Which functions are used for importing and exporting DS jobs?
Ans. âdsimport.exeâ is used for importing and âdsexport.exeâ is used for exporting DS jobs.
Check out the top IBM courses right now!
Q21. How can you fix truncated data errors in Datastage?
Ans. You can fix truncated data errors by using the ENVIRONMENT variable âIMPORT_REJECT_STRING_FIELD_OVERRUNâ.
Q22. What is the difference between a data file and a descriptor file?
Ans. A data file contains the data and a descriptor file contains the description of this data contained within the data file.
Q23. List down some functions that you can execute using âdsjobâ command.
Ans. You can execute the following functions using the âdsjobâ command:
- $dsjob -run: to run DataStage job
- $dsjob -jobid: for providing the job information
- $dsjob -stop: to stop the job that is currently present in the process
- $dsjob -report: to display complete job report
- $dsjob -llinks: for listing all the links
- $dsjob -lprojects: for listing all present projects
- $dsjob -ljobs: to list all jobs present in the project
- $dsjob -lstages: for listing every stage of the current job
- $dsjobs -lparams: for listing all parameters of the job
- $dsjob -projectinfo: to retrieve project information
- $dsjob -jobinfo: for retrieving information of the job
- Q24. What are Routines and how can we call them in Datastage jobs?
Ans. Routines are the collection of functions defined by the DS manager. There are three types of routines including parallel, mainframe and server routines. You can call a routine from the transformer stage of the tool.
Q25. How can you write parallel routines?
Ans. You can write parallel routines in the C and C++ compilers.
Q26. Why do we use an HBase connector in Datastage?
Ans. HBase connector is used for connecting databases and tables that are present in the HBase database. Through the HBase connector, you can read and write the data in the HBase database, You can also read the data in parallel mode.
Q27. What is the importance of Data partitioning?
Ans. Data partitioning involves the process of segmenting records into partitions for processing. This increases processing efficiency in a linear model. Overall, it is a parallel approach for data processing.
Q28. How does Datastage manage rejected rows?
Ans. Rejected rows are managed through constraints in the transformer. There are two ways to do so. Either you can place the rejected rows in the transformerâs properties or you can create temporary storage for these rejected rows using the REJECTED command.
Q29. What is the use of link partitioner and link connector in Datastage?
Ans. Link Partitioner divides the data into different parts through partitioning methods. Link Collector collects data from different segments into a single data and it saves this data into the target table.
Q30. What are the tiers of Datastage InfoSphere Information Server?
Ans. There are four tiers in the InfoSphere Information Server are:
- Client tier: used for development and administration of computers using client programs and consoles.
- Services tier: to provide standard and module-specific services. It contains an application server, product modules as well as product services.
- Engine tier: it has a set of logical components which are used for running the jobs and other tasks for product modules.
- Metadata repository tier: it includes metadata repository and analysis database. The repository is used for sharing metadata, shared data and configuration information.
Q31. How does datastage jobs performance tuning take place?
Ans. For the performance tuning, first of all, one needs to select the appropriate configuration file. After that, select right partition and buffer the memory, We, then, need to deal with data sorting and handling null-time value. Use copy, filter or modify. Avoid using transformer. After that, reduce the propogation of redundant metadeta between several stages.
Q32. How does DataStage handle merging?
Ans. The primary key column in the tables can be used to merge or join two or more tables.
Q33. Whatâs the distinction between Datastage 7.5 and 7.0?
Ans. Many new stages, such as Procedure Stage, Command Stage, Generate Report, and so on, have been added to Datastage 7.5, which were not present in the 7.0 version.
