Top Answers to Sqoop Interview Questions

1. Compare Sqoop and Flume

Criteria Sqoop Flume
Application Importing data from RDBMS Moving bulk streaming data into HDFS
Architecture Connector  – connecting to respective data Agent – fetching of the right data
Loading of data Event driven Not event driven

2. Name a few import control commands. How can Sqoop handle large objects?

Import control commands are used to import RDBMS data

Append: Append data to an existing dataset in HDFS. --append

Columns: columns to import from the table. --columns
<col,col……> • Where: where clause to use during import. --

where The common large objects are Blog and Clob.Suppose the object is less than 16 MB, it is stored inline with the rest of the data. If there are big objects, they are temporarily stored in a subdirectory with the name _lob. Those data are then materialized in memory for processing. If we set lob limit as ZERO (0) then it is stored in external memory.

3. How can we import data from particular row or column? What is the destination types allowed in Sqoop import command?

Sqoop allows to Export and Import the data from the data table based on the where clause. The syntax is

--columns
<col1,col2……> --where
--query

Example:

sqoop import –connect jdbc:mysql://db.one.com/corp --table INTELLIPAAT_EMP --where “start_date> ’2016-07-20’ ”
sqoopeval --connect jdbc:mysql://db.test.com/corp --query “SELECT * FROM intellipaat_emp LIMIT 20”
sqoop import –connect jdbc:mysql://localhost/database --username root --password aaaaa –columns “name,emp_id,jobtitle”

Sqoop supports data imported into following services:

  • HDFS
  • Hive
  • Hbase
  • Hcatalog
  • Accumulo

Learn about the complete Hadoop ecosystem of India in this blog post.

4. Role of JDBC driver in sqoop setup? Is the JDBC driver enough to connect the sqoop to the database?

Sqoop needs a connector to connect the different relational databases. Almost all Database vendors make a JDBC connector available specific to that Database, Sqoop needs a JDBC driver of the database for interaction.
No, Sqoop needs JDBC and a connector to connect a database.

5. Using Sqoop command how can we control the number of mappers?.

We can control the number of mappers by executing the parameter –num-mapers in sqoop command. The –num-mappers arguments control the number of map tasks, which is the degree of parallelism used. Start with a small number of map tasks, then choose a high number of mappers starting the performance may down on the database side.

Syntax: -m, –num-mappers

6.How will you update the rows that are already exported? Write sqoop command to show all the databases in MySQL server.

By using the parameter – update-key we can update existing rows. Comma-separated list of columns is used which uniquely identifies a row. All of these columns are used in the WHERE clause generated UPDATE query. All other table columns will be used in the SET part of the query.
The command below is used to show all the databases in MySQL server.

$ sqoop list –databases –connect jdbc:mysql://database.test.com/

Get 100% Hike!

Master Most in Demand Skills Now !

7. Define Sqoop metastore? What is the purpose of Sqoop-merge?

Sqoop meta store is a tool for using hosts in a shared metadata repository. Multiple users and remote users can define and execute saved jobs defined in metastore. End users configured to connect the metastore in sqoop-site.xml or with the

–meta-connect argument.

The purpose of sqoop-merge is:
This tool combines 2 datasets where entries in one dataset overwrite entries of an older dataset preserving only the new version of the records between both the data sets.

8. Explain the saved job process in Sqoop.

Sqoop allows us to define saved jobs which make this process simple. A saved job records the configuration information required to execute a Sqoop command at a later time. sqoop-job tool describes how to create and work with saved jobs. Job descriptions are saved to a private repository stored in $HOME/.sqoop/.

We can configure Sqoop to instead use a shared metastore, which makes saved jobs offered to multiple users across a shared cluster. Starting the metastore is covered by the section on the sqoop-metastore tool.

9. How Sqoop word came ? Sqoop is which type of tool and the main use of sqoop?

Sqoop word came from SQL+HADOOP=SQOOP. And Sqoop is a data transfer tool.
The main use of Sqoop is to import and export the large amount of data from RDBMS to HDFS and vice versa.

10. How to enter into Mysql prompt, and explain the command parameter indicates?

The command for entering into Mysql prompt is “mysql –u root –p”
-u indicatesthe user
Root indicates username
-p indicates password.

Career Transition

Intellipaat Job Guarantee Review | Intellipaat Job Assistance Review | Data Engineer Course
Career Transition to Big Data - Yogesh's Success Story | Big Data Hadoop Course - Intellipaat Review
Intellipaat Reviews | Big Data Analytics Course | Career Transformation to Big Data | Gayathri
Intellipaat Reviews | Big Data Analytics Course | Non-Tech To Big Data Expert - Kushagra Chugh
How Can A Non Technical Person Become Data Scientist | Intellipaat Review - Melvin
Non Tech to Data Scientist Career Transition | Data Science Course Review - Intellipaat
Upskilled & Got Job as Analyst After a Career Break |  Data Science Course Story - Shehzin Mulla

11. I am getting connection failure exception during connecting to Mysql through Sqoop, what is the root cause and fix for this error scenario?

This will happen when there is lack of permissions to access our Mysql database over the network. We can try the below command to confirm the connect to Mysql database from aSqoop client machine.
$ mysql –host=MySqlnode> –database=test –user= –password=
We can grant the permissions with below commands.

mysql> GRANT ALL PRIVILEGES ON *.* TO ‘%’@’localhost’;
mysql> GRANT ALL PRIVILEGES ON *.* TO ‘ ’@’localhost’;

12. I am getting java.lang.IllegalArgumentException: during importing tables from oracle database.what might be the root cause and fix for this error scenario?

Sqoop commands are case- sensitive of table names and user names.
By specifying the above two values in UPPER case, it will resolve the issue.
In case, the source table is created under different user namespace,then table name should be like USERNAME.TABLENAME as shown below
sqoop import
–connect jdbc:oracle:thin:@intellipaat.testing.com/INTELLIPAAT
–username SQOOP
–password sqoop
–table COMPANY.EMPLOYEES

13. How can you list all the columns of a table using Apache sqoop?

There is no straight way to list all the columns of a table in Apache Sqoop like sqoop-list-columns, so first we should retrieve the columns of the particular table and transform to a file containing the column names of particular table.Syntax is:

Sqoop import –m1 –connect ‘jdbc:sqlserver://servername;database=databasename;
Username-DeZyre;password=mypassword’ –query “SELECT column_name,DATA_TYPE FROM INFORMATION_SCHEMA columns WHEREtable_name=’mytableofinterest’ AND $CONDITIONS” –target-dir ‘mytableofinterest_column_name’.

14. How to create a table in Mysql and how to insert the values into the table ?

To create a table in mysql using the below command

mysql> create table tablename( col1 datatype, col2 datatype,…………);

Example –

mysql> create table INTELLIPAAT(emp_idint,emp_namevarchar(30),emp_salint);

Insert the values into the table

mysql> insert into table name(value1,value2,value3,………);

Example-

mysql> insert into INTELLIPAAT(1234,’aaa’,20000);
mysql> insert into INTELLIPAAT(1235,’bbb’,10000);
mysql>   insert into INTELLIPAAT(1236,’ccc’,15000);

Become a Big Data Architect

15. What are the basic commands in Hadoop Sqoop and its uses?

The basic commands of HadoopSqoop are

  • Codegen, Create-hive-table, Eval, Export, Help, Import, Import-all-tables, List-databases, List-tables,Versions.
  • Useof HadoopSqoop basic commands
  • Codegen- It helps to generate code to interact with database records.
  • Create-hive-table- It helps to Import a table definition into a hive
  • Eval- It helps to evaluateSQL statement and display the results
  • Export-It helps to export an HDFS directory into a database table
  • Help- It helps to list the available commands
  • Import- It helps to import a table from a database to HDFS
  • Import-all-tables- It helps to import tables from a database to HDFS
  • List-databases- It helps to list available databases on a server
  • List-tables-It helps to list tables in a database
  • Version-It helps to display the version information

16. Is sqoop same as to distcp in hadoop?

No. Because the only distcp import command is same as Sqoop import command and both the commands submit parallel map-only jobs but both command functions are different. Distcp is used to copy any type of files from Local filesystem to HDFS and Sqoop is used for transferring the data records between RDBMS and Hadoop eco- system service.

17. For each sqoop copying into HDFS how many MapReduce jobs and tasks will be submitted?

There are 4 jobs that will be submitted to each Sqoop copying into HDFS and no reduce tasks are scheduled.

18. How can Sqoop be used in Java programs?

In the Java code Sqoop jar is included in the classpath. The required parameters are created to Sqoop programmatically like for CLI (command line interface). Sqoop.runTool() method also invoked in Java code.

19. I am having around 500 tables in a database. I want to import all the tables from the database except the tables named Table 498, Table 323, and Table 199. How can we do this without having to import the tables one by one?

This can be proficient using the import-all-tables, import command in Sqoop and by specifying the exclude-tables option with it as follows-
sqoop import-all-tables
–connect –username –password –exclude-tables Table498, Table 323, Table 199

20. Explain the significance of using –split-by clause in Apache Sqoop?

split-by is a clause, it is used to specify the columns of the table which are helping to generate splits for data imports during importing the data into the Hadoop cluster. This clause specifies the columns and helps to improve the performance via greater parallelism. And also it helps to specify the column that has an even distribution of data to create splits,that data is imported. 

Course Schedule

Name Date Details
Big Data Course 23 Mar 2024(Sat-Sun) Weekend Batch
View Details
Big Data Course 30 Mar 2024(Sat-Sun) Weekend Batch
View Details
Big Data Course 06 Apr 2024(Sat-Sun) Weekend Batch
View Details