Top Answers to Sqoop Interview Questions
|Application||Importing data from RDBMS||Moving bulk streaming data into HDFS|
|Architecture||Connector – connecting to respective data||Agent – fetching of the right data|
|Loading of data||Event driven||Not event driven|
Import control commands are used to import RDBMS data
Append: Append data to an existing dataset in HDFS. –append
Columns: columns to import from the table. –columns
<col,col……> • Where: where clause to use during import. —
where The common large objects are Blog and Clob.Suppose the object is less than 16 MB, it is stored inline with the rest of the data. If there are big objects, they are temporarily stored in a subdirectory with the name _lob. Those data are then materialized in memory for processing. If we set lob limit as ZERO (0) then it is stored in external memory.
Sqoop allows to Export and Import the data from the data table based on the where clause. The syntax is
--columns <col1,col2……> --where --query
sqoop import –connect jdbc:mysql://db.one.com/corp --table INTELLIPAAT_EMP --where “start_date> ’2016-07-20’ ” sqoopeval --connect jdbc:mysql://db.test.com/corp --query “SELECT * FROM intellipaat_emp LIMIT 20” sqoop import –connect jdbc:mysql://localhost/database --username root --password aaaaa –columns “name,emp_id,jobtitle”
Sqoop supports data imported into following services:
Learn about the complete Hadoop ecosystem in this blog post.
Learn Hadoop in 85 hrs from experts
Sqoop needs a connector to connect the different relational databases. Almost all Database vendors make a JDBC connector available specific to that Database, Sqoop needs a JDBC driver of the database for interaction.
No, Sqoop needs JDBC and a connector to connect a database.
Interested in learning Sqoop? Well, we have the comprehensive Training Course to give you a head start in your career.
We can control the number of mappers by executing the parameter –num-mapers in sqoop command. The –num-mappers arguments control the number of map tasks, which is the degree of parallelism used. Start with a small number of map tasks, then choose a high number of mappers starting the performance may down on the database side.
Syntax: -m, –num-mappers
By using the parameter – update-key we can update existing rows. Comma-separated list of columns is used which uniquely identifies a row. All of these columns are used in the WHERE clause generated UPDATE query. All other table columns will be used in the SET part of the query.
The command below is used to show all the databases in MySQL server.
$ sqoop list –databases –connect jdbc:mysql://database.test.com/
Sqoop meta store is a tool for using hosts in a shared metadata repository. Multiple users and remote users can define and execute saved jobs defined in metastore. End users configured to connect the metastore in sqoop-site.xml or with the
The purpose of sqoop-merge is:
This tool combines 2 datasets where entries in one dataset overwrite entries of an older dataset preserving only the new version of the records between both the data sets.
Sqoop allows us to define saved jobs which make this process simple. A saved job records the configuration information required to execute a Sqoop command at a later time. sqoop-job tool describes how to create and work with saved jobs. Job descriptions are saved to a private repository stored in $HOME/.sqoop/.
We can configure Sqoop to instead use a shared metastore, which makes saved jobs offered to multiple users across a shared cluster. Starting the metastore is covered by the section on the sqoop-metastore tool.
Sqoop word came from SQL+HADOOP=SQOOP. And Sqoop is a data transfer tool.
The main use of Sqoop is to import and export the large amount of data from RDBMS to HDFS and vice versa.
The command for entering into Mysql prompt is “mysql –u root –p”
-u indicatesthe user
Root indicates username
-p indicates password.
This will happen when there is lack of permissions to access our Mysql database over the network. We can try the below command to confirm the connect to Mysql database from aSqoop client machine.
$ mysql –host=MySqlnode> –database=test –user= –password=
We can grant the permissions with below commands.
mysql> GRANT ALL PRIVILEGES ON *.* TO ‘%’@’localhost’; mysql> GRANT ALL PRIVILEGES ON *.* TO ‘ ’@’localhost’;
Give your career a big boost by going through our Apache Sqoop Training Course now!
Sqoop commands are case- sensitive of table names and user names.
By specifying the above two values in UPPER case, it will resolve the issue.
In case, the source table is created under different user namespace,then table name should be like USERNAME.TABLENAME as shown below
There is no straight way to list all the columns of a table in Apache Sqoop like sqoop-list-columns, so first we should retrieve the columns of the particular table and transform to a file containing the column names of particular table.Syntax is:
Sqoop import –m1 –connect ‘jdbc:sqlserver://servername;database=databasename; Username-DeZyre;password=mypassword’ –query “SELECT column_name,DATA_TYPE FROM INFORMATION_SCHEMA columns WHEREtable_name=’mytableofinterest’ AND \$CONDITIONS” –target-dir ‘mytableofinterest_column_name’.
To create a table in mysql using the below command
mysql> create table tablename( col1 datatype, col2 datatype,…………);
mysql> create table INTELLIPAAT(emp_idint,emp_namevarchar(30),emp_salint);
Insert the values into the table
mysql> insert into table name(value1,value2,value3,………);
mysql> insert into INTELLIPAAT(1234,’aaa’,20000); mysql> insert into INTELLIPAAT(1235,’bbb’,10000); mysql> insert into INTELLIPAAT(1236,’ccc’,15000);
The basic commands of HadoopSqoop are
- Codegen, Create-hive-table, Eval, Export, Help, Import, Import-all-tables, List-databases, List-tables,Versions.
- Useof HadoopSqoop basic commands
- Codegen- It helps to generate code to interact with database records.
- Create-hive-table- It helps to Import a table definition into a hive
- Eval- It helps to evaluateSQL statement and display the results
- Export-It helps to export an HDFS directory into a database table
- Help- It helps to list the available commands
- Import- It helps to import a table from a database to HDFS
- Import-all-tables- It helps to import tables from a database to HDFS
- List-databases- It helps to list available databases on a server
- List-tables-It helps to list tables in a database
- Version-It helps to display the version information
No. Because the only distcp import command is same as Sqoop import command and both the commands submit parallel map-only jobs but both command functions are different. Distcp is used to copy any type of files from Local filesystem to HDFS and Sqoop is used for transferring the data records between RDBMS and Hadoop eco- system service.
There are 4 jobs that will be submitted to each Sqoop copying into HDFS and no reduce tasks are scheduled.
In the Java code Sqoop jar is included in the classpath. The required parameters are created to Sqoop programmatically like for CLI (command line interface). Sqoop.runTool() method also invoked in Java code.
This can be proficient using the import-all-tables, import command in Sqoop and by specifying the exclude-tables option with it as follows-
–connect –username –password –exclude-tables Table498, Table 323, Table 199
split-by is a clause, it is used to specify the columns of the table which are helping to generate splits for data imports during importing the data into the Hadoop cluster. This clause specifies the columns and helps to improve the performance via greater parallelism. And also it helps to specify the column that has an even distribution of data to create splits,that data is imported.
Take charge of your career by going through this professionally designed Apache Hadoop Developer Course.