Top Hive Interview Questions – Most Asked
Top Hive Interview Questions – Most Asked
Top Answers to Hive Interview Questions
|Architecture||Procedural data flow language||SQL type declarative language|
|Application||Programming purposes||Report creation|
|Operational field||Client side||Server side|
|Support for avro files||Yes||No|
Get a better understanding of Hive by going through this Hive Tutorial now.
• Analyzing the relatively static data
• Less Responsive time
• No rapid changes in data.
Hive doesn’t provide fundamental features required for OLTP, Online Transaction Processing.Hive is suitable for data warehouse applications in large data sets.
Two types of tables in Hive
1. Managed table
2. External table Find out more about Hive in this blog post.
Example: The following commands shows buckets are occupied according to the table definition.
hive> SET hive.enforce.bucketing=true;
We can see the current value of any property by using SET with the property name. SET will list all the properties with their values set by Hive.
hive> SET hive.enforce.bucketing;
And this list will not include defaults of Hadoop. So we should use the below like
It will list all the properties including the Hadoop defaults in the system.Interested in learning Hive? Well, we have the comprehensive Hive Training Course to give you a head start in your career.
1)Take a new system create a new username and password
2) Install the SSH and with master node setup ssh connections
3) Add ssh public_rsa id key to the authorized keys file
4) Add the new data node host name, IP address and other details in /etc/hosts slaves file
192.168.1.102 slave3.in slave3
5) Start the Data Node on New Node
6) Login to the new node like suhadoop or ssh -X email@example.com
7) Start HDFS of a newly added slave node by using the following command
./bin/hadoop-daemon.sh start data node
8) Check the output of jps command on a new node
‘N’ number of strings separated by a comma.
CONCAT ('Intellipaat','-','is','-','a','-','eLearning',’-’,’provider’); Output: Intellipaat-is-a-eLearning-provider
So, every time we set the limits of the strings by ‘-‘. If it is common for every strings, then Hive provides another command
CONCAT_WS. In this case,we have to specify the set limits of operator first. CONCAT_WS ('-',’Intellipaat’,’is’,’a’,’eLearning’,‘provider’); Output: Intellipaat-is-a-eLearning-provider.
Learn more about the Apache Hive features, architecture, and HiveQL, in this article now.
TRIM(‘ INTELLIPAAT ‘); Output: INTELLIPAAT To remove the Leading space LTRIM(‘ INTELLIPAAT’); To remove the trailing space RTRIM(‘INTELLIPAAT ‘); In Reverse function, characters are reversed in the string. Example: REVERSE(‘INTELLIPAAT’); Output: TAAPILLETNI
The syntax is :
ALTER TABLE table_name CHANGE column_namecolumn_namenew_datatype;
Example: If we want to change the data type of the salary column from integer to bigint in the employee table.
ALTER TABLE employee CHANGE salary salary BIGINT;RLIKE: Its full form is Right-Like and it is a special function in the Hive. It helps to examine the two substrings. i.e, if the substring of A matches with B then it evaluates to true.
‘Intellipaat’ RLIKE ‘tell’ True ‘Intellipaat’ RLIKE ‘^I.*’ True (this is a regular expression)
Learn what is Hadoop Hive in this detailed blog post now.
By using below commands we can access sub directories recursively in Hive
hive> Set mapred.input.dir.recursive=true; hive> Set hive.mapred.supports.subdirectories=true;
Hive tables can be pointed to the higher level directory and this is suitable for the directory structure which is like /data/country/state/city/
In the above three lines of headers that we do not want to include in our Hive query. To skip header lines from our tables in the Hive,set a table property that will allow us to skip the header lines.
CREATE EXTERNAL TABLE employee ( name STRING, job STRING, dob STRING, id INT, salary INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘ ‘ STORED AS TEXTFILE LOCATION ‘/user/data’ TBLPROPERTIES("skip.header.line.count"="2”);
Hive supports the text file format by default and it supports the binary format Sequence files, ORC files, Avro Data files, Parquet files.
Sequence files: Splittable, compressible and row oriented are the general binary format.
ORC files: Full form of ORC is optimized row columnar format files. It is a Record columnar file and column oriented storage file. It divides the table in row split. In each split stores that value of the first row in the first column and followed sub subsequently.
AVRO data files: It is same as a sequence file splittable, compressible and row oriented, but except the support of schema evolution and multilingual binding support.Learn more about Hive in this Training Course to get ahead in your career!
1. SET Command in HIVE
2. The command line –hiveconf option
By default its value is minimal. Which optimize- SELECT *, FILTER on partition columns, LIMIT queries only, Where another value is more which optimize- SELECT, FILTER, LIMIT.
We can store the hive data in highly efficient manner in the Optimized Row Columnar file format. It can simplify many Hive file format limitations. We can improve the performance by using ORC files while reading, writing and processing the data.
Set hive.compute.query.using.stats-true; Set hive.stats.dbclass-fs; CREATE TABLE orc_table ( idint, name string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘\:’ LINES TERMINATED BY ‘\n’ STORES AS ORC;
Need a reason to learn Apache Hadoop and Hive? Well, go through this blog post to find out why Hadoop is the new black.
Instance of Java class
A standard Java object
A lazily initialized object
The Object-Inspector tells structure of the object and also ways to access the internal fields inside the object.
Currently used SerDe classes in the Hive are
Meta data Typed Column setSerDe
2. JDBC Driver: JDBC Driver supports the Type 4 (pure Java) JDBC Driver
3. ODBC Driver: ODBC Driver supports the ODBC protocol.
Example: The RegexDeserializer will deserialize the data using the configuration parameter ‘regex’, and a list of column names.
If our SerDe supports DDL, we probably want to implement a protocol based on DynamicSerDe. It’s non-trivial to write a “thrift DDL” parser.
Three collection data types in Hive
• STRUCTGo through this IBM article and learn how Hive works perfectly with Big SQL
We can execute Hive queries from the script files by using the source command.
Hive> source /path/to/file/file_with_query.hql
Take charge of your career by going through this professionally designed Hive Training Course.