Pig and Hive are open source platform mainly used for same purpose. These tools that ease the complexity of writing difficult/complexed programs of java based MapReduce. Hive is like a data warehouse that uses the MapReduce for the purpose of analyzing data stored on HDFS. It provides a query language called HiveQL that is familiar to the Structured Query Language (SQL) standard. It is developed based on facebook concepts. Hive was created who are posing strong analysts having strong SQL skills but few java programming skills are required to run queries on the large volumes of data that Face book stored in HDFS. Apache Pig and Hive are two projects that are consider as the top most layer of Hadoop and provide a higher-level language for using MapReduce library of Hadoop management.
It consists of a query language based on the standard SQL instead of giving a rapid development of map and reduces tasks. Hive takes HiveQL statements and then automatically transforms each and every query into one or more MapReduce jobs. Later it runs the overall MapReduce program and executes the output to the user whereas Hadoop streaming decreases the mandatory code, compile, and submit cycle. Hive removes it completely instead requires only the composition of HiveQL statements.
This interface to Hadoop not only accelerates the time required to produce results from data analysis but also it significantly expands for whom this Hadoop and MapReduce are helpful.
|Used for Data Analysis||Used for Data and Programs|
|Used as Structured Data||Pig is Semi-Structured Data|
|Hive has HiveQL||Pig has Latin|
|Hive is used for creating reports||Pig is used for programming|
|Hive works on the server side||Pig works on the client side|
|Hive does not support avro||Pig supports Avro|
hive>select * form employee;
hive> describe employee;
Hive is most suitable for following data warehouse applications
Hive doesn’t provide fundamental features required for OLTP (Online Transaction Processing). Hive is proper usage for data warehouse applications in large data sets.
The two types of tables in Hive
We can change the settings within Hive session, using the command known as SET. It is used to change Hive job settings for a query to gain the exact results.
Example: The following below commands shows buckets are occupied according to the table definition.
hive> SET hive.enforce.bucketing=true;
We can see the current value of any property by using the value of SET with the property name. SET will allows to list all the properties with their values set by Hive.
hive> SET hive.enforce.bucketing; hive.enforce.bucketing=true
And this above list will not be include by defaults of Hadoop. So we should use the below as follows:
It will list all the properties including Hadoop functioning defaults in the system.
Learn SQL in 16 hrs from experts