bing
Flat 10% & upto 40% off + Free additional Courses. Hurry up!
×
UPTO
50%
OFF!

Apache Hive

Pig and Hive are open source platform mainly used for same purpose. These tools that ease the complexity of writing difficult/complexed programs of java based MapReduce. Hive is like a data warehouse that uses the MapReduce for the purpose of analyzing data stored on HDFS. It provides a query language called HiveQL that is familiar to the Structured Query Language (SQL) standard. It is developed based on facebook concepts. Hive was created who are posing strong analysts having strong SQL skills but few java programming skills are required to run queries on the large volumes of data that Face book stored in HDFS. Apache Pig and Hive are two projects that are consider as the top most layer of Hadoop and provide a higher-level language for using MapReduce library of Hadoop management.

Why hive?

It consists of a query language based on the standard SQL instead of giving a rapid development of map and reduces tasks. Hive takes HiveQL statements and then automatically transforms each and every query into one or more MapReduce jobs. Later it runs the overall MapReduce program and executes the output to the user whereas Hadoop streaming decreases the mandatory code, compile, and submit cycle. Hive removes it completely instead requires only the composition of HiveQL statements.

This interface to Hadoop not only accelerates the time required to produce results from data analysis but also it significantly expands for whom this Hadoop and MapReduce are helpful.

What makes Hive Hadoop popular?

  • The users are provided with strong and powerful statistics functions.
  • It is similar to SQL and hence it is very easy to understand the concepts.
  • It can be combined with the HBase for querying the data in HBase. This kind of feature is not available in pig. Pig function named HbaseStorage () is mainly used for loading the data from HBase.
  • Supported by Hue.
  • Various user groups are considered such as CNET, Last.fm, Facebook, and Digg etc.

Difference between hive and pig

Hive Pig
Used for Data Analysis Used for Data and Programs
Used as Structured Data Pig is Semi-Structured Data
Hive has HiveQL Pig has Latin
Hive is used for creating reports Pig is used for programming
Hive works on the server side Pig works on the client side
Hive does not support avro Pig supports Avro

hive>select * form employee;
hive> describe employee;

  • The Apache Hive is mainly data warehouse software which allows you to read, write and manage huge number volumes of datasets stored in a distributed environment using SQL. It is possible to project structure onto data that is termed as storage. Users can be connected to Hive using a JDBC driver and a command line tool.
  • Hive is an open Source platform system. Use Hive for analyzing and querying in large number of datasets consisting the Hadoop files. It’s similar to the SQL programming. The current version of Hive is 0.13.1.
  • Hive supports ACID transaction: Atomicity, Consistency, Isolation, and Durability. ACID transactions are provided at the row levels, those are Insert, Delete, and Update options so that Hive supports ACID transaction.
  • Hive is not considered as a complete database. The design rules and regulations of Hadoop and HDFS put restrictions on what Hive can do in the field of programming.

Hive is most suitable for following data warehouse applications

  • Analyzing the static data
  • Less Responsive time
  • No rapid changes in datasets.

Hive doesn’t provide fundamental features required for OLTP (Online Transaction Processing). Hive is proper usage for data warehouse applications in large data sets.

The two types of tables in Hive

  1. Managed table
  2. External table

We can change the settings within Hive session, using the command known as SET. It is used to change Hive job settings for a query to gain the exact results.

Example: The following below commands shows buckets are occupied according to the table definition.

hive> SET hive.enforce.bucketing=true;

We can see the current value of any property by using the value of SET with the property name. SET will allows to list all the properties with their values set by Hive.

hive> SET hive.enforce.bucketing;
hive.enforce.bucketing=true

And this above list will not be include by defaults of Hadoop. So we should use the below as follows:

SET -v
It will list all the properties including Hadoop functioning defaults in the system.

"0 Responses on Apache Hive"

100% Secure Payments. All major credit & debit cards accepted Or Pay by Paypal.
top

Sales Offer

  • To avail this offer, enroll before 22nd November 2017.
  • This offer cannot be combined with any other offer.
  • This offer is valid on selected courses only.
  • Please use coupon codes mentioned below to avail the offer
offer banner Black friday

Sign Up or Login to view the Free Apache Hive.