bing
Flat 10% & upto 50% off + Free additional Courses. Hurry up!

Apache Pig

 

Pig raises the level of abstraction for processing large datasets. It is a platform for analyzing large data sets that consists of a high level language for expressing data analysis programs. It is an open source language which is developed by yahoo.

Advantages of Pig

  • Reuse the code
  • Faster development
  • Fewer lines of code
  • Schema and type checking etc

 Pig is made up of two pieces:

  • First is language which is used to express data flows known as Pig Latin.
  • Second one is execution environment to run Pig Latin programs. There are now two environments that are local execution in a single JVM and distributed execution on a Hadoop cluster.

A Pig Latin program is collection of a series of operations or transformations which are applied to the input data to generate output. The operations express a data flow that the pig execution environment transforms into an executable representation and then runs.

This blog will help you get a better understanding of What is Apache Pig?

 

What makes Pig Hadoop popular?

  • It is easy to learn read and write if you know SQL.
  • It uses a multi query approach.
  • It provides large number of nested data types like as Maps, Tuples and Bags which are not available in MapReduce along with some data operations like Filters, Ordering and Joins.
  • It contains different user groups for instance 90% of Yahoo’s MapReduce is done by Pig and 80% of Twitter’s MapReduce is also done by Pig and various other companies like Sales force, LinkedIn and Nokia etc. also use Pig.

 

Running Pig Programs

There are 3 ways of executing Pig programs all of which work in both local and MapReduce mode:

  • Script 

Pig can run a script file that contains Pig commands. For example, pig script.pig runs the commands in the local file script.pig. Alternatively, for very short scripts, you can use the -e option to run a script specified as a string on the command line.

  • Grunt

Grunt is an interactive shell for running Pig commands. Grunt is started when no file is specified for Pig to run, and the -e option is not used. It is also possible to run Pig scripts from within Grunt using run and exec.

  • Embedded

You can execute Pig programs from Java like you can use JDBC to run SQL programs from Java.

 

Example: Word count in Pig

Lines=LOAD ‘input/hadoop.log’ AS (line: chararray);

Words = FOREACH Lines GENERATE FLATTEN (TOKENIZE (line)) AS word;

Groups = GROUP Words BY word;

Counts = FOREACH Groups GENERATE group, COUNT (Words);

Results = ORDER Words BY Counts DESC;

Top5 = LIMIT Results 5;

STORE Top5 INTO /output/top5words;

"0 Responses on Apache Pig"

Training in Cities

Bangalore, Hyderabad, Chennai, Delhi, Kolkata, UK, London, Chicago, San Francisco, Dallas, Washington, New York, Orlando, Boston

100% Secure Payments. All major credit & debit cards accepted Or Pay by Paypal.

top

Sales Offer

  • To avail this offer, enroll before 06th December 2016.
  • This offer cannot be combined with any other offer.
  • This offer is valid on selected courses only.
  • Please use coupon codes mentioned below to avail the offer
offer-june

Sign Up or Login to view the Free Apache Pig.