Are you a developer looking for a high-level scripting language to work on Hadoop? If yes, then you must take Apache Pig into your consideration. This Pig cheat sheet is designed for the one who has already started learning about the scripting languages like SQL and using Pig as a tool, then this sheet will be handy reference. Don’t worry if you are a beginner and have no idea about how Pig works, this cheat sheet will give you a quick reference of the basics that you must know to get started.
|EVAL functions||AVG, COUNT, COUNT_STAR, SUM, TOKENIZE, MAX, MIN, SIZE etc|
|LOAD or STORE functions||Pigstorage(), Textloader, HbaseStorage, JsonLoader, JsonStorage etc|
|Math functions||ABS, COS, SIN, TAN, CEIL, FLOOR, ROUND, RANDOM etc|
|String functions||TRIM, RTRIM, SUBSTRING, LOWER, UPPER etc|
|DateTime function||GetDay, GetHour, GetYear, ToUnixTime, ToString etc|
A = LOAD ‘Employee’ USING PigStorage(‘\t’) AS (name: chararray, age:int, gpa: float);
Loads and stores data as structured text file
Syntax: A = LOAD ‘data’ USING TextLoader();
Loads unstructured data in UTF 8 format
Syntax: A = LOAD ‘data’ USING BinStorage();
Loads and stores data in machine readable format
It loads and stores compressed data in Pig
Syntax: A = load ‘a.json’ using JsonLoader();
It loads and stores JSON data
Syntax: STORE X INTO ‘output’ USING PigDump ();
Stores data in UTF 8 format
It returns the absolute value of an expression
It Returns the trigonometric cosine of an expression.
Syntax: SIN (expression)
It returns the sine of an expression.
It is used to return the value of an expression rounded up to the nearest integer
It is used to return the trigonometric tangent of an angle.
It returns the value of an expression rounded to an integer (if the result type is float) or long (if the result type is double)
Synatx: RANDOM ()
It returns a pseudo random number (type double) greater than or equal to 0.0 and less than 1.0
Returns the value of an expression rounded down to the nearest integer.
It returns the cube root of an expression
Syntax: INDEXOF (string, ‘character’, startIndex)
It returns an index of the first occurrence of a character in a string
Syntax: LAST_INDEX_OF (expression)
It returns an index of the last occurrence of a character in a string
It returns a copy of the string with leading and trailing whitespaces removed
Syntax: SUBSTRING (string, startIndex, stopIndex)
It will return a substring from a given string
It will return a string with the first character changed to the upper case
Converts all characters in a string to lowercase
Converts all characters in a string to the uppercase
|TOTUPLE||TOTUPLE(expression [, expression …])||It is used to convert one or more expressions to the type Tuple|
|TOBAG||TOBAG(expression [, expression …])||It is used to convert one or more expression to the individual tuple, which is then placed in a bag|
|TOMAP||TOMAP(key-expression, value-expression [, key-expression, value-expression …])||It is used to convert key/value expression pairs to a Map|
|TOP||TOP(topN,column,relation)||Returns a top-n tuples from a bag of tuples|
We have covered all the basics of Pig Built-in Functions in this cheat sheet. If you want to start learning Pig Built-in Functions in depth then check out the Hadoop Administrator Online Training and Certification by Intellipaat.
Not only will you get to learn and implement Pig Built-in Functions with a step by step guidance and support from us, but also you will get 24*7 technical support to help you with any and all your queries, from the experts in the respective technologies here at intellipaat throughout the certification period. So, why wait? Check out the training program and enroll today!Previous
Download Interview Questions asked by top MNCs in 2019?