0% found this document useful (0 votes)

32 views7 pages

Pig SKB

Apache Pig is a high-level data flow platform for executing MapReduce programs in Hadoop using the Pig Latin language, which simplifies complex programming tasks. It supports various data types and execution modes, including Local Mode for development and MapReduce Mode for production. Key features include ease of programming, optimization opportunities, extensibility, and built-in operators for data manipulation.

Uploaded by

maheshpuli078

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views7 pages

Pig SKB

Uploaded by

maheshpuli078

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

What is Apache Pig

Apache Pig is a high-level data flow platform for executing MapReduce

programs of Hadoop. The language used for Pig is Pig Latin.

The Pig scripts get internally converted to Map Reduce jobs and get
executed on data stored in HDFS. Apart from that, Pig can also execute its
job in Apache Tez or Apache Spark.

Pig can handle any type of data, i.e., structured, semi-structured or

unstructured and stores the corresponding results into Hadoop Data File
System. Every task which can be achieved using PIG can also be achieved
using java used in MapReduce.

Features of Apache Pig

Let's see the various uses of Pig technology.

1) Ease of programming

Writing complex java programs for map reduce is quite tough for non-
programmers. Pig makes this process easy. In the Pig, the queries are
converted to MapReduce internally.

2) Optimization opportunities

It is how tasks are encoded permits the system to optimize their execution
automatically, allowing the user to focus on semantics rather than
efficiency.

3) Extensibility

A user-defined function is written in which the user can write their logic to
execute over the data set.

4) Flexible

It can easily handle structured as well as unstructured data.

5) In-built operators

It contains various type of operators such as sort, filter and joins.

Differences between Apache MapReduce and PIG

Apache MapReduce Apache PIG

It is a high-level data flow

It is a low-level data processing tool.
tool.
It is not required to
Here, it is required to develop complex programs
develop complex
using Java or Python.
programs.

It provides built-in
It is difficult to perform data operations in operators to perform data
MapReduce. operations like union,
sorting and ordering.

It provides nested data

It doesn't allow nested data types. types like tuple, bag, and
map.

Advantages of Apache Pig

o Less code - The Pig consumes less line of code to perform any
operation.

o Reusability - The Pig code is flexible enough to reuse again.

o Nested data types - The Pig provides a useful concept of nested data
types like tuple, bag, and map.

Apache Pig Run Modes

Apache Pig executes in two modes: Local Mode and MapReduce Mode.

Local Mode

o It executes in a single JVM and is used for development

experimenting and prototyping.

o Here, files are installed and run using localhost.

o The local mode works on a local file system. The input and output
data stored in the local file system.

The command for local mode grunt shell:

1. $ pig-x local

MapReduce Mode

o The MapReduce mode is also known as Hadoop Mode.

o It is the default mode.

o In this Pig renders Pig Latin into MapReduce jobs and executes them
on the cluster.

o It can be executed against semi-distributed or fully distributed

Hadoop installation.

o Here, the input and output data are present on HDFS.

The command for Map reduce mode:

1. $ pig

Or,

1. $ pig -x mapreduce

Ways to execute Pig Program

These are the following ways of executing a Pig program on local and
MapReduce mode: -

o Interactive Mode - In this mode, the Pig is executed in the Grunt

shell. To invoke Grunt shell, run the pig command. Once the Grunt
mode executes, we can provide Pig Latin statements and command
interactively at the command line.

o Batch Mode - In this mode, we can run a script file having a .pig
extension. These files contain Pig Latin commands.

o Embedded Mode - In this mode, we can define our own functions.

These functions can be called as UDF (User Defined Functions). Here,
we use programming languages like Java and Python.

Pig Latin

The Pig Latin is a data flow language used by Apache Pig to analyze the
data in Hadoop. It is a textual language that abstracts the programming
from the Java MapReduce idiom into a notation.

Pig Latin Statements

The Pig Latin statements are used to process the data. It is an operator that
accepts a relation as an input and generates another relation as an output.

o It can span multiple lines.

o Each statement must end with a semi-colon.

o It may include expression and schemas.

o By default, these statements are processed using multi-query

execution

Pig Latin Conventions

Convention Description

The parenthesis can enclose one o

It can also be used to indicate the
()
type.
Example - (10, xyz, (3,6,9))

The straight brackets can enclose

items. It can also be used to indica
[]
data type.
Example - [INNER | OUTER]

The curly brackets enclose two or

{} can also be used to indicate the ba
Example - { block | nested_block }

The horizontal ellipsis points indica

... can repeat a portion of the code.
Example - cat path [path ...]

Latin Data Types

Simple Data Types

Type Description

It defines the signed 32-bit

int integer.
Example - 2
It defines the signed 64-bit
long integer.
Example - 2L or 2l

It defines 32-bit floating point

number.
float
Example - 2.5F or 2.5f or
2.5e2f or 2.5.E2F

It defines 64-bit floating point

number.
double
Example - 2.5 or 2.5 or 2.5e2f
or 2.5.E2F

It defines character array in

chararray Unicode UTF-8 format.
Example - javatpoint

bytearray It defines the byte array.

It defines the boolean type

boolean values.
Example - true/false

It defines the values in

datetime order.
datetime
Example - 1970-01-
01T00:00:00.000+00:00

It defines Java BigInteger

biginteger values.
Example - 5000000000000

It defines Java BigDecimal

bigdecimal values.
Example - 52.232344535345

Complex Types

Type Description
It defines an ordered set of fields.
tuple
Example - (15,12)

It defines a collection of tuples.

bag
Example - {(15,12), (12,15)}

It defines a set of key-value pairs.

map
Example - [open#apache]

Pig Example

Use case: Using Pig find the most occurred start letter.

Solution:

Case 1: Load the data into bag named "lines". The entire line is stuck to
element line of type character array.

1. grunt> lines = LOAD "/user/Desktop/data.txt" AS (line: chararray);

Case 2: The text in the bag lines needs to be tokenized this produces one
word per row.

1. grunt>tokens = FOREACH lines GENERATE flatten(TOKENIZE(line)) A

s token: chararray;

Case 3: To retain the first letter of each word type the below
command .This commands uses substring method to take the first
character.

1. grunt>letters = FOREACH tokens GENERATE SUBSTRING(0,1) as lett

er : chararray;

Case 4: Create a bag for unique character where the grouped bag will
contain the same character for each occurrence of that character.

1. grunt>lettergrp = GROUP letters by letter;

Case 5: The number of occurrence is counted in each group.

1. grunt>countletter = FOREACH lettergrp GENERATE group , COUNT(

letters);

Case 6: Arrange the output according to count in descending order using

the commands below.
1. grunt>OrderCnt = ORDER countletter BY $1 DESC;

Case 7: Limit to One to give the result.

1. grunt> result =LIMIT OrderCnt 1;

Case 8: Store the result in HDFS . The result is saved in output directory
under sonoo folder.

1. grunt> STORE result into 'home/sonoo/output';

MC Female Home Challenge 6.0 Cut
100% (2)
MC Female Home Challenge 6.0 Cut
22 pages
Transcript of Pivotal Climate-Change Hearing 1988
100% (4)
Transcript of Pivotal Climate-Change Hearing 1988
216 pages
The Most Notorious "Talker" Runs The World's Greatest Clan Vol 3
No ratings yet
The Most Notorious "Talker" Runs The World's Greatest Clan Vol 3
339 pages
EWD Camry 2006
No ratings yet
EWD Camry 2006
400 pages
Apache Pig for Data Analysts
No ratings yet
Apache Pig for Data Analysts
4 pages
Pig
No ratings yet
Pig
16 pages
Fractions Worksheet
No ratings yet
Fractions Worksheet
2 pages
Apache Pig for Data Engineers
No ratings yet
Apache Pig for Data Engineers
50 pages
Unit No. 8
No ratings yet
Unit No. 8
24 pages
Pig Latin: Simplifying Hadoop for All
No ratings yet
Pig Latin: Simplifying Hadoop for All
9 pages
AIS Data Coding Schemes Written Report
50% (2)
AIS Data Coding Schemes Written Report
2 pages
BDH Exp-4 I232
No ratings yet
BDH Exp-4 I232
8 pages
BDP U4
No ratings yet
BDP U4
58 pages
Hadoop - PIG User Material
No ratings yet
Hadoop - PIG User Material
292 pages
Chapter 6 - Multiphase Systems: CBE2124, Levicky
No ratings yet
Chapter 6 - Multiphase Systems: CBE2124, Levicky
27 pages
Cable Products Pricelist Cable Products Pricelist: Cable Products Price List Cable Products Price List
No ratings yet
Cable Products Pricelist Cable Products Pricelist: Cable Products Price List Cable Products Price List
24 pages
Unit - V PIG Hadoop & Big Data: Pig Latin. This Language Provides Various Operators Using Which Programmers
No ratings yet
Unit - V PIG Hadoop & Big Data: Pig Latin. This Language Provides Various Operators Using Which Programmers
9 pages
Electromagnetic Warp Drive Theory
No ratings yet
Electromagnetic Warp Drive Theory
16 pages
COC III Set Up Computer Server
No ratings yet
COC III Set Up Computer Server
77 pages
Hadoop Pig Presentation
No ratings yet
Hadoop Pig Presentation
33 pages
In An Artist's Studio
50% (2)
In An Artist's Studio
4 pages
Hadoop Big Data: Pig, Hive, HBase
No ratings yet
Hadoop Big Data: Pig, Hive, HBase
17 pages
Big Data Notes Pig
No ratings yet
Big Data Notes Pig
38 pages
CH1O3 Questions PDF
No ratings yet
CH1O3 Questions PDF
52 pages
Ii M.A. English Men 33 - Contemporary Literary Theory-I
No ratings yet
Ii M.A. English Men 33 - Contemporary Literary Theory-I
16 pages
BDA - Unit-4 Part 1
No ratings yet
BDA - Unit-4 Part 1
47 pages
A Comprehensive Look at The Acid Number Test PDF
No ratings yet
A Comprehensive Look at The Acid Number Test PDF
6 pages
Big Data Unit IV
No ratings yet
Big Data Unit IV
19 pages
Unit Iv Part - 2
No ratings yet
Unit Iv Part - 2
59 pages
UNIT 5 Complete Notes
No ratings yet
UNIT 5 Complete Notes
21 pages
BDA Module 4 - Part 1 (Pig) 2023
100% (1)
BDA Module 4 - Part 1 (Pig) 2023
34 pages
Determinants of The Money Supply: © 2005 Pearson Education Canada Inc
No ratings yet
Determinants of The Money Supply: © 2005 Pearson Education Canada Inc
17 pages
Unit IV - Big Data Programming
No ratings yet
Unit IV - Big Data Programming
17 pages
Heimdal The Gjallarhorn The Horn Resounding and Ragnarok by Ormungandr Melchizedek
100% (1)
Heimdal The Gjallarhorn The Horn Resounding and Ragnarok by Ormungandr Melchizedek
4 pages
PIG A Big Data Processor
No ratings yet
PIG A Big Data Processor
49 pages
Notes Unit 5 Bigdata
No ratings yet
Notes Unit 5 Bigdata
21 pages
Hadoop Pig
No ratings yet
Hadoop Pig
111 pages
Summit X460 Series: Scalable Aggregation and Edge Switch
No ratings yet
Summit X460 Series: Scalable Aggregation and Edge Switch
13 pages
PIg in BIg Data
No ratings yet
PIg in BIg Data
28 pages
Apache Pig & Pig Latin Overview
No ratings yet
Apache Pig & Pig Latin Overview
41 pages
3 Pig
No ratings yet
3 Pig
77 pages
Cambridge IGCSE: FRENCH 0520/03
No ratings yet
Cambridge IGCSE: FRENCH 0520/03
18 pages
CH 16
No ratings yet
CH 16
106 pages
Unit V-Apache Pig
No ratings yet
Unit V-Apache Pig
10 pages
Advanced Flight Ops Training
No ratings yet
Advanced Flight Ops Training
3 pages
PIg in BIg Data
No ratings yet
PIg in BIg Data
28 pages
MA6452 S&NM 1 - by Civildatas - Com 12
No ratings yet
MA6452 S&NM 1 - by Civildatas - Com 12
50 pages
Big Data Applications: Pig & Hive
No ratings yet
Big Data Applications: Pig & Hive
29 pages
Bda Unit 4 060115 Big Data Analytics Unit 4
No ratings yet
Bda Unit 4 060115 Big Data Analytics Unit 4
19 pages
Unit-V Pig Programming
No ratings yet
Unit-V Pig Programming
123 pages
Unit5 Bigdatanotes
No ratings yet
Unit5 Bigdatanotes
52 pages
Unit 5
No ratings yet
Unit 5
76 pages
Apache Pig
No ratings yet
Apache Pig
23 pages
160719a0cd3011 - 29094359708
No ratings yet
160719a0cd3011 - 29094359708
2 pages
Pig
No ratings yet
Pig
6 pages
DSR Ss 03 January 2023 Indordb
No ratings yet
DSR Ss 03 January 2023 Indordb
19 pages
Tutorial 2 - Signals
No ratings yet
Tutorial 2 - Signals
9 pages
Pig and Pig Latin
No ratings yet
Pig and Pig Latin
16 pages
Bda Unit 4 060115 Big Data Analytics Unit 4
No ratings yet
Bda Unit 4 060115 Big Data Analytics Unit 4
19 pages
Introduction to Apache Pig & Pig Latin
No ratings yet
Introduction to Apache Pig & Pig Latin
28 pages
Unit-4 SGS
No ratings yet
Unit-4 SGS
13 pages
BDA-Unit 5-Notes
No ratings yet
BDA-Unit 5-Notes
36 pages
MITinformation Brochure 2 June 2023
No ratings yet
MITinformation Brochure 2 June 2023
18 pages
BDA - HIVE & PIG-Other Notes in Detail
No ratings yet
BDA - HIVE & PIG-Other Notes in Detail
162 pages
BDA - UNIT 4 PIG Notes
No ratings yet
BDA - UNIT 4 PIG Notes
9 pages
Unit 4 Apachepig 210825041412
No ratings yet
Unit 4 Apachepig 210825041412
16 pages
Apache PIG
No ratings yet
Apache PIG
41 pages
Bda Unit 4
No ratings yet
Bda Unit 4
16 pages
Unit 4
No ratings yet
Unit 4
20 pages
06 Pig 01 Intro 1
No ratings yet
06 Pig 01 Intro 1
23 pages
Unit 5
No ratings yet
Unit 5
24 pages
PIG
No ratings yet
PIG
9 pages
Anthony 8
No ratings yet
Anthony 8
2 pages
6 Part2
No ratings yet
6 Part2
45 pages
Unit IV
No ratings yet
Unit IV
36 pages
Pig: Building High-Level Dataflows Over Map-Reduce
No ratings yet
Pig: Building High-Level Dataflows Over Map-Reduce
59 pages
Big Data Unit-5
No ratings yet
Big Data Unit-5
9 pages
Bdaut 2
No ratings yet
Bdaut 2
66 pages
Pumpe en 2023 v1
No ratings yet
Pumpe en 2023 v1
12 pages
Project Review Guidelines
No ratings yet
Project Review Guidelines
2 pages
Shivam
No ratings yet
Shivam
43 pages
Big Data Module V Notes
No ratings yet
Big Data Module V Notes
26 pages
BDTools PIG
No ratings yet
BDTools PIG
14 pages
21 BQ Guide Acceptance Letter
No ratings yet
21 BQ Guide Acceptance Letter
3 pages
Bda Unit 4 060115 Big Data Analytics Unit 4
No ratings yet
Bda Unit 4 060115 Big Data Analytics Unit 4
19 pages
Bda 4
No ratings yet
Bda 4
50 pages
Nikhil&library Final.... 1
No ratings yet
Nikhil&library Final.... 1
10 pages
South Central Railway Women's Welfare Organisations (Regd) Vijayawada
No ratings yet
South Central Railway Women's Welfare Organisations (Regd) Vijayawada
21 pages
2021 Batch B.Tech 4-1 (R20) Sup Exams, Feb-2025 Results
No ratings yet
2021 Batch B.Tech 4-1 (R20) Sup Exams, Feb-2025 Results
16 pages
Unit No 4 Hadoop Eco System
No ratings yet
Unit No 4 Hadoop Eco System
15 pages
Đề Khảo Sát Cuối Kỳ Ii
No ratings yet
Đề Khảo Sát Cuối Kỳ Ii
5 pages
BDA Unit 5-1
No ratings yet
BDA Unit 5-1
29 pages
Region Religion and Politics 100 Years of Shiromani Alcali Dal Amarjit S Narang Download
No ratings yet
Region Religion and Politics 100 Years of Shiromani Alcali Dal Amarjit S Narang Download
64 pages
B.tech 1-1 Supply Exams, May-2025 Revaluation Results
No ratings yet
B.tech 1-1 Supply Exams, May-2025 Revaluation Results
7 pages
U5 Big Data Aktu
No ratings yet
U5 Big Data Aktu
32 pages
Eapcet-Sche - Aptonline.in EAPCET EapAllotment GetSubmitSelfReport
No ratings yet
Eapcet-Sche - Aptonline.in EAPCET EapAllotment GetSubmitSelfReport
1 page
Southern Rai Way
No ratings yet
Southern Rai Way
5 pages

Pig SKB

Uploaded by

Pig SKB

Uploaded by

What is Apache Pig

Apache Pig is a high-level data flow platform for executing MapReduce

Pig can handle any type of data, i.e., structured, semi-structured or

Features of Apache Pig

Let's see the various uses of Pig technology.

It can easily handle structured as well as unstructured data.

It contains various type of operators such as sort, filter and joins.

Differences between Apache MapReduce and PIG

Apache MapReduce Apache PIG

It is a high-level data flow

It provides nested data

Advantages of Apache Pig

o Reusability - The Pig code is flexible enough to reuse again.

Apache Pig Run Modes

o It executes in a single JVM and is used for development

o Here, files are installed and run using localhost.

The command for local mode grunt shell:

o The MapReduce mode is also known as Hadoop Mode.

o It is the default mode.

o It can be executed against semi-distributed or fully distributed

o Here, the input and output data are present on HDFS.

The command for Map reduce mode:

Ways to execute Pig Program

o Interactive Mode - In this mode, the Pig is executed in the Grunt

o Embedded Mode - In this mode, we can define our own functions.

Pig Latin Statements

o It can span multiple lines.

o Each statement must end with a semi-colon.

o It may include expression and schemas.

o By default, these statements are processed using multi-query

Pig Latin Conventions

The parenthesis can enclose one o

The straight brackets can enclose

The curly brackets enclose two or

The horizontal ellipsis points indica

Latin Data Types

Simple Data Types

It defines the signed 32-bit

It defines 32-bit floating point

It defines 64-bit floating point

It defines character array in

bytearray It defines the byte array.

It defines the boolean type

It defines the values in

It defines Java BigInteger

It defines Java BigDecimal

It defines a collection of tuples.

It defines a set of key-value pairs.

1. grunt> lines = LOAD "/user/Desktop/data.txt" AS (line: chararray);

1. grunt>tokens = FOREACH lines GENERATE flatten(TOKENIZE(line)) A

1. grunt>letters = FOREACH tokens GENERATE SUBSTRING(0,1) as lett

1. grunt>lettergrp = GROUP letters by letter;

Case 5: The number of occurrence is counted in each group.

1. grunt>countletter = FOREACH lettergrp GENERATE group , COUNT(

Case 6: Arrange the output according to count in descending order using

Case 7: Limit to One to give the result.

1. grunt> result =LIMIT OrderCnt 1;

1. grunt> STORE result into 'home/sonoo/output';

You might also like