FACULTY OF ENGINEERING & TECHNOLOGY
Big-Data Analysis(203105348)
B. Tech.7TH SEM
PRACTICAL-1
AIM: To understand the overall programming architecture using Map Reduce
API.
The MapReduce task is mainly divided into two phase and reduce phase.
1. Map(), filter(), and reduce() in python.
2. These function are most commonly used with lambda function.
1. Map():
“A map function executes certain instruction or functionality provided to it on every
item of an iterable could be a list, tuple, set, etc.
SYNTAX:
map(function,iterable)
example:
items=[1,2,3,4,5]
a=list(map((lambda x: x**3), items))
print(a)
output:
[1, 8, 27, 64, 125]
2. filter():-
“ A filter function in python tests a specific tests a specific user-defined condition for a
function and returns an iterable for the elements and values that satisfy the condition or,
in other words, return true.”
SYNTAX:
filter(function, variable)
example:
a=[1,2,3,4,5,6]
b=[2,5,0,7,3]
c=list(filter(lambda x: x in a, b))
print(c)
output:
ENROLLMENT NO: 2203051057108
FACULTY OF ENGINEERING & TECHNOLOGY
Big-Data Analysis(203105348)
B. Tech.7TH SEM
3. reduce():
“reduce functions apply a function to every item of an iterable and gives back to a single
value as a resultant”.
We have to import the reduce function from functools module using the statement
SYNTAX:
reduce(function, iterable)
example:
from functools import reduce
a=reduce((lambda x, y: x*y), [1,2,3,4])
print(a)
output:
ENROLLMENT NO: 2203051057108
FACULTY OF ENGINEERING & TECHNOLOGY
Big-Data Analysis(203105348)
B. Tech.7TH SEM
Practical-2
AIM: Write a program of word Count in Map Reduce over HDFS.
Description:
MapReduce is a framework for processing large datasets using a large number of computers
(nodes), collectively referred to as a cluster. Processing can occur on data stored in a file
system (HDFS).A method for distributing computation across multiple nodes.Each node
processes the data that is stored at that node.
Consists of two main phases
Mapper Phase
Reduce phase
Input data set is split into independent blocks – processed in parallel. Each input split is
converted in Key Value pairs. Mapper logic processes each key value pair and produces and
intermediate key value pairs based on the implementation logic. Resultant key value pairs can be
of different type from that of input key value pairs. The output of Mapper is passed to the
reducer. Output of Mapper function is the input for Reducer. Reducer sorts the intermediate key
value pairs. Applies reducer logic upon the key value pairs and produces the output in desired
format.Output is stored in HDFS
ENROLLMENT NO: 2203051057108
FACULTY OF ENGINEERING & TECHNOLOGY
Big-Data Analysis(203105348)
B. Tech.7TH SEM
Execution Step:
http://content.udacity-data.com/courses/ud617/Cloudera-Udacity-Training-VM-
4.1.1.c.zip
Create the jar file of this program and name it wordcount.jar.
Run the jar file
hadoop fs -mkdir/input
hadoop fs -put/home/training/Desktop/sample.txt /output
Output:
hadoop fs -cat /output/part-00000
Word Count Java Program
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
public class wordcount extends Configured implements Tool {
@Override
public int run(String[] args) throws Exception {
ENROLLMENT NO: 2203051057108
FACULTY OF ENGINEERING & TECHNOLOGY
Big-Data Analysis(203105348)
B. Tech.7TH SEM
if(args.length<2)
{
System.out.println("Plz Give Input Output Directory Correctly");
return -1;
}
JobConf conf = new JobConf(wordcount.class);
FileInputFormat.setInputPaths(conf,new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
conf.setMapperClass(wordmapper.class);
conf.setReducerClass(wordreducer.class);
conf.setMapOutputKeyClass(Text.class);
conf.setMapOutputValueClass(IntWritable.class);
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
JobClient.runJob(conf);
return 0;
}
public static void main(String args[]) throws Exception
{
int exitcode = ToolRunner.run(new wordcount(), args);
System.exit(exitcode);
}
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;
public class wordmapper extends MapReduceBase implements
Mapper<LongWritable,Text,Text,IntWritable>
{
public void map(LongWritable key, Text value,
OutputCollector<Text, IntWritable> output, Reporter r)
throws IOException {
String s =value.toString();
for(String word:s.split(" "))
ENROLLMENT NO: 2203051057108
FACULTY OF ENGINEERING & TECHNOLOGY
Big-Data Analysis(203105348)
B. Tech.7TH SEM
{
if(word.length()>0)
{
output.collect(new Text(word), new IntWritable(1));
}
}
import java.io.IOException;
import java.util.Iterator;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;
public class wordreducer extends MapReduceBase implements
Reducer<Text,IntWritable,Text,IntWritable>
{
public void reduce(Text key, Iterator<IntWritable> values,
OutputCollector<Text, IntWritable> output, Reporter r)
throws IOException {
int count=0;
while(values.hasNext())
{
IntWritable i= values.next();
count+= i.get();
}
output.collect(key, new IntWritable(count));
ENROLLMENT NO: 2203051057108
FACULTY OF ENGINEERING & TECHNOLOGY
Big-Data Analysis(203105348)
B. Tech.7TH SEM
ENROLLMENT NO: 2203051057108