Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
30 views3 pages

Bda Practical 2

data analyst

Uploaded by

varmavikash990
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views3 pages

Bda Practical 2

data analyst

Uploaded by

varmavikash990
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Faculty of Engineering & Technology

Big Data Analytics (203105348)


B. Tech CSE 4rd Year 7th Semester

PRACTICAL 2

Aim: Write a program of Word Count in Map Reduce over HDFS.

Description:
MapReduce is a framework for processing large datasets using a large number of computers
(nodes), collectively referred to as a cluster. Processing can occur on data stored in a file
system (HDFS).A method for distributing computation across multiple nodes. Each node
processes the data that is stored at that node.

Consists of two main phases

Mapper Phase

Reduce phase

Input data set is split into independent blocks – processed in parallel. Each input split is
converted in Key Value pairs. Mapper logic processes each key value pair and produces and
intermediate key value pairs based on the implementation logic. Resultant key value pairs can
be of different type from that of input key value pairs. The output of Mapper is passed to the
reducer. Output of Mapper function is the input for Reducer. Reducer sorts the intermediate
key value pairs. Applies reducer logic upon the key value pairs and produces the output in
desired format. Output is stored in HDFS

Enrollment No.: 2203051057106


Roll Number: 68
Div: 7A9(CSE)
Faculty of Engineering & Technology
Big Data Analytics (203105348)
B. Tech CSE 4rd Year 7th Semester
Code:

import urllib.request
import random
from operator import itemgetter

current_word={}
current_count=0
story ='http://sixty-north.com/c/t.txt'
request=urllib.request.urlopen(story)
response=urllib.request.urlopen(story)

each_word=[]
words=None
count=1
same_words={}
word=[]

for line in response:


line_words=line.split()
for word in line_words:
each_word.append(word)

for words in each_word:


if words.lower() not in same_words.keys():
same_words[words.lower()]=1
else:
same_words[words.lower()]+=1

for each in same_words.keys():


print("word =",each,"count =",same_words[each])

Enrollment No.: 2203051057106


Roll Number: 68
Div: 7A9(CSE)
Faculty of Engineering & Technology
Big Data Analytics (203105348)
B. Tech CSE 4rd Year 7th Semester
Output:

Enrollment No.: 2203051057106


Roll Number: 68
Div: 7A9(CSE)

You might also like