Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
23 views44 pages

BigDataAnalytics Lab Manual (DS)

The document is a lab manual for the Big Data Analytics Lab at the Princeton Institute of Engineering & Technology for Women, detailing the course objectives, outcomes, and experiments for the academic year 2022-2023. It outlines the vision, mission, program educational objectives, and specific outcomes for the Computer Science Engineering (Data Science) department. Additionally, it provides a comprehensive list of experiments and procedures for using Hadoop and other data analytics tools.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views44 pages

BigDataAnalytics Lab Manual (DS)

The document is a lab manual for the Big Data Analytics Lab at the Princeton Institute of Engineering & Technology for Women, detailing the course objectives, outcomes, and experiments for the academic year 2022-2023. It outlines the vision, mission, program educational objectives, and specific outcomes for the Computer Science Engineering (Data Science) department. Additionally, it provides a comprehensive list of experiments and procedures for using Hadoop and other data analytics tools.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 44

PRINCETON INSTITUTE OF ENGINEERING & TECHNOLOGY FOR WOMEN

(Approved by AICTE, New Delhi & Affiliated to JNTU Hyderabad)


Chowdaryguda (V), Ghatkesar (M), Medchal-Malkajgiri(D).TS-500088 Phone: 9394544566 /
6305324412

e-mail: [email protected]

JNTUH Code(6M) CIVIL–EEE–ECE-CSE-CSE(AI&ML)-CSE(DS)-CSE(CS) EAMCET Code– PETW

DEPARTMENT OF COMPUTER SCIENCE ENGINEERING – DATA SCIENCE

BIG DATA ANALYTICS LAB


LAB MANUAL

Subject Code : C323


Regulation : R18/JNTUH
Academic Year : 2022-2023

III B. TECH II SEMESTER


COMPUTER SCIENCE AND ENGINEERING(DATA SCIENCE)
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
PRINCETON INSTITUTE OF ENGINEERING & TECHNOLOGY FOR WOMEN
(Approved by AICTE, New Delhi & Affiliated to JNTU Hyderabad)
Chowdaryguda (V), Ghatkesar (M), Medchal-Malkajgiri(D).TS-500088 Phone: 9394544566 /
6305324412

e-mail: [email protected]

JNTUH Code(6M) CIVIL–EEE–ECE-CSE-CSE(AI&ML)-CSE(DS)-CSE(CS) EAMCET Code– PETW

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING


Academic Year 2022-2023
COLLEGE VISION, MISSION, CORE VALUES AND QUALITY POLICY:
VISION
The educational environment in order to develop graduates with the strong academic technical backgrounds
needed to achieve distinction in the discipline and to bring up the Institution as an Institution of Academic
excellence of International standard.

MISSION

We transform persons into personalities by the state-of-the-art infrastructure, time consciousness, quick
response and the best academic practices through assessment and advice.

CORE VALUES

Attaining global eminence, by achieving excellence in all that we do in life education and service

VISION AND MISSION OF CSE(DATA SCIENCE) DEPARTMENT


VISION

Innovation and research excellence in Data Science to be a lifelong learner with competence in engineering and
professional core, with the unremitting update of the curriculum.

MISSION
impart knowledge in Data Science technologies to meet industrial standards and Integrate research into
practical, relevant solutions to address business and societal challenges.
PRINCETON INSTITUTE OF ENGINEERING & TECHNOLOGY FOR WOMEN
(Approved by AICTE, New Delhi & Affiliated to JNTU Hyderabad)
Chowdaryguda (V), Ghatkesar (M), Medchal-Malkajgiri(D).TS-500088

Phone: 9394544566 / 6305324412

e-mail: [email protected]

JNTUH Code(6M) CIVIL–EEE–ECE-CSE-CSE(AI&ML)-CSE(DS)-CSE(CS) EAMCET Code– PETW

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

(DATA SCIENCE)
Academic Year 2022-2023

PROGRAM EDUCATIONAL OBJECTIVES AND PROGRAM SPECIFIC


OUTCOMES:

PROGRAM EDUCATIONAL OBJECTIVES (PEOs)


PEOs DESCRIPTION

Graduates of the program will have a globally competent professional


PEO1
career in Data Science domain.

To prepare students to excel in Data Science with the technical skills


PEO2 and competency to carry out research and address basic needs of the
society.

Graduates of the program will have entrepreneur skills with a lifelong


PEO3 learning attitude in order to support the growth of economy of a country.
PROGRAM SPECIFIC OUTCOMES (PSOs)
PSOs DESCRIPTION

Analyse and visualize data in the context of real world problems,


PSO1 communicate findings, and interpret results using data analytics for
decision making.

Evaluate, analyse and synthesize solutions for real world problems in


Data Science to conduct research in a wider theoretical and practical
PSO2
context and analyse ethical issues in business related to intellectual
property, data security, integrity, and privacy.

Making them to use their technical expertise in latest technologies and


PSO3 update knowledge continuously in Data Science Learning to excel in
career.
PROGRAM OUTCOMES:

PROGRAM OUTCOMES (POs)


POs DESCRIPTION
Engineering knowledge: Apply the knowledge of mathematics, science, engineering fundamentals, and
PO1 an engineering specialization to the solution of complex engineering problems.

Problem analysis: Identify, formulate, review research literature, and analyze complex engineering
PO2 problems reaching substantiated conclusions using first principles of mathematics, natural sciences, and
engineering sciences.
Design/development of solutions: Design solutions for complex engineering problems and design
system components or processes that meet the specified needs with appropriate consideration for the
PO3 public health and safety, and the cultural, societal, and environmental considerations.

Conduct investigations of complex problems: Use research-based knowledge and research methods
PO4 including design of experiments, analysis and interpretation of data, and synthesis of the information to
provide valid conclusions.
Modern tool usage: Create, select, and apply appropriate techniques, resources, and modern engineering
PO5 and IT tools including prediction and modeling to complex engineering activities with an understanding
of the limitations.
The engineer and society: Apply reasoning informed by the contextual knowledge to assess societal,
PO6 health, safety, legal and cultural issues and the consequent responsibilities relevant to the professional
engineering practice.
Environment and sustainability: Understand the impact of the professional engineering solutions in
PO7 societal and environmental contexts, and demonstrate the knowledge of, and need for sustainable
development.
Ethics: Apply ethical principles and commit to professional ethics and responsibilities and norms of the
PO8 engineering practice.
Individual and team work: Function effectively as an individual, and as a member or leader in diverse
PO9
teams, and in multidisciplinary settings.
Communication: Communicate effectively on complex engineering activities with the engineering
PO10 community and with society at large, such as, being able to comprehend and write effective reports and
design documentation, make effective presentations, and give and receive clear instructions.
Project management and finance: Demonstrate knowledge and understanding of the engineering and
PO11 management principles and apply these to one’s own work, as a member and leader in a team, to manage
projects and in multidisciplinary environments.
Life-long learning: Recognize the need for, and have the preparation and ability to engage in
PO12 independent and life-long learning in the broadest context of technological change.
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
(DATA SCIENCE)

BIG DATA ANALYTICS LAB

COURSE OBJECTIVES:

1.purpose of this course is to provide the students with the knowledge of Big data Analytics

principles and techniques.

2.This course is also designed to give an exposure of the frontiers of Big data Analytics

COURSE OUTCOMES:

CO1:Use Excel as an analytical tool and visualization tool

CO2:Ability to program using HADOOP and


Map reduce
CO3:Ability to perform data analytics using ML in R
CO4:Use Cassandra to perform social media analytics.
CO5:Ability to program R project for data visualization of social media data.
LIST OF EXPERIMENTS
S.No Name of the Experiment

Implement a simple map-reduce job that builds an inverted index on the set of input
documents (Hadoop)

2 Process big data in HBase

3 Store and retrieve data in Pig

4 Perform Social media analysis using Cassandra

5 Buyer event analytics using Cassandra on suitable product sales data visualization

6 Using Power Pivot (Excel) Perform the following on any dataset


a) Big Data Analytics
b) Big Data Charting
7 Use R-project to carryout statistical analysis of big data

8 Use R-project for data visualization of social media data


EXP NO: 1 Install Apache Hadoop

AIM: To Install Apache Hadoop.

Hadoop software can be installed in three modes of

Hadoop is a Java-based programming framework that supports the processing and storage of
extremely large datasets on a cluster of inexpensive machines. It was the first major open source
project in the big data playing field and is sponsored by the Apache Software Foundation.

Hadoopis comprised of four main layers:

 Hadoop Common is the collection of utilities and libraries that support other Hadoop
modules.
 HDFS, which stands for Hadoop Distributed File System, is responsible for persisting
data to disk.
 YARN, short for Yet Another Resource Negotiator, is the "operating system" for HDFS.
 MapReduce is the original processing model for Hadoop clusters. It distributes work
within the cluster or map, then organizes and reduces the results from the nodes into a
response to a query. Many other processing models are available for the 2.x version of
Hadoop.
Hadoop clusters are relatively complex to set up, so the project includes a stand-alone mode which
is suitable for learning about Hadoop, performing simple operations, and debugging.

Procedure:

we'll install Hadoop in stand-alone mode and run one of the example example MapReduce
programs it includes to verify the installation.

Prerequisites:

Step1: Installing Java 8 version.


Openjdk version "1.8.0_91"
OpenJDK Runtime Environment (build 1.8.0_91-8u91-b14-3ubuntu1~16.04.1-b14)
OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)
This output verifies that OpenJDK has been successfully installed.
Note: To set the path for environment variables. i.e. JAVA_HOME

Step2: Installing Hadoop


With Java in place, we'll visit the Apache Hadoop Releases page to find the most recent
stable release. Follow the binary for the current release:

Download Hadoop from www.hadoop.apache.org


Procedure to Run Hadoop

1. Install Apache Hadoop 2.2.0 in Microsoft Windows OS

If Apache Hadoop 2.2.0 is not already installed then follow the post Build, Install, Configure
and Run Apache Hadoop 2.2.0 in Microsoft Windows OS.

2. Start HDFS (Namenode and Datanode) and YARN (Resource Manager and Node
Manager)

Run following commands.


Command Prompt C:\Users\
abhijitg>cd c:\hadoop c:\
hadoop>sbin\start-dfs c:\
hadoop>sbin\start-yarn starting yarn
daemons

Namenode, Datanode, Resource Manager and Node Manager will be started in few minutes
and ready to execute Hadoop MapReduce job in the Single Node (pseudo-distributed mode)
cluster.
Resource Manager & Node Manager:

3
Run wordcount MapReduce job

Now we'll run wordcount MapReduce job available


in %HADOOP_HOME%\share\hadoop\mapreduce\hadoop-mapreduce-
examples- 2.2.0.jar

Create a text file with some content. We'll pass this file as input to
the wordcount MapReduce job for counting words.
C:\file1.txt
Install Hadoop

Run Hadoop Wordcount Mapreduce Example

Create a directory (say 'input') in HDFS to keep all the text files (say 'file1.txt') to be used
for counting words.
C:\Users\abhijitg>cd c:\hadoop C:\
hadoop>bin\hdfs dfs -mkdir input

Copy the text file(say 'file1.txt') from local disk to the newly created 'input' directory in HDFS.

C:\hadoop>bin\hdfs dfs -copyFromLocal c:/file1.txt input


4
Check content of the copied file.

C:\hadoop>hdfs dfs -ls input


Found 1 items
-rw-r--r-- 1 ABHIJITG supergroup 55 2014-02-03 13:19 input/file1.txt

C:\hadoop>bin\hdfs dfs -cat input/file1.txt


Install Hadoop
Run Hadoop Wordcount Mapreduce Example

Run the wordcount MapReduce job provided


in %HADOOP_HOME%\share\hadoop\mapreduce\hadoop-mapreduce-examples-2.2.0.jar

C:\hadoop>bin\yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples- 2.2.0.jar


wordcount input output
14/02/03 13:22:02 INFO client.RMProxy: Connecting to ResourceManager at
/0.0.0.0:8032
14/02/03 13:22:03 INFO input.FileInputFormat: Total input paths to process : 1 14/02/03
13:22:03 INFO mapreduce.JobSubmitter: number of splits:1
:
:
14/02/03 13:22:04 INFO mapreduce.JobSubmitter: Submitting tokens for job:
job_1391412385921_0002
14/02/03 13:22:04 INFO impl.YarnClientImpl: Submitted application
application_1391412385921_0002 to ResourceManager at /0.0.0.0:8032 14/02/03
13:22:04 INFO mapreduce.Job: The url to track the job:
http://ABHIJITG:8088/proxy/application_1391412385921_0002/
14/02/03 13:22:04 INFO mapreduce.Job: Running job: job_1391412385921_0002 14/02/03
13:22:14 INFO mapreduce.Job: Job job_1391412385921_0002 running in uber mode : false
14/02/03 13:22:14 INFO mapreduce.Job: map 0% reduce 0%
14/02/03 13:22:22 INFO mapreduce.Job: map 100% reduce 0%
14/02/03 13:22:30 INFO mapreduce.Job: map 100% reduce 100%
14/02/03 13:22:30 INFO mapreduce.Job: Job job_1391412385921_0002 completed
successfully
14/02/03 13:22:31 INFO mapreduce.Job: Counters: 43 File
System Counters
FILE: Number of bytes read=89
FILE: Number of bytes written=160142 FILE: Number of
read operations=0 FILE: Number of large read
operations=0 FILE: Number of write operations=0
5
HDFS: Number of bytes read=171 HDFS: Number of
bytes written=59 HDFS: Number of read
operations=6
HDFS: Number of large read operations=0 HDFS: Number
of write operations=2
Job Counters
Launched map tasks=1 Launched reduce
tasks=1 Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=5657 Total time spent by
all reduces in occupied slots (ms)=6128
Map-Reduce Framework Map input
records=2 Map output records=7
Map output bytes=82
Map output materialized bytes=89 Input split
bytes=116
Combine input records=7 Combine output
records=6 Reduce input groups=6 Reduce
shuffle bytes=89 Reduce input records=6
Reduce output records=6 Spilled
Records=12 Shuffled Maps =1
Failed Shuffles=0 Merged Map
outputs=1
GC time elapsed (ms)=145 CPU time
spent (ms)=1418
Physical memory (bytes) snapshot=368246784 Virtual memory
(bytes) snapshot=513716224 Total committed heap usage
(bytes)=307757056
Shuffle Errors
BAD_ID=0 CONNECTION=0
IO_ERROR=0 WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters Bytes
Read=55
File Output Format Counters
6
Bytes Written=59
http://abhijitg:8088/cluster

Result: We've installed Hadoop in stand-alone mode and verified it by running an example
program it provided.
EXP NO: 2 Implement a simple MapReduce job that builds an
inverted index on the set of input documents(Hadoop)

AIM: To Develop a MapReduce program to calculate the frequency of a given word in agiven file
Map Function – It takes a set of data and converts it into another set of data, where individual
elements are broken down into tuples (Key-Value pair).

Input
Set of data
Bus, Car, bus, car, train, car, bus, car, train, bus, TRAIN,BUS, buS, caR, CAR, car, BUS, TRAIN
Output
Convert into another set of data
(Key,Value)
(Bus,1), (Car,1), (bus,1), (car,1), (train,1), (car,1), (bus,1), (car,1), (train,1), (bus,1),
(TRAIN,1),(BUS,1), (buS,1), (caR,1), (CAR,1), (car,1), (BUS,1), (TRAIN,1)
Reduce Function – Takes the output from Map as an input and combines those data tuples
into a smaller set of tuples.
Example – (Reduce function in Word Count)
Input Set of Tuples
(output of Map function)
(Bus,1), (Car,1), (bus,1), (car,1), (train,1), (car,1), (bus,1), (car,1), (train,1), (bus,1),
(TRAIN,1),(BUS,1),
(buS,1),(caR,1),(CAR,1), (car,1), (BUS,1), (TRAIN,1)
Output Converts into smaller set of tuples
(BUS,7), (CAR,7), (TRAIN,4)
Work Flow of Program
Workflow of MapReduce consists of 5 steps
1. Splitting – The splitting parameter can be anything, e.g. splitting by space,
comma, semicolon, or even by a new line (‘\n’).
2. Mapping – as explained above
3. Intermediate splitting – the entire process in parallel on different clusters. In order to
group them in “Reduce Phase” the similar KEY data should be on same cluster.
4. Reduce – it is nothing but mostly group by phase
5. Combining – The last phase where all the data (individual result set from each
cluster) is combine together to form a Result

Now Let’s See the Word Count Program in Java

Make sure that Hadoop is installed on your system with java idk Steps to

follow

Step 1. Open Eclipse> File > New > Java Project > (Name it – MRProgramsDemo) > Finish
Step 2. Right Click > New > Package ( Name it - PackageDemo) > Finish Step 3.
Right Click on Package > New > Class (Name it - WordCount) Step 4. Add
Following Reference Libraries –
9
Right Click on Project > Build Path> Add External Archivals
 /usr/lib/hadoop-0.20/hadoop-core.jar
 Usr/lib/hadoop-0.20/lib/Commons-cli-1.2.jar

Program: Step 5. Type following Program :

package PackageDemo; import


java.io.IOException;
import org.apache.hadoop.conf.Configuration; import
org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable; import
org.apache.hadoop.io.LongWritable; import
org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job; import
org.apache.hadoop.mapreduce.Mapper; import
org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import
org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import
org.apache.hadoop.util.GenericOptionsParser;
public class WordCount {
public static void main(String [] args) throws Exception
{
Configuration c=new Configuration();
String[] files=new GenericOptionsParser(c,args).getRemainingArgs(); Path
input=new Path(files[0]);
Path output=new Path(files[1]); Job
j=new Job(c,"wordcount");
j.setJarByClass(WordCount.class);
j.setMapperClass(MapForWordCount.class);
j.setReducerClass(ReduceForWordCount.class); j.setOutputKeyClass(Text.class);
j.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(j, input);
FileOutputFormat.setOutputPath(j, output);
System.exit(j.waitForCompletion(true)?0:1);
}
public static class MapForWordCount extends Mapper<LongWritable, Text, Text,
IntWritable>{
public void map(LongWritable key, Text value, Context con) throws IOException,
InterruptedException
{
String line = value.toString();
10
String[] words=line.split(",");
for(String word: words )
{
Text outputKey = new Text(word.toUpperCase().trim());
IntWritable outputValue = new IntWritable(1);
con.write(outputKey, outputValue);
}
}
}
public static class ReduceForWordCount extends Reducer<Text, IntWritable, Text,
IntWritable>
{
public void reduce(Text word, Iterable<IntWritable> values, Context con) throws
IOException,
InterruptedException
{
int sum = 0;
for(IntWritable value : values)
{
sum += value.get();
}
con.write(word, new IntWritable(sum));
}
}
}

Make Jar File


Right Click on Project> Export> Select export destination as Jar File > next> Finish
12
To Move this into Hadoop directly, open the terminal and enter the following commands:
[training@localhost ~]$ hadoop fs -put wordcountFile wordCountFile

Run Jar file


(Hadoop jar jarfilename.jar packageName.ClassName PathToInputTextFile
PathToOutputDirectry)

[training@localhost ~]$ Hadoop jar MRProgramsDemo.jar


PackageDemo.WordCount wordCountFile MRDir1

Result: Open Result

[training@localhost ~]$ hadoop fs -ls MRDir1 Found 3


items
-rw-r--r-- 1 training supergroup
0 2016-02-23 03:36 /user/training/MRDir1/_SUCCESS drwxr-
xr-x - training supergroup
0 2016-02-23 03:36 /user/training/MRDir1/_logs
-rw-r--r-- 1 training supergroup
20 2016-02-23 03:36 /user/training/MRDir1/part-r-00000
[training@localhost ~]$ hadoop fs -cat MRDir1/part-r-00000 BUS
7
CAR 4
TRAIN 6
EXP NO: 3 Store and retrieve data in Pig

OBJECTIVE:

1. Installation of PIG.

STEPS FOR INSTALLING APACHE PIG


1) Extract the pig-0.15.0.tar.gz and move to home directory
2) Set the environment of PIG in bashrc file.
3) Pig can run in two modes
Local Mode and Hadoop Mode
Pig –x local and pig
4) Grunt Shell
Grunt >
5) LOADING Data into Grunt Shell
DATA = LOAD <CLASSPATH> USING PigStorage(DELIMITER) as (ATTRIBUTE :
DataType1, ATTRIBUTE : DataType2…..)
6) Describe Data
Describe DATA;
7) DUMP Data
Dump DATA;

INPUT/OUTPUT:
Input as Website Click Count Data
EXP NO: 4 Install Apache Cassandra

AIM: To Install Apache Cassandra

Cassandra Setup and Installation:

Apache Cassandra and Datastax enterprise is used by different organization for storing huge amount of data.

Before installing Apache Cassandra, you must have the following things:

You must have data stax community edition.,JDK must be installed,Platform should be Window.

Download and Install Cassandra

Run the datastax community edition setup. After running the setup, you will see the following page will be
displayed. It is a screenshot of 64 bit version.Click on the next button and you will get the following page:
Press the next button and you will get the following page specifying the location of the installation

Press the next button and a page will appear asking about whether you automatically start Data Stax DDC
service. Click on the radio button and proceed next.
Installation is started now. After completing the installation, go to program files where Data Stax is installed.
Open Program Files then you see the following page:

Open DataStax-DDC then you see Apache Cassandra:


Open Apache Cassandra and you see bin:
Open bin and you will see Cassandra Windows batch File:

Run this file. It will start the Cassandra server and you will see the following page:
Server
is started now go to windows start programs, search Cassandra CQL Shell
Run the Cassandra Shell. After running Cassandra shell, you will see the following command line:
EXP NO: 5 Using power pivot (Excel) perform the following on any
data set

A pivot table is essentially a data summarization tool that enables you to analyze data from various angles. By

dragging and dropping fields, you can quickly aggregate, group, and visualize data without writing complex

formulas or macros. This makes pivot tables an indispensable asset for data analysts, business professionals, and

anyone dealing with large datasets.

How to Create a Pivot Table

Power Pivot is an Excel add-in you can use to perform powerful data analysis and create sophisticated data
models. With Power Pivot, you can mash up large volumes of data from various sources, perform information
analysis rapidly, and share insights easily.

In both Excel and in Power Pivot, you can create a Data Model, a collection of tables with relationships. The data
model you see in a workbook in Excel is the same data model you see in the Power Pivot window. Any data you
import into Excel is available in Power Pivot, and vice versa.

a) BIG DATA ANALYTICS:

EX:purchase analysis of items, purchase analysis using pivot table and charting for big data
analytics

EMPOLYEE ITEM PRICE QUANTILY TOTALPRIME


MAHI HDD 3000 20 3020
MAITHILI LED 4000 50 4050
SRILATHA LCD 18000 23 18023
SUSHMA MOUSE 180 49 229
SHIVANI KEYBOARD 320 14 334
MAITHILI UPS 1420 8 1428
RANI CABINET 2000 23 2023
RAVANILLA HDD 3000 37 3037
BHAVYA MOUSE 180 45 225
AKHILA MOUSE 180 49 229
VARALAXMI KEYBOARD 320 50 370
DHANALAXMI HDD 3000 19 3019
POOJA LCD 18000 23 18023
SUPERJA LCD 18000 47 18047
AMMU LED 4000 28 4028
ANVIKA CABINET 2000 39 2039
JYOTHI UPS 1420 36 1456
ARCHANA UPS 1420 20 1440
SARITHA MOUSE 18000 40 18040
MOUNIKA LED 4000 30 4030

CREATING PIVOT TABLE:

QUANTIL
EMPOLYEE ITEM PRICE Y TOTALPRIME
MAHI HDD 3000 20 3020
MAITHILI LED 4000 50 4050
SRILATHA LCD 18000 23 18023
SUSHMA MOUSE 180 49 229
SHIVANI KEYBOARD 320 14 334
MAITHILI UPS 1420 8 1428
RANI CABINET 2000 23 2023
RAVANILLA HDD 3000 37 3037
BHAVYA MOUSE 180 45 225
AKHILA MOUSE 180 49 229
VARALAXMI KEYBOARD 320 50 370
DHANALAXM
I HDD 3000 19 3019
POOJA LCD 18000 23 18023
SUPERJA LCD 18000 47 18047
AMMU LED 4000 28 4028
ANVIKA CABINET 2000 39 2039
JYOTHI UPS 1420 36 1456
ARCHANA UPS 1420 20 1440
SARITHA MOUSE 18000 40 18040
MOUNIKA LED 4000 30 4030

EMPOLYEE (All)

Sum of Column
TOTALPRIME Labels
200 1800 Grand
Row Labels 180 320 1420 0 3000 0 Total
8 1428 1428
UPS 1428 1428
14 334 334
KEYBOARD 334 334
19 3019 3019
HDD 3019 3019
20 1440 3020 4460
HDD 3020 3020
UPS 1440 1440
202 3604
23 3 6 38069
202
CABINET 3 2023
3604
LCD 6 36046
36 1456 1456
UPS 1456 1456
37 3037 3037
HDD 3037 3037
203
39 9 2039
203
CABINET 9 2039
1804
40 0 18040
1804
MOUSE 0 18040
45 225 225
MOUSE 225 225
1804
47 7 18047
1804
LCD 7 18047
49 458 458
MOUSE 458 458
50 370 370
KEYBOARD 370 370
406 7213
Grand Total 683 704 4324 2 9076 3 90982

CREATING A PIVOT CHART FOR ABOVE PURCHASE ITEMS:

40000
35000
30000
25000
180
20000
320
15000
1420
10000 2000
5000 3000
0 18000
KEYBOARD

KEYBOARD
LCD

LCD
HDD

HDD

HDD

MOUSE

MOUSE

MOUSE
CABINET

CABINET
UPS

UPS

UPS

8 14 19 20 23 36 37 39 40 45 47 49 50
EXP NO: 6 Installation - Store and retrieve data in Pig

OBJECTIVE:
1. Installation of PIG.

RESOURCES:
VMWare, Web browser, 4 GB RAM, Hard Disk 80 GB.

PROGRAM LOGIC:

STEPS FOR INSTALLING APACHE PIG


8) Extract the pig-0.15.0.tar.gz and move to home directory
9) Set the environment of PIG in bashrc file.
10) Pig can run in two
modes Local Mode and
Hadoop Mode Pig –x local and
pig
11) Grun
t Shell Grunt
>
12) LOADING Data into Grunt Shell
DATA = LOAD <CLASSPATH> USING PigStorage(DELIMITER) as (ATTRIBUTE :
DataType1, ATTRIBUTE : DataType2…..)
13) Describ
e Data Describe
DATA;
14) DUM
P Data Dump
DATA;

INPUT/OUTPUT:Input as Website Click Count Data


EXP NO: 7 Use R project to carry out statistical analysis of big data

Perform statistical analysis of given data set Of under graduates,post graduates, graduates
Considered as 0,1,2
Qual=c(1,1,1,0,2,1,1,1,1,0,1,1,0,0,1,2,0):
Qual:
Barpolt(qual);
Barplot(table(qual));
Table(qual)(length(qual);
Library(plotrix);
Pie3D(TABLE(QUAL);

You might also like