0% found this document useful (0 votes)

23 views44 pages

BigDataAnalytics Lab Manual (DS)

The document is a lab manual for the Big Data Analytics Lab at the Princeton Institute of Engineering & Technology for Women, detailing the course objectives, outcomes, and experiments for the academic year 2022-2023. It outlines the vision, mission, program educational objectives, and specific outcomes for the Computer Science Engineering (Data Science) department. Additionally, it provides a comprehensive list of experiments and procedures for using Hadoop and other data analytics tools.

Uploaded by

manishamanigalla78

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views44 pages

BigDataAnalytics Lab Manual (DS)

Uploaded by

manishamanigalla78

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 44

PRINCETON INSTITUTE OF ENGINEERING & TECHNOLOGY FOR WOMEN

(Approved by AICTE, New Delhi & Affiliated to JNTU Hyderabad)

Chowdaryguda (V), Ghatkesar (M), Medchal-Malkajgiri(D).TS-500088 Phone: 9394544566 /
6305324412

e-mail: [email protected]

JNTUH Code(6M) CIVIL–EEE–ECE-CSE-CSE(AI&ML)-CSE(DS)-CSE(CS) EAMCET Code– PETW

DEPARTMENT OF COMPUTER SCIENCE ENGINEERING – DATA SCIENCE

BIG DATA ANALYTICS LAB

LAB MANUAL

Subject Code : C323

Regulation : R18/JNTUH
Academic Year : 2022-2023

III B. TECH II SEMESTER

COMPUTER SCIENCE AND ENGINEERING(DATA SCIENCE)
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
PRINCETON INSTITUTE OF ENGINEERING & TECHNOLOGY FOR WOMEN
(Approved by AICTE, New Delhi & Affiliated to JNTU Hyderabad)
Chowdaryguda (V), Ghatkesar (M), Medchal-Malkajgiri(D).TS-500088 Phone: 9394544566 /
6305324412

e-mail: [email protected]

JNTUH Code(6M) CIVIL–EEE–ECE-CSE-CSE(AI&ML)-CSE(DS)-CSE(CS) EAMCET Code– PETW

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Academic Year 2022-2023
COLLEGE VISION, MISSION, CORE VALUES AND QUALITY POLICY:
VISION
The educational environment in order to develop graduates with the strong academic technical backgrounds
needed to achieve distinction in the discipline and to bring up the Institution as an Institution of Academic
excellence of International standard.

MISSION

We transform persons into personalities by the state-of-the-art infrastructure, time consciousness, quick
response and the best academic practices through assessment and advice.

CORE VALUES

Attaining global eminence, by achieving excellence in all that we do in life education and service

VISION AND MISSION OF CSE(DATA SCIENCE) DEPARTMENT

VISION

Innovation and research excellence in Data Science to be a lifelong learner with competence in engineering and
professional core, with the unremitting update of the curriculum.

MISSION
impart knowledge in Data Science technologies to meet industrial standards and Integrate research into
practical, relevant solutions to address business and societal challenges.
PRINCETON INSTITUTE OF ENGINEERING & TECHNOLOGY FOR WOMEN
(Approved by AICTE, New Delhi & Affiliated to JNTU Hyderabad)
Chowdaryguda (V), Ghatkesar (M), Medchal-Malkajgiri(D).TS-500088

Phone: 9394544566 / 6305324412

e-mail: [email protected]

JNTUH Code(6M) CIVIL–EEE–ECE-CSE-CSE(AI&ML)-CSE(DS)-CSE(CS) EAMCET Code– PETW

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

(DATA SCIENCE)
Academic Year 2022-2023

PROGRAM EDUCATIONAL OBJECTIVES AND PROGRAM SPECIFIC

OUTCOMES:

PROGRAM EDUCATIONAL OBJECTIVES (PEOs)

PEOs DESCRIPTION

Graduates of the program will have a globally competent professional

PEO1
career in Data Science domain.

To prepare students to excel in Data Science with the technical skills

PEO2 and competency to carry out research and address basic needs of the
society.

Graduates of the program will have entrepreneur skills with a lifelong

PEO3 learning attitude in order to support the growth of economy of a country.
PROGRAM SPECIFIC OUTCOMES (PSOs)
PSOs DESCRIPTION

Analyse and visualize data in the context of real world problems,

PSO1 communicate findings, and interpret results using data analytics for
decision making.

Evaluate, analyse and synthesize solutions for real world problems in

Data Science to conduct research in a wider theoretical and practical
PSO2
context and analyse ethical issues in business related to intellectual
property, data security, integrity, and privacy.

Making them to use their technical expertise in latest technologies and

PSO3 update knowledge continuously in Data Science Learning to excel in
career.
PROGRAM OUTCOMES:

PROGRAM OUTCOMES (POs)

POs DESCRIPTION
Engineering knowledge: Apply the knowledge of mathematics, science, engineering fundamentals, and
PO1 an engineering specialization to the solution of complex engineering problems.

Problem analysis: Identify, formulate, review research literature, and analyze complex engineering
PO2 problems reaching substantiated conclusions using first principles of mathematics, natural sciences, and
engineering sciences.
Design/development of solutions: Design solutions for complex engineering problems and design
system components or processes that meet the specified needs with appropriate consideration for the
PO3 public health and safety, and the cultural, societal, and environmental considerations.

Conduct investigations of complex problems: Use research-based knowledge and research methods
PO4 including design of experiments, analysis and interpretation of data, and synthesis of the information to
provide valid conclusions.
Modern tool usage: Create, select, and apply appropriate techniques, resources, and modern engineering
PO5 and IT tools including prediction and modeling to complex engineering activities with an understanding
of the limitations.
The engineer and society: Apply reasoning informed by the contextual knowledge to assess societal,
PO6 health, safety, legal and cultural issues and the consequent responsibilities relevant to the professional
engineering practice.
Environment and sustainability: Understand the impact of the professional engineering solutions in
PO7 societal and environmental contexts, and demonstrate the knowledge of, and need for sustainable
development.
Ethics: Apply ethical principles and commit to professional ethics and responsibilities and norms of the
PO8 engineering practice.
Individual and team work: Function effectively as an individual, and as a member or leader in diverse
PO9
teams, and in multidisciplinary settings.
Communication: Communicate effectively on complex engineering activities with the engineering
PO10 community and with society at large, such as, being able to comprehend and write effective reports and
design documentation, make effective presentations, and give and receive clear instructions.
Project management and finance: Demonstrate knowledge and understanding of the engineering and
PO11 management principles and apply these to one’s own work, as a member and leader in a team, to manage
projects and in multidisciplinary environments.
Life-long learning: Recognize the need for, and have the preparation and ability to engage in
PO12 independent and life-long learning in the broadest context of technological change.
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
(DATA SCIENCE)

BIG DATA ANALYTICS LAB

COURSE OBJECTIVES:

1.purpose of this course is to provide the students with the knowledge of Big data Analytics

principles and techniques.

2.This course is also designed to give an exposure of the frontiers of Big data Analytics

COURSE OUTCOMES:

CO1:Use Excel as an analytical tool and visualization tool

CO2:Ability to program using HADOOP and

Map reduce
CO3:Ability to perform data analytics using ML in R
CO4:Use Cassandra to perform social media analytics.
CO5:Ability to program R project for data visualization of social media data.
LIST OF EXPERIMENTS
S.No Name of the Experiment

Implement a simple map-reduce job that builds an inverted index on the set of input
documents (Hadoop)

2 Process big data in HBase

3 Store and retrieve data in Pig

4 Perform Social media analysis using Cassandra

5 Buyer event analytics using Cassandra on suitable product sales data visualization

6 Using Power Pivot (Excel) Perform the following on any dataset

a) Big Data Analytics
b) Big Data Charting
7 Use R-project to carryout statistical analysis of big data

8 Use R-project for data visualization of social media data

EXP NO: 1 Install Apache Hadoop

AIM: To Install Apache Hadoop.

Hadoop software can be installed in three modes of

Hadoop is a Java-based programming framework that supports the processing and storage of
extremely large datasets on a cluster of inexpensive machines. It was the first major open source
project in the big data playing field and is sponsored by the Apache Software Foundation.

Hadoopis comprised of four main layers:

 Hadoop Common is the collection of utilities and libraries that support other Hadoop
modules.
 HDFS, which stands for Hadoop Distributed File System, is responsible for persisting
data to disk.
 YARN, short for Yet Another Resource Negotiator, is the "operating system" for HDFS.
 MapReduce is the original processing model for Hadoop clusters. It distributes work
within the cluster or map, then organizes and reduces the results from the nodes into a
response to a query. Many other processing models are available for the 2.x version of
Hadoop.
Hadoop clusters are relatively complex to set up, so the project includes a stand-alone mode which
is suitable for learning about Hadoop, performing simple operations, and debugging.

Procedure:

we'll install Hadoop in stand-alone mode and run one of the example example MapReduce
programs it includes to verify the installation.

Prerequisites:

Step1: Installing Java 8 version.

Openjdk version "1.8.0_91"
OpenJDK Runtime Environment (build 1.8.0_91-8u91-b14-3ubuntu1~16.04.1-b14)
OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)
This output verifies that OpenJDK has been successfully installed.
Note: To set the path for environment variables. i.e. JAVA_HOME

Step2: Installing Hadoop

With Java in place, we'll visit the Apache Hadoop Releases page to find the most recent
stable release. Follow the binary for the current release:

Download Hadoop from www.hadoop.apache.org

Procedure to Run Hadoop

1. Install Apache Hadoop 2.2.0 in Microsoft Windows OS

If Apache Hadoop 2.2.0 is not already installed then follow the post Build, Install, Configure
and Run Apache Hadoop 2.2.0 in Microsoft Windows OS.

2. Start HDFS (Namenode and Datanode) and YARN (Resource Manager and Node
Manager)

Run following commands.

Command Prompt C:\Users\
abhijitg>cd c:\hadoop c:\
hadoop>sbin\start-dfs c:\
hadoop>sbin\start-yarn starting yarn
daemons

Namenode, Datanode, Resource Manager and Node Manager will be started in few minutes
and ready to execute Hadoop MapReduce job in the Single Node (pseudo-distributed mode)
cluster.
Resource Manager & Node Manager:

3
Run wordcount MapReduce job

Now we'll run wordcount MapReduce job available

in %HADOOP_HOME%\share\hadoop\mapreduce\hadoop-mapreduce-
examples- 2.2.0.jar

Create a text file with some content. We'll pass this file as input to
the wordcount MapReduce job for counting words.
C:\file1.txt
Install Hadoop

Run Hadoop Wordcount Mapreduce Example

Create a directory (say 'input') in HDFS to keep all the text files (say 'file1.txt') to be used
for counting words.
C:\Users\abhijitg>cd c:\hadoop C:\
hadoop>bin\hdfs dfs -mkdir input

Copy the text file(say 'file1.txt') from local disk to the newly created 'input' directory in HDFS.

C:\hadoop>bin\hdfs dfs -copyFromLocal c:/file1.txt input

4
Check content of the copied file.

C:\hadoop>hdfs dfs -ls input

Found 1 items
-rw-r--r-- 1 ABHIJITG supergroup 55 2014-02-03 13:19 input/file1.txt

C:\hadoop>bin\hdfs dfs -cat input/file1.txt

Install Hadoop
Run Hadoop Wordcount Mapreduce Example

Run the wordcount MapReduce job provided

in %HADOOP_HOME%\share\hadoop\mapreduce\hadoop-mapreduce-examples-2.2.0.jar

C:\hadoop>bin\yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples- 2.2.0.jar

wordcount input output
14/02/03 13:22:02 INFO client.RMProxy: Connecting to ResourceManager at
/0.0.0.0:8032
14/02/03 13:22:03 INFO input.FileInputFormat: Total input paths to process : 1 14/02/03
13:22:03 INFO mapreduce.JobSubmitter: number of splits:1
:
:
14/02/03 13:22:04 INFO mapreduce.JobSubmitter: Submitting tokens for job:
job_1391412385921_0002
14/02/03 13:22:04 INFO impl.YarnClientImpl: Submitted application
application_1391412385921_0002 to ResourceManager at /0.0.0.0:8032 14/02/03
13:22:04 INFO mapreduce.Job: The url to track the job:
http://ABHIJITG:8088/proxy/application_1391412385921_0002/
14/02/03 13:22:04 INFO mapreduce.Job: Running job: job_1391412385921_0002 14/02/03
13:22:14 INFO mapreduce.Job: Job job_1391412385921_0002 running in uber mode : false
14/02/03 13:22:14 INFO mapreduce.Job: map 0% reduce 0%
14/02/03 13:22:22 INFO mapreduce.Job: map 100% reduce 0%
14/02/03 13:22:30 INFO mapreduce.Job: map 100% reduce 100%
14/02/03 13:22:30 INFO mapreduce.Job: Job job_1391412385921_0002 completed
successfully
14/02/03 13:22:31 INFO mapreduce.Job: Counters: 43 File
System Counters
FILE: Number of bytes read=89
FILE: Number of bytes written=160142 FILE: Number of
read operations=0 FILE: Number of large read
operations=0 FILE: Number of write operations=0
5
HDFS: Number of bytes read=171 HDFS: Number of
bytes written=59 HDFS: Number of read
operations=6
HDFS: Number of large read operations=0 HDFS: Number
of write operations=2
Job Counters
Launched map tasks=1 Launched reduce
tasks=1 Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=5657 Total time spent by
all reduces in occupied slots (ms)=6128
Map-Reduce Framework Map input
records=2 Map output records=7
Map output bytes=82
Map output materialized bytes=89 Input split
bytes=116
Combine input records=7 Combine output
records=6 Reduce input groups=6 Reduce
shuffle bytes=89 Reduce input records=6
Reduce output records=6 Spilled
Records=12 Shuffled Maps =1
Failed Shuffles=0 Merged Map
outputs=1
GC time elapsed (ms)=145 CPU time
spent (ms)=1418
Physical memory (bytes) snapshot=368246784 Virtual memory
(bytes) snapshot=513716224 Total committed heap usage
(bytes)=307757056
Shuffle Errors
BAD_ID=0 CONNECTION=0
IO_ERROR=0 WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters Bytes
Read=55
File Output Format Counters
6
Bytes Written=59
http://abhijitg:8088/cluster

Result: We've installed Hadoop in stand-alone mode and verified it by running an example
program it provided.
EXP NO: 2 Implement a simple MapReduce job that builds an
inverted index on the set of input documents(Hadoop)

AIM: To Develop a MapReduce program to calculate the frequency of a given word in agiven file
Map Function – It takes a set of data and converts it into another set of data, where individual
elements are broken down into tuples (Key-Value pair).

Input
Set of data
Bus, Car, bus, car, train, car, bus, car, train, bus, TRAIN,BUS, buS, caR, CAR, car, BUS, TRAIN
Output
Convert into another set of data
(Key,Value)
(Bus,1), (Car,1), (bus,1), (car,1), (train,1), (car,1), (bus,1), (car,1), (train,1), (bus,1),
(TRAIN,1),(BUS,1), (buS,1), (caR,1), (CAR,1), (car,1), (BUS,1), (TRAIN,1)
Reduce Function – Takes the output from Map as an input and combines those data tuples
into a smaller set of tuples.
Example – (Reduce function in Word Count)
Input Set of Tuples
(output of Map function)
(Bus,1), (Car,1), (bus,1), (car,1), (train,1), (car,1), (bus,1), (car,1), (train,1), (bus,1),
(TRAIN,1),(BUS,1),
(buS,1),(caR,1),(CAR,1), (car,1), (BUS,1), (TRAIN,1)
Output Converts into smaller set of tuples
(BUS,7), (CAR,7), (TRAIN,4)
Work Flow of Program
Workflow of MapReduce consists of 5 steps
1. Splitting – The splitting parameter can be anything, e.g. splitting by space,
comma, semicolon, or even by a new line (‘\n’).
2. Mapping – as explained above
3. Intermediate splitting – the entire process in parallel on different clusters. In order to
group them in “Reduce Phase” the similar KEY data should be on same cluster.
4. Reduce – it is nothing but mostly group by phase
5. Combining – The last phase where all the data (individual result set from each
cluster) is combine together to form a Result

Now Let’s See the Word Count Program in Java

Make sure that Hadoop is installed on your system with java idk Steps to

Step 1. Open Eclipse> File > New > Java Project > (Name it – MRProgramsDemo) > Finish
Step 2. Right Click > New > Package ( Name it - PackageDemo) > Finish Step 3.
Right Click on Package > New > Class (Name it - WordCount) Step 4. Add
Following Reference Libraries –
9
Right Click on Project > Build Path> Add External Archivals
 /usr/lib/hadoop-0.20/hadoop-core.jar
 Usr/lib/hadoop-0.20/lib/Commons-cli-1.2.jar

Program: Step 5. Type following Program :

package PackageDemo; import

java.io.IOException;
import org.apache.hadoop.conf.Configuration; import
org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable; import
org.apache.hadoop.io.LongWritable; import
org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job; import
org.apache.hadoop.mapreduce.Mapper; import
org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import
org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import
org.apache.hadoop.util.GenericOptionsParser;
public class WordCount {
public static void main(String [] args) throws Exception
{
Configuration c=new Configuration();
String[] files=new GenericOptionsParser(c,args).getRemainingArgs(); Path
input=new Path(files[0]);
Path output=new Path(files[1]); Job
j=new Job(c,"wordcount");
j.setJarByClass(WordCount.class);
j.setMapperClass(MapForWordCount.class);
j.setReducerClass(ReduceForWordCount.class); j.setOutputKeyClass(Text.class);
j.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(j, input);
FileOutputFormat.setOutputPath(j, output);
System.exit(j.waitForCompletion(true)?0:1);
}
public static class MapForWordCount extends Mapper<LongWritable, Text, Text,
IntWritable>{
public void map(LongWritable key, Text value, Context con) throws IOException,
InterruptedException
{
String line = value.toString();
10
String[] words=line.split(",");
for(String word: words )
{
Text outputKey = new Text(word.toUpperCase().trim());
IntWritable outputValue = new IntWritable(1);
con.write(outputKey, outputValue);
}
}
}
public static class ReduceForWordCount extends Reducer<Text, IntWritable, Text,
IntWritable>
{
public void reduce(Text word, Iterable<IntWritable> values, Context con) throws
IOException,
InterruptedException
{
int sum = 0;
for(IntWritable value : values)
{
sum += value.get();
}
con.write(word, new IntWritable(sum));
}
}
}

Make Jar File

Right Click on Project> Export> Select export destination as Jar File > next> Finish
12
To Move this into Hadoop directly, open the terminal and enter the following commands:
[training@localhost ~]$ hadoop fs -put wordcountFile wordCountFile

Run Jar file

(Hadoop jar jarfilename.jar packageName.ClassName PathToInputTextFile
PathToOutputDirectry)

[training@localhost ~]$ Hadoop jar MRProgramsDemo.jar

PackageDemo.WordCount wordCountFile MRDir1

Result: Open Result

[training@localhost ~]$ hadoop fs -ls MRDir1 Found 3

items
-rw-r--r-- 1 training supergroup
0 2016-02-23 03:36 /user/training/MRDir1/_SUCCESS drwxr-
xr-x - training supergroup
0 2016-02-23 03:36 /user/training/MRDir1/_logs
-rw-r--r-- 1 training supergroup
20 2016-02-23 03:36 /user/training/MRDir1/part-r-00000
[training@localhost ~]$ hadoop fs -cat MRDir1/part-r-00000 BUS
7
CAR 4
TRAIN 6
EXP NO: 3 Store and retrieve data in Pig

OBJECTIVE:

1. Installation of PIG.

STEPS FOR INSTALLING APACHE PIG

1) Extract the pig-0.15.0.tar.gz and move to home directory
2) Set the environment of PIG in bashrc file.
3) Pig can run in two modes
Local Mode and Hadoop Mode
Pig –x local and pig
4) Grunt Shell
Grunt >
5) LOADING Data into Grunt Shell
DATA = LOAD <CLASSPATH> USING PigStorage(DELIMITER) as (ATTRIBUTE :
DataType1, ATTRIBUTE : DataType2…..)
6) Describe Data
Describe DATA;
7) DUMP Data
Dump DATA;

INPUT/OUTPUT:
Input as Website Click Count Data
EXP NO: 4 Install Apache Cassandra

AIM: To Install Apache Cassandra

Cassandra Setup and Installation:

Apache Cassandra and Datastax enterprise is used by different organization for storing huge amount of data.

Before installing Apache Cassandra, you must have the following things:

You must have data stax community edition.,JDK must be installed,Platform should be Window.

Download and Install Cassandra

Run the datastax community edition setup. After running the setup, you will see the following page will be
displayed. It is a screenshot of 64 bit version.Click on the next button and you will get the following page:
Press the next button and you will get the following page specifying the location of the installation

Press the next button and a page will appear asking about whether you automatically start Data Stax DDC
service. Click on the radio button and proceed next.
Installation is started now. After completing the installation, go to program files where Data Stax is installed.
Open Program Files then you see the following page:

Open DataStax-DDC then you see Apache Cassandra:

Open Apache Cassandra and you see bin:
Open bin and you will see Cassandra Windows batch File:

Run this file. It will start the Cassandra server and you will see the following page:
Server
is started now go to windows start programs, search Cassandra CQL Shell
Run the Cassandra Shell. After running Cassandra shell, you will see the following command line:
EXP NO: 5 Using power pivot (Excel) perform the following on any
data set

A pivot table is essentially a data summarization tool that enables you to analyze data from various angles. By

dragging and dropping fields, you can quickly aggregate, group, and visualize data without writing complex

formulas or macros. This makes pivot tables an indispensable asset for data analysts, business professionals, and

anyone dealing with large datasets.

How to Create a Pivot Table

Power Pivot is an Excel add-in you can use to perform powerful data analysis and create sophisticated data
models. With Power Pivot, you can mash up large volumes of data from various sources, perform information
analysis rapidly, and share insights easily.

In both Excel and in Power Pivot, you can create a Data Model, a collection of tables with relationships. The data
model you see in a workbook in Excel is the same data model you see in the Power Pivot window. Any data you
import into Excel is available in Power Pivot, and vice versa.

a) BIG DATA ANALYTICS:

EX:purchase analysis of items, purchase analysis using pivot table and charting for big data
analytics

EMPOLYEE ITEM PRICE QUANTILY TOTALPRIME

MAHI HDD 3000 20 3020
MAITHILI LED 4000 50 4050
SRILATHA LCD 18000 23 18023
SUSHMA MOUSE 180 49 229
SHIVANI KEYBOARD 320 14 334
MAITHILI UPS 1420 8 1428
RANI CABINET 2000 23 2023
RAVANILLA HDD 3000 37 3037
BHAVYA MOUSE 180 45 225
AKHILA MOUSE 180 49 229
VARALAXMI KEYBOARD 320 50 370
DHANALAXMI HDD 3000 19 3019
POOJA LCD 18000 23 18023
SUPERJA LCD 18000 47 18047
AMMU LED 4000 28 4028
ANVIKA CABINET 2000 39 2039
JYOTHI UPS 1420 36 1456
ARCHANA UPS 1420 20 1440
SARITHA MOUSE 18000 40 18040
MOUNIKA LED 4000 30 4030

CREATING PIVOT TABLE:

QUANTIL
EMPOLYEE ITEM PRICE Y TOTALPRIME
MAHI HDD 3000 20 3020
MAITHILI LED 4000 50 4050
SRILATHA LCD 18000 23 18023
SUSHMA MOUSE 180 49 229
SHIVANI KEYBOARD 320 14 334
MAITHILI UPS 1420 8 1428
RANI CABINET 2000 23 2023
RAVANILLA HDD 3000 37 3037
BHAVYA MOUSE 180 45 225
AKHILA MOUSE 180 49 229
VARALAXMI KEYBOARD 320 50 370
DHANALAXM
I HDD 3000 19 3019
POOJA LCD 18000 23 18023
SUPERJA LCD 18000 47 18047
AMMU LED 4000 28 4028
ANVIKA CABINET 2000 39 2039
JYOTHI UPS 1420 36 1456
ARCHANA UPS 1420 20 1440
SARITHA MOUSE 18000 40 18040
MOUNIKA LED 4000 30 4030

EMPOLYEE (All)

Sum of Column
TOTALPRIME Labels
200 1800 Grand
Row Labels 180 320 1420 0 3000 0 Total
8 1428 1428
UPS 1428 1428
14 334 334
KEYBOARD 334 334
19 3019 3019
HDD 3019 3019
20 1440 3020 4460
HDD 3020 3020
UPS 1440 1440
202 3604
23 3 6 38069
202
CABINET 3 2023
3604
LCD 6 36046
36 1456 1456
UPS 1456 1456
37 3037 3037
HDD 3037 3037
203
39 9 2039
203
CABINET 9 2039
1804
40 0 18040
1804
MOUSE 0 18040
45 225 225
MOUSE 225 225
1804
47 7 18047
1804
LCD 7 18047
49 458 458
MOUSE 458 458
50 370 370
KEYBOARD 370 370
406 7213
Grand Total 683 704 4324 2 9076 3 90982

CREATING A PIVOT CHART FOR ABOVE PURCHASE ITEMS:

40000
35000
30000
25000
180
20000
320
15000
1420
10000 2000
5000 3000
0 18000
KEYBOARD

KEYBOARD
LCD

LCD
HDD

HDD

MOUSE

MOUSE
CABINET

CABINET
UPS

UPS

8 14 19 20 23 36 37 39 40 45 47 49 50
EXP NO: 6 Installation - Store and retrieve data in Pig

OBJECTIVE:
1. Installation of PIG.

RESOURCES:
VMWare, Web browser, 4 GB RAM, Hard Disk 80 GB.

PROGRAM LOGIC:

STEPS FOR INSTALLING APACHE PIG

8) Extract the pig-0.15.0.tar.gz and move to home directory
9) Set the environment of PIG in bashrc file.
10) Pig can run in two
modes Local Mode and
Hadoop Mode Pig –x local and
pig
11) Grun
t Shell Grunt
>
12) LOADING Data into Grunt Shell
DATA = LOAD <CLASSPATH> USING PigStorage(DELIMITER) as (ATTRIBUTE :
DataType1, ATTRIBUTE : DataType2…..)
13) Describ
e Data Describe
DATA;
14) DUM
P Data Dump
DATA;

INPUT/OUTPUT:Input as Website Click Count Data

EXP NO: 7 Use R project to carry out statistical analysis of big data

Perform statistical analysis of given data set Of under graduates,post graduates, graduates
Considered as 0,1,2
Qual=c(1,1,1,0,2,1,1,1,1,0,1,1,0,0,1,2,0):
Qual:
Barpolt(qual);
Barplot(table(qual));
Table(qual)(length(qual);
Library(plotrix);
Pie3D(TABLE(QUAL);

6 Big Data Analytics Lab Manual
No ratings yet
6 Big Data Analytics Lab Manual
73 pages
BDA Manual
No ratings yet
BDA Manual
41 pages
BDA Manual
No ratings yet
BDA Manual
56 pages
Data Science Lab Guide
100% (2)
Data Science Lab Guide
148 pages
IITJ DE 02 - Curriculum - v181123 - 250708 - 211556
No ratings yet
IITJ DE 02 - Curriculum - v181123 - 250708 - 211556
30 pages
CSE DS 5th Sem PrintingV1
No ratings yet
CSE DS 5th Sem PrintingV1
103 pages
Bda 1
No ratings yet
Bda 1
95 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
93 pages
Ccs334 Big Data Analytics Laboratory Manual
No ratings yet
Ccs334 Big Data Analytics Laboratory Manual
75 pages
CST 322 Data Analytics (Elective)
No ratings yet
CST 322 Data Analytics (Elective)
244 pages
CCS334 Bda
No ratings yet
CCS334 Bda
19 pages
Big Data Analytics (R18a0529)
No ratings yet
Big Data Analytics (R18a0529)
139 pages
DSBDA Lab Manual 2022-23 Final-1
No ratings yet
DSBDA Lab Manual 2022-23 Final-1
148 pages
DSBA Manual 2025
No ratings yet
DSBA Manual 2025
77 pages
Ad3467 Data Science and Analytics Laboratory Manual
No ratings yet
Ad3467 Data Science and Analytics Laboratory Manual
59 pages
R Programming Lab Manual
No ratings yet
R Programming Lab Manual
57 pages
Ad3301 - Dev Lab
No ratings yet
Ad3301 - Dev Lab
52 pages
FDSA Lab Manual
No ratings yet
FDSA Lab Manual
43 pages
DBMS Record 19.4
No ratings yet
DBMS Record 19.4
88 pages
Handbook AY2023-24 Odd
No ratings yet
Handbook AY2023-24 Odd
63 pages
Big Data Analytics Course File
No ratings yet
Big Data Analytics Course File
133 pages
Department of Computer Science and Engineering: Even Semester
No ratings yet
Department of Computer Science and Engineering: Even Semester
45 pages
EDA Lab Record
No ratings yet
EDA Lab Record
45 pages
Experiment Pgno
No ratings yet
Experiment Pgno
50 pages
DAE Lab - Manual
No ratings yet
DAE Lab - Manual
42 pages
DVT Lab Manual III CSD
No ratings yet
DVT Lab Manual III CSD
69 pages
Data Analytics (R1UC424T) - IV-CoursePack (Final) 01.03.25
No ratings yet
Data Analytics (R1UC424T) - IV-CoursePack (Final) 01.03.25
19 pages
CCS334 Updated 05-05-2025
No ratings yet
CCS334 Updated 05-05-2025
19 pages
Sy Cse Ds Syllabus 23102024
No ratings yet
Sy Cse Ds Syllabus 23102024
19 pages
CS3352 Foundations of Data Science
No ratings yet
CS3352 Foundations of Data Science
27 pages
Bda Lab Manual (R20a0592)
No ratings yet
Bda Lab Manual (R20a0592)
89 pages
LecturePlan CS206 22CSH-381
No ratings yet
LecturePlan CS206 22CSH-381
6 pages
Big Data Analytics Lab Manual
No ratings yet
Big Data Analytics Lab Manual
90 pages
Siddh Ds
No ratings yet
Siddh Ds
121 pages
Co-Po Big Data Analytics
100% (1)
Co-Po Big Data Analytics
41 pages
LecturePlan CS201 21CSH-471
No ratings yet
LecturePlan CS201 21CSH-471
8 pages
Bda Lab Manual (R20a0592)
No ratings yet
Bda Lab Manual (R20a0592)
89 pages
(50-59) Growth Performance of Broiler Chickens Fed Diets Containing Partially Cooked Sweet Potato Meal
No ratings yet
(50-59) Growth Performance of Broiler Chickens Fed Diets Containing Partially Cooked Sweet Potato Meal
11 pages
Bda Lab Manual
No ratings yet
Bda Lab Manual
45 pages
New CP - Cse2500 Data Analytics
No ratings yet
New CP - Cse2500 Data Analytics
11 pages
Ai4146 - Bda - Course Handout
No ratings yet
Ai4146 - Bda - Course Handout
7 pages
Data Analytics Course Handout 2024 29.11.24 Anjamma
No ratings yet
Data Analytics Course Handout 2024 29.11.24 Anjamma
42 pages
III-i Bda Syllabus
No ratings yet
III-i Bda Syllabus
8 pages
Dev CF
No ratings yet
Dev CF
29 pages
III YR Aids Data Ware Housing Ourse File
No ratings yet
III YR Aids Data Ware Housing Ourse File
11 pages
General Design of An 8000 DWT Product Oil Tanker
No ratings yet
General Design of An 8000 DWT Product Oil Tanker
100 pages
Jaya - BDA Record Front Pages
No ratings yet
Jaya - BDA Record Front Pages
8 pages
Big Data
No ratings yet
Big Data
16 pages
Bda Vision Mission New
No ratings yet
Bda Vision Mission New
4 pages
Geethanjali College of Engineering and Technology (Ugc Autonomous Institution)
No ratings yet
Geethanjali College of Engineering and Technology (Ugc Autonomous Institution)
34 pages
Computer Science Graduate Goals
No ratings yet
Computer Science Graduate Goals
5 pages
Big Data and Analytics Course Overview
No ratings yet
Big Data and Analytics Course Overview
18 pages
MOS Drainage
100% (2)
MOS Drainage
16 pages
2CS702-CPD-Odd 23 24
No ratings yet
2CS702-CPD-Odd 23 24
9 pages
BTCS9202 Data Sciences Lab Manual
No ratings yet
BTCS9202 Data Sciences Lab Manual
39 pages
Curriculum and Syllabi (2020-2021) : School of Computer Science and Engineering
No ratings yet
Curriculum and Syllabi (2020-2021) : School of Computer Science and Engineering
26 pages
Online Process Piping Design Course
No ratings yet
Online Process Piping Design Course
12 pages
BDA (2019) Two Marks (QB)
No ratings yet
BDA (2019) Two Marks (QB)
16 pages
CMO No. 24 Series of 2008 - PS For ECE
No ratings yet
CMO No. 24 Series of 2008 - PS For ECE
16 pages
Civil Works Scope for Slop Oil Tank
100% (1)
Civil Works Scope for Slop Oil Tank
6 pages
15.2 FMECA SAG Mill
No ratings yet
15.2 FMECA SAG Mill
2 pages
Lecture Zero - UIT-Data Sciednce
No ratings yet
Lecture Zero - UIT-Data Sciednce
18 pages
Neutral Earthing Resistors Guide
No ratings yet
Neutral Earthing Resistors Guide
4 pages
Design and Optimization of Overhead Transmission Lines Using PLS-CADD and PLS-Tower Software
No ratings yet
Design and Optimization of Overhead Transmission Lines Using PLS-CADD and PLS-Tower Software
11 pages
CEE 101 Outline Summary
No ratings yet
CEE 101 Outline Summary
25 pages
2023 JCR
No ratings yet
2023 JCR
1,365 pages
Android Controlled Pick and Place Robotic Arm Vehicle With Wireless Camera
No ratings yet
Android Controlled Pick and Place Robotic Arm Vehicle With Wireless Camera
10 pages
Staff ID Card - Final
No ratings yet
Staff ID Card - Final
8 pages
Gantt Chart
No ratings yet
Gantt Chart
32 pages
Pipeline Stress Analysis With Caesar II
No ratings yet
Pipeline Stress Analysis With Caesar II
16 pages
NPTEL Mechanical Engineering Courses
No ratings yet
NPTEL Mechanical Engineering Courses
9 pages
European Master in Nuclear Fusion
No ratings yet
European Master in Nuclear Fusion
1 page
V4043H Valve Installation Guide
No ratings yet
V4043H Valve Installation Guide
2 pages
Sonotube Catalogo
No ratings yet
Sonotube Catalogo
2 pages
Ramaiah: Makeup Examinations - June/July 2018
No ratings yet
Ramaiah: Makeup Examinations - June/July 2018
2 pages
First Year Core 8 and Trackone Program: University of Toronto - Faculty of Applied Science and Engineering
No ratings yet
First Year Core 8 and Trackone Program: University of Toronto - Faculty of Applied Science and Engineering
2 pages
Electrical Panle
No ratings yet
Electrical Panle
4 pages
Infrastruktur Air Limbah Indonesia
No ratings yet
Infrastruktur Air Limbah Indonesia
18 pages
Electrical & Electronics Courses 2020
No ratings yet
Electrical & Electronics Courses 2020
4 pages
Awareness of Corrosion Importance Among Engineering Undergraduates in The United Arab Emirates
No ratings yet
Awareness of Corrosion Importance Among Engineering Undergraduates in The United Arab Emirates
16 pages
03 SE Agile
No ratings yet
03 SE Agile
16 pages
Unit 7 Sách ĐT5
No ratings yet
Unit 7 Sách ĐT5
17 pages
S.P.L. (Electrical Safety Analizer GM-610) - Tutorial
No ratings yet
S.P.L. (Electrical Safety Analizer GM-610) - Tutorial
9 pages
Admission Committee For The Professional Diploma Course (Acpdc) Closure For First Year Diploma: Admission Year 2016-17
No ratings yet
Admission Committee For The Professional Diploma Course (Acpdc) Closure For First Year Diploma: Admission Year 2016-17
25 pages
Don Bosco Institute of Technology: Laboratory Manual
No ratings yet
Don Bosco Institute of Technology: Laboratory Manual
14 pages

BigDataAnalytics Lab Manual (DS)

Uploaded by

BigDataAnalytics Lab Manual (DS)

Uploaded by

PRINCETON INSTITUTE OF ENGINEERING & TECHNOLOGY FOR WOMEN

(Approved by AICTE, New Delhi & Affiliated to JNTU Hyderabad)

JNTUH Code(6M) CIVIL–EEE–ECE-CSE-CSE(AI&ML)-CSE(DS)-CSE(CS) EAMCET Code– PETW

DEPARTMENT OF COMPUTER SCIENCE ENGINEERING – DATA SCIENCE

BIG DATA ANALYTICS LAB

Subject Code : C323

III B. TECH II SEMESTER

JNTUH Code(6M) CIVIL–EEE–ECE-CSE-CSE(AI&ML)-CSE(DS)-CSE(CS) EAMCET Code– PETW

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

VISION AND MISSION OF CSE(DATA SCIENCE) DEPARTMENT

Phone: 9394544566 / 6305324412

JNTUH Code(6M) CIVIL–EEE–ECE-CSE-CSE(AI&ML)-CSE(DS)-CSE(CS) EAMCET Code– PETW

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

PROGRAM EDUCATIONAL OBJECTIVES AND PROGRAM SPECIFIC

PROGRAM EDUCATIONAL OBJECTIVES (PEOs)

Graduates of the program will have a globally competent professional

To prepare students to excel in Data Science with the technical skills

Graduates of the program will have entrepreneur skills with a lifelong

Analyse and visualize data in the context of real world problems,

Evaluate, analyse and synthesize solutions for real world problems in

Making them to use their technical expertise in latest technologies and

PROGRAM OUTCOMES (POs)

BIG DATA ANALYTICS LAB

principles and techniques.

CO1:Use Excel as an analytical tool and visualization tool

CO2:Ability to program using HADOOP and

2 Process big data in HBase

3 Store and retrieve data in Pig

4 Perform Social media analysis using Cassandra

6 Using Power Pivot (Excel) Perform the following on any dataset

8 Use R-project for data visualization of social media data

AIM: To Install Apache Hadoop.

Hadoop software can be installed in three modes of

Hadoopis comprised of four main layers:

Step1: Installing Java 8 version.

Step2: Installing Hadoop

Download Hadoop from www.hadoop.apache.org

1. Install Apache Hadoop 2.2.0 in Microsoft Windows OS

Run following commands.

Now we'll run wordcount MapReduce job available

Run Hadoop Wordcount Mapreduce Example

C:\hadoop>bin\hdfs dfs -copyFromLocal c:/file1.txt input

C:\hadoop>hdfs dfs -ls input

C:\hadoop>bin\hdfs dfs -cat input/file1.txt

Run the wordcount MapReduce job provided

C:\hadoop>bin\yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples- 2.2.0.jar

Now Let’s See the Word Count Program in Java

Program: Step 5. Type following Program :

package PackageDemo; import

Make Jar File

Run Jar file

[training@localhost ~]$ Hadoop jar MRProgramsDemo.jar

Result: Open Result

[training@localhost ~]$ hadoop fs -ls MRDir1 Found 3

STEPS FOR INSTALLING APACHE PIG

AIM: To Install Apache Cassandra

Cassandra Setup and Installation:

Download and Install Cassandra

Open DataStax-DDC then you see Apache Cassandra:

anyone dealing with large datasets.

How to Create a Pivot Table

a) BIG DATA ANALYTICS:

EMPOLYEE ITEM PRICE QUANTILY TOTALPRIME

CREATING PIVOT TABLE:

CREATING A PIVOT CHART FOR ABOVE PURCHASE ITEMS:

STEPS FOR INSTALLING APACHE PIG

INPUT/OUTPUT:Input as Website Click Count Data

You might also like