0% found this document useful (0 votes)

46 views14 pages

Distributing SQL Queries With Hadoop

Here is the SQL query to find the movie with the highest average rating using Hive: SELECT movieID, AVG(rating) AS average_rating FROM ratings GROUP BY movieID HAVING COUNT(*) > 10 ORDER BY average_rating DESC LIMIT 1; This query: 1. Groups the ratings data by movieID 2. Calculates the average rating for each movie using AVG(rating) 3. Only includes movies with more than 10 ratings using the HAVING clause 4. Orders the results by average rating in descending order 5. Limits the output to the top 1 movie So this will return the single movieID with the highest average rating where there

Uploaded by

vignesh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views14 pages

Distributing SQL Queries With Hadoop

Uploaded by

vignesh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 14

HIVE

Distributing SQL queries with Hadoop

What is Hive?

Hive

MapReduce Tez

Hadoop YARN

Translates SQL queries to MapReduce or Tez jobs on your cluster!

Why Hive?

■ Uses familiar SQL syntax (HiveQL)

■ Interactive
■ Scalable – works with “big data” on a
cluster
– Really most appropriate for data
warehouse applications
■ Easy OLAP queries – WAY easier than
writing MapReduce in Java
■ Highly optimized
■ Highly extensible
– User defined functions
– Thrift server
– JDBC / ODBC driver
Why not Hive?

■ High latency – not appropriate for OLTP

■ Stores data de-normalized
■ SQL is limited in what it can do
– Pig, Spark allows more complex stuff
■ No transactions
■ No record-level updates, inserts, deletes
HiveQL

■ Pretty much MySQL with some extensions

■ For example: views
– Can store results of a query into a “view”, which subsequent queries can
use as a table
■ Allows you to specify how structured data is stored and partitioned
Let’s just dive into an example.
HOW HIVE WORKS
Schema On Read

■ Hive maintains a “ metastore ” that imparts a structure you define on the

unstructured data that is stored on HDFS etc.

CREATE TABLE ratings (

userID INT,
movieID INT,
rating INT,
time INT)
ROW FORMAT DELIMTED
FIELDS TERMINATED BY ’ \ t’
STORED AS TEXTFILE;

LOAD DATA LOCAL INPATH ‘${ env:HOME}/ml-100k/u.data ’

OVERWRITE INTO TABLE ratings;
Where is the data?

■ LOAD DATA
– MOVES data from a distributed filesystem into Hive
■ LOAD DATA LOCAL
– COPIES data from your local filesystem into Hive
■ Managed vs. External tables

CREATE EXTERNAL TABLE IF NOT EXISTS ratings (

userID INT,
movieID INT,
rating INT,
time INT)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘ \ t’
LOCATION ‘/data/ml -100k/u.data ’;
Partitioning

■ You can store your data in partitioned subdirectories

– Huge optimization if your queries are only on certain partitions

CREATE TABLE customers (

name STRING,
address STRUCT<street:STRING, city:STRING, state:STRING, zip:INT>
)
PARTITIONED BY (country STRING);

…/customers/country=CA/
…/customers/country=GB/
Ways to use Hive

■ Interactive via hive> prompt / Command line interface (CLI)

■ Saved query files
– hive – f /somepath/queries.hql
■ Through Ambari / Hue
■ Through JDBC/ODBC server
■ Through Thrift service
– But remember, Hive is not suitable for OLTP
■ Via Oozie
HIVE CHALLENGE
Find the movie with the highest
average rating
■ Hint: AVG() can be used on aggregated data, like COUNT() does.
■ Extra credit: only consider movies with more than 10 ratings

Hadoop HIVE
No ratings yet
Hadoop HIVE
41 pages
250+ TOP MCQs On Geotechnical Engineering and Answers
100% (4)
250+ TOP MCQs On Geotechnical Engineering and Answers
4 pages
Hive Tutorial
No ratings yet
Hive Tutorial
25 pages
Chapter+9+ HIVE
No ratings yet
Chapter+9+ HIVE
50 pages
Hive - A Warehousing Solution Over A Map-Reduce Framework
No ratings yet
Hive - A Warehousing Solution Over A Map-Reduce Framework
4 pages
Hive Interview
75% (4)
Hive Interview
17 pages
Retaining Wall Drawing
No ratings yet
Retaining Wall Drawing
1 page
Prosman2 - Fluidity of Molten Metal
No ratings yet
Prosman2 - Fluidity of Molten Metal
22 pages
04 Bigdata Hive
No ratings yet
04 Bigdata Hive
22 pages
Apache Hive: Prashant Gupta
100% (1)
Apache Hive: Prashant Gupta
61 pages
Hive Overview
No ratings yet
Hive Overview
28 pages
Hive Introduction
No ratings yet
Hive Introduction
13 pages
RCC Structure by PANDI MANI
No ratings yet
RCC Structure by PANDI MANI
13 pages
Problems and Solutions - C4
83% (6)
Problems and Solutions - C4
25 pages
Big Data Analytics and Developers Training Session 10
No ratings yet
Big Data Analytics and Developers Training Session 10
27 pages
Hive Data Warehousing Overview
No ratings yet
Hive Data Warehousing Overview
61 pages
Hive for Data Engineers
No ratings yet
Hive for Data Engineers
18 pages
WSM Vs ULM Vs LSM
No ratings yet
WSM Vs ULM Vs LSM
3 pages
Working With Files: A Presentation On
No ratings yet
Working With Files: A Presentation On
27 pages
03 Hive
No ratings yet
03 Hive
48 pages
Hive Lecture Notes
100% (1)
Hive Lecture Notes
17 pages
Hive PPT
No ratings yet
Hive PPT
61 pages
Hive L1
No ratings yet
Hive L1
134 pages
Big Data Analytics: Welcome
No ratings yet
Big Data Analytics: Welcome
69 pages
Hive PPT
No ratings yet
Hive PPT
25 pages
Hive
No ratings yet
Hive
65 pages
Bigdata Analytics
No ratings yet
Bigdata Analytics
13 pages
Session 3.2
No ratings yet
Session 3.2
27 pages
Fluid Mechanics Exam Guide
No ratings yet
Fluid Mechanics Exam Guide
8 pages
7 Hive Notes
No ratings yet
7 Hive Notes
36 pages
Java Code Legality and Error Analysis
No ratings yet
Java Code Legality and Error Analysis
38 pages
DVD Lens Actuator
No ratings yet
DVD Lens Actuator
6 pages
HIVE Lect
No ratings yet
HIVE Lect
91 pages
Product Sheet ALON PDF
No ratings yet
Product Sheet ALON PDF
3 pages
2 T24Updates
No ratings yet
2 T24Updates
24 pages
BDA011GU04
No ratings yet
BDA011GU04
49 pages
HIVE
No ratings yet
HIVE
80 pages
Apache HIVE
No ratings yet
Apache HIVE
44 pages
Improvise Academy: Subject: Physics Class: XII Full Marks: 75
No ratings yet
Improvise Academy: Subject: Physics Class: XII Full Marks: 75
2 pages
6.1NoSQL ApacheHIVE Witha3
No ratings yet
6.1NoSQL ApacheHIVE Witha3
45 pages
Chapter 5 Hive
No ratings yet
Chapter 5 Hive
69 pages
Hive PPTs
No ratings yet
Hive PPTs
34 pages
Hive Final
No ratings yet
Hive Final
75 pages
Hive Data Warehousing Overview
No ratings yet
Hive Data Warehousing Overview
9 pages
Hive
No ratings yet
Hive
29 pages
Hive
No ratings yet
Hive
49 pages
Coccinia Grandis
No ratings yet
Coccinia Grandis
9 pages
dg1-6 The Gauss Curvature (Detail)
No ratings yet
dg1-6 The Gauss Curvature (Detail)
12 pages
Hive Main
No ratings yet
Hive Main
33 pages
Dokumen - Tips Basic Flowsheeting Principles Thermart Himmelblau D M and Riggs J B 2003 Basic
No ratings yet
Dokumen - Tips Basic Flowsheeting Principles Thermart Himmelblau D M and Riggs J B 2003 Basic
111 pages
5 - Hive
No ratings yet
5 - Hive
51 pages
Sessional - 1 Blockchain (MCA)
No ratings yet
Sessional - 1 Blockchain (MCA)
9 pages
Cse3002 Big Data m2
No ratings yet
Cse3002 Big Data m2
76 pages
Hive
No ratings yet
Hive
42 pages
4in SB12MNRX2 25 4
No ratings yet
4in SB12MNRX2 25 4
1 page
BDA Unit-5
No ratings yet
BDA Unit-5
39 pages
Banklogs Report
No ratings yet
Banklogs Report
3 pages
Hive Basics
No ratings yet
Hive Basics
35 pages
Unit 5 (BDC)
No ratings yet
Unit 5 (BDC)
59 pages
07 Hive 01
No ratings yet
07 Hive 01
21 pages
(R17a0528) Big Data Analytics-57-100
No ratings yet
(R17a0528) Big Data Analytics-57-100
44 pages
Unit 3 BDA
No ratings yet
Unit 3 BDA
44 pages
Computer Science Engineering Course Outcomes
No ratings yet
Computer Science Engineering Course Outcomes
17 pages
Hive
No ratings yet
Hive
4 pages
Wa0006.
No ratings yet
Wa0006.
53 pages
Hive Introduction
No ratings yet
Hive Introduction
47 pages
Ultra Sensitive TSH Test Report
No ratings yet
Ultra Sensitive TSH Test Report
1 page
Hive Architecture
No ratings yet
Hive Architecture
7 pages
Hive Intoduction and Tables
No ratings yet
Hive Intoduction and Tables
31 pages
Unit 5 Lecture No-1 (Hive)
No ratings yet
Unit 5 Lecture No-1 (Hive)
30 pages
Lecture11 - 18374 - Zero Lecture CIV438
No ratings yet
Lecture11 - 18374 - Zero Lecture CIV438
52 pages
Module 4
No ratings yet
Module 4
34 pages
Dhanu SH
No ratings yet
Dhanu SH
296 pages
Algomasterio System Design Interview Handbook
No ratings yet
Algomasterio System Design Interview Handbook
19 pages
Module - 4
No ratings yet
Module - 4
58 pages
Assignment 1spring25
No ratings yet
Assignment 1spring25
3 pages
BDA Hive
No ratings yet
BDA Hive
22 pages
Big Data Analytics Module-4
No ratings yet
Big Data Analytics Module-4
39 pages
A Novel Online Machine Learning Approach For..
No ratings yet
A Novel Online Machine Learning Approach For..
7 pages
Unit V
No ratings yet
Unit V
23 pages
CIE As & A LEVEL MECHANICS PREDICTION PAPER 1 FOR 250730 122147
No ratings yet
CIE As & A LEVEL MECHANICS PREDICTION PAPER 1 FOR 250730 122147
12 pages
NIC Asia Bank Limited
No ratings yet
NIC Asia Bank Limited
49 pages
F2014L
No ratings yet
F2014L
4 pages
CS311 Final Term Question File 2019, 2020, 2021
No ratings yet
CS311 Final Term Question File 2019, 2020, 2021
5 pages
HiveQL Overview
No ratings yet
HiveQL Overview
71 pages

Distributing SQL Queries With Hadoop

Uploaded by

Distributing SQL Queries With Hadoop

Uploaded by

HIVE

Distributing SQL queries with Hadoop

Translates SQL queries to MapReduce or Tez jobs on your cluster!

■ Uses familiar SQL syntax (HiveQL)

■ High latency – not appropriate for OLTP

■ Pretty much MySQL with some extensions

■ Hive maintains a “ metastore ” that imparts a structure you define on the

CREATE TABLE ratings (

LOAD DATA LOCAL INPATH ‘${ env:HOME}/ml-100k/u.data ’

CREATE EXTERNAL TABLE IF NOT EXISTS ratings (

■ You can store your data in partitioned subdirectories

CREATE TABLE customers (

■ Interactive via hive> prompt / Command line interface (CLI)

You might also like