0% found this document useful (0 votes)

16 views11 pages

Lab6E - Creating Hive Partition Table

The document provides a comprehensive guide on creating partitioned tables in Hive, covering both static and dynamic partitioning methods. It includes multiple scenarios with step-by-step instructions for creating partitioned tables based on one or two columns, as well as exercises for practical application. Additionally, it outlines the benefits of partitioning and the command structure for creating and managing partitioned tables.

Uploaded by

2024740897

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views11 pages

Lab6E - Creating Hive Partition Table

Uploaded by

2024740897

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

L6E-Creating Hive Partition Table

Outlines • Partitioning concept

• Types of partitioning
• Scenario 1 - creating a partitioned table (static partitioning) based on one column and loading data from HDFS
• Scenario 2 - creating a partitioned table (static partitioning) based on one column where the data comes from
existing table
• Scenario 3 - creating a partitioned table (dynamic partitioning) based on one column where the data comes from
existing table
• Scenario 4 - creating a partitioned table (static partitioning) based on two columns where the data comes from
existing table
• Scenario 5 - creating a partitioned table (dynamic partitioning) based on two columns where the data comes
from existing table
• Exercise 1
• Exercise 2

partitioning
concept

https://data-flair.training/blogs/apache-hive-partitions/

purpose:
• to divide tables into different parts (smaller tables) based on partition keys
• the partition keys can refer to any particular columns such as gender, date, city, and department

benefit:
• partitioning is the optimization technique in Hive which can improve the performance significantly
• It can be improved since it can eliminate entire table scans when dealing with a large set of data

the command structure:

CREATE TABLE table_name (column1 data_type, column2 data_type) PARTITIONED BY (partition1 data_type, partition2
data_type,….);

Types of https://data-flair.training/blogs/apache-hive-partitions/
partitioning
Static Partitioning
• Insert input data files individually into a partition table
• Usually when loading files (big files) into Hive tables static partitions are preferred
• It saves your time in loading data compared to dynamic partition
• We can alter the partition in the static partition.

Dynamic Partitioning
• Single insert to partition table
• Usually, dynamic partition loads the data from the non-partitioned table.
• Dynamic Partition takes more time in loading data compared to static partition.
• When you have large data stored in a table then the Dynamic partition is suitable.
• If you want to partition a number of columns but you don’t know how many columns then also dynamic partition
is suitable.
• We can’t perform alter on the Dynamic partition.

Scenario 1 • To create a partitioned internal table (static partitioning) where the partition key is based on one column
• Then, to use load function to load data from HDFS into the partitioned table

concepts

Steps • transfer the dataset into HDFS

• construct and execute an HQL command to create an empty partitioned table
• load the dataset into the partitioned table using the respective partition key (i.e. rating)
• check the outcome

Dataset: ratings.csv

This dataset contains four columns:

• user id
• movie id
• rating (will be used as the partition key)
• unixtime
•

transfer the recall this tutorial - Transferring file into HDFS

dataset in
HDFS

Note:
• Make sure you have successfully transferred this file into HDFS before proceeding the next task.
• Make sure the file exists in the directory.
create a Note: remember to select your database e.g. (use student_saXX) before creating any table.
table
Run this command in Hive:

create table movie_rating_part (

userid int,
movieid int,
unixtime string)
partitioned by (rating int)
row format delimited
fields terminated by ',' ;

load the data Note:

• We need to specify the key for static partitioning
• Assume, we are interested in rating = 5 and rating =4

1) Load data where rating = 5

load data inpath '/user/student30/movie_rating/ratings.csv' overwrite into table movie_rating_part

partition(rating=5);

2) load data where rating = 4

• you need to update the command accordingly

Note: Make sure ratings.csv exists in the directory before executing 2) command.

check the To check the created partition, run this command:

outcome • show partitions movie_rating_part;
•

To check the actual directory where the data is stored in Hive, run this command:
• show create table movie_rating_part;

You should be able to see this location info:

To check the data for a specific partition, run this command:

• select userid, rating from movie_rating_part where rating=5 limit 5;

Scenario 2 • To create a partitioned table (static partitioning) from existing table where the partition key is based on one column
concepts

Steps • make sure your existing table exists and contains data
• construct and execute an HQL command to create an empty partitioned table
• insert data from the existing table into the partitioned table by using the partition key (i.e. rating)
• check the outcome

Existing table

Note:
• in this exercise, the existing table is not a partitioned table
• you will need to recall L6A (scenario 1), if you have not created this table yet

create a Note: remember to select your database e.g. (use student_saXX) before creating any table.
table
Run this command in Hive:

create table movie_rating_part2 (

userid int,
movieid int,
unixtime string)
partitioned by (rating int)
row format delimited
fields terminated by ',' ;

insert data run this command:

from an
existing table insert into table movie_rating_part2 partition(rating=5)
select userid, movieid, unixtime from movie_rating
where rating=5;

check the To check the created partition, run this command:

outcome • show partitions movie_rating_part2;

To check the actual directory where the data is stored in Hive, run this command:
• show create table movie_rating_part2;

You should be able to see this location info:

To check the data for a specific partition, run this command:

• select userid, rating from movie_rating_part2 where rating=5 limit 5;

insert run this command:

another
partitioned insert into table movie_rating_part2 partition(rating=4)
data select userid, movieid, unixtime from movie_rating
where rating=4;

then, check the output.

Scenario 3 • To create a partitioned table (dynamic partitioning) from existing table where the partition key is based on one
column

Note:
• For dynamic partitioning, we cannot directly load the data from HDFS into a partitioned table with dynamic approach.
• The only way is to load the data into a table (staging table), and use this table to insert data into a new table using
dynamic partitioning

Steps • make sure your existing table is already created and contains data
• construct and execute an HQL command to create an empty partitioned table
• set for dynamic partitioning
• insert data from the existing table into the partitioned table by using the partition key (i.e. rating)
• check the outcome

Existing table

Note:
• in this exercise, the existing table is not a partitioned table
• you will need to recall L6A (scenario 1), if you have not created this table yet

create a Note: remember to select your database e.g. (use student_saXX) before creating any table.
table
Run this command in Hive:

create table movie_rating_dynpart (

userid int,
movieid int,
unixtime string)
partitioned by (rating int)
row format delimited
fields terminated by ',' ;
set for Run these commands in Hive:
dynamic • set hive.exec.dynamic.partition=true;
partitioning
• set hive.exec.dynamic.partition.mode=nonstrict;

Note:
• The first setting is to enable dynamic partitioning
• The second setting is to allow all partitions to be dynamic, otherwise, at least one partition has to be statically defined
• without this setting, you may get the following error:

insert data run this command:

from an
existing table insert into table movie_rating_dynpart partition(rating)
select userid, movieid, unixtime, rating from movie_rating;

check the To check the created partition, run this command:

outcome • show partitions movie_rating_dynpart;

To check the actual directory where the data is stored in Hive, run this command:
• show create table movie_rating_dynpart;

You should be able to see this location info:

To check the total records, run this command:

• select count (*) as total from movie_rating_dynpart where rating=1;

Scenario 4 • To create a partitioned table (static partitioning) from existing table where the partition key is based on two columns

Steps • prepare the data source

• construct and execute an HQL command to create an empty partitioned table
• set for dynamic partitioning
• insert data from the existing table into the partitioned table by using the partition keys
• check the output

prepare the The dataset refers to orders table, given as follows:

data source
(recall sqoop
tutorial)
• This table is available in retail_db database, in MariaDB
• You will need to sqoop this table from MariaDB into Hive Metastore (if you have not done it yet)
• Recall this tutorial to guide you – L5C
• The partition keys to be used for this exercise are:
o order date
o order status

create the Note: remember to select your database e.g. (use student30) before creating any table.
partitioned
table Run this command:

create table orders_part (

order_id int,
order_customer_id int)
partitioned by (order_date string, order_status string)
row format delimited
fields terminated by ',' ;

insert data Note:

from an • Assume, we are interested to store data into the partition where order_date='2014-07-24' and
existing table order_status='COMPLETE'

Run this command:

insert overwrite table orders_part partition (order_date='2014-07-24', order_status='COMPLETE')

select order_id, order_customer_id from orders where
order_date = '2014-07-24 00:00:00.0' and order_status='COMPLETE';

check the To check the created partition, run this command:

outcome • show partitions orders_part;

To check the actual directory where the data is stored in Hive, run this command:
• show create table orders_part;

You should be able to see this location info:

To check the data, run this command:

• select count(*) as total from orders_part;

exploration / There are other categories for order status:

exercise
• Insert another partition where order_date='2014-07-24' and order_status='PENDING'
• tips: you need to use insert into, and dont forget to change the order_status in your command
• you should get the following partitions created:

• the total count of records should be:

• the total of newly added records is:

Scenario 5 • To create a partitioned table (dynamic partitioning) from existing table where the partition key is based on two
columns

Steps • prepare the data source

prepare the The dataset refers to customers table, given as follows:

data source
(recall sqoop
tutorial)

• This table is available in retail_db database, in MariaDB

• You will need to sqoop this table from MariaDB into Hive Metastore
• Recall this tutorial L5C
• The partition keys are:
o customer state
o customer city

create the Note: remember to select your database e.g. (use student_saXX) before creating any table.
partitioned
table Run this command in Hive:

create table customers_dynpart (

cust_id int,
cust_fname string,
cust_lname string,
cust_email string,
cust_zipcode string)
partitioned by (cust_state string, cust_city string)
row format delimited
fields terminated by ',' ;

set for Run these commands in Hive:

dynamic • set hive.exec.dynamic.partition=true;
partitioning
• set hive.exec.dynamic.partition.mode=nonstrict;
• set hive.exec.max.dynamic.partitions.pernode = 600;

Note:
• The first setting is to enable dynamic partitioning
• The second setting is to allow all partitions to be dynamic, otherwise, at least one partition has to be statically defined
• The third setting is to increase the max number of partitions (The expected partition to be created is closed to 600)

insert data Run this command:

from an
existing table insert overwrite table customers_dynpart partition (cust_state, cust_city)
select customer_id, customer_fname, customer_lname, customer_email, customer_zipcode, customer_state as cust_state,
customer_city as cust_city from customers;

This process will take some times. You can monitor the progress via:
• YARN application monitor - http://10.5.19.231:8088/cluster/apps
• YARN job monitor - http://10.5.19.231:19888/jobhistory/app
•

• You can also find out the number of mapper executed:

check the To check the created partition, run this command:

outcome • show partitions customers_dynpart;

• also, notice the number of partition created:

To check the actual directory where the data is stored in Hive, run this command:
• show create table customers_dynpart;

You should be able to see this location info:

To check the total number, run this command:

• select count(*) from customers_part;

Exercise 1 • Load this dataset into HDFS

• Create an external table to hold this dataset
• Create a partitioned table (dynamic partitioning) where the data comes from the previously created external table

The dataset student_record.csv

Exercise 2 • Load this dataset into HDFS

• Create an external table to hold this dataset which contains Id, Url, Date, PubId, AdvertiserId
• Notice that AdvertiseId can be split into sub fields
• Thus, create a new external table to store the processed dataset which contains Id, Date, PubId, AdvertiserId, Keyword,
Country
• Create a partitioned table (dynamic partitioning) based on country where the data comes from the previously created
external table
• The partitioned table should contain Id, Date, PubId, AdvertiserId, Keyword

Dataset advertisement.txt

Sample Sample of created partitions:

output

Accessing • to access HUE, go to https://bigdatalab-rm-en1.uitm.edu.my:8889/hue/accounts/login?next=/

HUE
• then login using the given account

Accessing • to access Hive, execute the following command:

Hive o beeline -u jdbc:hive2://bigdatalab-cdh-mn1.uitm.edu.my:10000 -n yourrusername -p yourpassword
• then type in:
o use yourdatabasename
• then, you can browse the available tables, by typing in:
o show tables

Accessing Type in the following:

MariaDB • mysql -ustudent -pp@ssw0rd retail_db

Hive Query Language
No ratings yet
Hive Query Language
33 pages
Apache Hive 34 35
No ratings yet
Apache Hive 34 35
65 pages
HiveQL Overview
No ratings yet
HiveQL Overview
71 pages
Ip Practice Questions Class 12
No ratings yet
Ip Practice Questions Class 12
5 pages
Bda-Unit-Iv - 2020-21
100% (1)
Bda-Unit-Iv - 2020-21
30 pages
Hive File Format
No ratings yet
Hive File Format
38 pages
Cse3002 Big Data m2
No ratings yet
Cse3002 Big Data m2
76 pages
Hive Commands
No ratings yet
Hive Commands
15 pages
Finote Selam Collage Department of Information Technology
100% (2)
Finote Selam Collage Department of Information Technology
3 pages
07 Hive 01 Exercises
0% (1)
07 Hive 01 Exercises
4 pages
112 Q&a
No ratings yet
112 Q&a
139 pages
Mod 2
No ratings yet
Mod 2
70 pages
BDAV Practical 4 Hive
No ratings yet
BDAV Practical 4 Hive
21 pages
Big Data Analytics: Seema Acharya Subhashini Chellappan
100% (1)
Big Data Analytics: Seema Acharya Subhashini Chellappan
47 pages
Apache Hive Notes
No ratings yet
Apache Hive Notes
15 pages
Partition Concepts
No ratings yet
Partition Concepts
4 pages
Oracle Basic Concepts
No ratings yet
Oracle Basic Concepts
29 pages
Hive Main
No ratings yet
Hive Main
24 pages
Hive
No ratings yet
Hive
42 pages
Hive 1
No ratings yet
Hive 1
39 pages
M4 Q&a
No ratings yet
M4 Q&a
22 pages
Wa0006.
No ratings yet
Wa0006.
53 pages
Module 4
No ratings yet
Module 4
34 pages
BDA Hive
No ratings yet
BDA Hive
22 pages
SQ L Queries
No ratings yet
SQ L Queries
29 pages
HIVe Hands On
No ratings yet
HIVe Hands On
1 page
HIVE Architecture
No ratings yet
HIVE Architecture
5 pages
Hive Main
No ratings yet
Hive Main
33 pages
Complete Hive Practical
No ratings yet
Complete Hive Practical
8 pages
Hive Interview
75% (4)
Hive Interview
17 pages
Unit-4 Pig Hive
No ratings yet
Unit-4 Pig Hive
40 pages
Apache HIVE
No ratings yet
Apache HIVE
44 pages
19hive Partitioning
No ratings yet
19hive Partitioning
2 pages
Lab6F - Creating Hive Table With Complex Data Type
No ratings yet
Lab6F - Creating Hive Table With Complex Data Type
11 pages
Hive Query Language Guide
No ratings yet
Hive Query Language Guide
33 pages
To Create A Table in Hive
No ratings yet
To Create A Table in Hive
1 page
Lab 6 - Hive
No ratings yet
Lab 6 - Hive
4 pages
Hive Table Session
No ratings yet
Hive Table Session
23 pages
SQL GRANT and REVOKE Guide
No ratings yet
SQL GRANT and REVOKE Guide
3 pages
Hive
No ratings yet
Hive
29 pages
Partitioning Bucketing and Join
No ratings yet
Partitioning Bucketing and Join
4 pages
Teradata SQL Cheat Sheet - Free Download - ETL With SQL
No ratings yet
Teradata SQL Cheat Sheet - Free Download - ETL With SQL
31 pages
14-Lesson Cloudera Hive
No ratings yet
14-Lesson Cloudera Hive
9 pages
Hive Data Warehousing Overview
No ratings yet
Hive Data Warehousing Overview
61 pages
Hive Commands
No ratings yet
Hive Commands
7 pages
Unit-I DBMS
No ratings yet
Unit-I DBMS
32 pages
Hive
No ratings yet
Hive
15 pages
HDFSandhivecommands
No ratings yet
HDFSandhivecommands
15 pages
MySQL Basics: Data Types & Constraints
100% (1)
MySQL Basics: Data Types & Constraints
9 pages
Distributing SQL Queries With Hadoop
No ratings yet
Distributing SQL Queries With Hadoop
14 pages
Hadoop Prac Commands
No ratings yet
Hadoop Prac Commands
16 pages
HIVE
No ratings yet
HIVE
24 pages
Apache Hive Interview Questions: 1. Define The Difference Between Hive and Hbase?
No ratings yet
Apache Hive Interview Questions: 1. Define The Difference Between Hive and Hbase?
10 pages
Hive Tutorial for Data Analysts
No ratings yet
Hive Tutorial for Data Analysts
11 pages
6.1NoSQL ApacheHIVE Witha3
No ratings yet
6.1NoSQL ApacheHIVE Witha3
45 pages
Hive Cammand
No ratings yet
Hive Cammand
22 pages
Session 3.2
No ratings yet
Session 3.2
27 pages
Hive Partitions and Buckets Exercises
No ratings yet
Hive Partitions and Buckets Exercises
8 pages
Hive Data Management Guide
No ratings yet
Hive Data Management Guide
31 pages
Bigdata Analytics
No ratings yet
Bigdata Analytics
13 pages
BDA Assignment I and II
No ratings yet
BDA Assignment I and II
8 pages
Hive for Data Engineers
No ratings yet
Hive for Data Engineers
18 pages
Introduction To SQL-9
No ratings yet
Introduction To SQL-9
12 pages
Database Management System Quiz
No ratings yet
Database Management System Quiz
37 pages
Basics of PL/SQL
No ratings yet
Basics of PL/SQL
35 pages
Hive Overview
No ratings yet
Hive Overview
28 pages
Hive Queries
No ratings yet
Hive Queries
5 pages
Hive 2
No ratings yet
Hive 2
2 pages
Database Management Basics
No ratings yet
Database Management Basics
14 pages
SQL Commands Syntax Example
No ratings yet
SQL Commands Syntax Example
3 pages
Distinct: Practical - 1 Aim To Study DDL-create and DML-insert Commands Q-1 A Query With Output
No ratings yet
Distinct: Practical - 1 Aim To Study DDL-create and DML-insert Commands Q-1 A Query With Output
52 pages
Python Unit 5
No ratings yet
Python Unit 5
21 pages
Class 12 Ip Practical Programs 2024-25 Revised
No ratings yet
Class 12 Ip Practical Programs 2024-25 Revised
42 pages
Midterm in Load Testing
No ratings yet
Midterm in Load Testing
5 pages
Facebook Hive POC
No ratings yet
Facebook Hive POC
18 pages
SQL Queries, Constraints and Triggers
No ratings yet
SQL Queries, Constraints and Triggers
12 pages
Pi Rdbmspi 3.21.4.30
No ratings yet
Pi Rdbmspi 3.21.4.30
259 pages
Hive Notes PDF
No ratings yet
Hive Notes PDF
12 pages
PowerShell Gotchas Guide
No ratings yet
PowerShell Gotchas Guide
56 pages
Chapter 13: Filter Results Using WHERE and Having
No ratings yet
Chapter 13: Filter Results Using WHERE and Having
49 pages
SQL Practical Questions
No ratings yet
SQL Practical Questions
24 pages
SQL Database Schema and Queries
No ratings yet
SQL Database Schema and Queries
6 pages
PHP Database Connectivity
No ratings yet
PHP Database Connectivity
5 pages
Releasenote 2.0.0
No ratings yet
Releasenote 2.0.0
5 pages
Pratical File Term 2
No ratings yet
Pratical File Term 2
11 pages
Insurance Management DBMS Project
No ratings yet
Insurance Management DBMS Project
27 pages
SQL*Loader Case Studies Guide
No ratings yet
SQL*Loader Case Studies Guide
12 pages
Plsqlparti 1
No ratings yet
Plsqlparti 1
19 pages
MySQL Lab Manual for Students
No ratings yet
MySQL Lab Manual for Students
105 pages

Lab6E - Creating Hive Partition Table

Uploaded by

Lab6E - Creating Hive Partition Table

Uploaded by

L6E-Creating Hive Partition Table

Outlines • Partitioning concept

the command structure:

Steps • transfer the dataset into HDFS

This dataset contains four columns:

transfer the recall this tutorial - Transferring file into HDFS

create table movie_rating_part (

load the data Note:

1) Load data where rating = 5

load data inpath '/user/student30/movie_rating/ratings.csv' overwrite into table movie_rating_part

2) load data where rating = 4

check the To check the created partition, run this command:

You should be able to see this location info:

To check the data for a specific partition, run this command:

create table movie_rating_part2 (

insert data run this command:

check the To check the created partition, run this command:

You should be able to see this location info:

To check the data for a specific partition, run this command:

insert run this command:

then, check the output.

create table movie_rating_dynpart (

insert data run this command:

check the To check the created partition, run this command:

You should be able to see this location info:

To check the total records, run this command:

Steps • prepare the data source

prepare the The dataset refers to orders table, given as follows:

create table orders_part (

insert data Note:

Run this command:

insert overwrite table orders_part partition (order_date='2014-07-24', order_status='COMPLETE')

check the To check the created partition, run this command:

You should be able to see this location info:

To check the data, run this command:

exploration / There are other categories for order status:

• the total count of records should be:

• the total of newly added records is:

Steps • prepare the data source

prepare the The dataset refers to customers table, given as follows:

• This table is available in retail_db database, in MariaDB

create table customers_dynpart (

set for Run these commands in Hive:

insert data Run this command:

• You can also find out the number of mapper executed:

check the To check the created partition, run this command:

• also, notice the number of partition created:

You should be able to see this location info:

To check the total number, run this command:

Exercise 1 • Load this dataset into HDFS

The dataset student_record.csv

Exercise 2 • Load this dataset into HDFS

Sample Sample of created partitions:

Accessing • to access HUE, go to https://bigdatalab-rm-en1.uitm.edu.my:8889/hue/accounts/login?next=/

Accessing • to access Hive, execute the following command:

Accessing Type in the following:

You might also like