Apex Institute of Technology
Department of Computer Science & Engineering
Bachelor of Engineering (Computer Science & Engineering)
INTRODUCTION TO BDA– (21CST-246)
Prepared By: Dr Md Nadeem Ahmed(E13733)
(Assistant Professor)
Dr Md Nadeem Ahmed
PARTITION CONCEPTS
Partitioning is a way of dividing a table into related parts based on the values of particular columns like date,
city, and department. Using partition, it is easy to query a portion of the data.
Why is Partitioning Important?
In the current century, we know that the huge amount of data which is in the range of petabytes is getting stored
in HDFS. So due to this, it becomes very difficult for Hadoop users to query this huge amount of data.
The Hive was introduced to lower down this burden of data querying. Apache Hive converts the SQL queries
into MapReduce jobs and then submits it to the Hadoop cluster. When we submit a SQL query, Hive read the
entire data-set.
So, it becomes inefficient to run MapReduce jobs over a large table. Thus, this is resolved by creating partitions
in tables. Apache Hive makes this job of implementing partitions very easy by creating partitions by its
automatic partition scheme at the time of table creation.
How to Create Partitions in Hive?
Using PARTITIONED BY Clause
Example:-
CREATE TABLE table_name (column1 data_type, column2 data_type)
PARTITIONED BY (partition1 data_type, partition2 data_type,….);
Hive Data Partitioning Example
Now let’s understand data partitioning in Hive with an example. Consider a table named Tab1. The table
contains client detail like id, name, dept, and yoj( year of joining). Suppose we need to retrieve the details of all
the clients who joined in 2012.
Then, the query searches the whole table for the required information. But if we partition the client data with the
year and store it in a separate file, this will reduce the query processing time. The below example will help us to
learn how to partition a file and its data-
The file name says file1 contains client data table:
tab1/clientdata/file1
id, name, dept, yoj
1, sunny, SC, 2009
2, animesh, HR, 2009
3, sumeer, SC, 2010
4, sarthak, TP, 2010[/php]
Now, let us partition above data into two files using years
[php]tab1/clientdata/2009/file2
1, sunny, SC, 2009
2, animesh, HR, 2009
tab1/clientdata/2010/file3
3, sumeer, SC, 2010
4, sarthak, TP, 2010
Now when we are retrieving the data from the table, only the data of the specified partition will be queried.
Creating a partitioned table is as follows:
CREATE TABLE table_tab1 (id INT, name STRING, dept STRING, yoj INT)
PARTITIONED BY (year STRING);
LOAD DATA LOCAL INPATH filepath/file2’OVERWRITE INTO TABLE studentTab
Fo EX:-
PARTITION (year=’2009′);
LOAD DATA LOCAL INPATH filepath/file3’OVERWRITE INTO TABLE studentTab PARTITION
(year=’2010′)
Types of Hive Partitioning
• Static Partitioning
• Dynamic Partitioning
Static Partitioning
In Static Partitioning, we have to manually decide how many partitions tables will have and also value for those
partitions.
Dynamic Partitioning
Dynamic partitions provide us with flexibility and create partitions automatically depending on the data that we
are inserting into the table.
By default, Hive does not enable dynamic partition. This is to protect us, from creating from a huge number of
partitions accidentally. In dynamic partition, we are telling hive which column to use for dynamic partition.
This will allow us to create dynamic partitions in the table without any static partition: -
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
Show All Partitions on Hive Table: -
SHOW PARTITIONS Table_name.
Add New Partition to the Hive Table: -
A new partition can be added to the table using the ALERT TABLE statement, you can also specify the location
where you wanted to store partition data on HDFS
ALTER TABLE Table_Name ADD PARTITION (partitionColumn = 'value1') location 'loc1';
Example: - ALTER TABLE zipcodes ADD PARTITION (state='CA') LOCATION
'/user/data/zipcodes_ca';
Rename or Update Hive Partition:-
Using: - ALTER TABLE, you can also rename or update the specific partition.
Example: - ALTER TABLE zipcodes PARTITION (state='AL') RENAME TO
PARTITION (state='NY');
Drop Hive Partition
Dropping a partition can also be performed using :-ALTER TABLE tablename DROP
EXAMPLE:- ALTER TABLE sales DROP IF EXISTS PARTITION(year = 2020, quarter = 2);