0% found this document useful (0 votes)

23 views15 pages

Hive Commands

The document provides a comprehensive guide on using Apache Hive, including commands for starting Hive, creating databases and tables, inserting and selecting data, and managing partitions and buckets. It explains the differences between partitioning and bucketing, as well as data types in Hive such as arrays, maps, and structs. Additionally, it covers various SQL-like queries for data manipulation and retrieval in Hive.

Uploaded by

vaishnavi kumari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views15 pages

Hive Commands

Uploaded by

vaishnavi kumari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 15

Hive Commands

Steps to start hive:

1. cd C:\hadoopsetup\hadoop-3.2.4\sbin (Hadoop sbin)
2. start-all.cmd
3. cd C:\hive\apache-hive-3.1.3-bin\apache-hive-3.1.3-bin\bin (hive bin)
in a new cmd
4. StartNetworkServer -h 0.0.0.0
back to our original cmd
5. hive

now start with hive commands:

1. Create Database
CREATE DATABASE lpu;
What it does:
Creates a new database named lpu.
Purpose:
Databases are used to organize tables into separate logical groups.

2. Use Database
USE lpu;
What it does:
Switches the active database to lpu, so any new table you create will belong to it.

3. Create Table
CREATE TABLE students (id INT, name STRING);
What it does:
Creates a table students with two columns:
id → integer type
name → string type

4. Show Tables
SHOW TABLES;
What it does:
Lists all the tables available in the currently selected database.

5. Describe Table
DESCRIBE students;
What it does:
Displays the schema (columns and data types) of the students table.

6. Insert Data into Table

INSERT INTO students VALUES (1, "abc");
What it does:
Inserts one record into students table:
(id=1, name="abc").

7. Select Data
SELECT * FROM students;
What it does:
Fetches all rows and all columns from the students table.

8. Create Table with Custom Settings

CREATE TABLE customer(id INT, fname STRING, lname STRING, city
STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE;
What it does:
Creates a customer table where:
Each row's fields are separated by commas.
Table is saved in plain text format.

9. Load Data into Table

LOAD DATA LOCAL INPATH
'C:/Users/ASUS/Desktop/HADOOPFILES/hive.txt' INTO TABLE
customer;
What it does:
Loads data from a local file hive.txt into the customer table.

10. Rename Table

ALTER TABLE customer RENAME TO employees;
What it does:
Changes the table name from customer to employees.

11. Add Column to Table

ALTER TABLE employees ADD COLUMNS (salary INT);
What it does:
Adds a new column salary (integer type) to the employees table.
12. Truncate Table
TRUNCATE TABLE employees;
What it does:
Removes all rows from the employees table but keeps the table structure.

13. Drop Table

DROP TABLE employees;
What it does:
Deletes the employees table and removes all its data permanently.

Queries on student_data Table

Create student_data Table
CREATE TABLE student_data (
student_id INT,
student_name STRING,
department STRING,
marks INT,
advisor_id INT
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE;
What it does:
Creates a table for storing student records.
Insert Multiple Rows
INSERT INTO TABLE student_data VALUES
(1, 'Anya', 'CS', 88, 501),
(2, 'Brian', 'Math', 76, 502),
(3, 'Cara', 'CS', 92, 501),
(4, 'Daniel', 'Physics', 65, 503),
(5, 'Eva', 'Math', 81, NULL);
What it does:
Inserts multiple student records into the table at once.

Select CS Students with Marks > 90

SELECT * FROM student_data
WHERE department = 'CS' AND marks > 90;
Purpose:
Fetches CS students who scored more than 90 marks.

Students Not in Math Department

SELECT * FROM student_data
WHERE department != 'Math';
Purpose:
Fetches students whose department is NOT Math.

Students Whose Names Start with 'A'

SELECT * FROM student_data
WHERE student_name LIKE 'A%';
Purpose:
Fetches students whose names begin with the letter A.

Students in CS or Physics Department

SELECT * FROM student_data
WHERE department IN ('CS', 'Physics');
Purpose:
Fetches students enrolled either in CS or Physics departments.

Students with Marks Between 70 and 90

SELECT * FROM student_data
WHERE marks BETWEEN 70 AND 90;
Purpose:
Fetches students whose marks fall between 70 and 90, inclusive.

Extra Useful Hive Commands (Added by me!)

Show All Databases
SHOW DATABASES;
Lists all databases available in Hive.

Drop Database
DROP DATABASE lpu;
Deletes the lpu database (only if it’s empty unless you use
CASCADE).

Drop Database with All Tables

DROP DATABASE lpu CASCADE;
Deletes the lpu database along with all its tables.

Create Table as Select (CTAS)

CREATE TABLE high_scorers AS
SELECT * FROM student_data WHERE marks > 85;
Creates a new table (high_scorers) with data from a SELECT query.

Count Rows in Table

SELECT COUNT(*) FROM student_data;
Returns the total number of rows in the student_data table.
What is Hive?

➔ Apache Hive is a data warehouse system built on top of Hadoop.

It is used for querying and managing big data stored in Hadoop Distributed File System (HDFS) using SQL-like
language called HiveQL.

In simple words:

Hive = SQL for Hadoop.

What is HBase?

➔ Apache HBase is a NoSQL distributed database that runs on top of Hadoop HDFS.
It is designed to provide random real-time read/write access to big data.

In simple words:

HBase = NoSQL Database for Hadoop.

Partitioning in hive:
hive> show tables;
create table students(id INT, name STRING, branch STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE;

load data local inpath 'C:\Users\ASUS\Desktop\HADOOPFILES\

hivepartitioning.txt' into table students;

create table part_stu_branch(id INT, name STRING)

partitioned by (branch STRING);

set hive.exec.dynamic.partition.mode = nonstrict;

(By default, Hive is in strict mode for safety reasons (to avoid mistakenly creating too many partitions, which
can slow down the system).

But in your case, you want full dynamic partitioning, where Hive reads branch values and makes partitions
automatically.

insert overwrite table part_stu_branch partition(branch)

select id, name, branch from students;
3. Verifying Partitioning in HDFS
(Opening another Command Prompt.)

Navigate to Hadoop's sbin directory:

cd C:\hadoopsetup\hadoop-3.2.4\sbin
Start Hadoop services:
start-all.cmd
Starts HDFS (namenode, datanode) and YARN (resourcemanager,
nodemanager).

Step 8:
hdfs dfs -ls /user/hive/warehouse/part_stu_branch
Lists the folders inside part_stu_branch.

You will see folders like:

/branch=CSE
/branch=ECE
/branch=MECH

These are partition folders.

Step 9:
hdfs dfs -ls "/user/hive/warehouse/part_stu_branch/branch=CSE"
Lists the files inside the partition folder for CSE branch.

Step 10:
hdfs dfs -cat
"/user/hive/warehouse/part_stu_branch/branch=CSE/000000_0"
Displays the data for students in the CSE branch.
Static Partitioning:
In static partitioning, you must manually specify the partition column value
during INSERT.
insert into table part_stu_branch partition(branch='CSE')
select id, name from students where branch='CSE';

You tell Hive exactly:

➔ "Put this data into the branch = CSE partition."

Dynamic Partitioning:
In dynamic partitioning, you don't specify partition values manually.
Hive automatically reads partition column values from your SELECT statement.

insert overwrite table part_stu_branch partition(branch)

select id, name, branch from students;

Here, Hive looks at the branch column and creates partitions automatically like:
 branch = CSE
 branch = ECE
 branch = MECH etc.

Mode Meaning

strict Dynamic partitioning is restricted. You must at least partially specify static
(default) partitions.

Dynamic partitioning is fully allowed. No need to specify any static partition

nonstrict
values. Hive will create partitions dynamically for all data.

Hive Bucketing:
SET hive.enforce.bucketing=true;
 Makes Hive respect bucketing rules during insert operations.
 Without setting this, Hive might ignore buckets even if you define them.

create table st_bucket(id INT, name STRING, branch STRING)

clustered by (id) into 3 buckets
row format delimited
fields terminated by ',';

 Creates a table st_bucket.

 Data is bucketed into 3 files based on id (clustered by id).
 Bucketed tables help in faster queries by organizing data better.

insert overwrite table st_bucket select * from students;

 Inserts all the data from students into st_bucket and divides it into 3
buckets (files).

Verifying Bucketing in HDFS

(Open another new CMD terminal.)
hdfs dfs -ls "/user/hive/warehouse/st_bucket"
 Lists all the bucket files created inside the st_bucket directory.
 You will find 3 files (buckets), named something like:
o 000000_0
o 000001_0
o 000002_0
hdfs dfs -cat "/user/hive/warehouse/st_bucket/000000_0"
 Displays the content of a bucket file.

What is Partitioning?
Partitioning means dividing the data into separate folders based on the value of a specific
column.
Purpose of Partitioning:
 Faster Queries:
When you query only CSE students, Hive will directly go to /branch=CSE/ instead of
scanning the full table.
 Less I/O:
Hive reads only necessary partitions, not entire data.
 Better Management:
Easier to maintain and delete specific partitions

What is Bucketing?
Bucketing means dividing data into a fixed number of files based on the hash of a column.
 Instead of organizing into folders, data is organized into N number of buckets (files).
 You specify how many buckets you want.
 Rows are assigned to a bucket based on the hash value of a column (e.g., id).
Purpose of Bucketing:
 Even Distribution:
Distributes data more evenly, especially useful when data is skewed.
 Efficient Joins:
If two tables are bucketed on the same column, join operations become much faster.
 Parallel Processing:
MapReduce can process multiple buckets in parallel.

Partitioning vs Bucketing
Feature Partitioning Bucketing

Divides by Column Value Hash of Column Value

Feature Partitioning Bucketing

Storage Folders Files inside a folder

Number Depends on unique column values (dynamic) Fixed number (you decide)

Best for Filtering data (WHERE branch='CSE') Efficient joins, sampling

Example /branch=CSE/ folder 000000_0, 000001_0 bucket files

Hive data types:

Array:
CREATE TABLE temperature(

sno INT,

place STRING,

temp ARRAY<DOUBLE>

ROW FORMAT DELIMITED

FIELDS TERMINATED BY '\t'

COLLECTION ITEMS TERMINATED BY ',';

LOAD DATA LOCAL INPATH 'D:/temperature.txt' INTO TABLE temperature;

SELECT temp[0] FROM temperature;

Map:
CREATE TABLE country(

city STRING,

temp MAP<INT, INT>

ROW FORMAT DELIMITED

FIELDS TERMINATED BY '\t'

COLLECTION ITEMS TERMINATED BY ','

MAP KEYS TERMINATED BY ':';

LOAD DATA LOCAL INPATH 'D:/mapset.txt' INTO TABLE country;

SELECT * FROM country;

SELECT temp[2018] FROM country;

SELECT temp[2018] FROM country WHERE city='jalandhar';

Struct:
CREATE TABLE result(

name STRING,

city STRING,

marks STRUCT<subject:STRING, grade:FLOAT>

ROW FORMAT DELIMITED

FIELDS TERMINATED BY '\t'

COLLECTION ITEMS TERMINATED BY ',';

LOAD DATA LOCAL INPATH 'D:/result.txt' INTO TABLE result;

Query struct elements:

SELECT * FROM result;

 Shows entire table.

SELECT marks.grade FROM result;

 Fetches only grade from the struct.

SELECT marks.subject FROM result;

 Fetches only subject from the struct.

To get the total sum of all transactions:

SELECT SUM(amount) AS total_amount_spent FROM transactions;

To get total amount spent per account:

SELECT account_number, SUM(amount) AS total_amount
FROM transactions
GROUP BY account_number;

CREATE TABLE transactions (

transaction_id INT,
account_number STRING,
amount DOUBLE
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE;

Sample Input (CSV format):

1,ACC001,100.50
2,ACC002,250.00
3,ACC001,150.75
4,ACC003,300.25

Hive Query Language
No ratings yet
Hive Query Language
33 pages
Black Belt Project
100% (2)
Black Belt Project
21 pages
HiveQL Overview
No ratings yet
HiveQL Overview
71 pages
Apache Hive 34 35
No ratings yet
Apache Hive 34 35
65 pages
BDAV Practical 4 Hive
No ratings yet
BDAV Practical 4 Hive
21 pages
Apache Hive Notes
No ratings yet
Apache Hive Notes
15 pages
Bda-Unit-Iv - 2020-21
100% (1)
Bda-Unit-Iv - 2020-21
30 pages
M4 Q&a
No ratings yet
M4 Q&a
22 pages
Wa0006.
No ratings yet
Wa0006.
53 pages
Big Data Analytics: Seema Acharya Subhashini Chellappan
100% (1)
Big Data Analytics: Seema Acharya Subhashini Chellappan
47 pages
ZTE F832 User Manuel
No ratings yet
ZTE F832 User Manuel
65 pages
Hive File Format
No ratings yet
Hive File Format
38 pages
Zbe Chromira Printer: User's Manual
No ratings yet
Zbe Chromira Printer: User's Manual
56 pages
Split Valuation
No ratings yet
Split Valuation
2 pages
Complete Hive Practical
No ratings yet
Complete Hive Practical
8 pages
Hive and Pig
No ratings yet
Hive and Pig
57 pages
BDA Hive
No ratings yet
BDA Hive
22 pages
Hive Commands
No ratings yet
Hive Commands
7 pages
Cse3002 Big Data m2
No ratings yet
Cse3002 Big Data m2
76 pages
Hive PPT
No ratings yet
Hive PPT
25 pages
Mod 2
No ratings yet
Mod 2
70 pages
Hive Main
No ratings yet
Hive Main
33 pages
Hive 1
No ratings yet
Hive 1
3 pages
Hive Commands Cheat Sheet
No ratings yet
Hive Commands Cheat Sheet
2 pages
Get TRDoc
No ratings yet
Get TRDoc
98 pages
HDFSandhivecommands
No ratings yet
HDFSandhivecommands
15 pages
Hive Basics
No ratings yet
Hive Basics
35 pages
Apache HIVE
No ratings yet
Apache HIVE
44 pages
Lab6E - Creating Hive Partition Table
No ratings yet
Lab6E - Creating Hive Partition Table
11 pages
Hive
No ratings yet
Hive
29 pages
Hive
No ratings yet
Hive
42 pages
Partitioning Bucketing and Join
No ratings yet
Partitioning Bucketing and Join
4 pages
Hive Main
No ratings yet
Hive Main
24 pages
Hive
No ratings yet
Hive
15 pages
Hive Table Session
No ratings yet
Hive Table Session
23 pages
AMOS Commands
No ratings yet
AMOS Commands
12 pages
6.1NoSQL ApacheHIVE Witha3
No ratings yet
6.1NoSQL ApacheHIVE Witha3
45 pages
Brochures FX Y
No ratings yet
Brochures FX Y
20 pages
Cheat Sheet: Hive Basics
No ratings yet
Cheat Sheet: Hive Basics
1 page
HIVE
No ratings yet
HIVE
24 pages
TUYA Lock Setup & User Guide
No ratings yet
TUYA Lock Setup & User Guide
5 pages
HIVE Architecture
No ratings yet
HIVE Architecture
5 pages
Session 3.2
No ratings yet
Session 3.2
27 pages
Lab 6 - Hive
No ratings yet
Lab 6 - Hive
4 pages
Partition Concepts
No ratings yet
Partition Concepts
4 pages
BDA011GU04
No ratings yet
BDA011GU04
49 pages
14-Lesson Cloudera Hive
No ratings yet
14-Lesson Cloudera Hive
9 pages
HIVE
No ratings yet
HIVE
80 pages
Hadoop Prac Commands
No ratings yet
Hadoop Prac Commands
16 pages
Practice Questions
No ratings yet
Practice Questions
3 pages
Bigdata@master: 4.set The Environmental Variable HIVE - HOME in Bashrc File
No ratings yet
Bigdata@master: 4.set The Environmental Variable HIVE - HOME in Bashrc File
91 pages
Hive Data Modeling & Commands Guide
No ratings yet
Hive Data Modeling & Commands Guide
6 pages
Hive
No ratings yet
Hive
65 pages
Hive Cammand
No ratings yet
Hive Cammand
22 pages
5-Reducing Project Duration
100% (1)
5-Reducing Project Duration
12 pages
HGS-HSM-SL-21-001 - Improvement of Safety Function For DF Engine
No ratings yet
HGS-HSM-SL-21-001 - Improvement of Safety Function For DF Engine
6 pages
Hive Data Warehousing Overview
No ratings yet
Hive Data Warehousing Overview
61 pages
Hive Crash Course: A Beginner's Guide
No ratings yet
Hive Crash Course: A Beginner's Guide
19 pages
Hive Overview
No ratings yet
Hive Overview
28 pages
HTML Beginner
No ratings yet
HTML Beginner
16 pages
Hive for Data Engineers
No ratings yet
Hive for Data Engineers
18 pages
0300006EN
No ratings yet
0300006EN
66 pages
Hive Tutorial for Data Analysts
No ratings yet
Hive Tutorial for Data Analysts
11 pages
DSCI 5350 - Lecture 5 PDF
No ratings yet
DSCI 5350 - Lecture 5 PDF
64 pages
Hive Queries
No ratings yet
Hive Queries
5 pages
Hive 2
No ratings yet
Hive 2
2 pages
Hive
No ratings yet
Hive
50 pages
Introduction To Using C# For Graphics and Guis: Learning Objectives
No ratings yet
Introduction To Using C# For Graphics and Guis: Learning Objectives
13 pages
Assignment MET1233
No ratings yet
Assignment MET1233
12 pages
B.Tech IT: Embedded Systems Guide
No ratings yet
B.Tech IT: Embedded Systems Guide
103 pages
Hive
No ratings yet
Hive
9 pages
Huawei HG8245H ONT Features & Specs
No ratings yet
Huawei HG8245H ONT Features & Specs
2 pages
Introduction To Embedded Systems - : Lesson 1: Definition, Classification, Skills Required, Application Examples, .
No ratings yet
Introduction To Embedded Systems - : Lesson 1: Definition, Classification, Skills Required, Application Examples, .
15 pages
Excel Basics for Beginners
No ratings yet
Excel Basics for Beginners
6 pages
Hive Notes PDF
No ratings yet
Hive Notes PDF
12 pages
Chapter+9+ HIVE
No ratings yet
Chapter+9+ HIVE
50 pages
Amazon Complaint
No ratings yet
Amazon Complaint
103 pages
Salinan Dari Copy of Genshin Impact Materials Tracker (By Oble)
No ratings yet
Salinan Dari Copy of Genshin Impact Materials Tracker (By Oble)
242 pages
Integradora de Administracion Logistica - Subscription-8!11!2021
No ratings yet
Integradora de Administracion Logistica - Subscription-8!11!2021
48 pages
Software Architecture - Unit 2
No ratings yet
Software Architecture - Unit 2
31 pages
Mechanical Designer/Drafter
No ratings yet
Mechanical Designer/Drafter
3 pages
896600
No ratings yet
896600
3 pages
Oracle SCM Functional Consultant Resume
No ratings yet
Oracle SCM Functional Consultant Resume
3 pages
VINEET SHARMA CV Updated
No ratings yet
VINEET SHARMA CV Updated
3 pages
Slides Interim 2017 CFRG 01 Sessa Secp256k1 00
No ratings yet
Slides Interim 2017 CFRG 01 Sessa Secp256k1 00
7 pages
Python Basics for Beginners
No ratings yet
Python Basics for Beginners
4 pages
Software Engineering Project Guide
No ratings yet
Software Engineering Project Guide
2 pages

Hive Commands

Uploaded by

Hive Commands

Uploaded by

Hive Commands

Steps to start hive:

now start with hive commands:

6. Insert Data into Table

8. Create Table with Custom Settings

9. Load Data into Table

10. Rename Table

11. Add Column to Table

13. Drop Table

Queries on student_data Table

Select CS Students with Marks > 90

Students Not in Math Department

Students Whose Names Start with 'A'

Students in CS or Physics Department

Students with Marks Between 70 and 90

Extra Useful Hive Commands (Added by me!)

Drop Database with All Tables

Create Table as Select (CTAS)

Count Rows in Table

➔ Apache Hive is a data warehouse system built on top of Hadoop.

Hive = SQL for Hadoop.

HBase = NoSQL Database for Hadoop.

load data local inpath 'C:\Users\ASUS\Desktop\HADOOPFILES\

create table part_stu_branch(id INT, name STRING)

set hive.exec.dynamic.partition.mode = nonstrict;

insert overwrite table part_stu_branch partition(branch)

Navigate to Hadoop's sbin directory:

You will see folders like:

These are partition folders.

You tell Hive exactly:

insert overwrite table part_stu_branch partition(branch)

Dynamic partitioning is fully allowed. No need to specify any static partition

create table st_bucket(id INT, name STRING, branch STRING)

 Creates a table st_bucket.

insert overwrite table st_bucket select * from students;

Verifying Bucketing in HDFS

Divides by Column Value Hash of Column Value

Storage Folders Files inside a folder

Best for Filtering data (WHERE branch='CSE') Efficient joins, sampling

Example /branch=CSE/ folder 000000_0, 000001_0 bucket files

Hive data types:

ROW FORMAT DELIMITED

FIELDS TERMINATED BY '\t'

COLLECTION ITEMS TERMINATED BY ',';

LOAD DATA LOCAL INPATH 'D:/temperature.txt' INTO TABLE temperature;

SELECT temp[0] FROM temperature;

temp MAP<INT, INT>

ROW FORMAT DELIMITED

FIELDS TERMINATED BY '\t'

COLLECTION ITEMS TERMINATED BY ','

LOAD DATA LOCAL INPATH 'D:/mapset.txt' INTO TABLE country;

SELECT * FROM country;

SELECT temp[2018] FROM country;

SELECT temp[2018] FROM country WHERE city='jalandhar';

marks STRUCT<subject:STRING, grade:FLOAT>

ROW FORMAT DELIMITED

FIELDS TERMINATED BY '\t'

COLLECTION ITEMS TERMINATED BY ',';

LOAD DATA LOCAL INPATH 'D:/result.txt' INTO TABLE result;

Query struct elements:

SELECT * FROM result;

 Shows entire table.

SELECT marks.grade FROM result;

 Fetches only grade from the struct.

SELECT marks.subject FROM result;

 Fetches only subject from the struct.

To get the total sum of all transactions:

To get total amount spent per account:

CREATE TABLE transactions (

Sample Input (CSV format):

You might also like