Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
2 views10 pages

Hive Assignment

The document outlines a Hive practice assignment for Fall 2024, detailing the creation and manipulation of Hive tables for sales and customers data. It includes SQL commands for creating tables, altering structures, inserting records, and performing various types of joins. Additionally, it specifies tasks related to partitioning a table based on zip codes and requires screenshots of directory structures and command results.

Uploaded by

beezosjeffery
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views10 pages

Hive Assignment

The document outlines a Hive practice assignment for Fall 2024, detailing the creation and manipulation of Hive tables for sales and customers data. It includes SQL commands for creating tables, altering structures, inserting records, and performing various types of joins. Additionally, it specifies tasks related to partitioning a table based on zip codes and requires screenshots of directory structures and command results.

Uploaded by

beezosjeffery
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

Big Data – Fall 24 – Section C

Hive Practice Assignment

A.

Using Sales.csv file create a Hive table with name, Net-ID_sales (e.g. asp13_sales)

create table rcm8445_Sales (customer_id int,transaction_id int,product_category


string,product_name string,quantity int,sales_amount int) ROW FORMAT DELIMITED FIELDS
TERMINATED BY ',' STORED AS TEXTFILE;

List table columns of sales with describe command


Describe rcm8445_sales;

Add a column birth date with appropriate datatype


ALTER TABLE rcm8445_sales ADD COLUMNS (birthdate DATE);

Create a test table, testsales by selecting all records from the sales table.

CREATE TABLE testsales AS SELECT * FROM rcm8445_sales;

insert 5 new records in test table

INSERT INTO testsales VALUES

(1, 1001, 'Electronics', 'Smartphone', 2, 1200, '1990-05-15'),

(2, 1002, 'Clothing', 'T-Shirt', 5, 150, '1988-03-22'),

(3, 1003, 'Groceries', 'Organic Apples', 10, 200, '1995-07-30'),

(4, 1004, 'Electronics', 'Laptop', 1, 1500, '1982-11-02'),

(5, 1005, 'Furniture', 'Office Chair', 3, 300, '1978-09-17');


Query all records from the test table!
SELECT * FROM testsales;
Write three queries with fiters (where clause) and show result of queries.
SELECT * FROM testsales WHERE sales_amount > 500;
SELECT * FROM testsales WHERE quantity >= 3;
SELECT * FROM testsales WHERE birthdate >= '1990-01-01' AND birthdate < '2000-01-01';
Show the list of tables.
Show tables;

Drop the test table.


Drop testsales;

Show the list of tables after dropping test table


show tables;
B.
Use following code to create a Hive table, customers with name, Net-Id_customers
(e.g.
asp13_customers)
CREATE TABLE asp_customers (
customer_id INT,
customer_name STRING,
customer_email STRING,
customer_address STRING
);

CREATE TABLE rcm8445_customers (

customer_id INT,

customer_name STRING,

customer_email STRING,

customer_address STRING

);

INSERT INTO TABLE customers


VALUES
(7001, 'John Doe', '[email protected]', '123 Main St'),
(7002, 'Alice Smith', '[email protected]', '456 Elm St'),
(7003, 'Bob Johnson', '[email protected]', '789 Oak St');

INSERT INTO TABLE rcm8445_customers VALUES (1001, 'John Doe', '[email protected]', '123 Main
St'), (1002, 'Alice Smith', '[email protected]', '456 Elm St'), (1003, 'Bob
Johnson', '[email protected]', '789 Oak St');
Using Sales and Customers tables, write quires with INNER JOIN, LEFT OUTER JOIN,
RIGHT OUTER
JOIN, and FULL OUTER JOIN. Submit SQL queries and screenshot of their results.

SELECT s.customer_id, s.transaction_id, s.product_name, c.customer_name

FROM rcm8445_sales s

INNER JOIN rcm8445_customers c ON s.customer_id = c.customer_id;

SELECT s.customer_id, s.transaction_id, s.product_name, c.customer_name

FROM rcm8445_sales s

LEFT OUTER JOIN rcm8445_customers c ON s.customer_id = c.customer_id;


SELECT s.customer_id, s.transaction_id, s.product_name, c.customer_name

FROM rcm8445_sales s

RIGHT OUTER JOIN rcm8445_customers c ON s.customer_id = c.customer_id;

SELECT s.customer_id, s.transaction_id, s.product_name, c.customer_name

FROM rcm8445_sales s

FULL OUTER JOIN rcm8445_customers c ON s.customer_id = c.customer_id;


C.

C) Using Zipcodes.csv file, create Hive table Net-ID_zipcodes (e.g. asp13_zipcodes).


This table should
have partitions by state and with 3 buckets by zipcode.
Provide screenshot of
i) hdfs direcotry and subdirectories of patitions, also show files under
partition state='AL'

ii) results of following commands


SHOW PARTITIONS asp13_zipcodes;
DESCRIBE FORMATTED asp_zipcodes PARTITION(state='AL');
SHOW TABLE EXTENDED LIKE asp_zipcodes PARTITION(state='AL');

You might also like