Big Data – Fall 24 – Section C
Hive Practice Assignment
A.
Using Sales.csv file create a Hive table with name, Net-ID_sales (e.g. asp13_sales)
create table rcm8445_Sales (customer_id int,transaction_id int,product_category
string,product_name string,quantity int,sales_amount int) ROW FORMAT DELIMITED FIELDS
TERMINATED BY ',' STORED AS TEXTFILE;
List table columns of sales with describe command
Describe rcm8445_sales;
Add a column birth date with appropriate datatype
ALTER TABLE rcm8445_sales ADD COLUMNS (birthdate DATE);
Create a test table, testsales by selecting all records from the sales table.
CREATE TABLE testsales AS SELECT * FROM rcm8445_sales;
insert 5 new records in test table
INSERT INTO testsales VALUES
(1, 1001, 'Electronics', 'Smartphone', 2, 1200, '1990-05-15'),
(2, 1002, 'Clothing', 'T-Shirt', 5, 150, '1988-03-22'),
(3, 1003, 'Groceries', 'Organic Apples', 10, 200, '1995-07-30'),
(4, 1004, 'Electronics', 'Laptop', 1, 1500, '1982-11-02'),
(5, 1005, 'Furniture', 'Office Chair', 3, 300, '1978-09-17');
Query all records from the test table!
SELECT * FROM testsales;
Write three queries with fiters (where clause) and show result of queries.
SELECT * FROM testsales WHERE sales_amount > 500;
SELECT * FROM testsales WHERE quantity >= 3;
SELECT * FROM testsales WHERE birthdate >= '1990-01-01' AND birthdate < '2000-01-01';
Show the list of tables.
Show tables;
Drop the test table.
Drop testsales;
Show the list of tables after dropping test table
show tables;
B.
Use following code to create a Hive table, customers with name, Net-Id_customers
(e.g.
asp13_customers)
CREATE TABLE asp_customers (
customer_id INT,
customer_name STRING,
customer_email STRING,
customer_address STRING
);
CREATE TABLE rcm8445_customers (
customer_id INT,
customer_name STRING,
customer_email STRING,
customer_address STRING
);
INSERT INTO TABLE customers
VALUES
(7001, 'John Doe', '
[email protected]', '123 Main St'),
(7002, 'Alice Smith', '
[email protected]', '456 Elm St'),
(7003, 'Bob Johnson', '
[email protected]', '789 Oak St');
INSERT INTO TABLE rcm8445_customers VALUES (1001, 'John Doe', '[email protected]', '123 Main
St'), (1002, 'Alice Smith', '[email protected]', '456 Elm St'), (1003, 'Bob
Johnson', '[email protected]', '789 Oak St');
Using Sales and Customers tables, write quires with INNER JOIN, LEFT OUTER JOIN,
RIGHT OUTER
JOIN, and FULL OUTER JOIN. Submit SQL queries and screenshot of their results.
SELECT s.customer_id, s.transaction_id, s.product_name, c.customer_name
FROM rcm8445_sales s
INNER JOIN rcm8445_customers c ON s.customer_id = c.customer_id;
SELECT s.customer_id, s.transaction_id, s.product_name, c.customer_name
FROM rcm8445_sales s
LEFT OUTER JOIN rcm8445_customers c ON s.customer_id = c.customer_id;
SELECT s.customer_id, s.transaction_id, s.product_name, c.customer_name
FROM rcm8445_sales s
RIGHT OUTER JOIN rcm8445_customers c ON s.customer_id = c.customer_id;
SELECT s.customer_id, s.transaction_id, s.product_name, c.customer_name
FROM rcm8445_sales s
FULL OUTER JOIN rcm8445_customers c ON s.customer_id = c.customer_id;
C.
C) Using Zipcodes.csv file, create Hive table Net-ID_zipcodes (e.g. asp13_zipcodes).
This table should
have partitions by state and with 3 buckets by zipcode.
Provide screenshot of
i) hdfs direcotry and subdirectories of patitions, also show files under
partition state='AL'
ii) results of following commands
SHOW PARTITIONS asp13_zipcodes;
DESCRIBE FORMATTED asp_zipcodes PARTITION(state='AL');
SHOW TABLE EXTENDED LIKE asp_zipcodes PARTITION(state='AL');