0% found this document useful (0 votes)

34 views11 pages

Week 3

The document discusses using Sqoop to import and export data between databases and HDFS. Some key points covered include: - Sqoop Import is used to transfer data from a database to HDFS, while Sqoop Export moves data from HDFS to a database. - Imports use MapReduce jobs with mappers that divide the work based on the table's primary key by default. - Imports and exports can be customized through options like compression, column selection, and partitioning. - Staging tables are used during exports to avoid partial data transfers if a job fails. - Incremental imports allow importing only new or updated records over time rather than reprocessing the full table each time

Uploaded by

pali.rajtrader

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views11 pages

Week 3

Uploaded by

pali.rajtrader

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 11

# SQOOP IMPORT EXECISE

=======================

SESSION - 1
============

Sqoop Import - Databases to HDFS (frequently)

Sqoop Export - HDFS to Databases

Sqoop Eval - to run queries on the database

sqoop-list-databases \
--connect "jdbc:mysql://quickstart.cloudera:3306" \
--username retail_dba \
--password cloudera

sqoop-list-tables \
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
--username retail_dba \
--password cloudera

sqoop-eval \
--connect "jdbc:mysql://quickstart.cloudera:3306" \
--username retail_dba \
--password cloudera \
--query "select * from retail_db.customers limit 10"

SESSION - 2
============

INSERT INTO people values (101,'Raj','Pali','Itwara chowk','Yavatmal)

Sqoop import
=============

(transfer data from your relation db to HDFS)

Mapreduce job

only mappers work and no reducer.

by default there are 4 mappers which do the work.

yes we can change the number of mappers.

these mappers divide the work based on primary key.

if there is no primary key then what will happen?

1. you change the number of mappers to 1.

2. split by column

sqoop-eval \
--connect "jdbc:mysql://10.0.2.15:3306" \
--username retail_dba \
--password cloudera \
--query "describe retail_db.orders"

sqoop import \
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
--username root \sqoop
--password cloudera \
--table orders \
--target-dir /queryresult

sqoop import \
--connect "jdbc:mysql://quickstart.cloudera:3306/trendytech" \
--username root \
--password cloudera \
--table people \
--target-dir peopleresult

sqoop import \
--connect "jdbc:mysql://quickstart.cloudera:3306/trendytech" \
--username root \
--password cloudera \
--table people \ {people table don't content the P.K therefore
setting the mapper 1}
-m 1 \ {if you dont set mapper 1 then it will give an
error}
--target-dir peopleresult

sqoop import \
--connect "jdbc:mysql://quickstart.cloudera:3306/trendytech" \
--username root \
--password cloudera \
--table people \
-m 1 \
--warehouse-dir peopleresult1

Now my path will this = peopleresult1/people

Target dir vs. Warehouse dir

=============================
employee table that you are importing from mysql

In case of target directory the directory path mentioned is

the final path where data is copied.
/data

In case of warehouse directory, the system will create a

subdirectory with the table name.
/data/employee

sqoop-import-all-tables \
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
--username retail_dba \
--password cloudera \
--as-sequencefile \
-m 4 \
--warehouse-dir /user/cloudera/sqoopdir

SESSION - 3
============
sqoop-list-databases \
--connect "jdbc:mysql://quickstart.cloudera:3306" \
--username retail_dba \
--password cloudera

sqoop-list-databases \
--connect "jdbc:mysql://quickstart.cloudera:3306" \
--username retail_dba \
-P {console me aapka Password show nahi hoga!}

How to Redirect the logs for later use ?

----------------------------------------

sqoop import \
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
--username root \
--password cloudera \
--table orders \
--warehouse-dir /queryresult4 1>query.out 2>query.err

Mostly it will contain output content : 1>query.out (in case of eval command)
And all other log , errors will be here : 2>query.err (you can set any name for
file & this file will be stored in cwd from where command is run)

Boundary query
===============

sqoop import the work is divided among the mappers based on the
primary key.

Employee table
===============
empId, empname, age, salary (empId is the primary key)
0
1
2
3
4
5
6
.
.
100000

the mappers by default will be 4.

find -- how the mapper will distribute the work on the basis of P.K.?

the max of primary key

min of primary key

split size = (max_of_pk - min_of_pk)/Num_Mappers

(100000 - 0)/4
100000/4 = 25000

split size = 25000

mapper1 0 - 25000
mapper2 25001 - 50000
mapper3 50001 - 75000
mapper4 75001 - 100000

SESSION - 4
============

sqoop-import \
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
--username retail_dba \
--password cloudera \
--table orders \
--compress \
--warehouse-dir /user/cloudera/compressresult

sqoop-import \
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
--username retail_dba \
--password cloudera \
--table orders \
--compression-codec BZip2Codec \
--warehouse-dir /user/cloudera/bzipcompresult

sqoop-import \
--connect jdbc:mysql://quickstart.cloudera:3306/retail_db \
--username retail_dba \
--password cloudera \
--table orders \
--cloumns order_id,order_customer_id,order_status \
--where "order_status in ('complete','closed')" \ {Where clause converted as
BoundaryValsQuery}
--warehouse-dir /user/cloudera/customimportresult

sqoop-import \
--connect jdbc:mysql://quickstart.cloudera:3306/retail_db \
--username retail_dba \
--password cloudera \
--table orders \
--boundary-query "SELECT 1, 68883" {Here we are hardcoding the min & max for
BVQ due to outlier}
--warehouse-dir /user/cloudera/ordersboundval

SESSION - 5
============

sqoop-import
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db \
--username retail_dba \
--password cloudera \
--table orders \
--columns order_id,order_customer_id,order_status \
--where "order_status in ('processing')" \ {Where clause internally add
to boundary query, no matter what}
--warehouse-dir /user/cloudera/whereclauseresult

sqoop-import \
--connect jdbc:mysql://quickstart.cloudera:3306/retail_db \
--username retail_dba \
--password cloudera \
--table order_no_pk \ {It will fail because in this table no PK and therefore
mapper doesn't know how to divide the work among themselves}
--warehouse-dir /ordersnopk

sqoop-import \
--connect jdbc:mysql://quickstart.cloudera:3306/retail_db \
--username retail_dba \
--password cloudera \
--table order_no_pk \
--split-by order_id \
--target-dir /ordersnopk

sqoop import-all-tables \
--connect jdbc:mysql://quickstart.cloudera:3306/retail_db \
--username retail_dba \
--password cloudera \
--warehouse-dir /user/cloudera/autoreset1mresult \
--autoreset-to-one-mapper \ {uses one mappper if a table with no P.K. is
encountered}
--num-mappers 2

{Agar apne pass 100 tables hai or usme se 98 tables me P.K. hai and remaining 2
tables me P.K. nahi hai toh
jab table me P.K. hai toh 2 mapper work karege, and jissme P.K nahi hai ussme by
default mapper 1 ho jayega!}

SESSION - 6
============

sqoop create-hive-table \ {Creating the empty table in hive based on metadata in

mysql}
--connect jdbc:mysql://quickstart.cloudera:3306/retail_db \
--username retail_dba \
--password cloudera \
--table orders \ {By default hive me table ka name same hota hai from
source table name but we can change it}
--hive-table emps \ {The name of the table should be emps in hive which
content the metadata of order table present in mysql}
--fields-terminated-by ','

# SQOOP EXPORT EXECISE

=======================

SESSION - 1
============

SQOOP EXPORT

IS USED TO TRANSFER DATA FROM HDFS TO RDBMS.

CREATE TABLE card_transactions (

transaction_id INT,
card_id BIGINT,
member_id BIGINT,
amount INT,
postcode INT,
pos_id BIGINT,
transaction_dt varchar(255),
status varchar(255),
PRIMARY KEY(transaction_id)
);

WE HAVE CARD_TRANS.CSV ON THE DESKTOP LOCALLY IN CLOUDERA.

WE SHOULD BE MOVING THIS FILE FROM LOCAL TO HDFS

hadoop fs -mkdir /data

hadoop fs -put Desktop/card_trans.csv /data

sqoop export \

--connect jdbc:mysql://quickstart.cloudera:3306/banking \
--username root \
--password cloudera \
--table card_transactions \
--export-dir /data/card_trans.csv \
--fields-terminated-by ","

2 IMPORTANT THINGS:

1. why the job failed ? {check your Job tracking url}

2. if a job fails how to make sure that target table is not

impacted.{thats means nothing should be transfered if job fail
i.e. it should not be a partial}

Caused by:
com.mysql.jdbc.exceptions.jdbc4.MySQLIntergrityConstraintViolationException:
Duplicate entry '345925144288000-10-10-2017 18:02:40' for key 'PRIMARY'

>>Concept : Staging table comes into play for avoid partial transfer of data :

>>1st creating the same schema table with stage name attach in mysql database,

CREATE TABLE card_transactions_stage (

card_id BIGINT,
member_id BIGINT,
amount INT(10),
postcode INT(10),
pos_id BIGINT,
transaction_dt varchar(255),
status varchar(255),
PRIMARY KEY (card_id, transaction_dt)
);

>>Now, Running the export command with --staging-table <table name>

sqoop export \
--connect jdbc:mysql://quickstart.cloudera:3306/banking \
--username root \
--password cloudera \
--table card_transactions \
--staging-table card_transactions_stage \
--export-dir /data/card_transactions.csv \
--field-terminated-by ','

>>If partial record transfered then the partial record will kept in stage table
will not
transfer to the main table;
>>If data is successfully transfered to staging table then stage table in MySql
will Migrate the data
to the main table and stage table will become empty. Because data has been
migrated.

SESSION - 8
============

sqoop export \
--connect jdbc:mysql://quickstart.cloudera:3306/banking \
--username root \
--password cloudera \
--table card_transactions \
--staging-table card_transactions_stage \
--export-dir /user/cloudera/data/card_transactions_new.csv \
--fields-terminated-by ','

SESSION - 9
============

Incremental Import

orders table in mysql

50000 records are there.

order_id is the primary key.

100 new orders are coming tomorrow in orders table.

again, sqoop import.

you already have done the import of 50000 records using

sqoop import.

in such a case you should go with incremental import

2 choices
==========

1.append mode - append mode is used when there are no updates

in data, and there are just new inserts.

2.lastmodified mode - when we need to capture the updates also.

so in this case we will be using a date on basis of which we will
try to fetch the data.

sqoop-import \
--connect jdbc:mysql://quickstart.cloudera:3306/retail_db \
--username root \
--password cloudera \
--table orders \
--warehouse-dir /data \
--incremental append \
--check-column order_id \
--last-value 0 {saying that if order_id is >0 then please import the records}

insert into orders values(68884,'2014-07-23 00:00:00',5522,'COMPLETE');

insert into orders values(68885,'2014-07-23 00:00:00',5522,'COMPLETE');
insert into orders values(68886,'2014-07-23 00:00:00',5522,'COMPLETE');
insert into orders values(68887,'2014-07-23 00:00:00',5522,'COMPLETE');
insert into orders values(68888,'2014-07-23 00:00:00',5522,'COMPLETE');
insert into orders values(68889,'2014-07-23 00:00:00',5522,'COMPLETE');

>>commit

SESSION - 10
=============

incremental import using append mode - only inserts, no updates.

incremental import using lastmodified mode - when there are updates

as well.

sqoop import
--connect jdbc:mysql://quickstart.cloudera:3306/retail_db
--username root
--password cloudera
--table orders
--warehouse-dir /user/cloudera/data
--incremental lastmodified
--check-column order_date {Issme ham TimeStamp (Date) wala column specify karte
hai}
--last-value 0 {Basically I should give here date but in the first load I want to
consider everything}
--append

>> '2023-02-07 22:35:59' { Now next time I have run this then I have to replace 0
with this number thatswhy I am taking this }

insert into orders values(68890,current_timestamp,5523,'COMPLETE');

insert into orders values(68891,current_timestamp,5523,'COMPLETE');
insert into orders values(68892,current_timestamp,5523,'COMPLETE');
insert into orders values(68893,current_timestamp,5523,'COMPLETE');
insert into orders values(68894,current_timestamp,5523,'COMPLETE');

update orders set order_status='COMPLETE',order_date = current_timestamp WHERE

ORDER_ID = 68862;
commit;

sqoop-import \
--connect jdbc:mysql://quickstart.cloudera:3306/retail_db \
--username root \
--password cloudera \
--table orders \
--warehouse-dir /user/cloudera/data \
--incremental lastmodified \
--check-column order_date \
--last-value '2023-02-07 22:35:59' \ {Bass aye date ko save karna padta hai for
next import}
--append {hamne jab ek baar import kiya and again hum incremental import karte hai
over the same output dir then
we want to choose either append or merge-key on the base of the
requirement}

if a record is updated in your table and then we use incremental

import with last modified. then we will get the updated record
also

5000 oldtimestamp in hdfs

5000 newtimestamp in hdfs {It means in your hdfs you will have 2 records with
oldtimestamp & newtimestamp
because we are using --append parameter}

you want that hdfs file should be always in sync with the table.
{e.g. If you have 1000 records in your table of MySql DB then there should be 1000
records in your hdfs}
{i.e. hame bass new updated records chahiye old wale records nahi chahiye}

{ thats means 5000 is Primary key and having 2 records then it should
consider the record with the latest timeStamp in hdfs thats means
there will not be any duplicate entry in HDFS ,So for that we use --merge-key }

sqoop-import \
--connect jdbc:mysql://quickstart.cloudera:3306/retail_db \
--username root \
--password cloudera \
--table orders \
--warehouse-dir /user/cloudera/data \
--incremental lastmodified \
--check-column order_date \
--last-value '2023-02-07 22:35:59' \
--merge-key order_id {if i am using merge-key as against append then it will make
sure that against each P.K.
or each key order_id will have only one record and with
the latest TimeStamp will be considered
in hdfs}

{After using the above import command it will bring the new records which are added
& old records which are updated in the table and
after receiving that records then it will start process of merging the duplicate
records on the basis of --merging-key parameter,
and After it get merged then in will produce ony 1 file in the output dir with
Part-r file because merging is reducing activity }

2 modes
========
1. append - we talk only about new inserts

--incremental append
--check-column order_id
--last-value 0 {Any order_id greater than 0 should be import}

2. lastmodified - when we have updates as well

--incremental lastmodified
--check-column order_date {It should be some date column}
--last-value previousdate {So, this is a date after which all records entered
should be imported}

>>Aaapko append or merge dono me se ek parameter dena hi padega after 1st

incremental import otherwise will show err that output dir exist :

--append (will create the duplicacy if old records and old updated records in hdfs}

--merge-key order_id (will merge the duplicacy with the help of reducing activity
on the basis of P.K)
{And we usually replace the new records over the old records
on the basis of TimeStamp}

SESSION - 11
=============

incremental import

In this session we will talk about

1. sqoop job

2. password management.

sqoop job \
--create job_orders \ {job name should be unique}
-- import \ {yaha 2 times hypen ke baadme ek space honi chahiye}
--connect jdbc:mysql://quickstart.cloudera:3306\retail_db \
--username root \
--password cloudera \
--table orders \
--warehouse-dir /user/cloudera/data \
--incremental append \
--check-column order_id \
--last-value 0

sqoop job --list : This command will show us all the created sqoop jobs.

sqoop job --exec job_orders

sqoop job --show job_orders : To see all the parameter saved or stored.

sqoop job --delete job_orders : Deleting a sqoop job

echo -n "cloudera" >> .password.file , it's is created in local cloudera

sqoop job \
--create job_orders \
-- import \
--connect jdbc:mysql://quickstart.cloudera:3306/retail_db \
--username root \
--password-file file:///home/cloudera/.password.file \
--table orders \
--warehouse-dir /user/cloudera/data \
--incremental append \
--check-column order_id \
--last-value 0

We are expecting the above command will be fully automatic.

We have successfully created job.

sqoop job --exec job_orders

Madhubabu - Shivangi - PDF
No ratings yet
Madhubabu - Shivangi - PDF
228 pages
Sqoop 1
No ratings yet
Sqoop 1
29 pages
ABAP Data Dictionary Interview Questions
No ratings yet
ABAP Data Dictionary Interview Questions
9 pages
Oracle Integrated Cloud Service
No ratings yet
Oracle Integrated Cloud Service
39 pages
Hadoop Exam
No ratings yet
Hadoop Exam
67 pages
Sqoop Demo
No ratings yet
Sqoop Demo
7 pages
Insurance Management System
No ratings yet
Insurance Management System
38 pages
How Sqoop Works?: Relationaldatabase Servers in The Relational Database Structure
No ratings yet
How Sqoop Works?: Relationaldatabase Servers in The Relational Database Structure
7 pages
Lab Experiments 1,2&4
No ratings yet
Lab Experiments 1,2&4
8 pages
Sqoop Import Techniques Guide
No ratings yet
Sqoop Import Techniques Guide
18 pages
Sqoop Data Transfer Guide
No ratings yet
Sqoop Data Transfer Guide
9 pages
Uvm Ral
No ratings yet
Uvm Ral
33 pages
04 Sqoop
No ratings yet
04 Sqoop
30 pages
Sqoop Commands
No ratings yet
Sqoop Commands
4 pages
Sqoop Cammand
No ratings yet
Sqoop Cammand
8 pages
Sqoop Commands for Data Engineers
No ratings yet
Sqoop Commands for Data Engineers
2 pages
Sqoop Practice
No ratings yet
Sqoop Practice
5 pages
6.moving Data Into Hadoop
No ratings yet
6.moving Data Into Hadoop
18 pages
Big Data Practice
No ratings yet
Big Data Practice
93 pages
Bda U3
No ratings yet
Bda U3
59 pages
Sqoop
No ratings yet
Sqoop
5 pages
Sqoop
No ratings yet
Sqoop
4 pages
Class 4
No ratings yet
Class 4
3 pages
Sqoop
No ratings yet
Sqoop
15 pages
5 - Big - Data Vivek
No ratings yet
5 - Big - Data Vivek
4 pages
Sqoop MySQL to HDFS Data Transfer Guide
No ratings yet
Sqoop MySQL to HDFS Data Transfer Guide
7 pages
Knowledge About Apache Sqoop and Its All Basic Commands To Import and Export The Data
No ratings yet
Knowledge About Apache Sqoop and Its All Basic Commands To Import and Export The Data
7 pages
Session9 DataIngestion SQOOP
No ratings yet
Session9 DataIngestion SQOOP
4 pages
Data Ingest
No ratings yet
Data Ingest
5 pages
Sqoop Data Transfer Guide
No ratings yet
Sqoop Data Transfer Guide
18 pages
Sqoop User Guide
No ratings yet
Sqoop User Guide
90 pages
Hadoop Data Transfer with Sqoop
No ratings yet
Hadoop Data Transfer with Sqoop
21 pages
Revision Solution
No ratings yet
Revision Solution
5 pages
Sqoop Practice
No ratings yet
Sqoop Practice
2 pages
Tasks Mar 22 2020
No ratings yet
Tasks Mar 22 2020
2 pages
Loadeer Lab
No ratings yet
Loadeer Lab
3 pages
Sqoop v1.1
No ratings yet
Sqoop v1.1
18 pages
BC Ca1,2
No ratings yet
BC Ca1,2
31 pages
This Documents Are About Apache Sqoop
No ratings yet
This Documents Are About Apache Sqoop
23 pages
Sqoop
No ratings yet
Sqoop
3 pages
Scoop Intro
No ratings yet
Scoop Intro
9 pages
Apache Sqoop Data Transfer Between Hadoop and RDBMS
No ratings yet
Apache Sqoop Data Transfer Between Hadoop and RDBMS
9 pages
M - M - Num-Mappers
No ratings yet
M - M - Num-Mappers
4 pages
Sqoop: Interface for RDBMS & Hadoop
No ratings yet
Sqoop: Interface for RDBMS & Hadoop
39 pages
Module IV
No ratings yet
Module IV
5 pages
Creating A Table in RDBMS 3 2. Importing RDBMS Data Into H DFS 3 Exporting HDFS Data To RDBMS .. 6
No ratings yet
Creating A Table in RDBMS 3 2. Importing RDBMS Data Into H DFS 3 Exporting HDFS Data To RDBMS .. 6
5 pages
MySQL to Hive Data Import Guide
No ratings yet
MySQL to Hive Data Import Guide
14 pages
System Call Parameter Passing
100% (1)
System Call Parameter Passing
14 pages
B22 BDA Experiment 03
No ratings yet
B22 BDA Experiment 03
11 pages
Sqoop Incremental Import PP 200913 222451 Unlocked
No ratings yet
Sqoop Incremental Import PP 200913 222451 Unlocked
27 pages
Sqoop Commands for MySQL Import
No ratings yet
Sqoop Commands for MySQL Import
12 pages
HOL Hive
No ratings yet
HOL Hive
85 pages
Big Data: Sqoop
No ratings yet
Big Data: Sqoop
43 pages
Production Issues: in Beginning Almost Every Time!
No ratings yet
Production Issues: in Beginning Almost Every Time!
8 pages
CRUD (Create, Read, Update, Delete) Application Using Fire Store (Firebase)
No ratings yet
CRUD (Create, Read, Update, Delete) Application Using Fire Store (Firebase)
8 pages
10th Computer Science Half and Full Book
No ratings yet
10th Computer Science Half and Full Book
6 pages
Cloudera: CCA175 Exam
No ratings yet
Cloudera: CCA175 Exam
11 pages
Sqoop Export Guide for Big Data Learners
No ratings yet
Sqoop Export Guide for Big Data Learners
12 pages
Sqoop Import & Export Guide
No ratings yet
Sqoop Import & Export Guide
9 pages
BDA 02 - Sqoop Installation
No ratings yet
BDA 02 - Sqoop Installation
13 pages
How Sqoop Works?: Sqoop "SQL To Hadoop and Hadoop To SQL"
No ratings yet
How Sqoop Works?: Sqoop "SQL To Hadoop and Hadoop To SQL"
27 pages
Sqoop Implementation Revised
No ratings yet
Sqoop Implementation Revised
7 pages
BD Sqltohadoop3 PDF
No ratings yet
BD Sqltohadoop3 PDF
13 pages
Structures Functions
No ratings yet
Structures Functions
13 pages
Section Tools Primer
No ratings yet
Section Tools Primer
18 pages
Hive Commands Simplin
No ratings yet
Hive Commands Simplin
5 pages
Sqoop - A Haddop Technology: Srikalahasti
No ratings yet
Sqoop - A Haddop Technology: Srikalahasti
13 pages
Analysis of Loops - Complexity
No ratings yet
Analysis of Loops - Complexity
6 pages
Cloud & DevOps Training Guide
No ratings yet
Cloud & DevOps Training Guide
4 pages
ST Seminar Topics
No ratings yet
ST Seminar Topics
2 pages
Python Skype
No ratings yet
Python Skype
5 pages
Final Exam 2081 C Programming
No ratings yet
Final Exam 2081 C Programming
2 pages
Jalur Belajar Web Development 2022 Untuk Pemula
No ratings yet
Jalur Belajar Web Development 2022 Untuk Pemula
1 page
Mpi Openmp Examples
No ratings yet
Mpi Openmp Examples
27 pages
Error Logs: Empty Participant Update
No ratings yet
Error Logs: Empty Participant Update
1,136 pages
IP Ass
No ratings yet
IP Ass
5 pages
Practical For Class XII A 2023-2024
No ratings yet
Practical For Class XII A 2023-2024
2 pages
Abhi Resume
No ratings yet
Abhi Resume
4 pages
Overall Report of Class 11 CS Demo
No ratings yet
Overall Report of Class 11 CS Demo
6 pages
Fix Cloned Arduino NANO CNC Shield - 10 Steps - Instructables
No ratings yet
Fix Cloned Arduino NANO CNC Shield - 10 Steps - Instructables
11 pages
OOP & Visual Basic Essentials
No ratings yet
OOP & Visual Basic Essentials
4 pages
Yash PPT Presentation-2
No ratings yet
Yash PPT Presentation-2
14 pages
Chapter n3 Sqoop
No ratings yet
Chapter n3 Sqoop
24 pages
(Autonomous) : List Any Two Logical Operator: 2 Marks List Any Two Bitwise Operator: 2 Marks
No ratings yet
(Autonomous) : List Any Two Logical Operator: 2 Marks List Any Two Bitwise Operator: 2 Marks
29 pages
Python - Adv - 2 - Jupyter Notebook (Student)
No ratings yet
Python - Adv - 2 - Jupyter Notebook (Student)
28 pages
Syllabus:: Unit-3: Classes, Inheritance, Exceptions, Packages and Interfaces
No ratings yet
Syllabus:: Unit-3: Classes, Inheritance, Exceptions, Packages and Interfaces
56 pages
Gathering of Gray Presents: An Introduction To Programming For Hackers Part VI - Pointers, Data Structures and Dynamic Memory by Lovepump, 2004 Visit
No ratings yet
Gathering of Gray Presents: An Introduction To Programming For Hackers Part VI - Pointers, Data Structures and Dynamic Memory by Lovepump, 2004 Visit
13 pages
.NET Framework Deep Dive
No ratings yet
.NET Framework Deep Dive
71 pages

Week 3

Uploaded by

Week 3

Uploaded by

# SQOOP IMPORT EXECISE

Sqoop Import - Databases to HDFS (frequently)

Sqoop Export - HDFS to Databases

Sqoop Eval - to run queries on the database

INSERT INTO people values (101,'Raj','Pali','Itwara chowk','Yavatmal)

(transfer data from your relation db to HDFS)

only mappers work and no reducer.

by default there are 4 mappers which do the work.

yes we can change the number of mappers.

these mappers divide the work based on primary key.

if there is no primary key then what will happen?

1. you change the number of mappers to 1.

Now my path will this = peopleresult1/people

Target dir vs. Warehouse dir

In case of target directory the directory path mentioned is

In case of warehouse directory, the system will create a

How to Redirect the logs for later use ?

the mappers by default will be 4.

the max of primary key

split size = (max_of_pk - min_of_pk)/Num_Mappers

split size = 25000

sqoop create-hive-table \ {Creating the empty table in hive based on metadata in

# SQOOP EXPORT EXECISE

IS USED TO TRANSFER DATA FROM HDFS TO RDBMS.

CREATE TABLE card_transactions (

WE HAVE CARD_TRANS.CSV ON THE DESKTOP LOCALLY IN CLOUDERA.

WE SHOULD BE MOVING THIS FILE FROM LOCAL TO HDFS

hadoop fs -mkdir /data

hadoop fs -put Desktop/card_trans.csv /data

1. why the job failed ? {check your Job tracking url}

2. if a job fails how to make sure that target table is not

CREATE TABLE card_transactions_stage (

>>Now, Running the export command with --staging-table <table name>

orders table in mysql

50000 records are there.

order_id is the primary key.

100 new orders are coming tomorrow in orders table.

again, sqoop import.

you already have done the import of 50000 records using

in such a case you should go with incremental import

1.append mode - append mode is used when there are no updates

2.lastmodified mode - when we need to capture the updates also.

insert into orders values(68884,'2014-07-23 00:00:00',5522,'COMPLETE');

incremental import using append mode - only inserts, no updates.

incremental import using lastmodified mode - when there are updates

insert into orders values(68890,current_timestamp,5523,'COMPLETE');

update orders set order_status='COMPLETE',order_date = current_timestamp WHERE

if a record is updated in your table and then we use incremental

5000 oldtimestamp in hdfs

2. lastmodified - when we have updates as well

>>Aaapko append or merge dono me se ek parameter dena hi padega after 1st

In this session we will talk about

sqoop job --exec job_orders

sqoop job --delete job_orders : Deleting a sqoop job

echo -n "cloudera" >> .password.file , it's is created in local cloudera

We are expecting the above command will be fully automatic.

We have successfully created job.

sqoop job --exec job_orders

You might also like