MINI PROJECT ON BIGDATA
Contents
Prerequisites before Initiate HDFS file operation.
$start-dfs.sh (To Start all Daemons)
Jps (To check for all Daemons working)
1)Basic HDFS File Operation.
put Command (File import from Local file system to Hdfs)
get Command (File import from Hdfs to Local file system)
cp Command (Copy file from one directory to other Hdfs
directory)
mv command (move file from one directory to destination
Hdfs directory )
2)Sqoop Commands
Sqoop import command.
Sqoop import with Where clause command.
Sqoop export command
Sqoop Incremental append.
3) Hive Commands
Internal/Managed table Creation in Hive
External table Creation in Hive
Loading data from Local file system to Hive
Static partitioning in Hive
Dynamic partitioning in Hive
Bucketing in Hive
HDFS File Operation:-
put Command
Loading file from Local file system to specific directory in HDFS
get Command
-getcommand is used to copy data from hadoop system to local file
system, it will copy the data from hdfs stored directories to local file
system we can do the same by using copyToLocal command .
cp Command
It will copy file from one directory of HDFS file system to destination
directory in HDFS file system itself.
mv command
By using move command the File1.txt in wep directory will be moved to the new directory
/user/sumit .
Sqoop Command
Sqoop import command.
RDBMS-HDFS
The above Sqoop command will copy the file from local database which is in
localhost and in table student will be copied into Sqoop data directories in hdfs,
here first we create connections with jdbc and mysql and then data will be copied
to hdfs part file in directories.
Sqoop import with Where clause command.
Sqoop export command
HDFS-RDBMS
Sqoop Incremental append.
The command is used to load data from local database to hdfs in incremental
manner that means by looking at the last value of check column data will be
loaded to hdfs , the data after the specified value of column will be loaded
Hive Commads:
Internal table creation
External table creation
By creating the external table it helps in the way as if we will drop the
external table then the table will be deleted (Metadata) but the data
associated with the table (Actual data) will remain there in the
warehouse directories of hive .
Loading data from Local file system to Hive
STEPS TO DO PARTITIONING
Data can be stored either two ways one with
Internal/managed table or External table
Step -1: Create non partition table( Internal/External)
Step-2: Loading data into Created table
Step-3: Create Partition table
Step -4: For Dynamic Partioning Set Property
For Static its not needed
Step-5 : Loading data into Partition table
Static partitioning in Hive
static partitioning, where you explicitly specify partition column and that
column corresponding directory will be created in hive/warehouse
directory.
Dynamic partitioning in Hive
Unlike static partitioning, where you explicitly specify partition values,
dynamic partitioning lets Hive determine these values automatically based
on the data itself. And separate directory will be created implicitly in
hive/warehouse directory
Bucketing in Hive
Bucketing is based on the hashing technique.
For a given column value, calculate the modulo of that value with the
number of required buckets (let’s say, F(x) % 3).
Based on the resulting value, store the data into the corresponding bucket.
Data is distributed evenly between corresponding buckets.