Hadoop Introduction

Hadoop is an Apache open-source framework for distributed processing of large datasets using Java, designed to scale from a single server to thousands of machines. It consists of two main components: MapReduce for processing and the Hadoop Distributed File System (HDFS) for storage, both optimized for fault tolerance and high throughput. Additionally, Hadoop includes modules like Hadoop Common and YARN for job scheduling and resource management, offering advantages such as ease of use, dynamic scalability, and cross-platform compatibility.

Uploaded by

sujana s

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views2 pages

Hadoop Introduction

Uploaded by

sujana s

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 2

Hadoop is an Apache open source framework written in java that allows

distributed processing of large datasets across clusters of computers using

simple programming models. The Hadoop framework application works in an
environment that provides distributed storage and computation across clusters
of computers. Hadoop is designed to scale up from single server to thousands of
machines, each offering local computation and storage.

Hadoop Architecture

At its core, Hadoop has two major layers namely −

 Processing/Computation layer (MapReduce), and

 Storage layer (Hadoop Distributed File System).

MapReduce

MapReduce is a parallel programming model for writing distributed

applications devised at Google for efficient processing of large amounts of data
(multi-terabyte data-sets), on large clusters (thousands of nodes) of commodity
hardware in a reliable, fault-tolerant manner. The MapReduce program runs on
Hadoop which is an Apache open-source framework.

Hadoop Distributed File System

The Hadoop Distributed File System (HDFS) is based on the Google File
System (GFS) and provides a distributed file system that is designed to run on
commodity hardware. It has many similarities with existing distributed file
systems. However, the differences from other distributed file systems are
significant. It is highly fault-tolerant and is designed to be deployed on low-cost
hardware. It provides high throughput access to application data and is suitable
for applications having large datasets.

Apart from the above-mentioned two core components, Hadoop framework also
includes the following two modules −

 Hadoop Common − These are Java libraries and utilities required by

other Hadoop modules.
 Hadoop YARN − This is a framework for job scheduling and cluster
resource management.

Advantages of Hadoop

 Hadoop framework allows the user to quickly write and test distributed
systems. It is efficient, and it automatic distributes the data and work
across the machines and in turn, utilizes the underlying parallelism of the
CPU cores.
 Hadoop does not rely on hardware to provide fault-tolerance and high
availability (FTHA), rather Hadoop library itself has been designed to
detect and handle failures at the application layer.
 Servers can be added or removed from the cluster dynamically and
Hadoop continues to operate without interruption.
 Another big advantage of Hadoop is that apart from being open source, it
is compatible on all the platforms since it is Java based.

Unit Iii
No ratings yet
Unit Iii
20 pages
Jam Jim Jam Plan
No ratings yet
Jam Jim Jam Plan
7 pages
Hadoop
No ratings yet
Hadoop
3 pages
Hadoop 10
No ratings yet
Hadoop 10
8 pages
Bda Unit-2
No ratings yet
Bda Unit-2
37 pages
Shawn
No ratings yet
Shawn
4 pages
Hadoop Is An Open
No ratings yet
Hadoop Is An Open
4 pages
Part 02 - Big Data Solutions
No ratings yet
Part 02 - Big Data Solutions
17 pages
Unit 2-1
No ratings yet
Unit 2-1
43 pages
Unit 2
No ratings yet
Unit 2
17 pages
Unit-2 Hadoop HDFS Hadoopecosystem
No ratings yet
Unit-2 Hadoop HDFS Hadoopecosystem
25 pages
7) Intro To Hadoop and Mapreducer
No ratings yet
7) Intro To Hadoop and Mapreducer
10 pages
CC Unit 2
No ratings yet
CC Unit 2
29 pages
CC Unit - 5
No ratings yet
CC Unit - 5
27 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
14 pages
Big Data 2 - Part
No ratings yet
Big Data 2 - Part
40 pages
Unit Ii BDT F
No ratings yet
Unit Ii BDT F
13 pages
Unit 2,3
No ratings yet
Unit 2,3
24 pages
2 Hadoop
No ratings yet
2 Hadoop
20 pages
Chapter 3 Hadoop
No ratings yet
Chapter 3 Hadoop
10 pages
Hadoop
No ratings yet
Hadoop
11 pages
Hadoop Introduction PDF
No ratings yet
Hadoop Introduction PDF
3 pages
Hadoop
No ratings yet
Hadoop
14 pages
Ds1604 - Data Analytics Department of Ads 2024-2025
No ratings yet
Ds1604 - Data Analytics Department of Ads 2024-2025
34 pages
Big Data - Unit 2 Hadoop Framework
100% (1)
Big Data - Unit 2 Hadoop Framework
19 pages
Chap 2 Hadoop
No ratings yet
Chap 2 Hadoop
24 pages
Report On An Exploratory Analysis of The
No ratings yet
Report On An Exploratory Analysis of The
19 pages
BDA Unit-3
No ratings yet
BDA Unit-3
47 pages
Hadoop for Big Data Solutions
No ratings yet
Hadoop for Big Data Solutions
31 pages
Unit-2 - Hadoop2
No ratings yet
Unit-2 - Hadoop2
30 pages
BDA Unit2 Notes
No ratings yet
BDA Unit2 Notes
23 pages
Apache Hadoop
No ratings yet
Apache Hadoop
11 pages
BDA Module 3
No ratings yet
BDA Module 3
69 pages
CC Unit5
No ratings yet
CC Unit5
27 pages
Intro Hadoop Ecosystem Components, Hadoop Ecosystem Tools
No ratings yet
Intro Hadoop Ecosystem Components, Hadoop Ecosystem Tools
15 pages
Assignment 10
No ratings yet
Assignment 10
5 pages
BIG Data - Unit - 2
No ratings yet
BIG Data - Unit - 2
24 pages
BDA Manual
No ratings yet
BDA Manual
57 pages
Hadoop, A Distributed Framework For Big Data
No ratings yet
Hadoop, A Distributed Framework For Big Data
55 pages
Hadoop Guide for CS Students
No ratings yet
Hadoop Guide for CS Students
11 pages
Big Data Module 2
No ratings yet
Big Data Module 2
23 pages
Unit-2 Hadoop
No ratings yet
Unit-2 Hadoop
16 pages
Big Data Unit 2 Notes
No ratings yet
Big Data Unit 2 Notes
6 pages
Unit 2 Big Data Notes
No ratings yet
Unit 2 Big Data Notes
21 pages
Hadoop Overview for Big Data Course
No ratings yet
Hadoop Overview for Big Data Course
11 pages
Hadoop Important Lecture
No ratings yet
Hadoop Important Lecture
38 pages
Bda Module 2
No ratings yet
Bda Module 2
12 pages
Module 2
No ratings yet
Module 2
23 pages
CC-KML051-Unit V
No ratings yet
CC-KML051-Unit V
17 pages
Unit 2
No ratings yet
Unit 2
9 pages
Class: CS 237 Distributed Systems Middleware Instructor: Nalini Venkatasubramanian
No ratings yet
Class: CS 237 Distributed Systems Middleware Instructor: Nalini Venkatasubramanian
55 pages
Unit II Big Data
No ratings yet
Unit II Big Data
27 pages
Hadoop for Big Data Enthusiasts
No ratings yet
Hadoop for Big Data Enthusiasts
42 pages
Hadoop
No ratings yet
Hadoop
7 pages
Unit Ii
No ratings yet
Unit Ii
30 pages
Big Data Unit 2
No ratings yet
Big Data Unit 2
25 pages
Big Data - Introduction To Hadoop
No ratings yet
Big Data - Introduction To Hadoop
61 pages
Hadoop, A Distributed Framework For Big Data
No ratings yet
Hadoop, A Distributed Framework For Big Data
55 pages
CSE375
No ratings yet
CSE375
2 pages
Degree of Influence
No ratings yet
Degree of Influence
14 pages
Apa Style Dissertation Table of Contents
100% (2)
Apa Style Dissertation Table of Contents
4 pages
Marine Transportation Thesis Topics
100% (3)
Marine Transportation Thesis Topics
7 pages
Citizenship Advancement Training
No ratings yet
Citizenship Advancement Training
41 pages
Pe 1 4TH Quarter
No ratings yet
Pe 1 4TH Quarter
31 pages
Resource Mobilization
No ratings yet
Resource Mobilization
14 pages
Lec 13-Power Series
No ratings yet
Lec 13-Power Series
63 pages
The Monogamy Gap Men Love and The Reality of Cheating 1st Edition Eric Anderson Full Chapters Included
No ratings yet
The Monogamy Gap Men Love and The Reality of Cheating 1st Edition Eric Anderson Full Chapters Included
129 pages
Brighton in The Rain
No ratings yet
Brighton in The Rain
3 pages
Eye of The Storm
No ratings yet
Eye of The Storm
14 pages
ChamPock An
No ratings yet
ChamPock An
160 pages
Bootcamp 2020 Complete Course Outline
No ratings yet
Bootcamp 2020 Complete Course Outline
25 pages
Study Skills Presentation: 1 Year
No ratings yet
Study Skills Presentation: 1 Year
24 pages
Lightweight Edge Detection Network
No ratings yet
Lightweight Edge Detection Network
15 pages
BIOL 1310 Syllabus Fall 2023 Robert Morris University
No ratings yet
BIOL 1310 Syllabus Fall 2023 Robert Morris University
4 pages
Group Assignment IT Audit
No ratings yet
Group Assignment IT Audit
24 pages
UCEED Past 5 Years Cutoff (2020-2024) - 1744044682982
No ratings yet
UCEED Past 5 Years Cutoff (2020-2024) - 1744044682982
9 pages
Cognitive Learning Strategies Guide
No ratings yet
Cognitive Learning Strategies Guide
1 page
The Rizal Memorial Colleges, Inc
No ratings yet
The Rizal Memorial Colleges, Inc
4 pages
Adverbs
No ratings yet
Adverbs
2 pages
BS Islamic Studies 4TH Semester English Notes
No ratings yet
BS Islamic Studies 4TH Semester English Notes
12 pages
Tertiary Education Portfolios Review
No ratings yet
Tertiary Education Portfolios Review
24 pages
Transactional and Interactional
No ratings yet
Transactional and Interactional
12 pages
Guideline To Membership - 2019
No ratings yet
Guideline To Membership - 2019
50 pages
Janae Benson: Exceptional Nursing Student Recommendation
No ratings yet
Janae Benson: Exceptional Nursing Student Recommendation
1 page
Course File Sviet
No ratings yet
Course File Sviet
6 pages
Complete Bundle Test Bank For Social Work and Family Violence 2nd US Edition by McClennen
No ratings yet
Complete Bundle Test Bank For Social Work and Family Violence 2nd US Edition by McClennen
408 pages
Instant Download Research Methods in Second Language Psycholinguistics 1st Edition Jill Jegerski PDF All Chapter
100% (13)
Instant Download Research Methods in Second Language Psycholinguistics 1st Edition Jill Jegerski PDF All Chapter
66 pages

Hadoop Introduction

Uploaded by

Hadoop Introduction

Uploaded by

Hadoop is an Apache open source framework written in java that allows

distributed processing of large datasets across clusters of computers using

At its core, Hadoop has two major layers namely −

 Processing/Computation layer (MapReduce), and

MapReduce is a parallel programming model for writing distributed

Hadoop Distributed File System

 Hadoop Common − These are Java libraries and utilities required by

You might also like