Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
11 views2 pages

Hadoop Introduction

Hadoop is an Apache open-source framework for distributed processing of large datasets using Java, designed to scale from a single server to thousands of machines. It consists of two main components: MapReduce for processing and the Hadoop Distributed File System (HDFS) for storage, both optimized for fault tolerance and high throughput. Additionally, Hadoop includes modules like Hadoop Common and YARN for job scheduling and resource management, offering advantages such as ease of use, dynamic scalability, and cross-platform compatibility.

Uploaded by

sujana s
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views2 pages

Hadoop Introduction

Hadoop is an Apache open-source framework for distributed processing of large datasets using Java, designed to scale from a single server to thousands of machines. It consists of two main components: MapReduce for processing and the Hadoop Distributed File System (HDFS) for storage, both optimized for fault tolerance and high throughput. Additionally, Hadoop includes modules like Hadoop Common and YARN for job scheduling and resource management, offering advantages such as ease of use, dynamic scalability, and cross-platform compatibility.

Uploaded by

sujana s
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

Hadoop is an Apache open source framework written in java that allows

distributed processing of large datasets across clusters of computers using


simple programming models. The Hadoop framework application works in an
environment that provides distributed storage and computation across clusters
of computers. Hadoop is designed to scale up from single server to thousands of
machines, each offering local computation and storage.

Hadoop Architecture

At its core, Hadoop has two major layers namely −

 Processing/Computation layer (MapReduce), and


 Storage layer (Hadoop Distributed File System).

MapReduce

MapReduce is a parallel programming model for writing distributed


applications devised at Google for efficient processing of large amounts of data
(multi-terabyte data-sets), on large clusters (thousands of nodes) of commodity
hardware in a reliable, fault-tolerant manner. The MapReduce program runs on
Hadoop which is an Apache open-source framework.

Hadoop Distributed File System

The Hadoop Distributed File System (HDFS) is based on the Google File
System (GFS) and provides a distributed file system that is designed to run on
commodity hardware. It has many similarities with existing distributed file
systems. However, the differences from other distributed file systems are
significant. It is highly fault-tolerant and is designed to be deployed on low-cost
hardware. It provides high throughput access to application data and is suitable
for applications having large datasets.

Apart from the above-mentioned two core components, Hadoop framework also
includes the following two modules −

 Hadoop Common − These are Java libraries and utilities required by


other Hadoop modules.
 Hadoop YARN − This is a framework for job scheduling and cluster
resource management.

Advantages of Hadoop

 Hadoop framework allows the user to quickly write and test distributed
systems. It is efficient, and it automatic distributes the data and work
across the machines and in turn, utilizes the underlying parallelism of the
CPU cores.
 Hadoop does not rely on hardware to provide fault-tolerance and high
availability (FTHA), rather Hadoop library itself has been designed to
detect and handle failures at the application layer.
 Servers can be added or removed from the cluster dynamically and
Hadoop continues to operate without interruption.
 Another big advantage of Hadoop is that apart from being open source, it
is compatible on all the platforms since it is Java based.

You might also like