0% found this document useful (0 votes)

7 views10 pages

Competition Questions

Big Data refers to vast and complex data sets that are challenging to manage with traditional tools, characterized by volume, velocity, variety, and veracity. Organizations leverage Big Data analysis for competitive advantages such as informed decision-making, asset optimization, cost reduction, and improved customer engagement. Hadoop is a key technology for managing Big Data, providing efficient storage, fault tolerance, and parallel processing capabilities.

Uploaded by

somilkathuria14

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views10 pages

Competition Questions

Uploaded by

somilkathuria14

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 10

Competition questions

What is Big Data?

Big Data is nothing but an assortment of such a huge and complex data
that it becomes very tedious to capture, store, process, retrieve and
analyze it with the help of on-hand database management tools or
traditional data processing techniques.

What are the theree characteristics of Big Data?

Volume − Facebook generating 500+ terabytes of data per day.
Velocity − Analyzing 2 million records each day to identify the reason for losses.
Variety − images, audio, video, sensor data, log files, etc. Veracity: biases, noise and
abnormality in data
How is analysis of Big Data useful for organizations?
Effective analysis of Big Data provides a lot of business advantage as organizations will learn
which areas to focus on and which areas are less important. Big data analysis provides some
early key indicators that can prevent the company from a huge loss or help in grasping a great
opportunity with open hands! A precise analysis of Big Data helps in decision making! For
instance, nowadays people rely so much on Facebook and Twitter before buying any product or
service. All thanks to the Big Data explosion.

Ques1. What is Big Data, and where does it come from? How
does it work?

Ans: Big Data refers to extensive and often complicated data sets so huge that
they’re beyond the capacity of managing with conventional software tools. Big
Data comprises unstructured and structured data sets such as videos, photos,
audio, websites, and multimedia content.

Businesses collect the data they need in countless ways, such as:

 Internet cookies
 Email tracking
 Smartphones
 Smartwatches
 Online purchase transaction forms
 Website interactions
 Transaction histories
 Social media posts
 Third-party trackers -companies that collect and sell clients and
profitable data

Working with big data involves three sets of activities:

 Integration: This involves merging data often from different sources – and
molding it into a form that can be analysed in a way to provide insights.
 Management: Big data must be stored in a repository where it can be
collected and readily reached. The largest amount of Big Data is
unstructured, causing it ill-suited for conventional relational databases,
which need data in tables-and-rows format.
 Analysis: The Big Data investment return is a spectrum of worthy market
insights, including details on buying patterns and customer choices. These
are represented by examining large data sets with tools driven by AI and
machine learning.

Ques: Why businesses are using Big Data for competitive

advantage.

Ans: Big data professionals also need to know what the company requires from
the application and how they plan to use the data to their advantage.

 Confident decision-building: Analytics aims to develop decision building,

and big data endures to sustain this. Big data can help enterprises speed up
their decision-making method with so much data available while still being
assured of their choice. Nowadays, moving fast and reacting to broader
trends and operational changes is a huge business benefit in a quick-paced
society.
 Asset optimisation: Big data signifies that businesses can control assets
at a personal level. This implies they can adequately optimise assets
depending on the data source, improve productivity, extend the lifespan of
help, and reduce the downtime some assets may require. This gives a
competing advantage by assuring the company is getting the most out of
its assets and links with decreasing costs.
 Cost reduction: Big data can support businesses to reduce their
outgoings. From analysing energy usage to assessing the effectiveness of
staff operating patterns, data collected by companies can help them
recognise where they can make cost savings without having a negative
impact on company operations.
 Improve customer engagement: When surveying online, consumers
make confident choices indicating their decisions, habits, and tendencies
that can then be used to develop and tailor consumer dialogue, which could
then be interpreted into increased sales. Understanding what each client is
looking for through the data collected on them means you can target them
with specific products, but it also gives a personal feel that many
consumers today have come to await.
 Identify new revenue streams: Analytics can further assist companies in
identifying new revenue streams and expanding into other areas. For
example, knowing customer trends and decisions allow firms to decide the
way they should go. The data companies accumulate can also likely be
sold, adding income streams and the potential to build alliances

Unit-2

Why do we need Hadoop?

Everyday a large amount of unstructured data is getting dumped into our machines. The major
challenge is not to store large data sets in our systems but to retrieve and analyze the big data
in the organizations, that too data present in different machines at different locations. In this
situation a necessity for Hadoop arises. Hadoop has the ability to analyze the data present in
different machines at different locations very quickly and in a very cost effective way. It uses the
concept of MapReduce which enables it to divide the query into small parts and process them in
parallel. This is also known as parallel computing. The following link Why Hadoop gives a
detailed explanation about why Hadoop is gaining so much popularity!

What is the basic difference between traditional RDBMS and Hadoop?

Traditional RDBMS is used for transactional systems to report and archive the data,
whereas Hadoop is an approach to store huge amount of data in the distributed file system and
process it. RDBMS will be useful when you want to seek one record from Big data, whereas,
Hadoop will be useful when you want Big data in one shot and perform analysis on that later.

What is Fault Tolerance?

Suppose you have a file stored in a system, and due to some technical problem that file gets
destroyed. Then there is no chance of getting the data back present in that file. To avoid such
situations, Hadoop has introduced the feature of fault tolerance in HDFS. In Hadoop, when we
store a file, it automatically gets replicated at two other locations also. So even if one or two of
the systems collapse, the file is still available on the third system.

Replication causes data redundancy, then why is it pursued in HDFS?

HDFS works with commodity hardware (systems with average configurations) that has high
chances of getting crashed any time. Thus, to make the entire system highly fault-tolerant,
HDFS replicates and stores data in different places. Any data on HDFS gets stored at least 3
different locations. So, even if one of them is corrupted and the other is unavailable for some
time for any reason, then data can be accessed from the third one. Hence, there is no chance of
losing the data. This replication factor helps us to attain the feature of Hadoop called Fault
Tolerant.
Since the data is replicated thrice in HDFS, does it mean that any
calculation done on one node will also be replicated on the other two?
No, calculations will be done only on the original data. The master node will know which node
exactly has that particular data. In case, if one of the nodes is not responding, it is assumed to
be failed. Only then, the required calculation will be done on the second replica.

What is a Namenode?
Namenode is the master node on which job tracker runs and consists of the metadata. It
maintains and manages the blocks which are present on the datanodes. It is a high-availability
machine and single point of failure in HDFS.

Is Namenode also a commodity hardware?

No. Namenode can never be commodity hardware because the entire HDFS rely on it. It is the
single point of failure in HDFS. Namenode has to be a high-availability machine.

What is a Datanode?
Datanodes are the slaves which are deployed on each machine and provide the actual storage.
These are responsible for serving read and write requests for the clients.

Why do we use HDFS for applications having large data sets and not
when there are lot of small files?
HDFS is more suitable for large amount of data sets in a single file as compared to small
amount of data spread across multiple files. This is because Namenode is a very expensive
high performance system, so it is not prudent to occupy the space in the Namenode by
unnecessary amount of metadata that is generated for multiple small files. So, when there is a
large amount of data in a single file, name node will occupy less space. Hence for getting
optimized performance, HDFS supports large data sets instead of multiple small files.

What is a job tracker?

Job tracker is a daemon that runs on a namenode for submitting and tracking MapReduce jobs
in Hadoop. It assigns the tasks to the different task tracker. In a Hadoop cluster, there will be
only one job tracker but many task trackers. It is the single point of failure for Hadoop and
MapReduce Service. If the job tracker goes down all the running jobs are halted. It receives
heartbeat from task tracker based on which Job tracker decides whether the assigned task is
completed or not.

What is a task tracker?

Task tracker is also a daemon that runs on datanodes. Task Trackers manage the execution of
individual tasks on slave node. When a client submits a job, the job tracker will initialize the job
and divide the work and assign them to different task trackers to perform MapReduce tasks.
While performing this action, the task tracker will be simultaneously communicating with job
tracker by sending heartbeat. If the job tracker does not receive heartbeat from task tracker
within specified time, then it will assume that task tracker has crashed and assign that task to
another task tracker in the cluster.

What is a heartbeat in HDFS?

A heartbeat is a signal indicating that it is alive. A datanode sends heartbeat to Namenode and
task tracker will send its heart beat to job tracker. If the Namenode or job tracker does not
receive heart beat then they will decide that there is some problem in datanode or task tracker is
unable to perform the assigned task.

What is a ‘block’ in HDFS?

A ‘block’ is the minimum amount of data that can be read or written. In HDFS, the default block
size is 64 MB as contrast to the block size of 8192 bytes in Unix/Linux. Files in HDFS are
broken down into block-sized chunks, which are stored as independent units. HDFS blocks are
large as compared to disk blocks, particularly to minimize the cost of seeks. If a particular file is
50 mb, will the HDFS block still consume 64 mb as the default size? No, not at all! 64 mb is just
a unit where the data will be stored. In this particular situation, only 50 mb will be consumed by
an HDFS block and 14 mb will be free to store something else. It is the MasterNode that does
data allocation in an efficient manner.

What are the benefits of block transfer?

A file can be larger than any single disk in the network. There’s nothing that requires the blocks
from a file to be stored on the same disk, so they can take advantage of any of the disks in the
cluster. Making the unit of abstraction a block rather than a file simplifies the storage
subsystem. Blocks provide fault tolerance and availability. To insure against corrupted blocks
and disk and machine failure, each block is replicated to a small number of physically separate
machines (typically three). If a block becomes unavailable, a copy can be read from another
location in a way that is transparent to the client?

How indexing is done in HDFS?

Hadoop has its own way of indexing. Depending upon the block size, once the data is stored,
HDFS will keep on storing the last part of the data which will say where the next part of the data
will be.

Are job tracker and task trackers present in separate machines?

Yes, job tracker and task tracker are present in different machines. The reason is job tracker is
a single point of failure for the Hadoop MapReduce service. If it goes down, all running jobs are
halted.
What is the communication channel between client and
namenode/datanode?
The mode of communication is SSH. SSH or Secure Shell is a network communication protocol
that enables two computers to communicate (c.f http or hypertext transfer protocol, which is the
protocol used to transfer hypertext such as web pages) and share data.

What is a rack?
Rack is a storage area with all the datanodes put together. These datanodes can be physically
located at different places. Rack is a physical collection of datanodes which are stored at a
single location. There can be multiple racks in a single location.

What is a Secondary Namenode? Is it a substitute to the Namenode?

The secondary Namenode constantly reads the data from the RAM of the Namenode and writes
it into the hard disk or the file system. It is not a substitute to the Namenode, so if the
Namenode fails, the entire Hadoop system goes down.

What does ‘jps’ command do?

It gives the status of the deamons which run Hadoop cluster. It gives the output mentioning the
status of namenode, datanode , secondary namenode, Jobtracker and Task tracker.

How to restart Namenode?

Step-1. Click on stop-all.sh and then click on start-all.sh OR
Step-2. Write sudo hdfs (press enter), su-hdfs (press enter), /etc/init.d/ha (press
enter) and then /etc/init.d/hadoop-0.20-namenode start (press enter).

Which are the three modes in which Hadoop can be run?

The three modes in which Hadoop can be run are −

1. standalone (local) mode

2. Pseudo-distributed mode
3. Fully distributed mode

What does /etc /init.d do?

/etc /init.d specifies where daemons (services) are placed or to see the status of these
daemons. It is very LINUX specific, and nothing to do with Hadoop.

What if a Namenode has no data?

It cannot be part of the Hadoop cluster.

What happens to job tracker when Namenode is down?

When Namenode is down, your cluster is OFF, this is because Namenode is the single point of
failure in HDFS.

Explain how do ‘map’ and ‘reduce’ works.

Namenode takes the input and divide it into parts and assign them to data nodes. These
datanodes process the tasks assigned to them and make a key-value pair and returns the
intermediate output to the Reducer. The reducer collects this key value pairs of all the
datanodes and combines them and generates the final output.

Why ‘Reading‘ is done in parallel and ‘Writing‘ is not in HDFS?

Through mapreduce program the file can be read by splitting its blocks when reading. But while
writing as the incoming values are not yet known to the system mapreduce cannot be applied
and no parallel writing is possible.

Copy a directory from one node in the cluster to another

Use ‘-distcp’ command to copy,

Default replication factor to a file is 3.

hadoop fs -setrep -w 2 apache_hadoop/sample.txt

What is rack awareness?

Rack awareness is the way in which the namenode decides how to place blocks based on the
rack definitions Hadoop will try to minimize the network traffic between datanodes within the
same rack and will only contact remote racks if it has to. The namenode is able to control this
due to rack awareness.

Which file does the Hadoop-core configuration?

core-default.xml

Is there a hdfs command to see available free space in hdfs

hadoop dfsadmin –report

The requirement is to add a new data node to a running Hadoop

cluster; how do I start services on just one data node?
You do not need to shutdown and/or restart the entire cluster in this case.
First, add the new node's DNS name to the conf/slaves file on the master node.
Then log in to the new slave node and execute −
$ cd path/to/hadoop
$ bin/hadoop-daemon.sh start datanode
$ bin/hadoop-daemon.sh start tasktracker
then issuehadoop dfsadmin -refreshNodes and hadoop mradmin -refreshNodes so that
the NameNode and JobTracker know of the additional node that has been added.

How do you gracefully stop a running job?

Hadoop job –kill jobid

Does the name-node stay in safe mode till all under-replicated files
are fully replicated?
No. During safe mode replication of blocks is prohibited. The name-node awaits when all or
majority of data-nodes report their blocks.

What happens if one Hadoop client renames a file or a directory

containing this file while another client is still writing into it?
A file will appear in the name space as soon as it is created. If a writer is writing to a file and
another client renames either the file itself or any of its path components, then the original writer
will get an IOException either when it finishes writing to the current block or when it closes the
file.

How to make a large cluster smaller by taking out some of the nodes?
Hadoop offers the decommission feature to retire a set of existing data-nodes. The
nodes to be retired should be included into the exclude file, and the exclude file name
should be specified as a configuration parameter dfs.hosts.exclude.
The decommission process can be terminated at any time by editing the configuration
or the exclude files and repeating the -refreshNodes command

Can we search for files using wildcards?

Yes. For example, to list all the files which begin with the letter a, you could use the ls command
with the * wildcard &minu;

What happens when two clients try to write into the same HDFS file?
DFS supports exclusive writes only.
When the first client contacts the name-node to open the file for writing, the name-node
grants a lease to the client to create this file. When the second client tries to open the
same file for writing, the name-node will see that the lease for the file is already granted
to another client, and will reject the open request for the second client

What does "file could only be replicated to 0 nodes, instead of 1"

mean?
The namenode does not have any available DataNodes.

What is a Combiner?
The Combiner is a ‘mini-reduce’ process which operates only on data generated by a mapper.
The Combiner will receive as input all data emitted by the Mapper instances on a given node.
The output from the Combiner is then sent to the Reducers, instead of the output from the
Mappers

Consider case scenario: In M/R system, - HDFS block size is 64 MB

- Input format is FileInputFormat
– We have 3 files of size 64K, 65Mb and 127Mb
How many input splits will be made by Hadoop framework?
Hadoop will make 5 splits as follows −

 - 1 split for 64K files

 - 2 splits for 65MB files
 - 2 splits for 127MB files

Suppose Hadoop spawned 100 tasks for a job and one of the task
failed. What will Hadoop do?
It will restart the task again on some other TaskTracker and only if the task fails more than four (
the default setting and can be changed) times will it kill the job.

What are Problems with small files and HDFS?

HDFS is not good at handling large number of small files. Because every file, directory and
block in HDFS is represented as an object in the namenode’s memory, each of which occupies
approx 150 bytes So 10 million files, each using a block, would use about 3 gigabytes of
memory. when we go for a billion files the memory requirement in namenode cannot be met.

What is speculative execution in Hadoop?

If a node appears to be running slow, the master node can redundantly execute another
instance of the same task and first output will be taken .this process is called as Speculative
execution.

Can Hadoop handle streaming data?

Yes, through Technologies like Apache Kafka, Apache Flume, and Apache Spark it is possible
to do large-scale streaming.

Why is Checkpointing Important in Hadoop?

As more and more files are added the namenode creates large edit logs. Which can
substantially delay NameNode startup as the NameNode reapplies all the edits. Checkpointing
is a process that takes an fsimage and edit log and compacts them into a new fsimage. This
way, instead of replaying a potentially unbounded edit log, the NameNode can load the final in-
memory state directly from the fsimage. This is a far more efficient operation and reduces
NameNode startup time.

Big Data and Hadoop Self Notes
No ratings yet
Big Data and Hadoop Self Notes
16 pages
جودة المواقع PDF
No ratings yet
جودة المواقع PDF
25 pages
Process Flow Chart For
100% (1)
Process Flow Chart For
9 pages
BDH Admin Ebook
No ratings yet
BDH Admin Ebook
807 pages
Notes Hadoop
No ratings yet
Notes Hadoop
19 pages
Big Data & Hadoop Essentials
No ratings yet
Big Data & Hadoop Essentials
52 pages
Visualizing Association Rules: Introduction To The R-Extension Package Arulesviz
No ratings yet
Visualizing Association Rules: Introduction To The R-Extension Package Arulesviz
24 pages
Big Data Analytics - Project
50% (2)
Big Data Analytics - Project
27 pages
Hadoop Interview Questions
No ratings yet
Hadoop Interview Questions
28 pages
Smith & Wesson 2013 Catalog
100% (2)
Smith & Wesson 2013 Catalog
75 pages
1st Pinnacle Open Blitz Chess Tournament 2025
No ratings yet
1st Pinnacle Open Blitz Chess Tournament 2025
4 pages
RRU5903 (850Mhz) - Technical Specifications
No ratings yet
RRU5903 (850Mhz) - Technical Specifications
8 pages
B.Com Management Exam Prep Guide
100% (1)
B.Com Management Exam Prep Guide
7 pages
BDA Unit-1
No ratings yet
BDA Unit-1
31 pages
Unit - I Introduction To Big Data
No ratings yet
Unit - I Introduction To Big Data
38 pages
Updated Unit-2
0% (1)
Updated Unit-2
55 pages
Module 02 - Learners Guide
No ratings yet
Module 02 - Learners Guide
82 pages
Hadoop Interview Questions - HDFS
No ratings yet
Hadoop Interview Questions - HDFS
19 pages
Hadoop Interview Questions
No ratings yet
Hadoop Interview Questions
28 pages
LP-3 (Information & Cyber Security) Lab Manual 2021-22
No ratings yet
LP-3 (Information & Cyber Security) Lab Manual 2021-22
37 pages
$RM5TSDQ
No ratings yet
$RM5TSDQ
70 pages
Lecture 02
No ratings yet
Lecture 02
60 pages
Hadoop Quick Guide
No ratings yet
Hadoop Quick Guide
32 pages
Bda U2
No ratings yet
Bda U2
68 pages
Bda Ese
No ratings yet
Bda Ese
66 pages
BDA Answers-1
No ratings yet
BDA Answers-1
15 pages
Big Data Tools and Applications Assignment
No ratings yet
Big Data Tools and Applications Assignment
10 pages
BDA Unit-1
No ratings yet
BDA Unit-1
32 pages
Hadoop Architecture and Its Functionality
No ratings yet
Hadoop Architecture and Its Functionality
7 pages
Unit 1,2,3,4
No ratings yet
Unit 1,2,3,4
116 pages
ST-1 Solution Big Data KCS061
No ratings yet
ST-1 Solution Big Data KCS061
26 pages
Hadoop & BigData (UNIT - 2)
No ratings yet
Hadoop & BigData (UNIT - 2)
22 pages
Intro
No ratings yet
Intro
47 pages
Hadoop Report
No ratings yet
Hadoop Report
110 pages
Hadoop Interview Questions
No ratings yet
Hadoop Interview Questions
27 pages
Unit 1
No ratings yet
Unit 1
89 pages
Testing Big Data: Camelia Rad
No ratings yet
Testing Big Data: Camelia Rad
31 pages
Mittal School of Business: Course Code: CAP348 Course Title: Introduction To Big Data
No ratings yet
Mittal School of Business: Course Code: CAP348 Course Title: Introduction To Big Data
6 pages
Big Data-2
No ratings yet
Big Data-2
40 pages
Lecture 4
No ratings yet
Lecture 4
32 pages
Hadoop - Quick Guide Hadoop - Big Data Overview
No ratings yet
Hadoop - Quick Guide Hadoop - Big Data Overview
41 pages
Big Data Analytics
No ratings yet
Big Data Analytics
10 pages
Big Data Overview
No ratings yet
Big Data Overview
18 pages
Hadoop ISE 2
No ratings yet
Hadoop ISE 2
25 pages
Hadoop - Quick Guide Hadoop - Big Data Overview
No ratings yet
Hadoop - Quick Guide Hadoop - Big Data Overview
32 pages
Big Data Analytics Explained
No ratings yet
Big Data Analytics Explained
4 pages
Experiment No. 11 Part A A.1 Aim: 2 Prerequisite: A.3 Outcome: After Successful Completion of This Experiment, Students Will Be Able To
No ratings yet
Experiment No. 11 Part A A.1 Aim: 2 Prerequisite: A.3 Outcome: After Successful Completion of This Experiment, Students Will Be Able To
21 pages
Big Data: Presented By, Nishaa R
No ratings yet
Big Data: Presented By, Nishaa R
24 pages
Bda Assignment
No ratings yet
Bda Assignment
7 pages
Big Data Insights for Analysts
No ratings yet
Big Data Insights for Analysts
19 pages
Experiment No - 1 Bda
No ratings yet
Experiment No - 1 Bda
10 pages
Polity (Articles Compilation June2024-Jan2025) M IE Explained - All Subjects (Dec 2025)
No ratings yet
Polity (Articles Compilation June2024-Jan2025) M IE Explained - All Subjects (Dec 2025)
23 pages
Hadoop Interview Questions
No ratings yet
Hadoop Interview Questions
14 pages
Big Data & Hadoop Essentials
No ratings yet
Big Data & Hadoop Essentials
4 pages
Big Data: Meaning, Types, and 5 Vs
No ratings yet
Big Data: Meaning, Types, and 5 Vs
4 pages
Bda TT
No ratings yet
Bda TT
73 pages
UG-Big Data Analytics Unit - 3 - Big Data Business Perspectives
No ratings yet
UG-Big Data Analytics Unit - 3 - Big Data Business Perspectives
23 pages
Pantry Evaluation Proposal Internship
No ratings yet
Pantry Evaluation Proposal Internship
6 pages
Unit1 Bda
No ratings yet
Unit1 Bda
30 pages
Cao Wang FTA EMA
No ratings yet
Cao Wang FTA EMA
5 pages
Big Type Data
No ratings yet
Big Type Data
4 pages
Unit 3 (Big Data Analytics)
No ratings yet
Unit 3 (Big Data Analytics)
18 pages
LAB 1 - Matlab Basic
100% (1)
LAB 1 - Matlab Basic
26 pages
Big Data: Abstract
No ratings yet
Big Data: Abstract
15 pages
Regulatory Environment For Food and Beverage in Brazil
No ratings yet
Regulatory Environment For Food and Beverage in Brazil
12 pages
BIG DATA & Hadoop Interview Questions With Answers
No ratings yet
BIG DATA & Hadoop Interview Questions With Answers
9 pages
Personal Details:: A Study On "Recruitment and Selection Practices On Sterling Resorts Private Limited, Kodaikanal
No ratings yet
Personal Details:: A Study On "Recruitment and Selection Practices On Sterling Resorts Private Limited, Kodaikanal
6 pages
Bigdata
No ratings yet
Bigdata
6 pages
FELICIANO MALIWAT, Petitioner, vs. HON. COURT OF APPEALS, Former Special First Division, and The REPUBLIC OF THE PHILIPPINES, Respondents
100% (1)
FELICIANO MALIWAT, Petitioner, vs. HON. COURT OF APPEALS, Former Special First Division, and The REPUBLIC OF THE PHILIPPINES, Respondents
7 pages
Big Data and Hadoop
No ratings yet
Big Data and Hadoop
8 pages
Eddie Soriano
No ratings yet
Eddie Soriano
3 pages
Windows Movie Maker
100% (2)
Windows Movie Maker
6 pages
Shopee Delivery Po Pra Sa Kabaong Ni Don
No ratings yet
Shopee Delivery Po Pra Sa Kabaong Ni Don
65 pages
Flexitallic Flexpro Brochure 11-30-2017
No ratings yet
Flexitallic Flexpro Brochure 11-30-2017
8 pages
Lab 3
No ratings yet
Lab 3
16 pages
Visa Application Document Checklist
No ratings yet
Visa Application Document Checklist
13 pages
APTs 1st Set of 20 of 240 Printable MCQs For AQA As Econ Sect 1
No ratings yet
APTs 1st Set of 20 of 240 Printable MCQs For AQA As Econ Sect 1
15 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
23 pages
Case Study
No ratings yet
Case Study
9 pages
Self-Reflection On Instructional Coaching (1) 2
No ratings yet
Self-Reflection On Instructional Coaching (1) 2
3 pages
Safety Data Sheet Idlube XL: 1. Identification of The Substance/Preparation and The Company
No ratings yet
Safety Data Sheet Idlube XL: 1. Identification of The Substance/Preparation and The Company
4 pages
4 Startup Roles To Hire
No ratings yet
4 Startup Roles To Hire
8 pages
Encoded Data Document
No ratings yet
Encoded Data Document
6 pages
Epie Vs Ulat-Marredo
No ratings yet
Epie Vs Ulat-Marredo
1 page
CaseStudy Ch8 (3) Eng
No ratings yet
CaseStudy Ch8 (3) Eng
2 pages

Competition Questions

Uploaded by

Competition Questions

Uploaded by

Competition questions

What is Big Data?

What are the theree characteristics of Big Data?

Working with big data involves three sets of activities:

Ques: Why businesses are using Big Data for competitive

 Confident decision-building: Analytics aims to develop decision building,

Why do we need Hadoop?

What is the basic difference between traditional RDBMS and Hadoop?

What is Fault Tolerance?

Replication causes data redundancy, then why is it pursued in HDFS?

Is Namenode also a commodity hardware?

What is a job tracker?

What is a task tracker?

What is a heartbeat in HDFS?

What is a ‘block’ in HDFS?

What are the benefits of block transfer?

How indexing is done in HDFS?

Are job tracker and task trackers present in separate machines?

What is a Secondary Namenode? Is it a substitute to the Namenode?

What does ‘jps’ command do?

How to restart Namenode?

Which are the three modes in which Hadoop can be run?

1. standalone (local) mode

What does /etc /init.d do?

What if a Namenode has no data?

What happens to job tracker when Namenode is down?

Explain how do ‘map’ and ‘reduce’ works.

Why ‘Reading‘ is done in parallel and ‘Writing‘ is not in HDFS?

Copy a directory from one node in the cluster to another

Default replication factor to a file is 3.

What is rack awareness?

Which file does the Hadoop-core configuration?

Is there a hdfs command to see available free space in hdfs

The requirement is to add a new data node to a running Hadoop

How do you gracefully stop a running job?

What happens if one Hadoop client renames a file or a directory

Can we search for files using wildcards?

What does "file could only be replicated to 0 nodes, instead of 1"

Consider case scenario: In M/R system, - HDFS block size is 64 MB

 - 1 split for 64K files

What are Problems with small files and HDFS?

What is speculative execution in Hadoop?

Can Hadoop handle streaming data?

Why is Checkpointing Important in Hadoop?

You might also like