Assignment-2
EVEN Semester Session 2024-2025
BIG DATA
(BCDS-601)
Max Marks 10 Due Date 25-04-2025
Note: 1.Mention your Name, Roll-Number, Branch, Section, and
subject code.
2. Only hand-written answers will be accepted.
1. Explain how HDFS stores, reads, and writes files. Describe the sequence of operations
involved in storing a file in HDFS, retrieving data from HDFS, and writing data to
HDFS.
2. Describe the considerations for deploying Hadoop in a cloud environment. What are the
advantages and challenges of running Hadoop clusters on cloud platforms like Amazon
Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP)?
3. Discuss the process of developing a MapReduce application. What are the key steps
involved in writing, testing, and deploying a MapReduce program?
4. Describe the Hadoop Distributed File System (HDFS). How does HDFS manage the
storage and replication of data across a distributed cluster of machines?
5. Provide examples of real-world applications where Big Data analytics have been
instrumental. How do industries such as healthcare, finance, e-commerce, and
transportation leverage Big Data to gain insights and create value.
6. Explain the core concepts of HDFS, including NameNode, DataNode, and the file
system namespace. How do these components work together to manage data storage and
replication in Hadoop clusters?
7. Explain Apache Hadoop and its role in big data processing. What are the core
components of the Apache Hadoop ecosystem, and how do they work together to enable
distributed data storage and processing.
8. What are the benefits and challenges of using HDFS for distributed storage and
processing.
9. Describe the concepts of file sizes, block sizes, and block abstraction in HDFS.
10. What are the different types of digital data commonly encountered in Big Data
applications? Provide examples of structured, semi-structured, and unstructured data?
11. Explain working of following phases of Map Reduce with one common example. (i)
Map Phase (ii) Combiner Phase (iii) Shuffle and Sort
12. Define heartbeat in HDFS