0% found this document useful (0 votes)

36 views11 pages

Ch5. Introduction To Big Data and Data Analytics

The document provides an overview of Big Data and Data Analytics, highlighting key concepts such as the 3V's (Volume, Velocity, Variety) and the importance of data preprocessing and analytics techniques like data mining. It discusses the types of data (structured, semi-structured, unstructured), the role of tools like Hadoop and Tableau, and the advantages and challenges associated with Big Data. Additionally, it addresses ethical concerns and future trends in Big Data analytics.

Uploaded by

Purvang Sunva

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views11 pages

Ch5. Introduction To Big Data and Data Analytics

Uploaded by

Purvang Sunva

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 11

Ch 5 Introduction to Big Data and Data Analytics

1. What does “Volume” refer to in the context of Big Data?

a) The variety of data types
b) The speed at which data is generated
c) The amount of data generated
d) The veracity of the data
Answer: c) The amount of data generated

2. Which of the following is a key characteristic of Big Data?

a) Structured format
b) Easily manageable size
c) Predictable patterns
d) Variety
Answer: d) Variety

3. Which of the following is NOT one of the V’s of Big Data?

a) Velocity
b) Volume
c) Verification
d) Variety
Answer: c) Verification

4. What is the primary purpose of data preprocessing in Big Data analytics?

a) To increase data volume
b) To reduce data variety
c) To improve data quality
d) To speed up data processing
Answer: c) To improve data quality

5. Which technique is commonly used for analyzing large datasets to discover patterns and
relationships?
a) Linear regression
b) Data mining
c) Decision trees
d) Naive Bayes
Answer: b) Data mining

6. Which term describes the process of extracting useful information from large datasets?
a) Data analytics
b) Data warehousing
c) Data integration
d) Data virtualization
Answer: a) Data analytics

7. Which of the following is a potential benefit of Big Data analytics?

a) Decreased data security
b) Reduced operational efficiency
c) Improved decision-making
d) Reduced data privacy
Answer: c) Improved decision-making

8. What role does Hadoop play in Big Data processing?

a) Hadoop is a programming language used for Big Data analytics.
b) Hadoop is a distributed file system for storing and processing Big Data.
c) Hadoop is a data visualization tool.
d) Hadoop is a NoSQL database management system.
Answer: b) Hadoop is a distributed file system for storing and processing Big Data.

9. What is the primary challenge associated with the veracity aspect of Big Data?
a) Handling large volumes of data
b) Ensuring data quality and reliability
c) Dealing with diverse data types
d) Managing data processing speed
Answer: b) Ensuring data quality and reliability

10. Which of the following types of data is most commonly used in Big Data analytics?
a) Structured data
b) Semi-structured data
c) Unstructured data
d) None of the above
Answer: c) Unstructured data

11. Which of the following is an example of unstructured data?

a) Customer information in a database
b) A CSV file containing product data
c) Social media posts
d) A sales report
Answer: c) Social media posts

12. What does “Velocity” refer to in the context of Big Data?

a) The volume of data
b) The speed at which data is generated
c) The variety of data
d) The value of data
Answer: b) The speed at which data is generated

13. Which of the following is an example of semi-structured data?

a) XML files
b) Customer database
c) Video files
d) Text documents
Answer: a) XML files

14. Which of the following tools is used for Big Data analytics?
a) Tableau
b) MS Excel
c) WordPress
d) Google Docs
Answer: a) Tableau

15. Which of the following is a disadvantage of Big Data?

a) Improved decision-making
b) High processing speed
c) Privacy and security concerns
d) Increased efficiency
Answer: c) Privacy and security concerns
16. What does “Veracity” refer to in Big Data?
a) The quantity of data
b) The speed of data processing
c) The accuracy and quality of data
d) The variety of data
Answer: c) The accuracy and quality of data

17. What does the “3V framework” for Big Data consist of?
a) Volume, Variety, Velocity
b) Value, Variety, Veracity
c) Variety, Velocity, Validation
d) Volume, Verification, Visualization
Answer: a) Volume, Variety, Velocity

18. What is the main focus of the “Value” characteristic of Big Data?
a) Ensuring the consistency of data
b) Ensuring the accuracy of data
c) Deriving business insights from data
d) Storing data effectively
Answer: c) Deriving business insights from data

19. What type of data does Big Data typically include?

a) Structured data only
b) Semi-structured data only
c) Unstructured data only
d) Structured, semi-structured, and unstructured data
Answer: d) Structured, semi-structured, and unstructured data

20. Which of the following is a tool used in Big Data analytics for processing data?
a) Hadoop
b) WordPress
c) Google Drive
d) Slack
Answer: a) Hadoop

21. What does “Batch Processing” refer to in Big Data analytics?

a) Analyzing small sets of data in real-time
b) Analyzing large blocks of data over time
c) Real-time processing of data streams
d) Preprocessing data into structured formats
Answer: b) Analyzing large blocks of data over time

22. What is “Data Stream Mining”?

a) Processing data in batches
b) Analyzing real-time data streams for patterns
c) Storing data for long-term analysis
d) Cleaning and organizing data for visualization
Answer: b) Analyzing real-time data streams for patterns

23. What is the main characteristic of unstructured data?

a) Organized in rows and columns
b) Lacks predefined structure
c) Easily searchable
d) Stored in relational databases
Answer: b) Lacks predefined structure

24. What is a “Logistic Regression” model used for in Big Data analytics?
a) Classifying data into distinct categories
b) Predicting continuous numerical outcomes
c) Analyzing the relationship between variables
d) Identifying clusters of data points
Answer: a) Classifying data into distinct categories

25. Which of the following types of Big Data analytics helps predict future outcomes?
a) Descriptive analytics
b) Diagnostic analytics
c) Predictive analytics
d) Prescriptive analytics
Answer: c) Predictive analytics

Important Questions with answer

1. What is Big Data, and how does it differ from small data?
Answer: Big Data refers to vast, complex datasets that traditional database systems cannot handle
due to their size, speed, or structure. Unlike small data, which is manageable and easily understood,
Big Data requires specialized tools and techniques for analysis. It often includes transactional data,
machine-generated data, and social media data, making it harder to analyze using conventional
methods. The large scale and complexity of Big Data make it a significant resource for businesses and
researchers looking to extract valuable insights.

2. Explain the 3V’s of Big Data and their significance.

Answer: The 3V’s of Big Data are Volume, Velocity, and Variety.

 Volume refers to the massive amount of data generated, often in the range of petabytes or
even exabytes daily.

 Velocity describes the speed at which data is generated and must be processed, requiring
real-time analysis.

 Variety encompasses the different types of data, including structured (e.g., databases), semi-
structured (e.g., XML files), and unstructured data (e.g., social media posts, videos).
These three characteristics distinguish Big Data from traditional data and require advanced
analytics tools to extract meaningful insights.

3. What are the main types of Big Data, and how do they differ from each other?
Answer: The main types of Big Data are:

 Structured Data: Data that is highly organized and stored in databases or spreadsheets,
typically in rows and columns (e.g., customer information).

 Semi-structured Data: Data that does not have a fixed schema but has some form of
organization, such as XML files or JSON files.

 Unstructured Data: Data that lacks any predefined structure, including text, images, videos,
and social media posts.
Each type requires different methods for analysis and storage, with unstructured data being
the most challenging to process.
4. Describe the advantages and disadvantages of using Big Data.
Answer:
Advantages:

 Enhanced decision-making: Big Data enables organizations to make informed, data-driven

decisions based on vast datasets.

 Improved efficiency: By analyzing large volumes of data, businesses can optimize operations
and reduce inefficiencies.

 Better customer insights: Big Data allows for the personalization of services and targeted
marketing.

 Competitive advantage: Organizations can uncover trends and predict future outcomes.

Disadvantages:

 Privacy and security concerns: The collection and analysis of personal data raise ethical
issues and data protection risks.

 Data quality issues: Ensuring the accuracy and consistency of Big Data is challenging, as it
often includes unstructured and heterogeneous data.

 Technical complexity: Big Data requires specialized infrastructure, tools, and expertise,
which can be costly and resource-intensive.

 Compliance challenges: Companies must adhere to regulations such as GDPR, which can be
complex and costly to implement.

5. What are the 6V’s of Big Data, and how do they provide a more holistic view of Big Data?
Answer: The 6V’s of Big Data expand upon the 3V framework by adding three more characteristics:

 Volume: The sheer amount of data generated every day.

 Velocity: The speed at which data is produced and needs to be processed.

 Variety: The different types of data (structured, semi-structured, unstructured).

 Veracity: The accuracy and trustworthiness of the data, ensuring its suitability for analysis.

 Value: The insights and benefits derived from analyzing the data.

 Variability: The inconsistencies or unpredictability in data flows, requiring systems to adapt.

Together, these dimensions highlight the complexity of managing and analyzing Big Data
effectively.

6. What is the role of Hadoop in Big Data processing?

Answer: Hadoop is a distributed file system and software framework used to store and process large
datasets across clusters of computers. It allows for the storage of vast amounts of data in a fault-
tolerant manner, making it scalable and efficient. Hadoop’s MapReduce programming model enables
parallel processing, allowing data to be analyzed in chunks across multiple nodes. This is especially
useful for handling Big Data, where traditional data processing tools are insufficient.

7. Explain the differences between structured, semi-structured, and unstructured data.

Answer:

 Structured Data is highly organized and easily searchable, typically stored in relational
databases with a fixed schema (e.g., customer names, addresses).
 Semi-structured Data does not have a fixed schema but contains tags or markers to separate
elements (e.g., XML, JSON).

 Unstructured Data lacks any predefined structure and is more difficult to process and
analyze (e.g., text, images, videos, audio files).
These differences impact how the data is stored, accessed, and analyzed.

8. What is Data Mining, and how does it apply to Big Data?

Answer: Data mining is the process of discovering patterns, trends, and relationships in large
datasets. In the context of Big Data, it involves applying machine learning algorithms, statistical
models, and data processing tools to extract meaningful insights. Data mining can identify customer
preferences, predict trends, and detect anomalies, all of which are valuable for business decision-
making.

9. How does Big Data Analytics work, and what are its key components?
Answer: Big Data Analytics involves collecting, storing, cleaning, processing, and analyzing large
datasets to uncover insights and trends. Key components include:

 Data collection: Gathering data from various sources (social media, sensors, transactions).

 Data storage: Using distributed storage solutions like Hadoop or cloud-based storage
systems.

 Data processing: Cleaning and organizing data to make it ready for analysis.

 Data analysis: Applying techniques such as machine learning, predictive analytics, and
statistical modeling to extract insights.

 Data visualization: Presenting the findings in a visual format, such as graphs and charts, to
help stakeholders make informed decisions.

10. What are some common types of Big Data Analytics?

Answer: The common types of Big Data Analytics include:

 Descriptive Analytics: Summarizes past data to identify patterns and trends.

 Diagnostic Analytics: Analyzes historical data to understand the causes behind specific
outcomes.

 Predictive Analytics: Uses historical data to forecast future trends or events.

 Prescriptive Analytics: Recommends actions based on data insights to achieve desired

outcomes.

11. What is Data Stream Mining, and how is it used?

Answer: Data Stream Mining refers to the real-time processing and analysis of continuous streams of
data. Unlike traditional data mining, which analyzes static datasets, stream mining analyzes data as it
arrives, without storing it completely. It is used in applications like monitoring social media feeds,
detecting fraud in financial transactions, or tracking sensor data in IoT devices.

12. What challenges are associated with the “Veracity” of Big Data?
Answer: Veracity in Big Data refers to the trustworthiness, quality, and accuracy of the data.
Challenges include:

 Data inconsistencies: Large volumes of data may contain errors, duplications, or missing
values, making it difficult to trust the insights derived from them.

 Data bias: Inaccurate or biased data sources can lead to misleading conclusions.
 Data cleaning issues: Ensuring that data is properly cleaned and formatted is a time-
consuming process, especially when dealing with unstructured data.

13. How can Big Data be used in the healthcare industry?

Answer: Big Data analytics in healthcare can be used to predict disease outbreaks, personalize
patient care, and improve medical research. For example, predictive analytics can help hospitals
forecast patient admissions and optimize resource allocation. By analyzing patient data, doctors can
offer personalized treatments, improving outcomes. Big Data can also help detect fraud and improve
drug development by identifying trends in clinical trial data.

14. What is the significance of “Cloud Computing” in Big Data Analytics?

Answer: Cloud computing provides scalable and cost-effective infrastructure for storing and
processing Big Data. With cloud services, organizations can access powerful computing resources on-
demand, without the need for large upfront investments in hardware. This allows businesses to
analyze vast datasets quickly and efficiently, while also enabling collaboration across multiple
locations. Additionally, cloud-based tools offer flexibility, security, and reliability for Big Data
analytics.

15. How do the “Volume” and “Velocity” aspects of Big Data affect the analysis process?
Answer:

 Volume affects the storage and processing of data, as larger datasets require specialized
infrastructure and tools like distributed file systems (e.g., Hadoop). Handling high volumes
also requires more computational power to process data efficiently.

 Velocity refers to the speed at which data is generated and needs to be processed. Real-time
or near-real-time analytics are required to make quick decisions based on up-to-date data,
such as monitoring social media feeds or tracking financial transactions.

16. What role do Machine Learning algorithms play in Big Data analytics?
Answer: Machine Learning algorithms are essential for analyzing large datasets by automatically
detecting patterns and making predictions. In Big Data analytics, they can be used for classification,
clustering, regression, and anomaly detection. These algorithms enable businesses to forecast
trends, personalize customer experiences, and detect fraud, among other tasks. They can also learn
from new data over time, improving accuracy and efficiency.

17. How does “Batch Processing” differ from “Stream Processing” in Big Data analytics?
Answer:

 Batch Processing involves collecting large amounts of data and processing them in blocks or
batches over time. This method is more suitable for less time-sensitive tasks like analyzing
historical data.

 Stream Processing, on the other hand, processes data in real-time or near-real-time as it is

generated. This is used in scenarios where immediate analysis is required, such as
monitoring live data streams from sensors or social media feeds.

18. What are the ethical concerns associated with Big Data?
Answer: Ethical concerns around Big Data include:

 Privacy issues: Collecting personal data raises concerns about how that data is used and
whether it is adequately protected.

 Data misuse: There is a risk of using data for purposes other than what it was intended for,
such as targeting vulnerable populations for marketing or surveillance.
 Bias and discrimination: Algorithms based on biased data may result in discriminatory
practices, such as denying certain groups access to services or opportunities.

19. What tools are commonly used in Big Data analytics, and how do they help?
Answer: Common tools include:

 Hadoop: A framework for storing and processing large datasets using a distributed file
system.

 Tableau: A data visualization tool that helps present Big Data insights through graphs and
charts.

 R and Python: Programming languages widely used for statistical analysis and machine
learning in Big Data.

 Spark: A data processing engine that supports real-time analytics and machine learning.
These tools enable businesses to handle, process, analyze, and visualize Big Data effectively.

20. What are the future trends in Big Data Analytics?

Answer: Future trends include:

 Real-time analytics: With the increasing speed of data generation, real-time analytics will
allow businesses to make immediate decisions based on live data.

 AI and Machine Learning integration: These technologies will continue to enhance

predictive analytics, enabling more accurate forecasts.

 Quantum computing: This promises to accelerate data processing and enable the analysis of
even larger datasets.

 Data privacy regulations: As Big Data usage grows, more stringent regulations will be
implemented to protect user privacy and ensure ethical data practices.

Competency-based Questions with answer

1. A retail company wants to optimize its inventory management using Big Data. How would you
use Big Data analytics to solve this problem?
Answer:
To optimize inventory management using Big Data, I would first gather data from various sources,
such as sales transactions, customer preferences, and historical purchase patterns. By
using predictive analytics, I could forecast future demand for each product, helping the company to
maintain the right inventory levels. I would also analyze seasonal trends and regional variations to
ensure the inventory aligns with customer behavior and market fluctuations.
Additionally, descriptive analytics can be applied to identify patterns in past sales, highlighting which
products are overstocked or understocked. Finally, real-time data from customer interactions or web
searches can be analyzed to adjust inventory dynamically.

2. A healthcare organization is looking to use Big Data to predict disease outbreaks in specific
regions. What approach would you take to apply Big Data analytics in this case?
Answer:
To predict disease outbreaks using Big Data, I would collect data from a variety of sources, including
hospital records, social media, online health searches, and environmental factors (e.g., weather
conditions). Predictive analytics would be employed to identify early warning signs and detect
patterns that may signal an impending outbreak. For example, by analyzing historical data on past
outbreaks, weather trends, and mobility data, I could predict the likelihood of an outbreak occurring
in specific regions. Machine learning models could be used to refine predictions over time,
continuously learning from new data and improving the accuracy of future forecasts.
Additionally, real-time data streams from sensors, health reports, and news feeds can help monitor
and respond to outbreaks as they occur.

3. You are working for an e-commerce company that wants to personalize its marketing strategies
based on customer data. How would you leverage Big Data analytics to achieve this?
Answer:
To personalize marketing strategies, I would first analyze customer behavior data, including past
purchases, browsing history, and demographic information. Using cluster analysis and segmentation
techniques, I could identify distinct customer groups with similar preferences or
behaviors. Predictive analytics would then be used to forecast future purchases, and personalized
recommendations would be made to each group based on their past activity and preferences.
Additionally, sentiment analysis on customer reviews and social media posts could provide valuable
insights into how customers feel about certain products, which can be integrated into marketing
campaigns. A/B testing could be used to fine-tune personalized offers and promotions for maximum
engagement.

4. An insurance company is dealing with fraudulent claims and wants to identify patterns in
fraudulent activities using Big Data. How would you approach this problem?
Answer:
To detect fraudulent claims, I would first collect data on past claims, customer profiles, and any
known fraudulent activities. Anomaly detection algorithms could be used to identify claims that
deviate from typical patterns, such as unusually high claim amounts or claims made in suspicious
circumstances. Using machine learning models, I could train the system to recognize characteristics
of fraudulent claims based on historical data, and continuously improve the model as more data
becomes available. I would also analyze social media and external databases to cross-check
customer information, helping to identify inconsistencies or patterns indicative of fraud. The use
of predictive modeling would help the insurance company proactively flag potentially fraudulent
claims before they are processed.

5. A financial institution is looking to manage risks and optimize its investment strategies using Big
Data. How would you apply Big Data analytics to improve the institution’s decision-making
process?
Answer:
To optimize investment strategies, I would first gather diverse financial data, including stock market
trends, historical performance, economic indicators, and sentiment analysis from news and social
media. Using predictive analytics, I would forecast potential market movements, helping the
institution anticipate risks and identify profitable investment opportunities. I would apply descriptive
analytics to analyze past investment performance, detecting patterns that indicate high returns or
risk factors. Real-time data analytics could be leveraged to track market fluctuations and adjust
strategies dynamically. Furthermore, by incorporating machine learning algorithms, I could
continuously refine investment models based on changing market conditions, providing more
accurate recommendations for optimizing investment portfolios and minimizing risks.

Assertion-Reasoning Based Questions

Which of the following is correct option?

a) Both A and R are true, and R is the correct explanation of A.
b) Both A and R are true, but R is not the correct explanation of A.
c) A is true, but R is false.
d) A is false, but R is true.

1. Assertion: Big Data refers to large, complex datasets that traditional data-processing applications
cannot handle.
Reasoning: Traditional data-processing systems are limited in their ability to manage and analyze Big
Data due to its size, complexity, and variety.
Answer: Both Assertion and Reasoning are correct, and Reasoning is the correct explanation of
Assertion.

2. Assertion: Hadoop is a key tool used for Big Data analytics.

Reasoning: Hadoop allows distributed storage and processing of large datasets across multiple
machines, making it scalable for Big Data applications.
Answer: Both Assertion and Reasoning are correct, and Reasoning is the correct explanation of
Assertion.

3. Assertion: Unstructured data is the easiest type of data to analyze in Big Data applications.
Reasoning: Unstructured data, such as text, images, and videos, does not follow a specific format or
structure, making it more challenging to process and analyze.
Answer: Assertion is incorrect, but Reasoning is correct.

4. Assertion: The “Volume” characteristic of Big Data refers to the amount of data generated.
Reasoning: As data volume increases, the amount of computational resources required for
processing also increases, which can lead to challenges in managing Big Data.
Answer: Both Assertion and Reasoning are correct, and Reasoning is the correct explanation of
Assertion.

5. Assertion: Real-time analytics allows businesses to make decisions based on historical data.
Reasoning: Real-time analytics processes data as it is generated, providing immediate insights that
allow businesses to make timely decisions.
Answer: Assertion is incorrect, but Reasoning is correct.

6. Assertion: The “Variety” of Big Data refers to the different formats of data, such as structured,
semi-structured, and unstructured.
Reasoning: The variety of data requires specialized tools and techniques for effective processing, as
each data type has different structures and storage needs.
Answer: Both Assertion and Reasoning are correct, and Reasoning is the correct explanation of
Assertion.

7. Assertion: Big Data analytics can only be applied to structured data.

Reasoning: Structured data is highly organized, which makes it easier to analyze compared to
unstructured or semi-structured data.
Answer: Assertion is incorrect, but Reasoning is correct.
8. Assertion: Veracity in Big Data refers to the speed at which data is generated and analyzed.
Reasoning: Veracity is concerned with the accuracy and quality of data, rather than its speed,
ensuring that data is reliable for analysis.
Answer: Assertion is incorrect, but Reasoning is correct.

9. Assertion: Predictive analytics is a type of Big Data analytics that uses historical data to forecast
future trends.
Reasoning: By analyzing past trends and behaviors, predictive analytics can anticipate future
outcomes, enabling businesses to plan accordingly.
Answer: Both Assertion and Reasoning are correct, and Reasoning is the correct explanation of
Assertion.

10. Assertion: The “Velocity” characteristic of Big Data is concerned with the variety of data sources.
Reasoning: Velocity refers to the speed at which data is generated and must be processed in real-
time or near-real-time, not the variety of data sources.
Answer: Assertion is incorrect, but Reasoning is correct.

Data Preparation Methods:

1. Cleaning:
This involves removing or correcting inconsistencies, errors, and missing data.
2. Validating:
This ensures data meets certain quality standards, such as format, range, and
accuracy.
3. Transforming:
This involves converting data from one format to another, such as standardizing
dates or converting text to numbers.
4. Enriching:
This adds new information to the existing data, such as geographic coordinates
or demographic data.

Privacy Accountability Matrix
100% (1)
Privacy Accountability Matrix
7 pages
Big Data and Data Analytics (Solutions of Book Exercise)
No ratings yet
Big Data and Data Analytics (Solutions of Book Exercise)
8 pages
Unit 01
No ratings yet
Unit 01
36 pages
Oracle Final Part 2 Answers
No ratings yet
Oracle Final Part 2 Answers
35 pages
Assignment-1 Bda Student
100% (1)
Assignment-1 Bda Student
2 pages
DWDM Unit 2
No ratings yet
DWDM Unit 2
21 pages
Kendriya Vidyalaya CS Pre Board 1 2023-24
No ratings yet
Kendriya Vidyalaya CS Pre Board 1 2023-24
11 pages
Big Data Analytics
No ratings yet
Big Data Analytics
20 pages
Cp5293 Big Data Analytics Question Bank
0% (1)
Cp5293 Big Data Analytics Question Bank
13 pages
Ds4015 Big Data Analytics QB
No ratings yet
Ds4015 Big Data Analytics QB
155 pages
Class 12 Introduction To Big Data Analytics
No ratings yet
Class 12 Introduction To Big Data Analytics
4 pages
1
No ratings yet
1
4 pages
Big Data Basic MCQ
No ratings yet
Big Data Basic MCQ
30 pages
Ccs334 BDA Important Questions
No ratings yet
Ccs334 BDA Important Questions
31 pages
BD MCQ
No ratings yet
BD MCQ
40 pages
CCS334 Question Bank Big Data
No ratings yet
CCS334 Question Bank Big Data
20 pages
Unit - 1
No ratings yet
Unit - 1
15 pages
Big Data Anlaytics: Unit 1 & 2 - Question Bank MCQ's
100% (1)
Big Data Anlaytics: Unit 1 & 2 - Question Bank MCQ's
4 pages
Introduction to Big Data Analytics
No ratings yet
Introduction to Big Data Analytics
30 pages
Big Data Study Material Part 1 (Unit I) - 1
No ratings yet
Big Data Study Material Part 1 (Unit I) - 1
38 pages
BDA Question Bank
No ratings yet
BDA Question Bank
17 pages
Qs Big Data and Data Warehousing
No ratings yet
Qs Big Data and Data Warehousing
45 pages
IT Fina Exam 2017
No ratings yet
IT Fina Exam 2017
3 pages
Question Bank
No ratings yet
Question Bank
62 pages
Big Data Course Overview
No ratings yet
Big Data Course Overview
46 pages
BDA Unit 1
No ratings yet
BDA Unit 1
10 pages
All Units CCS334 Big Data Analytics Q&A Unitwise
No ratings yet
All Units CCS334 Big Data Analytics Q&A Unitwise
20 pages
Practicalresearch1 q4 Mod7 Conclusionsandrecommendations Final
100% (1)
Practicalresearch1 q4 Mod7 Conclusionsandrecommendations Final
28 pages
Bda Mse
No ratings yet
Bda Mse
62 pages
IM Ch14 Big Data Analytics NoSQL Ed12
No ratings yet
IM Ch14 Big Data Analytics NoSQL Ed12
8 pages
CH 4 Part 2
No ratings yet
CH 4 Part 2
41 pages
M1 Q&a
No ratings yet
M1 Q&a
26 pages
BestPractices - 2 - Naming Conventions - Data Quality
No ratings yet
BestPractices - 2 - Naming Conventions - Data Quality
13 pages
Sybca Bigdata MCQ
No ratings yet
Sybca Bigdata MCQ
7 pages
BIG Data1
No ratings yet
BIG Data1
49 pages
CS8091 LN
No ratings yet
CS8091 LN
68 pages
Bda Q&a
No ratings yet
Bda Q&a
15 pages
Big Data Assignment 1 1
No ratings yet
Big Data Assignment 1 1
4 pages
BD 1
No ratings yet
BD 1
15 pages
Ak As2
No ratings yet
Ak As2
15 pages
Bda U1 Ans
No ratings yet
Bda U1 Ans
20 pages
Big Data Analytics in Power Systems
No ratings yet
Big Data Analytics in Power Systems
20 pages
cp5293 Big Data Analytics Question Bank
0% (1)
cp5293 Big Data Analytics Question Bank
13 pages
Module - 1 - Introduction To Big Data
No ratings yet
Module - 1 - Introduction To Big Data
11 pages
R II Bca IV Sem Unit 3 Balu Sir
No ratings yet
R II Bca IV Sem Unit 3 Balu Sir
14 pages
Big Data Bank
No ratings yet
Big Data Bank
24 pages
Big Data As A Popular Trend in The IT Industry
No ratings yet
Big Data As A Popular Trend in The IT Industry
7 pages
BDA Question Bank
No ratings yet
BDA Question Bank
20 pages
Ia1 - Bda
No ratings yet
Ia1 - Bda
12 pages
BIG Data 1
No ratings yet
BIG Data 1
10 pages
Bda Quiz QA
No ratings yet
Bda Quiz QA
7 pages
Alm 8 (Bda)
No ratings yet
Alm 8 (Bda)
5 pages
BA Intro Bigdata Analytics
No ratings yet
BA Intro Bigdata Analytics
24 pages
Unit 7 Chapter 21 Answer
No ratings yet
Unit 7 Chapter 21 Answer
2 pages
30 FSD Projects With Detailed Description of All Problem Statements
No ratings yet
30 FSD Projects With Detailed Description of All Problem Statements
30 pages
The Unix File System
No ratings yet
The Unix File System
4 pages
BDA Unit 1
No ratings yet
BDA Unit 1
50 pages
Build An Enquiry With A Nofile
No ratings yet
Build An Enquiry With A Nofile
2 pages
2403RES29 - Hemant Choudhary - CS546 - Assignment - 1
No ratings yet
2403RES29 - Hemant Choudhary - CS546 - Assignment - 1
14 pages
Eps 1
No ratings yet
Eps 1
19 pages
Salesforce Business Analyst Role
No ratings yet
Salesforce Business Analyst Role
1 page
Big Data Platforms and Analytics
No ratings yet
Big Data Platforms and Analytics
20 pages
Big Data: Definition, Analytics, and Technologies
No ratings yet
Big Data: Definition, Analytics, and Technologies
47 pages
Sample Ques Ns
No ratings yet
Sample Ques Ns
29 pages
IT6006-Data Analytics Department of CSE 2018-2019
No ratings yet
IT6006-Data Analytics Department of CSE 2018-2019
193 pages
Big Data Answers
No ratings yet
Big Data Answers
14 pages
Semester 2 Final Exam - Oracle Academy
100% (1)
Semester 2 Final Exam - Oracle Academy
23 pages
DWM Lab Manual 2025-26 Updated
No ratings yet
DWM Lab Manual 2025-26 Updated
47 pages
Big Data Interview Questions Answers
No ratings yet
Big Data Interview Questions Answers
14 pages
Hadoop BIG DATA Interview Questions You'll Most Likely Be Asked
No ratings yet
Hadoop BIG DATA Interview Questions You'll Most Likely Be Asked
20 pages
Module 16 Siebel Data Model
100% (2)
Module 16 Siebel Data Model
21 pages
Cache Memory, Virtual Memory and Auxiliary Memory Notes
No ratings yet
Cache Memory, Virtual Memory and Auxiliary Memory Notes
42 pages
BIG Data Analysis Assign - Final
No ratings yet
BIG Data Analysis Assign - Final
21 pages
Generative AI Strategy for _VOIS
No ratings yet
Generative AI Strategy for _VOIS
31 pages
Data Analytics with Python Course
No ratings yet
Data Analytics with Python Course
2 pages
Combined Quizes
No ratings yet
Combined Quizes
8 pages
MMSD A Multi Modal Dataset For Real Time
No ratings yet
MMSD A Multi Modal Dataset For Real Time
9 pages
Dem, DSM DTM
No ratings yet
Dem, DSM DTM
2 pages
Lost Spring - XII
No ratings yet
Lost Spring - XII
5 pages
Series
No ratings yet
Series
31 pages
Jewellery in India
No ratings yet
Jewellery in India
52 pages
Oracle 9i PL SQL
No ratings yet
Oracle 9i PL SQL
207 pages
Ch2 Self Management Skills Q - A
No ratings yet
Ch2 Self Management Skills Q - A
4 pages
Chemistry Investigatory Project 12
No ratings yet
Chemistry Investigatory Project 12
4 pages
Skripsi Dian Diana
No ratings yet
Skripsi Dian Diana
85 pages
Documentum Architecture White Paper
No ratings yet
Documentum Architecture White Paper
47 pages
Understanding - FM - Data 23
No ratings yet
Understanding - FM - Data 23
35 pages
Series Functions
No ratings yet
Series Functions
1 page
Series Loc Accessing
No ratings yet
Series Loc Accessing
1 page
Series Access With Iloc
No ratings yet
Series Access With Iloc
1 page
Linked List & Stack Operations
No ratings yet
Linked List & Stack Operations
12 pages
Servotech Sales Presentation RSB
No ratings yet
Servotech Sales Presentation RSB
9 pages
William Stallings Computer Organization and Architecture 8 Edition Operating System Support
No ratings yet
William Stallings Computer Organization and Architecture 8 Edition Operating System Support
66 pages
DWM - Comps End Sem5 Imps
No ratings yet
DWM - Comps End Sem5 Imps
3 pages
Teradata RDBMS: Architecture & Performance
No ratings yet
Teradata RDBMS: Architecture & Performance
3 pages
BI Tools for Data-Driven Businesses
No ratings yet
BI Tools for Data-Driven Businesses
2 pages
Research HUMMS
No ratings yet
Research HUMMS
48 pages

Ch5. Introduction To Big Data and Data Analytics

Uploaded by

Ch5. Introduction To Big Data and Data Analytics

Uploaded by

Ch 5 Introduction to Big Data and Data Analytics

1. What does “Volume” refer to in the context of Big Data?

2. Which of the following is a key characteristic of Big Data?

3. Which of the following is NOT one of the V’s of Big Data?

4. What is the primary purpose of data preprocessing in Big Data analytics?

7. Which of the following is a potential benefit of Big Data analytics?

8. What role does Hadoop play in Big Data processing?

11. Which of the following is an example of unstructured data?

12. What does “Velocity” refer to in the context of Big Data?

13. Which of the following is an example of semi-structured data?

15. Which of the following is a disadvantage of Big Data?

19. What type of data does Big Data typically include?

21. What does “Batch Processing” refer to in Big Data analytics?

22. What is “Data Stream Mining”?

23. What is the main characteristic of unstructured data?

Important Questions with answer

2. Explain the 3V’s of Big Data and their significance.

 Enhanced decision-making: Big Data enables organizations to make informed, data-driven

 Volume: The sheer amount of data generated every day.

 Velocity: The speed at which data is produced and needs to be processed.

 Variety: The different types of data (structured, semi-structured, unstructured).

 Variability: The inconsistencies or unpredictability in data flows, requiring systems to adapt.

6. What is the role of Hadoop in Big Data processing?

7. Explain the differences between structured, semi-structured, and unstructured data.

8. What is Data Mining, and how does it apply to Big Data?

10. What are some common types of Big Data Analytics?

 Descriptive Analytics: Summarizes past data to identify patterns and trends.

 Predictive Analytics: Uses historical data to forecast future trends or events.

 Prescriptive Analytics: Recommends actions based on data insights to achieve desired

11. What is Data Stream Mining, and how is it used?

13. How can Big Data be used in the healthcare industry?

14. What is the significance of “Cloud Computing” in Big Data Analytics?

 Stream Processing, on the other hand, processes data in real-time or near-real-time as it is

20. What are the future trends in Big Data Analytics?

 AI and Machine Learning integration: These technologies will continue to enhance

Competency-based Questions with answer

Assertion-Reasoning Based Questions

Which of the following is correct option?

2. Assertion: Hadoop is a key tool used for Big Data analytics.

7. Assertion: Big Data analytics can only be applied to structured data.

Data Preparation Methods:

You might also like