0% found this document useful (0 votes)

6 views18 pages

12 AI Unit 5 Introduction To Big Data and Data Analytics

Uploaded by

karthikeyanabishek8

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views18 pages

12 AI Unit 5 Introduction To Big Data and Data Analytics

Uploaded by

karthikeyanabishek8

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

ARTIFICIAL

INTELLIGENCE
CLASS XII

STUDENT HANDBOOK
2025-26

Subject Code:843
UNIT 5: Introduction to Big Data and Data Analytics
Title: Introduction to Big Data and Data Approach: Team discussion, Web search
Analytics

Summary: Students will delve into the world of Big Data, a game-changer in today's
digital age. Students gain insights into the various types of data and their unique
characteristics, equipping them to understand how this vast information is managed
and analysed. The journey continues as students discover the real-world applications
of Big Data and Data Analytics in diverse fields, witnessing how this revolutionary
concept is transforming how we approach data analysis to unlock new possibilities.

Learning Objectives:
1. Students will develop an understanding of the concept of Big Data and its
development in the new digital era.
2. Students will appreciate the role of big data in AI and Data Science.
3. Students will learn to understand the features of Big Data and how these
features are handled in Big Data Analytics.
4. Students will appreciate its applications in various fields and how this new
concept has evolved to bring new dimensions to Data Analysis.
5. Students will understand the term mining data streams.

Key Concepts:
1. Introduction to Big Data
2. Types of Big Data
3. Advantages and Disadvantages of Big Data
4. Characteristics of Big Data
5. Big Data Analytics
6. Working on Big Data Analytics
7. Mining Data Streams
8. Future of Big Data Analytics

Learning Outcomes:
Students will be able to –
1. Define Big Data and identify its various types.
2. Evaluate the advantages and disadvantages of Big Data.
3. Recognize the characteristics of Big Data.
4. Explain the concept of Big Data Analytics and its significance.
5. Describe how Big Data Analytics works.
6. Exploring the future trends and advancements in Big Data Analytics.

Prerequisites: Understanding the concept of data and reasonable fluency in the

English language.

86
5.1. What is Big Data?
To understand Big Data, let us first understand
small data.

Small data refers to datasets that are easily

comprehendible by people as they are easily
accessible, informative, and actionable, this makes
it ideal for individuals and businesses to find useful
information and make better choices in everyday
tasks. For example, a small store might track daily
sales to decide what products to restock.
Fig. 5.1 Sources of Big Data

Big Data refers to extremely large and complex datasets that regular computer programs
and databases cannot handle. It comes from three main sources: transactional data (e.g.,
online purchases), machine data (e.g., sensor readings), and social data (e.g., social media
posts). To analyze and use Big Data effectively, special tools and techniques are required.
These tools help organizations find valuable insights hidden in the data, which lead to
innovations and better decision-making. For example, companies like Amazon and Netflix
use Big Data to recommend products or shows based on users’ past activities.

5.2. Types of Big Data

Fig. 5.2

Semi-Structured
Aspect Structured Data Unstructured Data
Data
Quantitative data A mix of quantitative No inherent
Definition with a defined and qualitative structures or formal
structure properties rules
Dedicated data May lack a specific Lacks a consistent
Data Model
model data model data model

87
Semi-Structured
Aspect Structured Data Unstructured Data
Data
No organization
Organized in clearly Less organized than
Organization exhibits variability
defined columns structured data
over time
Accessibility depends
Easily accessible and Accessible but may
Accessibility on the specific data
searchable be harder to analyze
format
Examples Customer XML files, CSV files, Audio files, images,
information, JSON files, HTML video files, emails,
transaction records, files, PDFs, social media
product directories semi-structured posts
documents

5.3. Advantages and Disadvantages of Big Data:

Big Data is a key to modern innovation. It has changed how organizations analyze and use
information. While it offers great benefits, it also comes with challenges that affect its use
in different industries. In this section, we will be discussing a few pros and cons of big data.

Advantages:
● Enhanced Decision Making: Big Data analytics empowers organizations to make
data-driven decisions based on insights derived from large and diverse datasets.
● Improved Efficiency and Productivity: By analyzing vast amounts of data,
businesses can identify inefficiencies, streamline processes, and optimize resource
allocation, leading to increased efficiency and productivity.
● Better Customer Insights: Big Data enables organizations to gain a deeper
understanding of customer behavior, preferences, and needs, allowing for
personalized marketing strategies and improved customer experiences.
● Competitive Advantage: Leveraging Big Data analytics provides organizations with
a competitive edge by enabling them to uncover market trends, identify
opportunities, and stay ahead of competitors.
● Innovation and Growth: Big Data fosters innovation by facilitating the development
of new products, services, and business models based on insights derived from data
analysis, driving business growth and expansion.

88
Disadvantages:
● Privacy and Security Concerns: The collection, storage, and analysis of large
volumes of data raise significant privacy and security risks, including unauthorized
access, data breaches, and misuse of personal information.
● Data Quality Issues: Ensuring the accuracy, reliability, and completeness of data can
be challenging, as Big Data often consists of unstructured and heterogeneous data
sources, leading to potential errors and biases in analysis.
● Technical Complexity: Implementing and managing Big Data infrastructure and
analytics tools require specialized skills and expertise, leading to technical
challenges and resource constraints for organizations.
● Regulatory Compliance: Organizations face challenges in meeting data protection
laws like GDPR (General Data Protection Regulation) and The Digital Personal Data
Protection Act, 2023. These laws require strict handling of personal data, making
compliance essential to avoid legal risks and penalties.
● Cost and Resource Intensiveness: The cost of acquiring, storing, processing, and
analyzing Big Data, along with hiring skilled staff, can be high. This is especially
challenging for smaller organizations with limited budgets and resources.

Activity: Find the sources of big data using the link UNSTATS

5.4. Characteristics of Big Data

The “characteristics of Big Data” refer to the
defining attributes that distinguish large and
complex datasets from traditional data
sources. These characteristics are commonly
described using the "3Vs" framework:
Volume, Velocity, and Variety.
The 6Vs framework provides a holistic view of
Big Data, emphasizing not only its volume,
velocity, and variety but also its veracity,
variability, and value. Understanding and
addressing these six dimensions are essential
for effectively managing, analyzing, and
deriving value from Big Data in various
domains.
Fig. 5.3 Characteristics of Big Data

89
5.4.1. Velocity: Velocity refers to the speed at
which data is generated, delivered, and analyzed.
In the present world, where millions of people are
accessing and storing information online, the
speed at which the data gets stored or generated
is huge. For example: Google alone generates
more than 40,000 search queries per second. See
Fig. 5.4 Speed of data generation from various sources
the statistics in the picture provided. Isn’t it huge!

5.4.2. Volume: Every day a huge volume of data

is generated as the number of people using
online platforms has increased exponentially.
Such a huge volume of data is considered Big
Data. Typically, if the data volume exceeds
gigabytes, it falls into the realm of big data. This
volume can range from petabytes to terabytes or
even exabytes, based on surveys conducted by
various organizations. According to the latest
estimates, 328.77 million terabytes of data are Fig.5.5 Volume of data

created each day.

5.4.3. Variety: Big data encompasses data

in various formats, including structured,
unstructured, semi-structured, or highly
complex structured data. These can range
from simple numerical data to complex
and diverse forms such as text, images,
audio, videos, and so on. Storing and
processing unstructured data through
RDBMS is challenging. However,
unstructured data often provides valuable
insights that structured data cannot offer.
Additionally, the variety of data sources
within big data provides information on the
diversity of data.
Fig.5.6 Varieties in Big data

90
5.4.4. Veracity: Veracity is a characteristic in
Big Data related to consistency, accuracy,
quality, and trustworthiness. Not all data that
undergoes processing holds value.
Therefore, it is essential to clean data
effectively before storing or processing it,
especially when dealing with massive
volumes. Veracity addresses this aspect of
big data, focusing on the accuracy and
reliability of the data source and its suitability
Fig. 5.7
for analytical models.

5.4.5. Value: The goal of big data analysis lies in

extracting business value from the data. Hence, the
business value derived from big data is perhaps its
most critical characteristic. Without obtaining valuable
insights, the other characteristics of big data hold little
significance. So, in simple terms Value of Big Data
refers to the benefits the big data can provide.

Fig. 5.8 The value of Big Data

5.4.6. Variability: This refers to establishing if the

contextualizing structure of the data stream is regular and
dependable even in conditions of extreme unpredictability. It
defines the need to get meaningful data considering all possible
circumstances.

Fig. 5.9

Case Study: How a Company Uses 3V and 6V Frameworks for Big Data
Company: An OTT Platform ‘OnDemandDrama’
3V Framework:
Volume: OnDemandDrama processes huge amounts of data from millions of users, including watch
history, ratings, searches, and preferences to offer personalized content recommendations.
Velocity: Data is processed in real-time, allowing OnDemandDrama to immediately adjust
recommendations, track the patterns of the users, and offer trending content based on their
activity.
Variety: The platform handles diverse data such as user profiles, watch lists, video content, and
user reviews which are categorized as structured, semi-structured, and unstructured data.
6V Framework:
Along with the above 3 V of big data, the 6V Framework involves 3 more features of big data named
Veracity, Value, and Variability.

91
Veracity: OnDemandDrama filters out irrelevant or low-quality data (such as incomplete profiles) to
ensure accurate content recommendations.
Value: OnDemandDrama uses the data to personalize user experiences, driving engagement and
retention by recommending shows and movies that match individual tastes.
Variability: OnDemandDrama handles changes or inconsistencies in data streams caused by factors
like user behavior, trends, or any other external events. For example, user preferences can vary
based on region, time, or trends.
By using the 3V and 6V frameworks, OnDemandDrama can manage, process, and derive valuable
insights from its Big Data, which enhances customer satisfaction and drives business decisions.

5.5. Big Data Analytics

Data Analytics

Data analytics involves analyzing datasets to uncover

insights, trends, and patterns. It can be applied to datasets
of any size, from small to moderate volumes. Technologies
commonly used in data analytics include statistical
analysis software, data visualization tools, and relational
database management systems (RDBMS).

Big data analytics uses advanced analytic techniques against huge, diverse datasets that
include structured, semi-structured, and unstructured data, from different sources, and in
various sizes from terabytes to zettabytes.

Big-Data Analytics encompasses the

methodologies, tools, and practices involved in
analyzing and managing data, covering tasks
such as data collection, organization, and
storage. The primary objective of data analytics
is to utilize statistical analysis and technological
methods to uncover patterns and address
challenges. In the business realm, big data
analytics has gained significance as a means to
assess and refine business processes, as well as
enhance decision-making and overall business performance. It provides valuable insights
and forecasts that help businesses make informed decisions to improve their operations
and outcomes. Different types of Big Data Analytics can help businesses and organizations
find insights from large and complex datasets. Some of the common types are: Descriptive
analytics, Diagnostic analytics, Predictive analytics, and Prescriptive analytics, which we
have discussed in Unit 2 of Data Science Methodology.

92
Big Data Analytics emerges as a consequence of four significant global trends:
1. Moore’s Law: The exponential growth of computing power as per Moore's Law has
enabled the handling and analysis of massive datasets, driving the evolution of Big
Data Analytics.
2. Mobile Computing: With the widespread adoption of smartphones and mobile
devices, access to vast amounts of data is now at our fingertips, enabling real-time
connectivity and data collection from anywhere.
3. Social Networking: Platforms such as Facebook, Foursquare, and Pinterest facilitate
extensive networks of user-generated content, interactions, and data sharing,
leading to the generation of massive datasets ripe for analysis.
4. Cloud Computing: This paradigm shift in technology infrastructure allows
organizations to access hardware and software resources remotely via the Internet
on a pay-as-you-go basis, eliminating the need for extensive on-premises hardware
and software investments.

5.6. Working on Big Data Analytics

Big data analytics involves collecting, processing, cleaning, and analyzing enormous
datasets to improve organizational operations. The working process of big data analytics
includes the following steps –

Step 1. Gather data

Each company has a unique approach to data collection. Organizations can now collect
structured and unstructured data from various sources, including cloud storage, mobile apps,
and IoT sensors.

Step 2. Process Data

Once data is collected and stored, it must be processed properly to get accurate results on
analytical queries, especially when it’s large and unstructured. Various processing options
are available:
● Batch processing which looks at large data blocks over time.
● Stream processing looks at small batches of data at once, shortening the delay time
between collection and analysis for quicker decision-making.

Step 3. Clean Data

Scrubbing all data, regardless of size, improves quality and yields better results. Correct
formatting and elimination of duplicate or irrelevant data are essential. Erroneous and
missing data can lead to inaccurate insights.

Step 4. Analyze Data

Getting big data into a usable state takes time. Once it’s ready, advanced analytics processes
can turn big data into big insights.

Example: Data Analytics Tools – Tableau, APACHE Hadoop, Cassandra, MongoDB, SaS

93
Using Orange Data Mining for Big Data Analytics

We will explore how big data analysis can be performed using Orange Data Mining.

Step 1: Gather Data

1. Use the File widget to load data into Orange.
2. Load the desired dataset. For demonstration, we will use the built-in Heart Disease
dataset.

It is important to carefully study the dataset and understand the features and target
variable.

● Features: age, gender, chest pain, resting blood pressure (rest_spb), cholesterol,
resting ECG (rest_ecg), maximum heart rate (max_hr), etc.
● Target: diameter narrowing.

If the value for diameter narrowing is 1, it signifies significant narrowing of

the arteries, which is a risk factor for heart disease. If the value is 0, it
indicates healthier arteries with minimal or no narrowing.

94
Step 2: Process Data
Data processing involves preparing the data for accurate analysis. There are two methods:

1. Batch Processing: Use the Preprocess widget to normalize large chunks of

structured data at once.
2. Stream Processing (near-real-time): While Orange does not natively support live
stream data, you can incrementally process smaller subsets of the data in parallel
workflows.
Here, we will focus on the Normalization technique.
Normalization in data preprocessing refers to scaling numerical values to a specific range
(e.g., 0–1 or -1–1), making them comparable and improving the performance of machine
learning algorithms.

Step 2.1: Normalize Data

1. Connect the Preprocess widget to the File or Data Table widget.
2. Double-click on the Preprocess widget and select "Normalize Features".
3. Choose an interval, such as 0–1 or -1–1.

Step 2.2: Verify Normalized Data

1. Connect the Data Table widget to the Preprocess widget.
2. Open the Data Table to observe the differences in values.

You will see that all numerical values are now scaled between 0 and 1.

95
Step 3: Clean Data
Data cleaning is essential to ensure quality results. We will use the Impute widget to
handle missing values by replacing them with the mean, median, mode, or a custom
value. In this data we all can see that some values are missing in the figure below. This
missing value data set is being saved as heart data.xlsx in the computer folder.

Step 3.1: Upload Data

1. Use the File widget to upload a dataset with missing values.
2. Assign the role of "Target" to the feature you want to predict.

Step 3.2: Handle Missing Values

1. Connect the Impute widget to the File widget.
2. Double-click the Impute widget and select an imputation strategy:
Average (mean), Most frequent (mode), Fixed value, Random value

96
Step 3.3: Verify Cleaned Data
1. Connect the Data Table widget to the Impute widget.
2. Open the Data Table to confirm the missing values have been replaced.

Missing values are now filled with the chosen method (e.g., average values).

Step 4: Analyze Data

After cleaning, Orange provides various advanced analytics tools to extract insights:

● K-Means: For segmenting data into clusters.

● Logistic Regression / Decision Tree: For predicting outcomes using labeled data.
● Scatter Plot / Box Plot / Heat Map: For visualizing data patterns and relationships.
Step 4.1: Build a Logistic Regression Model
1. Drag and drop the Logistic Regression widget.
2. Connect it to the cleaned and normalized data.

Step 4.2: Test the Model

1. Add the Test and Score widget.
2. Connect the Test and Score widget to:
a. The Logistic Regression widget (learner data)
b. The processed data.

97
Step 4.3: Choose a Validation Method
1. Double-click the Test and Score widget. Select a validation method (e.g., Cross-
Validation).

Step 4.4: Generate Predictions

Connect the Predict widget to the Test and Score widget.

Check the predictions generated using the Logistic Regression model.

5.7. Mining Data Streams

To understand mining data streams, we first understand what data stream is. A data
stream is a continuous, real-time flow of data generated by various sources. These sources
can include sensors, satellite image data, Internet and web traffic, etc.
Mining data streams refers to the process of extracting meaningful patterns, trends,
and knowledge from a continuous flow of real-time data. Unlike traditional data mining, it
processes data as it arrives, without storing it completely. An example of an area where data
98
stream mining can be applied is website data. Websites typically receive continuous
streams of data daily. For instance, a sudden spike in searches for "election results" on a
particular day might indicate that elections were recently held in a region or highlight the
level of public interest in the results.

5.8. Future of Big Data Analytics

The future of Big Data Analytics is highly influenced by several key technological
advancements that will shape the way data is processed and analyzed. A few of them are:

Real-Time Analytics: It will allow businesses to process data instantaneously,

providing immediate insights for decision-making and enabling actions based on live data,
such as monitoring customer behavior or tracking supply chain activities.

Development of Advanced Models in Predictive Analytics: Predictive analytics will

evolve with the integration of more sophisticated machine learning and AI algorithms,
enabling organizations to forecast trends and behaviors with greater precision.

Quantum Computing: Quantum computing promises to revolutionize Big Data

analytics by offering unprecedented processing power. Quantum computers will be able to
solve complex problems much faster than classical computers.

-----------------------------------------------------------------------------------------------------

Activity 1: Note – This is a research-based group activity

i) Watch this video using the link https://www.youtube.com/watch?v=37x5dKW-X5U
ii) Form a group, explore the applications of Big Data & Data Analytics in the following
fields, and fill in the table given below:

Insights are drawn about this

Field Video resource field and its futuristic
development

Education

Environmental Science

Media and
Entertainment

99
Activity-2
List the steps involved in the working process of Big Data analytics.
Step 1:

Step 2:

Step 3:

Step 4:

EXERCISES

A. Multiple Choice questions

1. What does "Volume" refer to in the context of big data?
a) The variety of data types b) The speed at which data is generated
c) The amount of data generated d) The veracity of the data

2. Which of the following is a key characteristic of big data?

a) Structured format b) Easily manageable size
c) Predictable patterns d) Variety

3. Which of the following is NOT one of the V's of big data?

a) Velocity b) Volume c) Verification d) Variety

4. What is the primary purpose of data preprocessing in big data analytics?

a) To increase data volume b) To reduce data variety
c) To improve data quality d) To speed up data processing

5. Which technique is commonly used for analyzing large datasets to discover patterns
and relationships?
a) Linear regression b) Data mining c) Decision trees d) Naive Bayes

6. Which term describes the process of extracting useful information from large
datasets?
a) Data analytics b) Data warehousing c) Data integration d) Data virtualization

7. Which of the following is a potential benefit of big data analytics?

a) Decreased data security b) Reduced operational efficiency
c) Improved decision-making d) Reduced data privacy

100
8. What role does Hadoop play in big data processing?
a) Hadoop is a programming language used for big data analytics.
b) Hadoop is a distributed file system for storing and processing big data.
c) Hadoop is a data visualization tool.
d) Hadoop is a NoSQL database management system.

9. What is the primary challenge associated with the veracity aspect of big data?
a) Handling large volumes of data
b) Ensuring data quality and reliability
c) Dealing with diverse data types
d) Managing data processing speed

B. True or False

1. Big data refers to datasets that are too large to be processed by traditional
database systems.
2. Structured data is the primary type of data processed in big data analytics, making
up the majority of datasets.
3. Veracity refers to the trustworthiness and reliability of data in big data analytics
4. Real-time analytics involves processing and analyzing data as it is generated, without
any delay.
5. Cloud computing is the only concept used in Big Data Analytics.
6. A CSV file is an example of structured data.
7. “Positive, Negative, and Neutral” are terms related to Sentiment Analysis.
8. Data preprocessing is a critical step in big data analytics, involving cleaning,
transforming, and aggregating data to prepare it for analysis.
9. To analyze vast collections of textual materials to capture key concepts, trends, and
hidden relationships, the concept of Text mining is used.

C. Short answer questions

1. Define the term Big Data.
2. What does the term Volume refer to in Big Data?
3. Mention some important benefits of big data in the health sector.
4. Enlist the four types of Big Data Analytics.

101
D. Long answer questions
1. Explain the 6 V’s related to Big data.
2. Explain the differences between structured, semi-structured, and unstructured data.
3. Explain the process of Big Data Analytics.
4. Why is Big Data Analytics important in modern industries and decision-making
processes?
5. A healthcare company is using Big Data analytics to manage patient records, predict
disease outbreaks, and personalize treatments. However, the company is facing
challenges regarding data privacy, as patient information is highly sensitive. What
are the potential risks to patient privacy when using Big Data in healthcare, and how
can these be mitigated?
6. Given the following list of data types, categorize each as Structured, Unstructured,
or Semi-Structured:
a) A customer database with fields such as Name, Address, Phone Number, and
Email.
b) A JSON file containing product information with attributes like name, price, and
specifications.
c) Audio recordings of customer service calls.
d) A sales report in Excel format with rows and columns.
e) A collection of social media posts, including text, images, and hashtags.
f) A CSV file with daily temperature readings for the past year.

E. Competency Based Questions:

1. A retail clothing store is experiencing a decline in sales despite strong marketing
campaigns. You are tasked with using big data analytics to identify the root cause.
a. What types of customer data can be analyzed?
b. How can big data analytics be used to identify buying trends and customer
preferences?
c. Can you recommend specific data visualization techniques to present insights
to stakeholders?
d. How might these insights be used to personalize customer experiences and
improve sales?
2. A research institute is conducting a study on public sentiment towards environmental
conservation efforts. They aim to gather insights from various data sources to
understand public opinions and perceptions. They collect data from diverse sources
such as news articles, online forums, blog posts, and social media comments. Which
type of data does this description represent?

102

Akash Decap456 Introduction To Big Data
No ratings yet
Akash Decap456 Introduction To Big Data
297 pages
Extracted Note For Big Data - 070659
No ratings yet
Extracted Note For Big Data - 070659
79 pages
Big Data Analytics - CCS334 - Notes - ALL UNITS NOTES
No ratings yet
Big Data Analytics - CCS334 - Notes - ALL UNITS NOTES
130 pages
Unit-01 Bda
No ratings yet
Unit-01 Bda
25 pages
05 Oalibj - 2022021516353500
No ratings yet
05 Oalibj - 2022021516353500
12 pages
BDA Notes
No ratings yet
BDA Notes
35 pages
Big Data
No ratings yet
Big Data
28 pages
Partiunit5introduction To Big Data Its Type and Advantagedisadvantages
No ratings yet
Partiunit5introduction To Big Data Its Type and Advantagedisadvantages
4 pages
Big Data Analytics Notes
No ratings yet
Big Data Analytics Notes
115 pages
Bda Unit 1
No ratings yet
Bda Unit 1
20 pages
Unit 1 Big Data Notes
No ratings yet
Unit 1 Big Data Notes
40 pages
Fundamentals of Big Data Analytics
No ratings yet
Fundamentals of Big Data Analytics
151 pages
Unit - 1
No ratings yet
Unit - 1
104 pages
BDA Unit 1 Notes
No ratings yet
BDA Unit 1 Notes
34 pages
Emerging Big Data and Cloud Computing
No ratings yet
Emerging Big Data and Cloud Computing
15 pages
Unit 1 BDA
No ratings yet
Unit 1 BDA
38 pages
The Influence of Big Data Analytics in The Industry
No ratings yet
The Influence of Big Data Analytics in The Industry
15 pages
Big-Data-Unit 1
No ratings yet
Big-Data-Unit 1
23 pages
BigData UNIT-1
No ratings yet
BigData UNIT-1
19 pages
Ccs334 Big Data Analytics
No ratings yet
Ccs334 Big Data Analytics
49 pages
Ccs334 Big Data Analytics
No ratings yet
Ccs334 Big Data Analytics
69 pages
BD 1
No ratings yet
BD 1
15 pages
Internship Report 2023-24 Data Science
100% (2)
Internship Report 2023-24 Data Science
23 pages
BDA Unit 1
No ratings yet
BDA Unit 1
22 pages
BDA Unit 1 Notes-1
No ratings yet
BDA Unit 1 Notes-1
34 pages
Big Data Chapter 1
No ratings yet
Big Data Chapter 1
7 pages
Unit 1
No ratings yet
Unit 1
56 pages
BDA Answerbank
No ratings yet
BDA Answerbank
71 pages
Big Data & Hadoop Essentials
No ratings yet
Big Data & Hadoop Essentials
22 pages
BDA Unit 1 Notes
No ratings yet
BDA Unit 1 Notes
6 pages
Big Data Analytics - Unit 1
No ratings yet
Big Data Analytics - Unit 1
29 pages
Unit 1 Notes Bda
No ratings yet
Unit 1 Notes Bda
20 pages
Course Material
100% (1)
Course Material
57 pages
BDAchap 1
No ratings yet
BDAchap 1
15 pages
Introduction to Big Data Analytics
No ratings yet
Introduction to Big Data Analytics
30 pages
Module I Big Data
No ratings yet
Module I Big Data
7 pages
IMTC634 - Data Science - Chapter 11
No ratings yet
IMTC634 - Data Science - Chapter 11
22 pages
Chapter 1
No ratings yet
Chapter 1
21 pages
Big Data Analytics - CCS334 - Notes - Unit 1 - Understanding Big Data
No ratings yet
Big Data Analytics - CCS334 - Notes - Unit 1 - Understanding Big Data
40 pages
CS 329 Lecture One 2025
No ratings yet
CS 329 Lecture One 2025
28 pages
Introduction to Big Data Concepts
100% (2)
Introduction to Big Data Concepts
33 pages
Unit 1 (Chapter 1) - Introduction
No ratings yet
Unit 1 (Chapter 1) - Introduction
10 pages
Module 1
No ratings yet
Module 1
14 pages
Big Data: Concepts and Applications
No ratings yet
Big Data: Concepts and Applications
5 pages
Attachment
No ratings yet
Attachment
10 pages
Big Data Analytics
No ratings yet
Big Data Analytics
23 pages
Bda U1
No ratings yet
Bda U1
78 pages
Big Data Notes UNIT-1
No ratings yet
Big Data Notes UNIT-1
14 pages
Big Data and Data Analytics
No ratings yet
Big Data and Data Analytics
6 pages
Introduction To Big Data - Report 1
No ratings yet
Introduction To Big Data - Report 1
5 pages
Business Intelligence Analytics and Data Science Managerial Perspective 4th Edition Sharda HQ File Fast Access
No ratings yet
Business Intelligence Analytics and Data Science Managerial Perspective 4th Edition Sharda HQ File Fast Access
324 pages
Bda Aiml Note Unit 1
No ratings yet
Bda Aiml Note Unit 1
14 pages
Introduction to Big Data Concepts
No ratings yet
Introduction to Big Data Concepts
23 pages
BIG Data1
No ratings yet
BIG Data1
49 pages
Attendance Term-V 2021 FINAL
No ratings yet
Attendance Term-V 2021 FINAL
197 pages
Auditing A Practical Approach With Data Analytics 1st Edition Raymond N Johnson Laura Davis Wiley Robyn Moroney Fiona Campbell Jane Hamilton
No ratings yet
Auditing A Practical Approach With Data Analytics 1st Edition Raymond N Johnson Laura Davis Wiley Robyn Moroney Fiona Campbell Jane Hamilton
349 pages
Unit 2
No ratings yet
Unit 2
35 pages
BDA Unit 1
No ratings yet
BDA Unit 1
10 pages
Unit 1 - ETI (BDA)
No ratings yet
Unit 1 - ETI (BDA)
20 pages
1.big Data and Its Importance
No ratings yet
1.big Data and Its Importance
17 pages
Unit - 1 Bda
No ratings yet
Unit - 1 Bda
14 pages
UNIT Two Emerging Technology
No ratings yet
UNIT Two Emerging Technology
43 pages
James Serra Data & AI Architect Microsoft, NYC MTC
No ratings yet
James Serra Data & AI Architect Microsoft, NYC MTC
93 pages
Current Analytical Architecture
No ratings yet
Current Analytical Architecture
6 pages
Data Science in Healthcare
No ratings yet
Data Science in Healthcare
5 pages
Lecture 1
No ratings yet
Lecture 1
58 pages
Ai in Digital Marketing
No ratings yet
Ai in Digital Marketing
15 pages
The Customer Experience Management Maturity Model: Key Takeaways Why Read This Report
No ratings yet
The Customer Experience Management Maturity Model: Key Takeaways Why Read This Report
27 pages
BDA Unit 1
No ratings yet
BDA Unit 1
17 pages
BUSINESS APPLICATIONS OF EMERGING TECHNOLOGIES II III IV V - Venki
No ratings yet
BUSINESS APPLICATIONS OF EMERGING TECHNOLOGIES II III IV V - Venki
36 pages
PDF-files-websiteR19 UG-PG FINAL SYLLABUSMBAMBA
No ratings yet
PDF-files-websiteR19 UG-PG FINAL SYLLABUSMBAMBA
126 pages
Data Handling and Decision Making: Date For Submission: Please Refer To The Timetable On Ilearn
0% (1)
Data Handling and Decision Making: Date For Submission: Please Refer To The Timetable On Ilearn
7 pages
Review of Literature
No ratings yet
Review of Literature
19 pages
Companypresentation
No ratings yet
Companypresentation
25 pages
MBA Dissertation
No ratings yet
MBA Dissertation
46 pages
Customer Use-Cases and Scenarios For Integration: Public
No ratings yet
Customer Use-Cases and Scenarios For Integration: Public
20 pages
Business Analytics & AI for Professionals
No ratings yet
Business Analytics & AI for Professionals
31 pages
1.1 Module-1
No ratings yet
1.1 Module-1
31 pages
R18B Tech MinorIIIYearIISemesterSyllabus
No ratings yet
R18B Tech MinorIIIYearIISemesterSyllabus
13 pages
SudheerRJarugu (22 0)
No ratings yet
SudheerRJarugu (22 0)
9 pages
Cortex Ebook Xsiam 111224
No ratings yet
Cortex Ebook Xsiam 111224
15 pages
Build a SOC on a Budget
No ratings yet
Build a SOC on a Budget
16 pages
Table of Contents
No ratings yet
Table of Contents
11 pages
BSCM - BI-01 2425 Portfolio Assignment
No ratings yet
BSCM - BI-01 2425 Portfolio Assignment
5 pages
FINS3648 Quiz Bank
No ratings yet
FINS3648 Quiz Bank
13 pages
Ai (X) Practice Paper 5
No ratings yet
Ai (X) Practice Paper 5
5 pages
Aspiring Data Scientist Resume
No ratings yet
Aspiring Data Scientist Resume
1 page
Data Analytics Advanced With Python, Numpy and
No ratings yet
Data Analytics Advanced With Python, Numpy and
6 pages
Rogers MDS-Alfred Tang
No ratings yet
Rogers MDS-Alfred Tang
1 page

12 AI Unit 5 Introduction To Big Data and Data Analytics

Uploaded by

12 AI Unit 5 Introduction To Big Data and Data Analytics

Uploaded by

ARTIFICIAL

Prerequisites: Understanding the concept of data and reasonable fluency in the

Small data refers to datasets that are easily

5.2. Types of Big Data

5.3. Advantages and Disadvantages of Big Data:

5.4. Characteristics of Big Data

5.4.2. Volume: Every day a huge volume of data

created each day.

5.4.3. Variety: Big data encompasses data

5.4.5. Value: The goal of big data analysis lies in

Fig. 5.8 The value of Big Data

5.4.6. Variability: This refers to establishing if the

5.5. Big Data Analytics

Data analytics involves analyzing datasets to uncover

Big-Data Analytics encompasses the

5.6. Working on Big Data Analytics

Step 1. Gather data

Step 2. Process Data

Step 3. Clean Data

Step 4. Analyze Data

Step 1: Gather Data

If the value for diameter narrowing is 1, it signifies significant narrowing of

1. Batch Processing: Use the Preprocess widget to normalize large chunks of

Step 2.1: Normalize Data

Step 2.2: Verify Normalized Data

Step 3.1: Upload Data

Step 3.2: Handle Missing Values

Step 4: Analyze Data

● K-Means: For segmenting data into clusters.

Step 4.2: Test the Model

Step 4.4: Generate Predictions

Check the predictions generated using the Logistic Regression model.

5.7. Mining Data Streams

5.8. Future of Big Data Analytics

Real-Time Analytics: It will allow businesses to process data instantaneously,

Development of Advanced Models in Predictive Analytics: Predictive analytics will

Quantum Computing: Quantum computing promises to revolutionize Big Data

Activity 1: Note – This is a research-based group activity

Insights are drawn about this

A. Multiple Choice questions

2. Which of the following is a key characteristic of big data?

3. Which of the following is NOT one of the V's of big data?

4. What is the primary purpose of data preprocessing in big data analytics?

7. Which of the following is a potential benefit of big data analytics?

C. Short answer questions

E. Competency Based Questions:

You might also like