0% found this document useful (0 votes)

11 views8 pages

Deep Learning U6

The document discusses the significance of AI-powered Big Data Analytics in decision-making, highlighting enhanced predictive analytics, real-time decision-making, and improved accuracy. It also compares Apache Cassandra to traditional relational databases, emphasizing its scalability, fault tolerance, and flexible data model. Additionally, it explains Apache Spark's architecture and key components, defines dark data, and outlines streaming and real-time analytics.

Uploaded by

Robert Stark

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views8 pages

Deep Learning U6

Uploaded by

Robert Stark

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

Q7) a) Significance of the Rise of AI-powered Big Data

Analytics in Data-driven Decision-making sBusiness

Intelligence [6 Marks]
The integration of AI with Big Data Analytics has transformed how
businesses make decisions Cderive insights. Its significance includes:
1. Enhanced Predictive Analytics: AI algorithms can analyze
historical data to forecast trends, consumer behavior, Cmarket
dynamics, aiding proactive decision-making.
2. Real-time Decision Making:AI enables rapid processing of
massive data streams in real-time, crucial for time-sensitive
sectors like finance, healthcare, Ce-commerce.
3. Improved Accuracy sEfficiency:AI minimizes human error
Cautomates data processing, providing accurate insights faster
than traditional methods.
4. Personalization sCustomer Insights: AI analyzes
customer data to deliver personalized recommendations
Ctargeted marketing, boosting customer engagement
Cloyalty.
5. Cost Optimization: By identifying inefficiencies Coptimizing
operations, AI-powered analytics help reduce costs Callocate
resources more effectively.
6. Competitive Advantage: Organizations using AI in analytics
gain deeper insights than competitors relying solely on
conventional analytics, enabling better strategic planning.
b) Advantages of Cassandra over Traditional Relational
Databases [6 Marks] Apache Cassandra is a NoSQL distributed
database designed for handling large volumes of data across many
servers. Its advantages over traditional RDBMS
include:
1. High Scalability: Cassandra offers horizontal scalability,
allowing seamless addition of nodes without downtime,
unlike RDBMS that scale vertically Ccan be limited.
2. Fault Tolerance sHigh Availability: Data is
automatically replicated across multiple nodes, ensuring no
single point of failure Ccontinuous availability.
3. Decentralized Architecture: All nodes in Cassandra
are avoiding bottlenecks typical of master-slave relational
databases.
4. Write sRead Performance: Optimized for high-speed writes
Ccan handle massive write loads with low latency—ideal for IoT,
logs, Csensor data.
5. Flexible Data Model: Supports dynamic schema changes
Ccomplex data types without downtime, whereas RDBMS require
rigid schemas.
6. Big Data Integration: Easily integrates with big data tools
like Hadoop, Spark, CKafka, making it well-suited for modern
data pipelines Canalytics.
c) What is Apache Spark? Explain the main
components of Spark architecture. [5]
Apache Spark is an open-source, distributed computing
system designed for fast processing of large-scale data. It supports
in-memory computing, making it significantly faster than traditional
big data tools like Hadoop MapReduce. Spark is widely used for data
analytics, machine learning, stream processing, Cgraph
computation.
Main Components of Spark Architecture:
1. Driver Program:
o Acts as the main controller of the Spark application.

o It converts user code into tasks, schedules them,

Cmanages their execution on the cluster.

2. Cluster Manager:
o Responsible for managing the cluster resources.

o Spark can work with various cluster managers like:

 Standalone

 Kubernetes

o It allocates resources to different applications.

3. Executors:
o Worker processes that run individual tasks Cstore data

for processing (in-memory or on disk).

o Each Spark application has its own set of executors that

are launched once Crun for the entire lifetime of the

application.
4. Tasks:
o The smallest unit of work in Spark.

o Each task is assigned by the driver to executors Cis part of a job

stage.
5. RDDs (Resilient Distributed Datasets):
o The core data structure in Spark.

o Immutable, distributed collections of objects that can be

processed in parallel.
Q 8 a) What is Dark Data? Explain the Different Types of Dark
Data [6 Marks] Definition of Dark Data:
Dark Data refers to the data that organizations collect, process,
Cstore during regular business activities but do not use for
analysis, decision-making, or
business intelligence. This data remains "in the dark" due to lack of
awareness, tools, or perceived value.
Types of Dark Data:
1. Log Files: System, application, Csecurity logs that are
stored but often ignored unless issues arise.
2. Email sCommunication Archives: Emails, chat logs, Ccall
recordings stored for compliance but not analyzed for insights.
3. Sensor sMachine Data: IoT devices, manufacturing
equipment,
Cnetwork hardware generate data often stored without further
processing.
4. Customer Support Records:Past service tickets, chat
transcripts, Ccall recordings that could provide insights but are
rarely analyzed.
5. Social Media Data: Unused interactions, likes,
comments, Cshares collected via social media monitoring
tools.
6. Document Repositories:Reports, spreadsheets, CPDFs
stored in file systems or SharePoint without metadata tagging
or indexing.
b) Explain the Following Terms: i)
Streaming Analytics [6 Marks] Streaming Analytics refers to
the real-time processing Canalysis of continuous data streams. It
involves extracting insights from data as it is generated, without
storing it first.
Key Features:
 Processes data in motion (e.g., sensor data, clickstreams,

financial transactions).
 Supports real-time alerting, monitoring, Cdecision-making.

 Common tools: Apache Kafka, Apache Flink, Apache Storm, Spark

Streaming.
Applications:
 Fraud detection in banking.

 Real-time recommendation engines.

 Network monitoring Ccybersecurity.

 Predictive maintenance in manufacturing.

ii) Real-time Analytics [6 Marks]

Real-time Analytics involves the immediate processing
sanalysis of data as it arrives, allowing users to gain insights
Cmake decisions instantly.
Key Features:
 Uses live data or data with minimal latency.
 Provides up-to-date dashboards, KPIs, or alerts.
 May include both streaming Cfast batch processing techniques.

Applications:
 Real-time traffic navigation systems.

 Stock market monitoring Ctrading.

 Healthcare monitoring (e.g., patient vitals).

 Live customer behavior tracking on e-commerce platforms.

c) What is Apache Cassandra? Explain Its Key
Features [6 Marks] What is Apache Cassandra
Apache Cassandra is an open-source, distributed NoSQL
database designed to handle large volumes of data across
multiple servers with high availability, fault tolerance, Cno
single point of failure. It is best suited for applications that require
scalability, performance, C24/7 uptime.
Originally developed by Facebook Clater open-sourced, Cassandra is
now part of
the Apache Software
Foundation. Key Features of
Apache Cassandra:
1. High Availability sFault Tolerance:
o Cassandra ensures data is always accessible by replicating

it across multiple nodes Cdata centers. If one node fails,

others continue serving data.
2. Scalability (Horizontal Scaling):
o Easily scales out by adding more nodes without downtime.

Performance increases linearly with the addition of hardware.

3. Decentralized / Peer-to-Peer Architecture:
o All nodes in the cluster are equal—there is no master

node. This eliminates bottlenecks Csingle points of

failure.
4. High Write Performance:
o Optimized for high-speed writes, making it suitable for write-

intensive applications like IoT, messaging apps, Creal-time

analytics.
5. Flexible Schema (Schema-less):
o Allows dynamic changes to the data model without

affecting existing applications. Ideal for evolving

applications Cvariable data formats.
6. Tunable Consistency:
o Developers can choose between strong consistency

Ceventual consistency, based on application needs.

7. Support for Distributed Replication:
o Supports replication across multiple geographic locations,

improving data locality Cdisaster recovery.

8. CQL (Cassandra Query Language):
o Uses a SQL-like language (CQL) for querying, making it
easier for developers with RDBMS backgrounds to
adapt.
Use Cases:
 Real-time analytics * Internet of Things (IoT)
 Social media platforms * Messaging apps
 Recommendation systems

Method Statement of Survey Work PDF
100% (1)
Method Statement of Survey Work PDF
49 pages
Sem Bda Quest
No ratings yet
Sem Bda Quest
12 pages
Business Intelligence and Analytics: Systems For Decision Support, 10e (Sharda) Chapter 13 Big Data and Analytics
No ratings yet
Business Intelligence and Analytics: Systems For Decision Support, 10e (Sharda) Chapter 13 Big Data and Analytics
13 pages
Module 1 Notes
No ratings yet
Module 1 Notes
12 pages
Bda 2M
No ratings yet
Bda 2M
10 pages
Course Code: CCS334 Course Name: Big Data Analytics Regulation: 2021 Year/Sem: Iii / Vi Faculty Incharge
No ratings yet
Course Code: CCS334 Course Name: Big Data Analytics Regulation: 2021 Year/Sem: Iii / Vi Faculty Incharge
12 pages
2REVIEW Merged
No ratings yet
2REVIEW Merged
309 pages
Question Bank BDA CIA 1
No ratings yet
Question Bank BDA CIA 1
5 pages
BD Question Bank
No ratings yet
BD Question Bank
56 pages
Bda Summer 2024 Solution
No ratings yet
Bda Summer 2024 Solution
26 pages
Final Exam Page 2 of 4 PDF
No ratings yet
Final Exam Page 2 of 4 PDF
3 pages
Cassandra Notes
No ratings yet
Cassandra Notes
50 pages
Bda Part A
No ratings yet
Bda Part A
25 pages
Cassandra Notes
No ratings yet
Cassandra Notes
45 pages
BDA Question Bank
No ratings yet
BDA Question Bank
17 pages
Bda 23
No ratings yet
Bda 23
12 pages
Business Intelligence Essentials
No ratings yet
Business Intelligence Essentials
9 pages
BDA Module-1
No ratings yet
BDA Module-1
9 pages
BDA Question Bank
No ratings yet
BDA Question Bank
33 pages
BDA With Answer-1
No ratings yet
BDA With Answer-1
18 pages
Ite06 Big Data Analytics-Qbank
No ratings yet
Ite06 Big Data Analytics-Qbank
18 pages
Big Data
No ratings yet
Big Data
27 pages
Unit 1 Big Data
No ratings yet
Unit 1 Big Data
15 pages
Bda Question Bank
No ratings yet
Bda Question Bank
10 pages
Bda Question Bank
No ratings yet
Bda Question Bank
10 pages
Big Data
100% (1)
Big Data
190 pages
Big Data Analytics Question Bank
No ratings yet
Big Data Analytics Question Bank
12 pages
CCS334 - Bda - QB - Sec A
No ratings yet
CCS334 - Bda - QB - Sec A
12 pages
U I Q-A
No ratings yet
U I Q-A
7 pages
Big Data Unit 1 Notes
No ratings yet
Big Data Unit 1 Notes
20 pages
Ak As2
No ratings yet
Ak As2
15 pages
Big Data Analytics Evaluation Scheme
No ratings yet
Big Data Analytics Evaluation Scheme
7 pages
BDA I Unit
No ratings yet
BDA I Unit
44 pages
Bda QB
No ratings yet
Bda QB
12 pages
Introduction To NoSQL
No ratings yet
Introduction To NoSQL
5 pages
Big Data Analytics (Unit-II)
No ratings yet
Big Data Analytics (Unit-II)
17 pages
Bda Ak
No ratings yet
Bda Ak
107 pages
Data Analytics Mid Sem Notes
No ratings yet
Data Analytics Mid Sem Notes
9 pages
Unit 2
No ratings yet
Unit 2
6 pages
Big Data One Shot
No ratings yet
Big Data One Shot
45 pages
Big Data Analytics Study Guide
No ratings yet
Big Data Analytics Study Guide
45 pages
Big Data Imp-1
No ratings yet
Big Data Imp-1
16 pages
12th Tes It
No ratings yet
12th Tes It
10 pages
BIG DATA AND ANALYTICS Presentation
No ratings yet
BIG DATA AND ANALYTICS Presentation
31 pages
Bda Winter 2024 Solution
No ratings yet
Bda Winter 2024 Solution
25 pages
Business Intelligence Systems
No ratings yet
Business Intelligence Systems
4 pages
BDS DS307 Unit-1
No ratings yet
BDS DS307 Unit-1
46 pages
BD Imp Ques 1
No ratings yet
BD Imp Ques 1
22 pages
Nosql and Hadoop
No ratings yet
Nosql and Hadoop
42 pages
Bda 2M
No ratings yet
Bda 2M
13 pages
EmTec Chapter 2
No ratings yet
EmTec Chapter 2
32 pages
Big Data Analytics
No ratings yet
Big Data Analytics
21 pages
CCS334 Question Bank Big Data
No ratings yet
CCS334 Question Bank Big Data
20 pages
Big Data and NoSQL Assignment
No ratings yet
Big Data and NoSQL Assignment
4 pages
BDA 02 - Fundamentals
No ratings yet
BDA 02 - Fundamentals
64 pages
DSBDA Insem
No ratings yet
DSBDA Insem
18 pages
Big Data
No ratings yet
Big Data
7 pages
Reinforcement Learning (RL) - Definition
No ratings yet
Reinforcement Learning (RL) - Definition
6 pages
The Unix File System
No ratings yet
The Unix File System
4 pages
DSO Internship File
No ratings yet
DSO Internship File
38 pages
NR449 Quiz 3 Review
No ratings yet
NR449 Quiz 3 Review
1 page
Kaelble S. Data Security For Dummies 2023
No ratings yet
Kaelble S. Data Security For Dummies 2023
37 pages
Unit - Iii: Malla Reddy Engineering College For Women
No ratings yet
Unit - Iii: Malla Reddy Engineering College For Women
35 pages
C Programmers' Malloc Guide
No ratings yet
C Programmers' Malloc Guide
20 pages
Generative AI Strategy for _VOIS
No ratings yet
Generative AI Strategy for _VOIS
31 pages
Master of Library and Information SCIENCE (Revised) 1-1 - 7) FI Term-End Examination December, 2019 Mli-101: Information, Communication and Society
No ratings yet
Master of Library and Information SCIENCE (Revised) 1-1 - 7) FI Term-End Examination December, 2019 Mli-101: Information, Communication and Society
4 pages
Fashion Marketing Research Guide
No ratings yet
Fashion Marketing Research Guide
15 pages
Quarter 4 Week 1 Technical Terms in Research
No ratings yet
Quarter 4 Week 1 Technical Terms in Research
7 pages
Normalization Paper
No ratings yet
Normalization Paper
3 pages
Rajalakshmi Engineering College
No ratings yet
Rajalakshmi Engineering College
3 pages
BTA Syllabus Instructor Information: Course Title
No ratings yet
BTA Syllabus Instructor Information: Course Title
4 pages
AI & Cybersecurity in Smart Manufacturing
No ratings yet
AI & Cybersecurity in Smart Manufacturing
15 pages
Akhila Resume
No ratings yet
Akhila Resume
2 pages
A Game Plan For Success in Data Analytics
100% (1)
A Game Plan For Success in Data Analytics
118 pages
15
No ratings yet
15
9 pages
SATVIKA
No ratings yet
SATVIKA
23 pages
Student Result Managmenat System
No ratings yet
Student Result Managmenat System
11 pages
Practical Research I Course Guide
No ratings yet
Practical Research I Course Guide
4 pages
Executive Dashboard: PROJECT LEAD. Lt. Col. Rana Mazhar Irshad Project Officer. Sys Anst. Sadaf Maqsood Qazi
No ratings yet
Executive Dashboard: PROJECT LEAD. Lt. Col. Rana Mazhar Irshad Project Officer. Sys Anst. Sadaf Maqsood Qazi
8 pages
Python's Applications in The Real World
No ratings yet
Python's Applications in The Real World
12 pages
Chapter 05
No ratings yet
Chapter 05
66 pages
Computer Memory Organization Guide
No ratings yet
Computer Memory Organization Guide
6 pages
Data Analysis Exam Guide
No ratings yet
Data Analysis Exam Guide
3 pages
Data Science For Business 3 PDF
No ratings yet
Data Science For Business 3 PDF
28 pages
SQL - Guided Practice
100% (1)
SQL - Guided Practice
11 pages
Chapter 1: Introduction: Database System Concepts, 6 Ed
No ratings yet
Chapter 1: Introduction: Database System Concepts, 6 Ed
15 pages
Data Base
No ratings yet
Data Base
6 pages