0% found this document useful (0 votes)

20 views48 pages

Data Mining

The document provides an overview of data mining, detailing various types of data such as database data, data warehouses, and transactional data, along with their structures and uses. It discusses key mining techniques including classification, clustering, and outlier analysis, as well as the technologies and applications involved in data mining. Additionally, it highlights challenges in the field, such as handling uncertainty and the need for user-friendly interfaces.

Uploaded by

is7636665

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views48 pages

Data Mining

Uploaded by

is7636665

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 48

Data Mining

1. Database Data

 Stored in: Relational databases (tables with rows and

columns).

 Examples: Customer records, employee data, item listings.

 Structure: Uses a schema (e.g., customer(custID, name, age,

income...)).

 Tools: SQL for querying and aggregating data.

 Data Mining Use: Identify trends, patterns, and deviations

(e.g., predicting credit risk, analyzing sales).

2. Data Warehouses

 Stored in: Centralized repositories integrating data from

multiple sources.

 Structure: Multidimensional (data cubes).

 Purpose: Historical analysis and business decision support.

 Features:

o Organized by subject (customer, item, time, etc.)

o Uses OLAP operations like roll-up and drill-down.

 Data Mining Use: Discover patterns at various levels of

granularity, enable exploratory analysis.


3. Transactional Data

 Stored in: Flat files or tables representing individual

transactions.

 Structure: Each record contains a transaction ID and items

involved (e.g., T100: I1, I3, I8).

 Examples: Retail sales, flight bookings, clickstream data.

 Data Mining Use:

o Market basket analysis

o Frequent itemset mining to discover what products are

often bought together.

Other Forms of Data (Mentioned briefly)

 Data Streams
 Sequence Data

 Graph or Network Data

 Spatial, Text, Multimedia Data

 Web Data (WWW)

Feature Database Data Data Warehouse Transactional

Data Data

Definition Structured data Integrated, Data

stored in historical data representing
relational from multiple real-world
databases sources stored transactions or
(RDBMS) using for analytical events, often
tables. purposes. sequential and
time-stamped.

Purpose Real-time Strategic Capturing and

operations and decision support analyzing
day-to-day and historical individual user
transactions analysis (OLAP). or business
(OLTP). actions (e.g.,
purchases).

Structure Tables Multidimensional Flat files or

(relations) with data cubes with nested tables
rows (tuples) summarized with transaction
and columns information. IDs and lists of
(attributes). items or events.

Storage Relational Centralized Data Flat files,

System Database Warehouse (e.g., NoSQL DBs, or
Management Amazon special
System Redshift, transactional
(RDBMS) like Snowflake, DB systems.
MySQL, Oracle, Google
SQL Server. BigQuery).

Schema Type Normalized Star or Often

schemas (3NF or Snowflake denormalized or
ER models). schemas for fast semi-structured
aggregation and (list of items
querying. per
transaction).

Data Fine-grained Aggregated Fine-grained

Granularity (detailed (summary over (detailed per
individual time or groups). transaction).
records).

Data Sources Single Multiple Point-of-sale

operational heterogeneous systems,
system (e.g., sources (e.g., sensors, web
POS system). regional DBs, logs, etc.
logs).

Update Frequently Periodically Continuously

Frequency updated (daily, updated (daily, updated or
hourly). weekly, appended.
monthly).

Examples - - Sales by region - T100: [I1, I3,

customer(custID and time - I8] -
, name, age, Quarterly Clickstream log:
income) - product UserID: [Page1,
item(itemID, performance Page3, Page7]
price, category)

Query Type SQL-based OLAP queries: Pattern mining:

queries: SELECT, Drill-down, Roll- association rule
JOIN, GROUP BY, up, Slice, Dice mining,
etc. sequence
pattern mining

Mining Classification, Multidimensional Association rule

Techniques clustering, pattern mining, mining,
outlier trend analysis, sequential
detection, anomaly pattern mining,
regression detection market basket
analysis

Use Cases Predicting Strategic Recommending

customer churn, decisions like products,
identifying which regions detecting
fraud, customer are buying
segmentation. underperforming behavior,
or product trend promotion
analysis. bundling.

Tools/ SQL, DBMS OLAP tools Apache Hadoop,

Technologies (MySQL, (Tableau, Power Spark, NoSQL,
PostgreSQL), BI), ETL Association
Python (pandas) pipelines, Cube Rule Mining
computation (Apriori, FP-
Growth)
Functionalities

1. Characterization and Discrimination (Class/Concept

Description)

➤ Characterization:

Describes the general features of data belonging to a target class.

 Provides a concise summary, usually through descriptive

statistics, OLAP operations, or attribute-oriented induction.

 Example: A retail manager wants to know the profile of customers

who spend over $5000/year. The result might show that they are
typically middle-aged, employed, and have good credit ratings.

 Output can be presented in the form of:

o Charts (bar, pie)

o Generalized relations

o Characteristic rules (e.g., "If income > 50K → likely to spend >
$5000")

 It's used for summarizing and understanding data patterns in a

group.

➤ Discrimination:

 Compares the features of a target class against one or more

contrasting classes.

 Example: Comparing customers who shop frequently for computer

products vs. those who shop rarely. Differences may include age,
education, etc.

 Helps identify features that distinguish between groups (e.g., age,

occupation).

 Often results in discriminant rules, e.g., “If age between 20-40

and education = university → frequent buyer.”

Key Difference: Characterization is about describing one group;

discrimination is about comparing multiple groups.

2. Mining Frequent Patterns, Associations, and Correlations

 Aims to find repetitive patterns, associations, or correlations

in large datasets.
 This includes:

o Frequent Itemsets: Sets of items that often appear together

in transactions (e.g., bread and butter).

o Sequential Patterns: Items purchased in a sequence (e.g.,

laptop → camera → memory card).

o Substructures: Patterns in structural forms like graphs or

trees.

➤ Association Rule Mining:

 Example: “buys(X, 'computer') → buys(X, 'software') [support: 1%,

confidence: 50%]”

o Means 1% of transactions include both, and 50% of computer

buyers also buy software.

 Can be single-dimensional (same predicate) or multi-

dimensional (age, income, buys).

 Used in market basket analysis, cross-selling, product

recommendations.

➤ Correlation Analysis:

 Goes beyond co-occurrence to measure statistical significance

between items (e.g., chi-square test).

Importance: Helps in identifying what tends to happen together in

data, enabling targeted marketing, inventory planning, and more.

3. Classification and Regression (Predictive Analysis)

➤ Classification:

 Builds a model (classifier) that assigns data to predefined

categories or classes.

 Requires labeled training data.

 Output can be:

o IF-THEN rules

o Decision trees

o Neural networks

o SVMs, k-NN, Bayesian classifiers

 Example: Classifying items based on sales response (good, mild,
none).

 Used in spam detection, credit scoring, disease diagnosis, etc.

➤ Regression:

 Predicts continuous numeric values, not categories.

 Example: Predicting the expected revenue from a product.

 Methods include linear regression, polynomial regression, and

advanced ML techniques.

 Used in forecasting, pricing models, stock prediction.

Key Difference: Classification predicts discrete labels, regression

predicts continuous values.

4. Cluster Analysis

 Groups a set of objects into clusters so that:

o Intra-cluster similarity is high

o Inter-cluster similarity is low

 No prior labels are required (unsupervised learning).

 Each cluster can later be treated as a class for further analysis.

 Example: Segmenting customers based on purchasing behavior or

geographic location.

 Visualized often using 2D/3D plots (e.g., k-means, DBSCAN).

 Applications include customer segmentation, image recognition,

bioinformatics.
Purpose: To discover natural groupings within data without predefined
categories.

5. Outlier Analysis

 Identifies data objects that deviate significantly from the

general pattern.

 Such data points are called outliers or anomalies.

 Useful in applications where rare events are more important than

common ones:

o Fraud detection (e.g., unusual credit card activity)

o Intrusion detection

o Medical anomalies

 Techniques:

o Statistical methods (assuming distribution models)

o Distance-based methods (objects far from others)

o Density-based methods (like LOF – Local Outlier Factor)

 Not all outliers are noise—many are insightful and can drive
important decisions.

Technologies Used in Data Mining

Data mining is an application-driven field that integrates various
techniques from multiple disciplines to extract valuable insights from data.
These include:

1. Statistics

o Role in Data Mining: Statistics is used to model data and

target classes. A statistical model describes the behavior of
data using mathematical functions and probability
distributions.

o Applications in Data Mining:

 Data Characterization and Classification: Statistical

models can be used to classify and characterize data.

 Handling Noise and Missing Data: Statistics helps in

modeling and handling noisy or missing data during the
data mining process.

 Prediction and Forecasting: Statistical models are

key for prediction tasks, providing a framework for
making inferences about the data.

 Verifying Data Mining Results: After building

classification or prediction models, statistical hypothesis
testing helps verify their accuracy and significance.

o Challenges: Scaling statistical methods for large datasets is

complex due to computational costs. This issue is exacerbated
for online applications requiring real-time processing.

2. Machine Learning

o Role in Data Mining: Machine learning focuses on enabling

computers to learn patterns and make decisions based on
data. Machine learning is used in data mining for tasks like
classification and clustering.

o Types of Learning Methods:

 Supervised Learning (Classification): Involves

training a model with labeled data to recognize patterns,
such as recognizing postal codes from handwritten
images.

 Unsupervised Learning (Clustering): The model

learns from data without labels, finding hidden patterns
or groups (e.g., recognizing different digits in
handwritten data without predefined labels).

 Semi-supervised Learning: Combines both labeled

and unlabeled data. Labeled data helps build models,
while unlabeled data helps refine the model's
boundaries, improving accuracy.

 Active Learning: The model actively queries humans

(domain experts) to label uncertain data points, thus
improving the model with minimal human input.

o Challenges: While machine learning focuses on accuracy,

data mining also emphasizes efficiency, scalability, and
handling diverse types of data.

3. Database Systems and Data Warehouses

o Role in Data Mining: Database systems handle the storage,

management, and retrieval of data, and they play a crucial
role in ensuring that data mining can scale to large datasets.

o Data Warehousing: A data warehouse integrates data from

various sources and timeframes into a unified structure. It
enables advanced data analysis by consolidating data into
multidimensional space, known as data cubes.

o Data Mining Integration: Modern database systems often

incorporate data mining capabilities to extend their analytic
power. Data mining tools can operate directly on data stored
in databases to identify patterns.

o Challenges: Data mining often involves working with real-

time streaming data, which requires efficient database
technologies to process large volumes of data quickly.

4. Information Retrieval (IR)

o Role in Data Mining: Information retrieval involves

searching and retrieving relevant documents or information
from a large database or the web. Unlike database systems, IR
deals with unstructured data (e.g., text or multimedia).

o Probabilistic Models: IR uses probabilistic models to

measure the similarity between documents. Text documents
are often represented as a bag of words, where the presence
and frequency of words are important, but word order is not.
o Topic Modeling: IR systems use models to identify
underlying topics in collections of documents. These topics are
represented as probability distributions over a vocabulary, and
documents may belong to multiple topics.

o Integration with Data Mining: Combining IR with data

mining techniques enables deeper analysis of text and
multimedia data, facilitating better search and analysis of
large, unstructured datasets (e.g., web data, digital libraries,
healthcare records).

Applications of Data Mining

Data mining plays a vital role in various fields where large amounts of
data need to be analyzed. Here are two major applications:

1. Business Intelligence (BI)

o Purpose: To understand business contexts such as

customers, market trends, and competitors.

o Key Techniques:

 Classification and Prediction for sales, market

analysis, and customer feedback.

 Clustering for Customer Relationship Management

(CRM), grouping customers by similarities.

 Characterization mining for understanding customer

groups and developing tailored programs.

o Importance: BI allows businesses to make smart decisions,

retain valuable customers, and gain insights into competitors.
Without data mining, effective market analysis would be
difficult.

2. Web Search Engines

o Purpose: To retrieve information from the web in response to

user queries.

o Techniques Used:
 Crawling: Deciding which web pages to crawl and how
frequently.

 Indexing: Choosing which pages to index and how to

structure the index.

 Ranking: Determining how to rank pages based on

relevance and quality.

o Challenges:

 Data Volume: Search engines deal with massive

amounts of data, requiring cloud computing for
processing.

 Real-Time Processing: Search engines need to

respond to user queries instantly, often requiring
continuous updates and real-time data mining.

 Small Data Issues: Many queries are asked rarely,

posing a challenge for mining methods designed for
large datasets.

Major Issues in Data Mining

Data mining, being a rapidly evolving field, faces several challenges and
open research areas. These challenges can be categorized into five main
groups:

1. Mining Methodology

o New Knowledge Types: Data mining covers a broad range

of tasks (e.g., classification, regression, clustering), and as
new applications emerge, new mining techniques are
developed.

o Multidimensional Data Mining: Mining knowledge across

different dimensions, such as combining various attributes in
data cubes.

o Interdisciplinary Approaches: Integrating methods from

natural language processing, software engineering, and other
fields enhances data mining.

o Handling Uncertainty: Dealing with noisy or incomplete

data is a significant challenge in data mining, requiring
techniques like data cleaning and outlier detection.

2. User Interaction
o Interactive Mining: The mining process should be flexible
and dynamic, allowing users to refine searches and explore
data interactively.

o Incorporation of Background Knowledge: Including

domain-specific knowledge, constraints, or rules can guide the
mining process towards more useful results.

o Data Mining Query Languages: High-level languages or

interfaces allow users to define and optimize ad hoc queries,
making the process more user-friendly.

o Visualization of Results: Presenting mining results in an

understandable, visually intuitive way is crucial for the
usability of data mining systems.

3. Efficiency and Scalability

o Algorithm Efficiency: Data mining algorithms need to

handle large datasets quickly and efficiently, especially as
data volumes grow.

o Parallel and Distributed Mining: Large datasets often

require parallel processing across distributed systems. Cloud
and cluster computing are common methods to scale data
mining processes.

o Incremental Mining: Incremental algorithms that can update

models as new data arrives without reprocessing all existing
data are a key area of research.

4. Diversity of Data Types

o Complex Data Types: Data mining must handle a variety of

data types, from structured databases to unstructured data
like text and images.

o Dynamic Data: Some data, like online data streams or real-

time sensor data, change constantly, presenting challenges
for traditional mining methods.

o Interconnected Data: Many datasets are linked (e.g., social

networks, web data), requiring mining techniques that can
handle and exploit these connections.

5. Data Mining and Society

o Social Impact: Data mining affects privacy, security, and
social dynamics. How can we use data mining for societal
benefit while preventing misuse?

o Privacy-Preserving Mining: Safeguarding individuals'

privacy while conducting data mining is crucial. Ongoing
research focuses on privacy-preserving data mining methods.

o Invisible Data Mining: Many systems perform data mining

behind the scenes without users' awareness. For instance, e-
commerce sites track user behavior to recommend products.

What is a Data Warehouse?

A data warehouse refers to a central repository where data from
different sources is stored and organized for analysis and decision-making.
It allows businesses to store historical data that supports strategic
decisions. Data warehouses are essential in today’s competitive world as
organizations use them to gain insights into various aspects of their
operations and make informed decisions.

Key Features of a Data Warehouse

1. Subject-Oriented:

o Data warehouses are designed around major subjects of

interest such as customers, products, suppliers, and sales.

o Unlike operational databases, which focus on day-to-day

transactions, data warehouses are structured to provide a
more analytical view that helps decision-makers.

o Data is organized to reflect decision support processes, not

operational activities.

2. Integrated:

o A data warehouse integrates data from multiple

heterogeneous sources like relational databases, flat files, and
transaction logs.

o It ensures consistency in naming conventions, data formats,

and attribute measures.

o The data is cleaned and standardized before being loaded into

the warehouse.
3. Time-Variant:

o Data in a warehouse is typically historical, covering several

years (e.g., 5-10 years) to help analyze trends over time.

o Each data set within a warehouse includes a time element,

either implicitly or explicitly, to track changes and trends over
time.

4. Nonvolatile:

o Once data is stored in a warehouse, it is not changed. New

data is only appended to the system.

o A data warehouse does not require the mechanisms for

transaction processing, recovery, or concurrency control that
are needed in operational databases.

o The primary operations in a data warehouse are data loading

and querying.

Functions of a Data Warehouse

 A data warehouse consolidates large amounts of data for analysis

and decision-making purposes. It is not primarily designed for
transactional operations but rather for answering complex queries
and providing insights into various business activities.

 A data warehouse is often constructed by integrating data from

multiple sources, using processes like data cleaning, data
integration, and data consolidation.

 Decision support technologies are used to query the data,

generate reports, and make strategic decisions based on the
insights derived from the data.

Difference b/w Database and Datawarehouse

Feature OLTP (Operational OLAP (Data Warehouse
Database System) System)

Purpose Handles day-to-day Supports data analysis and

transactions and query decision making
processing
Users Clerks, clients, IT Managers, executives,
professionals (customer- analysts (market-oriented)
oriented)

Data Content Current, detailed, real- Historical, aggregated,

time transactional data summarized data

Data Volume Typically smaller; focused Very large; includes years

on current data of historical data

Database Entity-Relationship (ER) Star or Snowflake schema;

Design model; application- subject-oriented
oriented

View of Data Narrow view, specific to Broad view, integrates

department/enterprise data from multiple sources

Query Simple, short, atomic Complex queries, often

Characteristi queries and transactions involving aggregation
cs

Operations Frequent inserts, updates, Primarily read-only

deletes (write-heavy) operations (read-heavy)

Concurrency Requires concurrency Less need for concurrency,

Control control and recovery due to read-only nature
mechanisms

Access High transaction Low latency not critical,

Patterns throughput, quick focus on complex query
response time performance

Performance Measured by number of Measured by query

Metrics transactions per second response time and
analytical capability

Data Very detailed, fine-grained Data stored at multiple

Granularity data levels of granularity (from
detailed to summarized)

Frequency of Constant, very frequent Periodic, depending on

Access analysis/reporting needs

Data Minimal; single source High; integrates data from

Integration systems multiple heterogeneous
sources

Storage Typically stored on a Distributed storage

Medium single system or server systems due to large
volume

Data Warehousing: A Multitiered Architecture

Data warehouses are designed using a three-tier architecture, which
helps in separating data storage, data processing, and data presentation.
This architecture ensures scalability, flexibility, and efficient data
management. Here's a breakdown of each tier:

1. Bottom Tier: Data Warehouse Server

 Role: This tier is responsible for storing the actual data.

 Technology Used: Usually a relational database management

system (RDBMS).

 Functions:

o Data Extraction: Pulls data from various operational and

external sources.

o Data Cleaning: Removes errors, inconsistencies, and

duplicates.

o Data Transformation: Converts data into a common, unified

format.

o Data Loading: Transfers the processed data into the data

warehouse.

o Data Refreshing: Periodically updates the data warehouse to

reflect recent changes.

 Data Sources:

o Operational Databases: e.g., banking systems, sales

systems.

o External Sources: e.g., market research reports, customer

profiles from third parties.

 Tools Used:

o Gateways (APIs) to connect and query the source systems:

 ODBC (Open Database Connectivity)

 OLEDB (Object Linking and Embedding Database)

 JDBC (Java Database Connectivity)

 Metadata Repository:

o Stores information about data (like source, format,

transformations applied).

o Acts as a directory for warehouse management and query

optimization.

2. Middle Tier: OLAP Server

 Role: Acts as the processing layer, converting data into a form

suitable for analysis.

 Two Main OLAP Models:

1. ROLAP (Relational OLAP):

 Works on top of relational databases.

 Converts multidimensional operations into relational

queries.

 Suitable for handling large volumes of data.

2. MOLAP (Multidimensional OLAP):

 Uses specialized multidimensional data structures

(cubes).

 Faster for complex analytical queries, but may have

storage limitations.

 Functionality:

o Supports advanced analytical processing, including

summarization, aggregation, and complex computations.

o Optimized for read-heavy operations.

3. Top Tier: Front-End Tools

 Role: This is the user interface layer, where users interact with
the system.

 Components:

o Query and Reporting Tools: For generating standard or

custom reports.
o Data Analysis Tools: For ad-hoc querying, slicing, dicing, and
drill-down analysis.

o Data Mining Tools: For predictive modeling, clustering, trend

analysis, etc.

 Users:

o Business Analysts

o Executives and Managers

o Decision Makers

 Functionality:

o Provides a visual and interactive environment for

exploring and analyzing data.

o Supports dashboards, charts, graphs, and other visualizations.

Data Warehouse Models: Enterprise
Warehouse,
Data Mart, and Virtual Warehouse
1. Enterprise Data Warehouse (EDW)

 Definition: A centralized data warehouse that stores information

from across the entire organization.

 Scope: Corporate-wide, cross-functional.

 Data:

o Includes both detailed and summarized data.

o Integrated from multiple operational systems or external

sources.

 Implementation:

o Requires extensive business modeling.

o Typically built on mainframes, superservers, or parallel

systems.

o May take months or years to design and deploy.

 Advantages:

o Single source of truth.

o High consistency and integration.

 Disadvantages:

o Time-consuming and expensive to build.

o Inflexible in dynamic environments.

2. Data Mart

 Definition: A smaller, focused version of a data warehouse that

stores data for a specific business line or department (e.g.,
marketing, sales).

 Scope: Departmental or subject-specific.

 Data:

o Typically summarized and related to specific business needs.

 Types:

o Independent Data Mart: Sourced directly from operational

systems or external providers.

o Dependent Data Mart: Sourced from an existing enterprise

data warehouse.

 Implementation:

o Uses low-cost servers (e.g., Linux, Windows).

o Takes weeks to build (faster ROI).

 Advantages:

o Quick to implement.

o Cost-effective.

o Flexible and adaptable to specific needs.

 Disadvantages:

o Risk of data silos.

o Complex integration later if not aligned with enterprise

strategy.

3. Virtual Warehouse

 Definition: A set of virtual views over operational databases.

 Implementation:

o Does not store data physically.

o Queries are processed in real time using views.

 Advantages:

o Easy and fast to build.

o Cost-efficient (no extra storage).

 Disadvantages:

o Performance depends on operational systems.

o Requires high processing capacity for complex queries.

o Limited historical data analysis.

Top-Down vs. Bottom-Up Approaches to Data Warehouse
Development

Aspect Top-Down Approach Bottom-Up Approach

Start Begin with enterprise data Start with departmental
Point warehouse data marts

Time & High cost and long duration Low cost, faster
Cost implementation

Flexibilit Less flexible More adaptable

y
Integrati Minimizes integration issues May lead to integration
on later challenges

Suitabili Best for long-term strategic Best for tactical and quick
ty planning solutions

Risk High initial investment with Quick wins but may cause
late returns silo issues

Recommended Approach: Incremental & Evolutionary

A hybrid approach is often best — combining top-down planning with

bottom-up implementation. The steps are:

1. Define High-Level Corporate Data Model

o Done within 1–2 months.

o Ensures consistent view of data across the organization.

2. Implement Independent Data Marts

o Developed in parallel using the high-level model.

o Quick deployment, department-level use.

3. Construct Distributed Data Marts

o Integrate various marts via hub servers.

o Enables data sharing across business units.

4. Build Multitier Data Warehouse

o Centralized Enterprise Data Warehouse becomes the
primary data store.

o Distributes data to dependent data marts as needed.

📊 Data Warehouse Modeling: Data Cube and OLAP

✅ Overview

 Data Warehouses and OLAP (Online Analytical Processing)

tools are built on the multidimensional data model.

 This model visualizes data as a data cube, which allows for

interactive analysis of multidimensional data.

 The modeling supports advanced operations like roll-up, drill-

down, and slicing/dicing to enable deep business insights.

Data Cube: A Multidimensional Data Model

🔹 What is a Data Cube?

 A data cube allows data to be modeled in n-dimensions (not just

3D).
 It is defined by:

o Dimensions: The perspectives for analysis (e.g., time, item,

location).

o Facts/Measures: Quantitative data (e.g., dollars sold, units

sold).

🔹 Key Concepts:

 Dimensions:

o Examples: time, item, branch, location.

o Each has a dimension table (e.g., for item: item name,

brand, type).

 Fact Table:

o Contains numeric measures like dollars sold, units sold, etc.

o Links to each dimension via foreign keys.

🧊 Representation:

 2-D Cube: Like a spreadsheet/table (e.g., time × item, for location

= Vancouver).

 3-D Cube: time × item × location.

 4-D Cube: time × item × location × supplier — hard to visualize but

conceptually a series of 3D cubes.

📐 Cuboids and Lattice

🔸 What is a Cuboid?

 A cuboid is a cube at a certain level of summarization (group-

by).

 Base Cuboid: The lowest level (e.g., time, item, location, supplier).

 Apex Cuboid (0-D): The highest level — summarized over all

dimensions.

🔸 Data Cube Lattice

 Given n dimensions, 2^n possible cuboids exist.

 Forms a lattice structure, representing all possible levels of

summarization.
🔹 Example from the notes:

For dimensions: time, item, location, supplier, the lattice includes:

 0-D Cuboid: total sales (summarized across all dimensions)

 1-D Cuboids: {time}, {item}, {location}, {supplier}

 2-D Cuboids: {time, item}, {item, supplier}, etc.

 3-D Cuboids: {time, item, location}, etc.

 4-D Cuboid: {time, item, location, supplier} (base cuboid)

🔷 Multidimensional Schema Models

⭐ 1. Star Schema

 Structure:

o Central fact table (large, non-redundant).

o Connected dimension tables (flat, possibly redundant).

 Pros:

o Simple, fast query performance.

o Easy to understand.

 Cons:

o Some redundancy in dimension tables.

 Use case: Most common in data marts.

❄️2. Snowflake Schema

 Structure:
o Like a star schema but dimension tables are normalized into
sub-tables.

 Pros:

o Reduces redundancy.

o Easier maintenance.

 Cons:

o More complex queries due to joins.

o Slight performance trade-off.

 Use case: Less common; used when storage efficiency is more

critical.

🌌 3. Fact Constellation (Galaxy Schema)

 Structure:

o Multiple fact tables sharing dimension tables.

 Pros:

o Models multiple interrelated subjects.

o Captures enterprise-wide data.

 Cons:

o Complex structure.

 Use case: Suitable for enterprise data warehouses.

🌐 Concept Hierarchies for Dimensions
What is a Concept Hierarchy?

 Maps low-level values (e.g., city) to higher-level concepts (e.g.,

country).

 Helps summarize or roll-up data in OLAP operations.

Types:

 Schema Hierarchy: Total/partial order (e.g., street < city <

province < country).

 Lattice: Partial order where attributes don’t follow a single path

(e.g., week < year; day < month).

 Set-grouping Hierarchy: Value ranges grouped (e.g., price

ranges: ($0–$200], ($200–$400], ...).

 Can be manually defined or automatically generated.

🧮 Measures in a Data Cube

Measures = Numeric values aggregated over dimension values

(e.g., total sales).

Type Definition Examples Efficien

cy
Distribu Can be computed from sum(), count(), Very
tive subaggregates and min(), max() efficient
combined.

Algebrai Computed using a fixed avg() = Efficient

c number of distributive sum()/count(),
aggregates. stddev()

Holistic Requires full data scan, median(), mode(), Inefficient

cannot be broken into rank()
subaggregates.

⚠️Most OLAP tools focus on distributive and algebraic measures for

performance.

OLAP Operations
🔁 1. Roll-Up

 Definition: Aggregates data by climbing up a concept hierarchy or

by reducing dimensions.

 Example: Aggregating sales data from city to country (Toronto →

Canada).

 Also called: Drill-Up (by some vendors).

🔽 2. Drill-Down

 Definition: The reverse of roll-up; navigates from summary data to

more detailed data.

 Example: Moving from quarterly sales data to monthly sales

data.

 Also includes: Adding a new dimension (e.g., customer group) for

more detail.

🧊 3. Slice
 Definition: Selects a single dimension value, resulting in a subcube.

 Example: Selecting data where time = Q1 only.

🧊🧊 4. Dice

 Definition: Selects a range of values on two or more dimensions,

resulting in a subcube.

 Example: Data for location = Toronto or Vancouver, time = Q1

or Q2, and item = home entertainment or computer.

🔄 5. Pivot (Rotate)

 Definition: Rotates the cube to view data from different

perspectives.

 Example: Swapping the axes item and location for alternate visual
layout.

🔍 6. Drill-Across

 Definition: Executes queries across multiple fact tables.

🧱 7. Drill-Through

 Definition: Accesses the bottom-level data in the data cube using

SQL, typically reaching into backend relational tables.

📊 8. Other Advanced Operations

 Examples:

o Top-N/Bottom-N ranking.

o Moving averages, growth rates, depreciation.

o Currency conversion, internal return rates.

o Forecasting, trend/statistical analysis, variance

calculations.
💡 Role of Concept Hierarchies in OLAP

 Enable aggregation and drilling at various levels of detail.

 Facilitate multilevel data exploration across dimensions.

Concept Hierarchies
1. Definition

 Concept hierarchy is a sequence of mappings from low-level

(specific) concepts to high-level (general) concepts.

 Purpose: Allows multilevel data abstraction.

2. Types

 Schema Hierarchies (based on database attributes):

o Example for location: Street < City < Province/State <
Country

o Example for time: Day < Month < Quarter < Year

 Lattice Structure:

o Supports partial orders, e.g., Day < Month < Quarter, and
Week < Year.

 Set-Grouping Hierarchies:

o Created by grouping values into ranges or categories.

o Example for price: $0–$100, $100–$200, etc.

o User-defined groups: cheap, moderate, expensive

 Multiple Hierarchies:

o A single attribute can have multiple concept hierarchies

depending on the analysis (e.g., price can be by range or by
category).

3. Sources of Concept Hierarchies

 Manual: Provided by users, domain experts.

 Automatic: Generated using statistical analysis (e.g., clustering).

📏 Measures: Categorization & Computation

Measure: A numerical value computed for each multidimensional

point (e.g., sales).

Categories of Measures:

Type Definition Examples Notes

Distribut Can be computed sum(), count(), Easy and efficient to

ive in parts and min(), compute.
then max()
aggregated.

Algebrai Computed from a avg() = Depends on multiple

c fixed number sum/count distributive
(M) of , stddev() functions.
distributive
measures.

Holistic Cannot be expressed median(), Complex, may

with a bounded mode(), require
number of rank() approximation
distributive
.
results.

🧠 OLAP Engine Capabilities

 Enables complex analytical computations.

 Supports:

o Aggregations, hierarchies, ratios

o Forecasting, trend and statistical analysis

 Provides a user-friendly, interactive environment for querying

multidimensional dat

Detailed Definition of Mining Frequent Patterns

Frequent Pattern Mining is a fundamental task in data mining that

involves discovering patterns (like itemsets, sequences, or
structures) that occur frequently in a dataset. These patterns
reveal relationships and associations between data items that can
be useful in decision-making, prediction, recommendation, and
classification.

Patterns
Key Concepts in Frequent Pattern Mining

1. Frequent Pattern

A frequent pattern is a set of items, subsequences, or structures that

appear together frequently in a dataset.

Examples:

 Frequent Itemset: {milk, bread} appears together in many

transactions.

 Frequent Sequential Pattern: <PC → Digital Camera → Memory

Card> appears in many customer purchase histories.

 Frequent Structured Pattern: A frequently recurring subgraph in

a chemical compound dataset.
2. Itemset

A collection of one or more items. For instance, in a supermarket:

 {milk}, {milk, bread}, {bread, butter, eggs} are itemsets.

A k-itemset contains k items.

3. Support

 Support of an itemset is the proportion (or count) of transactions

that contain the itemset.

 It measures how frequently an itemset appears in the dataset.

Support(A) = (Number of transactions containing A) / (Total number of

transactions)

4. Confidence

 Confidence of a rule A → B is the probability that transactions

containing A also contain B.

Confidence(A → B) = Support(A ∪ B) / Support(A)

It shows how reliable the rule is.

5. Association Rules

An association rule is an implication of the form:

css

CopyEdit

A → B [support = s%, confidence = c%]

It means that if A occurs, B is likely to occur with support s and confidence

6. Closed and Maximal Frequent Itemsets

 Closed Frequent Itemset: A frequent itemset that has no

superset with the same support.

 Maximal Frequent Itemset: A frequent itemset that has no

frequent supersets.
Frequent Itemsets, Closed Itemsets, and
Association Rules

🔹 1. Basic Definitions

➤ Itemset:

 A group of items.

 A k-itemset contains k items (e.g., {bread, milk} is a 2-itemset).

➤ Transaction (T):

 A set of items bought together.

 Identified by a unique TID.

🔹 2. Support and Confidence

➤ Support:

 Fraction of transactions that contain an itemset.

Transactions containing both A and B
Support ( A ∪ B )=
Total transactions
➤ Confidence:

 Likelihood of item B occurring given item A.

Support ( A ∪ B )
Confidence ( A → B )=
Support ( A )

🔹 3. Association Rules

 Form: A → B

 Indicates a strong relationship: "If A occurs, B is likely to occur."

✔️Strong Rules:

 Satisfy both:

o Minimum Support (minsup)

o Minimum Confidence (minconf)

🔹 4. Frequent Itemsets

 An itemset is frequent if its support ≥ minsup.

 Support count = number of transactions containing the itemset.

🔹 5. Closed Frequent Itemsets

 An itemset is closed if no proper superset has the same

support count.

 Captures complete support info.

 Used to eliminate redundancy.

🔹 6. Maximal Frequent Itemsets

 An itemset is maximal frequent if it is frequent and none of its

supersets are frequent.

 Represents the outer boundary of frequent itemsets.

 More compact, but may lose support details of subsets.

🔹 7. Why Use Closed or Maximal?

 Mining frequent itemsets may result in an exponential number of

patterns.

 Closed and maximal reduce computation and storage.

🔹 8. Example

Dataset:

 T1: {a1, a2, ..., a100}

 T2: {a1, a2, ..., a50}

 minsup = 1

➤ Frequent itemsets: All subsets of T1 and T2

Total = 2^100 - 1 → Too large!

➤ Closed frequent itemsets:

 {a1, ..., a50} → support: 2

 {a1, ..., a100} → support: 1

➤ Maximal frequent itemset:

 {a1, ..., a100} only (as its superset doesn't exist)

🔹 9. Association Rule Mining Steps

1. Find all frequent itemsets (support ≥ minsup).

2. Generate strong association rules from those itemsets

(confidence ≥ minconf).

Apriori Algorithm – Overview

 Purpose: To mine frequent itemsets for Boolean association rules.

 Proposed by: R. Agrawal and R. Srikant (1994).

 Name Origin: Uses prior knowledge of itemset properties.

🔁 Working Principle

 Level-wise iterative approach:

o Finds frequent 1-itemsets (L1) from the database.

o Uses Lk-1 to generate Lk (frequent k-itemsets).

o Iterates until no more frequent itemsets can be found.

✅ Apriori Property (Antimonotonicity)

 Definition: All non-empty subsets of a frequent itemset must also

be frequent.

 Implication:
o If itemset I is infrequent, then any superset I ∪ A is also
infrequent.

o Helps in pruning the candidate space (reducing

computations).

🧩 Two-Step Process (Join & Prune)

1. Join Step:

o Generate candidate itemsets Ck by self-joining Lk-1.

o Join l1 and l2 in Lk-1 if their first k-2 items are the same.

o Ensures no duplicates using lexicographic order.

2. Prune Step:

o Remove candidate c ∈ Ck if any of its (k-1)-subsets is not in

Lk-1.

📊 Example (Using AllElectronics DB)

 Database D: 9 transactions (T100–T900).

 min_sup = 2 (support count).

 Iterations:

o C1 → L1: All 1-itemsets satisfying min_sup.

o C2 → L2: 2-itemsets from L1 × L1; all subsets are frequent →

no pruning.

o C3 → L3: Prune itemsets with infrequent subsets using the

Apriori property.

o C4: Generated but pruned entirely due to infrequent subset →

termination.

Generating Association Rules

✅ Definitions

 Frequent Itemset: An itemset whose support ≥ minimum support

threshold.
 Association Rule: An implication of the form A → B, where A and B
are itemsets.

 Support Count: Number of transactions containing a given

itemset.

 Confidence: Measures how often items in B appear in transactions

that contain A:

Support ( A ∪ B )
Confidence ( A → B )=
Support ( A )

⚙️Steps to Generate Association Rules

1. Find all frequent itemsets using algorithms like Apriori or FP-

Growth.

2. For each frequent itemset l:

o Generate all non-empty subsets s of l.

o For each s, form the rule:

s→(l−s)s \rightarrow (l - s)s→(l−s)

o Compute the confidence of each rule.

3. Filter strong rules:

o Only keep rules with confidence ≥ min_conf.

o All rules automatically satisfy min_support because they are

derived from frequent itemsets.

📘 Example

Let X={I1,I2,I5}X = \{I1, I2, I5\}X={I1,I2,I5} be a frequent itemset.

Non-empty subsets of X:

 {I1}, {I2}, {I5}, {I1, I2}, {I1, I5}, {I2, I5}

Possible rules and confidences (assuming support counts):

 {I1, I2} → {I5} → 50%

 {I1, I5} → {I2} → 100%

 {I2, I5} → {I1} → 100%

 {I1} → {I2, I5} → 33%

 {I2} → {I1, I5} → 29%

 {I5} → {I1, I2} → 100%

With min_conf = 70%, strong rules:

 {I1, I5} → {I2}

 {I2, I5} → {I1}

 {I5} → {I1, I2}

FP-Growth Algorithm

✅ Motivation

Apriori algorithm, though effective, suffers from:

 Huge candidate generation (e.g., 10⁴ 1-itemsets → 10⁷ 2-

itemsets).

 Multiple full database scans and expensive pattern matching.

💡 FP-Growth Solution

 Avoids candidate generation by using a divide-and-conquer

strategy.

 Builds a compressed data structure called the FP-tree

(Frequent Pattern Tree).

 Recursively mines conditional FP-trees for frequent patterns.

How FP-Growth Works

1. First Database Scan

 Count support of all items → generate frequent 1-itemsets.

 Sort items in descending order of support → List L.

2. Build FP-Tree

 Start with a null root.

 For each transaction:

o Sort items according to L.

o Insert path into tree, sharing common prefixes.

o Increment node counts for existing prefixes.

 Maintain node-links for quick access via a header table.

3. Mine FP-Tree

For each item (starting from the least frequent in L):

 Construct Conditional Pattern Base (CPB):

o Paths in FP-tree ending with the item.

 Build Conditional FP-Tree from CPB.

 Recursively mine the conditional FP-tree.

📘 Example Summary

Frequent Items (sorted by support):

L = {I2:7, I1:6, I3:6, I4:2, I5:2}

Mining I5:

 CPB: {I2, I1:1}, {I2, I1, I3:1}

 Conditional FP-tree: I2:2, I1:2

 Patterns: {I2, I5}, {I1, I5}, {I2, I1, I5}

Mining I4:

 CPB: {I2, I1:1}, {I2:1}

 Conditional FP-tree: I2:2

 Pattern: {I2, I4}

Mining I3:

 CPB: {I2, I1:2}, {I2:2}, {I1:2}

 Conditional FP-tree: Two branches

 Patterns: {I2, I3}, {I1, I3}, {I2, I1, I3}

Mining I1:

 CPB: {I2:4}

 Conditional FP-tree: I2:4

 Pattern: {I2, I1}

🌲 FP-Tree Benefits

 Compact representation of the database.

 Reduces:

o Database scans

o Candidate generation

o Search space

🔄 Recursive Mining

 Focuses on smaller projected databases.

 Combines prefix path with suffix pattern during recursion.

🔍 Why Pattern Evaluation Matters

 Even strong rules (high support & confidence) can be misleading
or uninteresting.

 This problem is especially common with:

o Low support thresholds

o Long patterns
📌 Strong Rules May Be Misleading

 Example: The rule "buys computer games → buys videos" has:

o Support: 40%

o Confidence: 66%

 Seems strong, but actually misleading, because:

o The base probability of buying videos is 75%

o So buying games lowers the chance of buying videos (→

negative correlation)

 Takeaway: Confidence does not imply true correlation

⚙️Correlation-Based Evaluation

To improve pattern evaluation, use correlation analysis.

✅ Lift:

 Formula:
lift(A → B) = P(A ∩ B) / (P(A) * P(B))

 Interpretation:

o Lift > 1: Positive correlation

o Lift < 1: Negative correlation

o Lift = 1: No correlation

 In the example, lift = 0.89 → negative correlation

✅ Chi-Square (χ²) Measure:

 Based on a contingency table of observed vs expected values

 Formula:
χ² = Σ (observed - expected)² / expected

 High χ² → Strong evidence of dependence

 In the example, χ² = 555.6, confirming negative correlation

✅ Conclusion: Which Measures Are Interesting?

 Support & Confidence: Good for filtering, but insufficient

 Lift & Chi-square (χ²): Better for identifying true
interestingness

o Capture actual dependencies (positive or negative)

Would you like a visual diagram to summarize this comparison of

interestingness measures?

You said:

write as detaield notes

ChatGPT said:

Sure! Here's a detailed set of notes on the topic "Which Patterns Are
Interesting? — Pattern Evaluation Methods", covering all key
points, examples, and measures in a structured and easy-to-study
format:

Pattern Evaluation Methods in Association Rule Mining

🔹 Overview

 Most association rule mining algorithms use the support–

confidence framework.

 Issue: Even with thresholds, many generated rules may still be

uninteresting or misleading.

 Especially problematic with:

o Low support thresholds

o Long patterns

 To tackle this, additional measures are used to assess the

interestingness of patterns more effectively.

🧩 Strong Rules Are Not Necessarily Interesting

🔸 Subjective vs Objective Interestingness:

 Subjective: Depends on user preferences, domain knowledge, etc.

 Objective: Based on statistical measures derived from the data.

❗ Objective measures help filter out misleading rules before presenting them to
users.

🔸 Example 6.7 – Misleading “Strong” Rule

Scenario:

 Data from AllElectronics about purchase behavior:

o Total transactions: 10,000

o Customers who bought:

 Computer games: 6000

 Videos: 7500

 Both: 4000

Discovered Rule:

buys(X, "computer games") → buys(X, "videos")

Support = 40%, Confidence = 66%

Analysis:

 Rule meets minimum support (30%) and confidence (60%) →

Considered strong.

 BUT, the actual probability of buying videos is 75%.

o Hence, confidence (66%) < base probability (75%).

o Indicates negative correlation.

 Conclusion: Rule is misleading.

o Buying games actually reduces the likelihood of buying

videos.

Key Insight:

 Confidence alone doesn't imply meaningful association.

 Need better metrics to reveal true correlations.

🧪 From Association Analysis to Correlation Analysis

🔸 Why Use Correlation Measures?

 Support & confidence cannot detect:

o Independence

o Negative correlation

 Correlation measures evaluate statistical dependency between

items.

🔹 Lift Measure

Formula:

Lift(A → B) = P(A ∩ B) / (P(A) * P(B))

Interpretation:

 Lift > 1: Positive correlation (A implies B more often than by

chance)

 Lift = 1: No correlation (independence)

 Lift < 1: Negative correlation (A implies B less than by chance)

Example 6.8 – Applying Lift

 From previous data:

o P(game) = 0.60

o P(video) = 0.75

o P(game ∩ video) = 0.40

Lift = 0.40 / (0.60 × 0.75) = 0.40 / 0.45 = 0.89

 Result: Since lift < 1 → Negative correlation

✅ Lift detects negative correlation that confidence failed to reveal.

🔹 Chi-Square (χ²) Measure

Purpose:

 Tests independence between itemsets

 Based on observed vs. expected values in a contingency table

Formula:

χ² = Σ [(Observed − Expected)² / Expected]

Example 6.9 – Applying χ²

Contingency Table: Observed Values (Table 6.6)

¬Ga Row
Gam
m Tota
e l

Video 4000 3500 7500

¬Video 2000 500 2500

Col Total 6000 4000 10000

Expected Values (Table 6.7)

Game ¬Game
(Ex (Exp
p) )

Video 4500 3000

¬Vide
1500 1000
o

χ² Calculation:

markdown

CopyEdit

χ² = (4000−4500)² / 4500

+ (3500−3000)² / 3000

+ (2000−1500)² / 1500

+ (500−1000)² / 1000
χ² = 500²/4500 + 500²/3000 + 500²/1500 + 500²/1000

= 55.56 + 83.33 + 166.67 + 250.0

= **555.6**

Interpretation:

 χ² > 1 → Statistically significant deviation

 In this case:

o Observed joint occurrence (4000) < Expected (4500)

o Confirms negative correlation

✅ Like Lift, Chi-square also detects the negative correlation missed by the
confidence metric.

SAR Criterion 6
No ratings yet
SAR Criterion 6
31 pages
DWDM Notes
No ratings yet
DWDM Notes
59 pages
Fairness and Bias in Artificial Intelligence - A Brief Survey of Sources, Impacts, and Mitigation Strategies
No ratings yet
Fairness and Bias in Artificial Intelligence - A Brief Survey of Sources, Impacts, and Mitigation Strategies
16 pages
Data Mining Notes
No ratings yet
Data Mining Notes
297 pages
Modelling
No ratings yet
Modelling
1,161 pages
Knowledge Management UNIT-3 Notes
No ratings yet
Knowledge Management UNIT-3 Notes
17 pages
Unit 3 DW
No ratings yet
Unit 3 DW
19 pages
Datawarehouse and Data Mining Final Notes
No ratings yet
Datawarehouse and Data Mining Final Notes
9 pages
Synopsis Print
No ratings yet
Synopsis Print
4 pages
Data Mining for Computer Science Students
No ratings yet
Data Mining for Computer Science Students
52 pages
MCA 301 Data Mining Notes
No ratings yet
MCA 301 Data Mining Notes
6 pages
DMT Unit1
No ratings yet
DMT Unit1
46 pages
Data Science & Big Data Analysis Module 1,2,3,4,5
No ratings yet
Data Science & Big Data Analysis Module 1,2,3,4,5
70 pages
Unit-1 DWDM
No ratings yet
Unit-1 DWDM
20 pages
Unit-I Part II Erp
No ratings yet
Unit-I Part II Erp
60 pages
Why We Need Data Mining?
No ratings yet
Why We Need Data Mining?
39 pages
Data Mining Course Outline
No ratings yet
Data Mining Course Outline
7 pages
Supervised Learning 1 PDF
100% (1)
Supervised Learning 1 PDF
162 pages
Data Mining
No ratings yet
Data Mining
6 pages
Unit-5 DM
No ratings yet
Unit-5 DM
18 pages
Data Mining
No ratings yet
Data Mining
40 pages
Chap 1
No ratings yet
Chap 1
45 pages
Viva Preparation Notes
No ratings yet
Viva Preparation Notes
6 pages
Data Warehousing Essentials
No ratings yet
Data Warehousing Essentials
19 pages
Data Mining Unit 1
No ratings yet
Data Mining Unit 1
39 pages
Unit 1,2,3
No ratings yet
Unit 1,2,3
35 pages
Unit I DATA MINING AAGAC
No ratings yet
Unit I DATA MINING AAGAC
27 pages
DM & W SQ
No ratings yet
DM & W SQ
15 pages
Unit I Dbmi
No ratings yet
Unit I Dbmi
35 pages
Module 1
No ratings yet
Module 1
41 pages
Data Ming Unit 2
No ratings yet
Data Ming Unit 2
8 pages
Bca DM Unit I
No ratings yet
Bca DM Unit I
20 pages
Introduction To Data Mining-Week1
No ratings yet
Introduction To Data Mining-Week1
43 pages
Unit 1
No ratings yet
Unit 1
59 pages
Data Mining Introduction & Techniques
No ratings yet
Data Mining Introduction & Techniques
9 pages
Data Mining and Datawarehousing CS-303
No ratings yet
Data Mining and Datawarehousing CS-303
34 pages
Mining Frequent Patterns and Data Mining Topics Cleaned
No ratings yet
Mining Frequent Patterns and Data Mining Topics Cleaned
3 pages
DM Module1 Notes
No ratings yet
DM Module1 Notes
25 pages
7dm Midterm Reviewer
No ratings yet
7dm Midterm Reviewer
10 pages
DM Unit2 (Part1)
No ratings yet
DM Unit2 (Part1)
19 pages
Mod 5
No ratings yet
Mod 5
36 pages
Data Mining
No ratings yet
Data Mining
4 pages
Chapter 1 Data Mining Lecture Note
No ratings yet
Chapter 1 Data Mining Lecture Note
31 pages
Data Warehousing & Data Mining Unit-3 Notes
No ratings yet
Data Warehousing & Data Mining Unit-3 Notes
27 pages
Unit-1 Notes
No ratings yet
Unit-1 Notes
24 pages
Data Mining
No ratings yet
Data Mining
4 pages
Datamining 1
No ratings yet
Datamining 1
30 pages
Data Mining Unit I Notes
No ratings yet
Data Mining Unit I Notes
24 pages
Data Mining
No ratings yet
Data Mining
55 pages
Data Mining Overview
No ratings yet
Data Mining Overview
4 pages
DMDW Full Notes
No ratings yet
DMDW Full Notes
26 pages
ISS - Module 3
No ratings yet
ISS - Module 3
11 pages
Unit No 3
No ratings yet
Unit No 3
10 pages
Data Preprocessing Personal
No ratings yet
Data Preprocessing Personal
11 pages
Unit 4
No ratings yet
Unit 4
4 pages
Ai Pass
No ratings yet
Ai Pass
12 pages
MBA Data Mining Unit 1 Notes
No ratings yet
MBA Data Mining Unit 1 Notes
12 pages
Transformer Architecture
No ratings yet
Transformer Architecture
18 pages
DM Unit 1
No ratings yet
DM Unit 1
10 pages
Data Science Internship Report
No ratings yet
Data Science Internship Report
55 pages
DWDM
No ratings yet
DWDM
14 pages
Data Mining 1
No ratings yet
Data Mining 1
7 pages
Transformer & GPT Model Basics
No ratings yet
Transformer & GPT Model Basics
69 pages
Ba Unit 3 Own
No ratings yet
Ba Unit 3 Own
7 pages
Unit Iii
No ratings yet
Unit Iii
10 pages
Artificial Intelligence Assignment
No ratings yet
Artificial Intelligence Assignment
8 pages
Geochemical Anomalies
No ratings yet
Geochemical Anomalies
8 pages
DMMLM - Risk Score Prediction Model
No ratings yet
DMMLM - Risk Score Prediction Model
28 pages
Biostatistics and Microbiology A Surviva
No ratings yet
Biostatistics and Microbiology A Surviva
1 page
Power Plays: Unleashing Machine Learning Magic in Smart Grids
No ratings yet
Power Plays: Unleashing Machine Learning Magic in Smart Grids
16 pages
Lima Et Al., 2022, MLOps Practices, Maturity Models, Roles, Tools, and Challenges - A Systematic Literature Review
No ratings yet
Lima Et Al., 2022, MLOps Practices, Maturity Models, Roles, Tools, and Challenges - A Systematic Literature Review
13 pages
Georgia State University
No ratings yet
Georgia State University
12 pages
Deep Air Quality Forecasting Using Hybrid Deep
No ratings yet
Deep Air Quality Forecasting Using Hybrid Deep
14 pages
COS40007 Design Project
No ratings yet
COS40007 Design Project
11 pages
Sentinel-1 ML Models for Soil Moisture
No ratings yet
Sentinel-1 ML Models for Soil Moisture
14 pages
Heart Disease Prediction Model
No ratings yet
Heart Disease Prediction Model
15 pages
Demand Forecasting Intermittentand Lumpy Time Series
No ratings yet
Demand Forecasting Intermittentand Lumpy Time Series
11 pages
Final Action Items-ASQ
No ratings yet
Final Action Items-ASQ
7 pages
Chap 1
No ratings yet
Chap 1
32 pages
Activations
No ratings yet
Activations
8 pages
DWDM
No ratings yet
DWDM
14 pages
AI Residency for Career Boost
No ratings yet
AI Residency for Career Boost
5 pages
Final Magnitude Hackathon Problem Statements
No ratings yet
Final Magnitude Hackathon Problem Statements
4 pages
Fsoft JD Ai Engineer - GHC
No ratings yet
Fsoft JD Ai Engineer - GHC
3 pages
What Is Bias Variance Decomposition - BbGoogle Search
No ratings yet
What Is Bias Variance Decomposition - BbGoogle Search
3 pages
LABSHEET 3 - SN PID Controller PDF
No ratings yet
LABSHEET 3 - SN PID Controller PDF
6 pages
DMW Module 3
No ratings yet
DMW Module 3
112 pages
UEC735
No ratings yet
UEC735
2 pages
Data Mining: An Overview From A Database Perspective
No ratings yet
Data Mining: An Overview From A Database Perspective
30 pages