Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
26 views16 pages

Predictive Analytics Basics

The document discusses the fundamentals of predictive analytics in machine learning, focusing on data mining tasks categorized as descriptive and predictive. It explains the concepts of supervised and unsupervised learning, detailing the process of training models to make predictions based on labeled data. Key components of learning include representation, evaluation, and optimization, culminating in the development of a prediction rule from training data.

Uploaded by

Dhruv Jain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views16 pages

Predictive Analytics Basics

The document discusses the fundamentals of predictive analytics in machine learning, focusing on data mining tasks categorized as descriptive and predictive. It explains the concepts of supervised and unsupervised learning, detailing the process of training models to make predictions based on labeled data. Key components of learning include representation, evaluation, and optimization, culminating in the development of a prediction rule from training data.

Uploaded by

Dhruv Jain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

DS605: Fundamentals of Machine Learning

Lecture 07

Fundamentals of Predictive Analytics


[Representation, Evaluation, and Optimization]

Arpit Rana
5th August 2024
Data Mining Tasks

Disclaimer: Most images incorporated within the presentation slides


have been sourced from different sources on the web and ML books.
Data Mining Tasks

Data Mining Tasks


The actual data mining task is the semi-automatic or automatic
analysis of large quantities of data to extract interesting patterns.

Descriptive Predictive
Find human-interpretable patterns Use some variables to predict future
that describe the data. or unknown values of other variables.

● Cluster Analysis ● Regression


● Outlier Analysis ● Classification
● Association Rule Mining
● Sequence Pattern Mining

In Machine Learning terminology, these In Machine Learning terminology, these


tasks are categorised as “Unsupervised tasks are categorised as “Supervised
Learning”. Learning”.
Data Mining Tasks

Data Mining Tasks


The actual data mining task is the semi-automatic or automatic
analysis of large quantities of data to extract interesting patterns.

Descriptive Predictive
Find human-interpretable patterns Use some variables to predict future
that describe the data. or unknown values of other variables.

● Cluster Analysis ● Regression


● Outlier Analysis ● Classification
● Association Rule Mining
● Sequence Pattern Mining

In Machine Learning terminology, these In Machine Learning terminology, these


tasks are categorised as “Unsupervised tasks are categorised as “Supervised
Learning”. Learning”.
Machine Learning: Definition

Machine Learning is

● the science (and art) of programming computers


● so they can learn from data. AI

ML
– Aurelien Geron, Google
DL

Gen
-AI
Machine Learning: Example

A Spam Filter,
● a Machine Learning Program, given
○ examples of “spam” emails (e.g. flagged by
users), and
○ examples of “ham” (i.e. regular) emails
● can learn to flag spam
Machine Learning: A New Programming Paradigm

Data Rules Data Answers

Traditional
Programming Machine
Learning
(Symbolic AI)

Answers Rules

● A long list of complex (hard coded) rules ● Automatically learns which words or
phrases are good predictors of spam
● Keep writing new rules as the new
phrases are introduced by spammers
Machine Learning: Definition Revisited

Machine Learning is the training of a model from data that generalises a decision against a
performance measure.

● Training a model suggests training examples. Data Answers

● A model suggests state acquired through experience.

● Generalises a decision suggests the capability to make a


decision based on inputs and anticipating unseen inputs in
the future for which a decision will be required. Machine
Learning
● against a performance measure suggests a targeted need and
directed quality to the model being prepared.

Model
Learning = Representation + Evaluation + Optimization

Representation
Choosing a representation of the learner: the hypotheses
space or the model class — the set of models that it can
possibly learn.

Evaluation
Choosing an evaluation function (also called objective
function, utility function, loss function, or scoring
function) is needed to distinguish good classifiers from
bad ones.

Optimization
��
Choosing a method to search among the models in the
hypothesis space for the highest-scoring one.
Learning = Representation + Evaluation + Optimization


✔ ✔ ✔
✔ ✔ ✔

✔ ✔
✔ ✔ ✔


Supervised Learning
Problem Settings and Examples
Supervised Learning: A Formal Model

The learner’s input:


● Domain set
An arbitrary set (instance space), X, the set of objects (a.k.a. instances, domain points) we may wish to
label.

● Label set
A set of possible labels, Y. e.g., {0, 1}, {-1, 1}.

● Training data
S = ((x1, y1) . . . (xm, ym)) is finite sequence of pairs in X x Y, i.e., a sequence of labeled domain points.

The learner’s output:


● A prediction rule, h : X → Y , also called a predictor, a hypothesis, or a classifier.
○ The learner returns h upon receiving the training sequence S.
○ It can be used to predict the label of new domain points (like the past ones).
Supervised Learning: A Formal Model

Data-generation Model:
● Let D be a probability distribution over X x Y, i.e., D is joint probability distribution over domain
points and labels.
○ A distribution Dx over unlabeled domain points (sometimes called marginal distribution),
○ A conditional probability over labels for each domain point, D((x, y) | x).

Independent and Identically Distributed (I.I.D.) Assumption


● Each domain point x has the same prior probability distribution (to be sampled):
P(xi) = P(xi+1) = P(xi+2) = · · · ,
and is independent of the previous examples:
P(xi) = P(xi | xi-1 , xi-2 , . . .) .
Supervised Learning: A Formal Model

More formally, the task of supervised learning can be defined as -


Given a training set (S) of m example input-output pairs,

We call the output y(i) the


ground truth — the true answer
we are asking our model to
predict.

where each pair was generated by an unknown function y = f (x),


discover a function h that approximates the true function f .
Supervised Learning Process

Training
Phase

Inductive Learning: given


Learner a set of observations, it
Hypothesis
finds a function that is
Space 𝓗 (𝚪: S → h) applicable to the entire
instance space. .

Stationarity:
Follows the
Final Hypothesis or
Model (h) Test
same
Phase
distribution as A Test Instance Prediction
the training
instances.
Next lecture Choosing a Hypothesis Space
6th August 2024

You might also like