Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
12 views6 pages

Intro:: Part-1: Bayesian Learning

Bayesian learning is a probabilistic method that utilizes Bayes' Theorem to update the probability of a hypothesis as new data is acquired, involving prior probability, likelihood, and posterior probability. It includes techniques like Maximum Likelihood Estimation and Least Squared Error for model fitting, as well as approaches like Locally Weighted Regression and Case-Based Reasoning for problem-solving. The document also contrasts lazy and eager learning methods in machine learning, highlighting their computational and memory requirements.

Uploaded by

Charan Cherry
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views6 pages

Intro:: Part-1: Bayesian Learning

Bayesian learning is a probabilistic method that utilizes Bayes' Theorem to update the probability of a hypothesis as new data is acquired, involving prior probability, likelihood, and posterior probability. It includes techniques like Maximum Likelihood Estimation and Least Squared Error for model fitting, as well as approaches like Locally Weighted Regression and Case-Based Reasoning for problem-solving. The document also contrasts lazy and eager learning methods in machine learning, highlighting their computational and memory requirements.

Uploaded by

Charan Cherry
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Part-1: Bayesian Learning

Intro:

- Bayesian learning is a probabilistic approach that applies Bayes' Theorem to update the probability estimate
for a hypothesis, as more evidence or information becomes available.
- Bayesian Learning is suitable for problems where there is plenty of information/training/prior data to go with.
- That is, the hypothesis for a data is determined and updated every time the new data is added into the
considered dataset.
- Terminologies:
- Hypothesis Space (H): A set of all possible hypotheses that can explain the observed data.
- Prior Probability (P(H)): The initial probability assigned to a hypothesis before observing any data. This
represents prior knowledge or belief about the hypothesis.
- Likelihood (P(D|H)): The probability of observing the data 𝐷 given that the hypothesis 𝐻 is true. This
measures how well the hypothesis explains the data.
- Posterior Probability (P(H|D)): The updated probability of the hypothesis after considering the observed
data. This is the core of Bayesian updating.
- Bayesian Learning Process:
1. Define the Prior: Start with an initial belief about the hypotheses, represented by the prior distribution
P(H)P(H)P(H).
2. Collect Data: Gather observed data DDD.
3. Compute the Likelihood: Evaluate how likely the observed data is under each hypothesis using
P(D∣H)P(D|H)P(D∣H).
4. Update Beliefs: Apply Bayes' Theorem to update the prior beliefs and obtain the posterior distribution
P(H∣D)P(H|D)P(H∣D).

Bayes’ Theorem:

- It is used to find the probability of occurrence of an event, based on some pre-provided evidence.
- Its is stated as, for a given dataset(evidence) (D), the Probability that (H) is the correct hypothesis for (D) is:

(known as posterior probability)


Where,
P(D | H) - Probability that (D) exists for given Hypothesis (H) (current probability)
P(H) - Probability of Hypothesis (H)
P(D) - Probability of Dataset(D)
- P(D | H) is termed as a likelihood.
- P(H) and P(D) are termed as Dataset(D).
- From the set of posterior probabilities for each hypothesis, we can determine the most probable hypothesis
by using hmap.
- hmap (Maximum-A-Posteriori Hypothesis) is a Hypothesis that maximizes the posterior probability using bayes’
theorem.
- It is given by:
Bayes’ Theorem and Concept Learning:

Maximum Likelihood and Least Squared Error Hypothesis:

- Maximum Likelihood Estimation (MLE) is a method used in statistics and machine learning to estimate the
parameters of a machine learning model.
- The principle behind MLE is to find the parameter values that maximize the likelihood(probability) of the
observed data.
- Since, the hmap focuses on likelihood and prior probabilities, but the hML focuses only on likelihood probability.
- It is given by:

- The above relation is also called continuous-valued target function (ML Hypothesis), because the set to
target-data-values(D) makes it continuous.
- But the ML models like Linear Regression, Non-linear Regression, and Curve Fitting cannot/unable to learn
directly from this above relation.
- Hence we derive the Maximum Likelihood Hypothesis relation to get the LSE relation.
- The Least Squared Error (LSE) hypothesis is a method used in regression analysis to find the best-fitting line
to a set of observed data points.
- The principle behind LSE is to minimize the sum of squared differences between observed values and
predicted values.
- It is given by:

- By deriving the maximum likelihood hypothesis with a normalization distribution function, we get the
minimized/least squared error hypothesis.
- Derivation: (Notes)
—---------------------------------------------------------------------------------------------------
2. Locally Weighted Regression:
- Regression is a statistical approach used to analyze the relationship between a dependent variable (target
variable) and one or more independent variables (predictor variables). The objective is to determine the most
suitable function that characterizes the connection between these variables.
- When similar data points in a dataset can be separated with a single straight line, then it is known as linear
regression, and the data is known as linearly separable data.
- Hence, Locally Weighted Regression is a technique to separate the non-linearly separable data.
- In this method, weights are assigned to each data point, by using the below relation known as Kernel
Smoothing:

Where, X = each training input (X1, X2, …… Xn)


X0 = value we are predicting

- The weight to be assigned for a data point is inversely proportional to the difference b/w expected value and
predicted value. That is, less the error value, more the weight will be.

- We have to construct a weight matrix for each input value (X).

Drawbacks:
- Computation cost is high.
- Memory requirements are high.

3. Radial basis functions:

Data in Machine Learning can be of two types, viz:


1. Linearly Separable Data
2. Non-Linearly Separable Data
- A single layer perceptron can be used to classify the linearly separable data.
- Whereas, multi-layer perceptron is used to classify the
non-linearly separable data. But it is very complex.
- Hence, the complexity of classifying the non-linearly separable
data by using multi-layer perceptrons, can be reduced by using
radial basis functions as activation functions.
- With RBFs, the data is compressed horizontally and expanded
Vertically.
- The two RBFs are:
4. Case Based Reasoning:

- Case-Based Reasoning (CBR) is a problem-solving approach in artificial intelligence and cognitive science
where new problems are solved by referencing and adapting solutions from previously encountered, similar
cases.
- In this method, everything is assumed/considered as a separate case.
- In this method, the instances are represented as the symbols.
- CBR is similar to Instance based learning. That is, the new instances are classified by analyzing the existing
instances directly. Similarly, CBR takes previously solved cases, while solving a new case (classifying).
- CBR uses CADET (Case-Based-Desgned-Tool) system which provides 75 predefined case designs.
- Working Procedure of CBR:
1. Case Retrieval: When a new problem arises, the system retrieves cases from its database of previously
solved problems that are similar to the current problem. This involves finding cases that have similar features
or are in similar contexts.
2. Case Adaptation: Once similar cases are retrieved, the system adapts the solutions from these cases to fit
the new problem. This may involve modifying the solution to account for differences between the old and new
cases.
3. Solution Application: The adapted solution is then applied to the current problem. This step involves
executing the solution and potentially testing it to ensure it works in the new context.
4. Case Storage: After solving the problem, the system stores the new case and its solution in the database.
This allows the system to build up a repository of cases over time, improving its ability to solve future problems.
Remarks on lazy and eager learning:

Lazy Learning: Prediction is made directly by analyzing the pre existing instances. That is, without building a
model.
- Computation time and Memory requirement is more, because all the existing instances must be checked and
new instances must be stored so that it is used for next new instance classification.
Ex: KNN, etc
Eager Learning: Prediction is made by a trained model with previously provided instances. That is, by building
a model.
- Computation time and Memory requirements are relatively less.
Ex: Decision Tree Learning, Naive Bayes Classifier, etc

You might also like