0% found this document useful (0 votes)

16 views103 pages

Cs Ai Lecture Notes 02

The document discusses various models and theories in decision-making, particularly focusing on agent-based decision-making in deterministic and stochastic environments, exemplified by the 8-puzzle and soccer. It covers search strategies, including breadth-first and iterative-deepening search, as well as the application of probability theory and Bayesian inference in learning and decision-making processes. Additionally, it addresses concepts like regression, model selection, and the implications of dimensionality in data analysis.

Uploaded by

techviktor17

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views103 pages

Cs Ai Lecture Notes 02

Uploaded by

techviktor17

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 103

Discovering Models /

Theories

cs365 2015 mukerjee

Domain Theories
 Agent :
given precept history p ∈ P,
select decision from set of choices a ∈ A
so as to meet a goal g (performance) –
maximize utility function U()

 Requires knowledge of how actions under different precepts

affect the goal
 Model or Theory

 Task domains: a) 8-puzzle, [detrmnstc] b) Soccer [stochastic]

8-puzzle

• Precept = state

• Actions = move

• Goal : T/F

• Utility : num moves

8-puzzle

• State = [7,2,4,5,B,6,8,3,1]

• Actions = L,R, U,D

State + Action
 new State

• Decision: based on Search

• [Informed / Uninformed]
Breadth-first search
• Expand shallowest unexpanded node

• Fringe: FIFO queue new successors go at end

O(b1+d)

CS 3243 - Blind Search 5

Properties of breadth-first search
• Complete? Yes (if b is finite)

• Time? 1+b+b2+b3+… +bd + b(bd-1) = O(bd+1)

• Space? O(bd+1) (keeps every node in memory)

• Optimal? Yes (if cost = 1 per step)

Iterative-Deepening search

14 Jan 2004 CS 3243 - Blind Search 7

Cost-based search

• edges don’t have

equal cost

• Breadth-first = first
search lower costs
from START
• Fringe: FIFO

O(b1+C/ ε)

8
Soccer

• Precept = goalie, self, ball

+ wind, opponents, θ
teammates…

• Actions = kick (angle,

speed, swing)

• Utility : goal probability

Discrete-Deterministic Spaces:

Search
Uninformed search strategies
• Uninformed search strategies use only the
information available in the problem definitio
• Breadth-first search
• Uniform-cost search
• Depth-first search
• Depth-limited search
• Iterative deepening search
Breadth-first search
• Expand shallowest unexpanded node

• Fringe: FIFO queue new successors go at end

14 Jan 2004 CS 3243 - Blind Search 12

Properties of breadth-first search
• Complete? Yes (if b is finite)

• Time? 1+b+b2+b3+… +bd + b(bd-1) = O(bd+1)

• Space? O(bd+1) (keeps every node in memory)

• Optimal? Yes (if cost = 1 per step)

Representing
the state
space

1. States:

2. Actions :

3. Goal test:

4. Cost:
8-puzzle heuristics
Admissible:

• h1 : Number of misplaced tiles

=6
goal:
• h2: Sum of Manhattan
distances of the tiles
from their goal positions
= 0+0+1+1+2+3+1+3=11
8-puzzle heuristics
Nilsson’s Sequence
Score(n) = P(n) + 3 S(n)

P(n) : Sum of Manhattan distances of each tile from

its proper position
S(n), sequence score : check around the non-central
squares:
+2 for every tile not followed by successor
0 for every other tile.
piece in center = +1
Stochastic Spaces
Soccer

θ
Soccer : Shooting at goal

[acharya mukerjee 01]

Soccer : Shoot, Pass, dribble, or … ?
Handwritten digits - MNIST
Confusion matrix
Discovering theories
Continuous Data
Discrete Attribute data
• Examples described by attribute values (Boolean, discrete, continuous)
• E.g., situations where I will/won't wait at a restaurant:

• Classification of examples is positive (T) or negative (F)

Discrete Features
• Parse the sentence: “Time flies like an arrow”

May have many parses.

How to rank the choices?
Regression
Modelling as Regression
Given a set of decisions yi based on observations xi,
- derived from unknown function y = f(x)
- with noise

Try to find a model or theory:

y = h(x) ≈ f(x)

where h() is drawn from the hypothesis space – e.g. the space of
radial basis functions, or polynomials, etc.
Polynomial Curve Fitting

[Bishop 06] ch.1

Linear Regression
y = f(x) = Σi wi . φi(x)

φi(x) : basis function

wi : weights

Linear : function is linear in the weights

Quadratic error function --> derivative is linear in w
Sum-of-Squares Error Function
0th Order Polynomial
1st Order Polynomial
3rd Order Polynomial
9th Order Polynomial
Over-fitting

Root-Mean-Square (RMS) Error:

Polynomial Coefficients
9th Order Polynomial
Data Set Size:
9th Order Polynomial
Data Set Size:
9th Order Polynomial
Regularization

Penalize large coefficient values

Regularization:
Regularization:
Regularization: vs.
Polynomial Coefficients
Probability Theory
Learning = discovering regularities
- Regularity : repeated experiments:
outcome not be fully predictable

outcome = “possible world”

set of all possible worlds = Ω
Probability Theory
Apples and Oranges
Sample Space
Sample ω = Pick two fruits,
e.g. Apple, then Orange
Sample Space Ω = {(A,A), (A,O),
(O,A),(O,O)}
= all possible worlds

Event e = set of possible worlds, e ⊆ Ω

• e.g. second one picked is an apple
Learning = discovering regularities
- Regularity : repeated experiments:
outcome not be fully predictable

- Probability p(e) : "the fraction of possible worlds in

which e is true” i.e. outcome is event e

- Frequentist view : p(e) = limit as N → ∞

- Belief view: in wager : equivalent odds
(1-p):p that outcome is in e, or vice versa
Axioms of Probability
- non-negative : p(e) ≥ 0

- unit sum p(Ω) = 1

i.e. no outcomes outside sample space

- additive : if e1, e2 are disjoint events (no common

outcome):
p(e1) + p(e2) = p(e1 ∪ e2)
ALT:
p(e1 ∨ e2) = p(e1) + p(e2) - p(e1 ∧ e2)
Why probability theory?
different methodologies attempted for uncertainty:
– Fuzzy logic
– Multi-valued logic
– Non-monotonic reasoning
But unique property of probability theory:
If you gamble using probabilities you have the best
chance in a wager. [de Finetti 1931]
=> if opponent uses some other system, he's
more likely to lose
Ramsay-diFinetti theorem (1931)
If agent X’s degrees of belief are rational, then X ’s
degrees of belief function defined by fair betting
rates is (formally) a probability function
Fair betting rates: opponent decides which side one
bets on
Proof: fair odds result in a function pr () that satisifies
the Kolmogrov axioms:
Normality : pr(S) >=0
Certainty : pr(T)=1
Additivity : pr (S1 v S2 v.. )= Σ(Si)
Joint vs. conditional probability

Marginal Probability

Joint Probability Conditional Probability

Probability Theory

Sum Rule

Product Rule
Rules of Probability

Sum Rule

Product Rule
Example
A disease d occurs in 0.05% of population. A test is
99% effective in detecting the disease, but 5% of
the cases test positive in absence of d.
10000 people are tested. How many are expected to
test positive?
p(d) = 0.0005 ; p(t/d) = 0.99 ; p(t/~d) = 0.05
p(t) = p(t,d) + p(t,~d) [Sum Rule]
= p(t/d)p(d) + p(t/~d)p(~d) [Product Rule]
= 0.99*0.0005 + 0.05 * 0.9995 = 0.0505  505 +ve
Bayes’ Theorem

posterior  likelihood × prior

Bayes’ Theorem
Thomas Bayes (c.1750):
how can we infer causes from effects?
How can one learn the probability of a future event if one knew
only
how many times it had (or had not) occurred in the past?

as new evidence comes in --> prob knowledge improves.

e.g. throw a die. guess is poor (1/6)
throw die again. is it > or < than prev? Can improve guess.
throw die repeatedly. can improve prob of guess quite a lot.

Hence: initial estimate (prior belief P(h), not well formulated)

+ new evidence (support) – compute likelihood P(data|h)
 improved estimate (posterior P(h|data) )
Example
A disease d occurs in 0.05% of population. A test is
99% effective in detecting the disease, but 5% of
the cases test positive in absence of d.
If you are tested +ve, what is the probability you have
the disease?
p(d/t) = p(d) . p(t/d) / p(t) ; p(t) = 0.0505
p(d/t) = 0.0005 * 0.99 / 0.0505 = 0.0098 (about 1%)
if 10K people take the test, E(d) = 5
FPs = 0.05 * 9995 = 500
TPs = 0.99 * 5 = 5.  only 5/505 have d
Bayesian Inference
Testing for hypothesis H given evidence E
- Evidence : based on new observation E
- Prior : Earlier evaluation about the probability of H
- Likelihood : probability of evidence given hypothesis
P(E|H)
normalization(
Bayesian inference: (marginal lklihood)
P (H|E) = P(E|H) P(H) / P(E)

Posterior probability
Bayesian Inference
The fruit picked is an orange
(o). What is the probability
that it’s from the blue box (B)?

orange
P(B|o) =
P(o|B)p(B) / P(o)

Given: red box is picked

40%  p(B) = 0.6

P(o) = (¾.6 + 1/30.4) = 11/20

P(B|o) = ¾ * .6 * 20/11 = 9/11

Continuous variables:
Probability Densities
Probability Densities
cumulative
Expectations

discrete x continuous x

Frequentist approximation w unbiased sample

(both discrete / continuous)

The Gaussian Distribution
Gaussian Mean and Variance
Central Limit Theorem
Distribution of sum of N i.i.d. random variables
becomes increasingly Gaussian for larger N.

Example: N uniform [0,1] random variables.

Gaussian Parameter Estimation

Observations
assumed to be
indpendently
drawn from same
distribution (i.i.d)

Likelihood function
Maximum (Log) Likelihood
Distributions over
Multi-dimensional spaces
The Multivariate Gaussian

lines of equal
probability densities
Multivariate distribution

joint distribution P(x,y) varies considerably

though marginals P(x), P(y) are identical

estimating the joint distribution requires

much larger sample: O(nk) vs nk
Marginals and Conditionals

marginals P(x), P(y) are gaussian

conditional P(x|y) is also gaussian
Non-intuitive in high dimensions

As dimensionality
increases, bulk of
data moves away
from center

Gaussian in polar coordinates;

p(r)δr : prob. mass inside annulus δr at r.
Change of variable x=g(y)
Bernoulli Process

Successive Trials – e.g. Toss a coin three times:

HHH, HHT, HTH, HTT, THH, THT, TTH, TTT

Probability of k Heads:

k 0 1 2 3
P(k) 1/8 3/8 3/8 1/8
Probability of success: p, failure q, then
Model Selection
Model Selection
Cross-Validation
Quantized-Cell Classification

flow data
red: ‘homogenous’,
green : ‘annular’,
blue : ‘laminar’.
Curse of Dimensionality

general cubic polynomial for D dimensions : O(D3) parameters

Curse of Dimensionality
The unit hyper cube and unit sphere in high dimensions

At higher dim, vol(sphere) / vol(hypercube)  0

Curse of Dimensionality
Polynomial curve fitting, M = 3

Gaussian Densities in
higher dimensions
Regression with Polynomials
Curve Fitting Re-visited
Bayesian Inference
Testing for hypothesis H given evidence E

likelihood
Bayesian inference:
P (H|E) = P(E|H) P(H) / P(E)
prior
posterior
Maximum Likelihood
Evidence = t; Hypothesis = poly(x,w)
Maximum Likelihood
Evidence = t; Hypothesis = poly(x,w)

Determine by minimizing sum-of-squares error,

.
Predictive Distribution
MAP: A Step towards Bayes

Determine by minimizing regularized sum-of-squares error,

MAP = Maximum Posterior

Bayesian Curve Fitting
Bayesian Predictive Distribution
Information Theory
Twenty Questions
Knower: thinks of object (point in a probability space)
Guesser: asks knower to evaluate random variables

Stupid approach:

Guesser: Is it my left big toe?

Knower: No.

Guesser: Is it Valmiki?
Knower: No.

Guesser: Is it Aunt Lakshmi?

...
Expectations & Surprisal
Turn the key: expectation: lock will open

Exam paper showing: could be 100, could be zero.

random variable: function from set of marks
to real interval [0,1]

Interestingness ∝ unpredictability

surprisal (r.v. = x) = - log2 p(x)

= 0 when p(x) = 1
= 1 when p(x) = ½
= ∞ when p(x) = 0
Expectations in data

A: 00010001000100010001. . . 0001000100010001000100010001

B: 01110100110100100110. . . 1010111010111011000101100010

C: 00011000001010100000. . . 0010001000010000001000110000

Structure in data  easy to remember

Entropy

Used in
• coding theory
• statistical physics
• machine learning
Entropy
Entropy
In how many ways can N identical objects be allocated M
bins?

Entropy maximized when

Entropy in Coding theory
x discrete with 8 possible states; how many bits to
transmit the state of x?

All states equally likely

Coding theory
Entropy in Twenty Questions
Intuitively : try to ask q whose answer is 50-50

Is the first letter between A and M?

question entropy = p(Y)logp(Y) + p(N)logP(N)

For both answers equiprobable:

entropy = - ½ * log2(½) - ½ * log2(½) = 1.0

For P(Y)=1/1028
entropy = - 1/1028 * -10 - eps = 0.01

MTH2222 Mathematics of Uncertainty
No ratings yet
MTH2222 Mathematics of Uncertainty
96 pages
Fourier Transform - Signal Processing
No ratings yet
Fourier Transform - Signal Processing
366 pages
Dealing With Uncertainty P (X - E) : Probability Theory The Foundation of Statistics
No ratings yet
Dealing With Uncertainty P (X - E) : Probability Theory The Foundation of Statistics
34 pages
4 PDF
No ratings yet
4 PDF
316 pages
ECE 368 Course Review: Probabilistic Reasoning 2023
No ratings yet
ECE 368 Course Review: Probabilistic Reasoning 2023
138 pages
Chepter # 5 Simple Regression and Correlation Exercise # 5 by Shahid Mehmood Simple Regression
No ratings yet
Chepter # 5 Simple Regression and Correlation Exercise # 5 by Shahid Mehmood Simple Regression
7 pages
Bayesian Learning for Graphics
No ratings yet
Bayesian Learning for Graphics
141 pages
Probability Theory For Machine Learning: Chris Cremer September 2015
No ratings yet
Probability Theory For Machine Learning: Chris Cremer September 2015
40 pages
Bayesian Learning: Thanks To Nir Friedman, HU
No ratings yet
Bayesian Learning: Thanks To Nir Friedman, HU
41 pages
Introduction To Discrete Probability Theory and Bayesian Networks
No ratings yet
Introduction To Discrete Probability Theory and Bayesian Networks
26 pages
Sampling Rate and Aliasing On A Virtual Laboratory
No ratings yet
Sampling Rate and Aliasing On A Virtual Laboratory
4 pages
Scribe: Naive Bayes Classifier
No ratings yet
Scribe: Naive Bayes Classifier
16 pages
ECE523 Engineering Applications of Machine Learning and Data Analytics - Bayes and Risk - 1
No ratings yet
ECE523 Engineering Applications of Machine Learning and Data Analytics - Bayes and Risk - 1
7 pages
Hidden Markov Models
No ratings yet
Hidden Markov Models
51 pages
Modeling With Probability
No ratings yet
Modeling With Probability
91 pages
Introduction To Probability Theory: A Short Course On Graphical Models
No ratings yet
Introduction To Probability Theory: A Short Course On Graphical Models
30 pages
Understanding Uncertainty in AI
No ratings yet
Understanding Uncertainty in AI
49 pages
Mathematics in Machine Learning
No ratings yet
Mathematics in Machine Learning
83 pages
PyCon 2015 - Bayesian Statistics Made Simple
100% (4)
PyCon 2015 - Bayesian Statistics Made Simple
145 pages
Statistical Tests Martin G 161131 V15 UPLOAD
No ratings yet
Statistical Tests Martin G 161131 V15 UPLOAD
33 pages
Probability Theory - Towards Data Science
No ratings yet
Probability Theory - Towards Data Science
19 pages
Lecture 10
No ratings yet
Lecture 10
59 pages
Three Example Lagrange Multiplier Problems PDF
No ratings yet
Three Example Lagrange Multiplier Problems PDF
4 pages
Unit 2 (2) - 1
No ratings yet
Unit 2 (2) - 1
37 pages
Bayes
No ratings yet
Bayes
10 pages
CHP 5
No ratings yet
CHP 5
63 pages
Cpts 440 / 540 Artificial Intelligence: Uncertainty Reasoning
No ratings yet
Cpts 440 / 540 Artificial Intelligence: Uncertainty Reasoning
59 pages
Data Science Probability Cheat Sheet
50% (2)
Data Science Probability Cheat Sheet
74 pages
PMRprobabilistic Modelling Primer
No ratings yet
PMRprobabilistic Modelling Primer
14 pages
Naive Bayes
No ratings yet
Naive Bayes
25 pages
Introduction To Probability Theory and Statistics
No ratings yet
Introduction To Probability Theory and Statistics
127 pages
Cs229 Probability Review
No ratings yet
Cs229 Probability Review
36 pages
Ece2008 Robotics-And-Automation Eth 2.0 40 Ece2008
No ratings yet
Ece2008 Robotics-And-Automation Eth 2.0 40 Ece2008
2 pages
ICMC Research Catalog
No ratings yet
ICMC Research Catalog
34 pages
Introduction To Probabilistic Learning
No ratings yet
Introduction To Probabilistic Learning
9 pages
07 Probability Review
No ratings yet
07 Probability Review
56 pages
ML Unit2-1
No ratings yet
ML Unit2-1
11 pages
MAT3003 Modules - (1 2 3) - Updated
No ratings yet
MAT3003 Modules - (1 2 3) - Updated
40 pages
3logistic Regression
No ratings yet
3logistic Regression
61 pages
Leon-Garcia-IPPR - Chapters 1-6
No ratings yet
Leon-Garcia-IPPR - Chapters 1-6
180 pages
Quantitative Risk Analysis: TEJ PANDYA 80303160091
No ratings yet
Quantitative Risk Analysis: TEJ PANDYA 80303160091
5 pages
Lecture Notes MAI
No ratings yet
Lecture Notes MAI
111 pages
Foundations of Machine Learning: Part A: Probability Basics
No ratings yet
Foundations of Machine Learning: Part A: Probability Basics
75 pages
Introduction To Probability Theory and S
No ratings yet
Introduction To Probability Theory and S
127 pages
Lecture-4 Strip Method - StructuralDesign (Compatibility Mode) PDF
No ratings yet
Lecture-4 Strip Method - StructuralDesign (Compatibility Mode) PDF
7 pages
4b ProbabilityNotes
No ratings yet
4b ProbabilityNotes
79 pages
Queue: Dr. Manmath Narayan Sahoo Dept. of CSE, NIT Rourkela
No ratings yet
Queue: Dr. Manmath Narayan Sahoo Dept. of CSE, NIT Rourkela
37 pages
Stata Practical Multilevel
No ratings yet
Stata Practical Multilevel
23 pages
A Review On Image Retrieval Techniques
No ratings yet
A Review On Image Retrieval Techniques
4 pages
ML - Lec 2 - Review of Probability and Statistics
No ratings yet
ML - Lec 2 - Review of Probability and Statistics
30 pages
Sam Roweis Probx
No ratings yet
Sam Roweis Probx
12 pages
PTSP
No ratings yet
PTSP
101 pages
Differential Geometry Exam 2017
No ratings yet
Differential Geometry Exam 2017
2 pages
Lecture Notes MAI
No ratings yet
Lecture Notes MAI
114 pages
QB - TE5101 - Unit 4 PDF
No ratings yet
QB - TE5101 - Unit 4 PDF
1 page
Advances in Neural Computation Machine Learning and Cognitive Re 2020
No ratings yet
Advances in Neural Computation Machine Learning and Cognitive Re 2020
434 pages
Reading Material 02
No ratings yet
Reading Material 02
30 pages
White Box Testing (SW)
No ratings yet
White Box Testing (SW)
24 pages
Capítulo 1 Probabilidad Libro
No ratings yet
Capítulo 1 Probabilidad Libro
33 pages
Stochbasics Handout
No ratings yet
Stochbasics Handout
36 pages
Linear Algebra Exam Prep
No ratings yet
Linear Algebra Exam Prep
5 pages
Report Endterm
No ratings yet
Report Endterm
30 pages
8 - Probability
No ratings yet
8 - Probability
54 pages
AIML-Unit 3 Notes-Assignment 3
No ratings yet
AIML-Unit 3 Notes-Assignment 3
37 pages
Hashing
No ratings yet
Hashing
48 pages
Report Mid
No ratings yet
Report Mid
19 pages
Random Matrix Theory: Wigner-Dyson Statistics and Beyond. Lecture Notes Given at SISSA (Trieste, Italy)
No ratings yet
Random Matrix Theory: Wigner-Dyson Statistics and Beyond. Lecture Notes Given at SISSA (Trieste, Italy)
29 pages
1.TKS Tutorials Overview
No ratings yet
1.TKS Tutorials Overview
3 pages
System IDentification Programs
No ratings yet
System IDentification Programs
19 pages
Cryptography
No ratings yet
Cryptography
31 pages
CLASS 2025 Bayesian Framework
No ratings yet
CLASS 2025 Bayesian Framework
46 pages
اطلاعاتی در مورد روشهای عددی در جبرخطی
No ratings yet
اطلاعاتی در مورد روشهای عددی در جبرخطی
5 pages
Optimization Techniquesot Notes For Bca 4th Sem Based On Purvanchal University PDF
No ratings yet
Optimization Techniquesot Notes For Bca 4th Sem Based On Purvanchal University PDF
33 pages
Probabilities
No ratings yet
Probabilities
7 pages
Unit 1 - Prob
No ratings yet
Unit 1 - Prob
48 pages
Assignment#02 ITEC 332 Section 10551
No ratings yet
Assignment#02 ITEC 332 Section 10551
3 pages
Lecture 7 - Probabilistic Reasoning
No ratings yet
Lecture 7 - Probabilistic Reasoning
43 pages
Advanced Numerical Analysis Prof. Daniel Kressner
No ratings yet
Advanced Numerical Analysis Prof. Daniel Kressner
25 pages
CAT1 MCQs
No ratings yet
CAT1 MCQs
11 pages
Perturbation Theory
No ratings yet
Perturbation Theory
6 pages
Intelligent Ambulance Positioning
No ratings yet
Intelligent Ambulance Positioning
6 pages
Fall 2019 Prob Review
No ratings yet
Fall 2019 Prob Review
33 pages
Elementary Probability For Machine Learning
No ratings yet
Elementary Probability For Machine Learning
22 pages
From the Foundations of Probability to its Applications - 學術
No ratings yet
From the Foundations of Probability to its Applications - 學術
5 pages

Cs Ai Lecture Notes 02

Uploaded by

Cs Ai Lecture Notes 02

Uploaded by

Discovering Models /

cs365 2015 mukerjee

 Requires knowledge of how actions under different precepts

 Task domains: a) 8-puzzle, [detrmnstc] b) Soccer [stochastic]

• Utility : num moves

• Actions = L,R, U,D

• Decision: based on Search

• Fringe: FIFO queue new successors go at end

CS 3243 - Blind Search 5

• Time? 1+b+b2+b3+… +bd + b(bd-1) = O(bd+1)

• Space? O(bd+1) (keeps every node in memory)

• Optimal? Yes (if cost = 1 per step)

14 Jan 2004 CS 3243 - Blind Search 7

• edges don’t have

• Precept = goalie, self, ball

• Actions = kick (angle,

• Utility : goal probability

• Fringe: FIFO queue new successors go at end

14 Jan 2004 CS 3243 - Blind Search 12

• Time? 1+b+b2+b3+… +bd + b(bd-1) = O(bd+1)

• Space? O(bd+1) (keeps every node in memory)

• Optimal? Yes (if cost = 1 per step)

• h1 : Number of misplaced tiles

P(n) : Sum of Manhattan distances of each tile from

[acharya mukerjee 01]

• Classification of examples is positive (T) or negative (F)

May have many parses.

Try to find a model or theory:

[Bishop 06] ch.1

φi(x) : basis function

Linear : function is linear in the weights

Root-Mean-Square (RMS) Error:

Penalize large coefficient values

outcome = “possible world”

Event e = set of possible worlds, e ⊆ Ω

- Probability p(e) : "the fraction of possible worlds in

- Frequentist view : p(e) = limit as N → ∞

- unit sum p(Ω) = 1

- additive : if e1, e2 are disjoint events (no common

Joint Probability Conditional Probability

posterior  likelihood × prior

as new evidence comes in --> prob knowledge improves.

Hence: initial estimate (prior belief P(h), not well formulated)

Given: red box is picked

P(o) = (¾*.6 + 1/3*0.4) = 11/20

P(B|o) = ¾ * .6 * 20/11 = 9/11

Frequentist approximation w unbiased sample

(both discrete / continuous)

Example: N uniform [0,1] random variables.

joint distribution P(x,y) varies considerably

estimating the joint distribution requires

marginals P(x), P(y) are gaussian

Gaussian in polar coordinates;

Successive Trials – e.g. Toss a coin three times:

general cubic polynomial for D dimensions : O(D3) parameters

At higher dim, vol(sphere) / vol(hypercube)  0

Determine by minimizing sum-of-squares error,

Determine by minimizing regularized sum-of-squares error,

MAP = Maximum Posterior

Guesser: Is it my left big toe?

Guesser: Is it Aunt Lakshmi?

Exam paper showing: could be 100, could be zero.

surprisal (r.v. = x) = - log2 p(x)

Structure in data  easy to remember

Entropy maximized when

All states equally likely

Is the first letter between A and M?

question entropy = p(Y)logp(Y) + p(N)logP(N)

For both answers equiprobable:

You might also like

P(o) = (¾.6 + 1/30.4) = 11/20