0% found this document useful (0 votes)

5 views17 pages

Fitting A Model Probability Distribution

Uploaded by

Anu Augustin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views17 pages

Fitting A Model Probability Distribution

Uploaded by

Anu Augustin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Fitting probability distributions to data

A: The normal distribution

Distributional modeling
A useful way to understand a data set:
• Fit a probability distribution to it.
• Simple and compact.
• Captures the big picture while smoothing out the wrinkles in the data.
• In subsequent application, use distribution as a proxy for the data.

Which distributions to use?

There exist a few distributions of great universality which occur in a surprisingly
large number of problems. The three principal distributions, with ramifications
throughout probability theory, are the binomial distribution, the normal distri-
bution, and the Poisson distribution. – William Feller.

We’ll see others as well. And for higher dimension, we’ll use various combinations of
1-d models: products and mixtures.

The normal distribution

The normal (or Gaussian) N(µ, σ 2 ) has mean µ, variance σ 2 , and density function
(x − µ)2

1
p(x) = exp − .
(2πσ 2 )1/2 2σ 2

• 68.3% of the distribution lies within one standard deviation of the mean, µ ± σ
• 95.5% lies within µ ± 2σ
• 99.7% lies within µ ± 3σ
Gaussians are everywhere
U.S. Height Distribution in Centimeters
150000

Number of People in Thousands

100000

50000

0
15

5
12

20
13

19
14
1

0–
0–

0–

0–
0–

0–

0–
0–

20
12

17
11

19
14
Women Men

Central Limit Theorem: Let X1 , X2 , . . . be independent with EXi = µi , var(Xi ) = vi .

Then
(X1 + · · · + Xn ) − (µ1 + · · · µn ) d
√ −→ N(0, 1)
v1 + · · · + vn

Fitting a Gaussian to data

Given: Data points x1 , . . . , xn to which we want to fit a distribution.

What Gaussian distribution N(µ, σ 2 ) should we choose?
B: The Poisson distribution

The Poisson distribution

A distribution over the non-negative integers {0, 1, 2, . . .}

Poisson(λ), with λ > 0:

λk
Pr(X = k) = e −λ
k!

• Mean: EX = λ
• Variance: E(X − λ)2 = λ
How the Poisson arises

Count the number of events (collisions, phone calls, etc) that occur in a certain
interval of time. Call this number X , and say it has expected value λ.

Now suppose we divide the interval into small pieces of equal length.

If the probability of an event occurring in a small interval is:

• independent of what happens in other small intervals, and
• the same across small intervals,
then X ∼ Poisson(λ).

Poisson: examples
Rutherford’s experiments with radioactive disintegration (1920)

radioactive
substance
• N = 2608 intervals of 7.5 seconds
• Nk = # intervals with k particles
• Mean: 3.87 particles per interval

counter

k 0 1 2 3 4 5 6 7 8 ≥9
Nk 57 203 383 525 532 408 273 139 45 43
P(3.87) 54.4 211 407 526 508 394 254 140 67.9 46.3
Flying bomb hits on London in WWII

Bundesarchiv, Bild 146-1975-117-26 / Lysiak /

CC-BY-SA 3.0

• Area divided into 576 regions, each 0.25 km2

• Nk = # regions with k hits
• Mean: 0.93 hits per region

k 0 1 2 3 4 ≥5
Nk 229 211 93 35 7 1
P(0.93) 226.8 211.4 98.54 30.62 7.14 1.57

Fitting a Poisson distribution to data

Given samples x1 , . . . , xn , what Poisson(λ) model to choose?

C: Maximum likelihood estimation

Maximum likelihood estimation

Let P = {Pθ : θ ∈ Θ} be a class of probability distributions (Gaussians, Poissons, etc).

Maximum likelihood principle: pick the θ ∈ Θ that makes the data maximally
likely, that is, maximizes Pr(data|θ) = Pθ (data).

Three steps:
1 Write down an expression for the likelihood, Pr(data|θ).
2 Maximizing this is the same as maximizing its log, the log-likelihood.
3 Solve for the maximum-likelihood parameter θ.
Maximum likelihood estimation of the Poisson
P = {Poisson(λ) : λ > 0}. We observe x1 , . . . , xn .
• Write down an expression for the likelihood, Pr(data|λ).

• Maximizing this is the same as maximizing its log, the log-likelihood.

• Solve for the maximum-likelihood parameter λ.

Maximum likelihood estimation of the normal

You see n data points x1 , . . . , xn ∈ R, and want to fit a Gaussian N(µ, σ 2 ) to them.
• Maximum likelihood: pick µ, σ to maximize
n
(xi − µ)2

2
Y 1
Pr(data|µ, σ ) = exp −
(2πσ 2 )1/2 2σ 2
i=1
• Work with the log, since it makes things easier:
n
2 n 1 X (xi − µ)2
LL(µ, σ ) = ln − .
2 2πσ 2 2σ 2
i=1
• Setting the derivatives to zero, we get
n
1X
µ= xi
n
i=1
n
1 X
σ2 = (xi − µ)2
n
i=1
These are simply the empirical mean and variance.
D: The binomial distribution

The binomial distribution

Binomial(n, p): # of heads from n independent coin tosses of bias (heads prob) p.

For X ∼ binomial(n, p),

EX =

var(X ) =

Pr(X = k) =
Fitting a binomial distribution to data
Example: Survey on food tastes.
• You choose 1000 people at random and ask them whether they like sushi.
• 600 say yes.
What is a good estimate for the fraction of people who like sushi? Clearly, 60%.

More generally, say you observe n tosses of a coin of unknown bias, and k come up
heads. What distribution binomial(n, p) is the best fit to this data?
Maximum likelihood: a small caveat

You have two coins of unknown bias.

• You toss the first coin 10 times, and it comes out heads every time.
You estimate its bias as p1 =
• You toss the second coin 10 times, and it comes out heads once.
You estimate its bias as p2 =

Now you are told that one of the coins was tossed 20 times and 19 of them came out
heads. Which coin do you think it is?

• Likelihood under p1 : Pr(19 heads out of 20 tosses|bias = 1) =

• Likelihood under p2 : Pr(19 heads out of 20 tosses|bias = 0.1) =

Laplace smoothing
A smoothed version of maximum-likelihood: when you toss a coin n times and observe
k heads, estimate the bias as
k +1
p= .
n+2
We will later justify this in a Bayesian setting.

Laplace’s law of succession: What is the probability that the sun won’t rise tomorrow?
• Let p be the probability that the sun won’t rise on a randomly chosen day.
We want to estimate p.
• For the past 5000 years (= 1825000 days), the sun has risen every day.
Using Laplace smoothing, estimate
1
p= .
1825002
Normal approximation to the binomial

=⇒

When a coin of bias p is tossed n times, let Sn be the number of heads.

• We know Sn has mean np and variance np(1 − p).
• By central limit theorem: As n grows, the distribution of Sn looks increasingly like
a Gaussian with this mean and variance, i.e.,
S − np d
p n −→ N(0, 1).
np(1 − p)

Poisson approximation to the binomial

Toss coins with bias p1 , . . . , pn and let Sn be the number of heads.

Le Cam’s inequality:
∞ k n
−λ λ
X X
Pr(Sn = k) − e ≤ pi2
k!
k=0 i=1

where λ = p1 + · · · + pn .

Poisson limit theorem: If all pi = λ/n, then

d
Sn −→ Poisson(λ).

Also called “the law of rare events”.

E: The multinomial distribution

The multinomial distribution

Imagine a k-faced die, with probabilities p1 , . . . , pk .
Toss such a die n times, and count the number of times each of the k faces occurs:

Xj = # of times face j occurs

The distribution of X = (X1 , . . . , Xk ) is called the multinomial.

• Parameters: p1 , . . . , pk ≥ 0, with p1 + · · · + pk = 1.
• EX = (np1 , np2 , . . . , npk ).
• Pr(n1 , . . . , nk ) = n ,n n,...,n p1n1 p2n2 · · · pknk , where

1 2 k

n n!
= ,
n1 , n2 , . . . , nk n1 !n2 ! · · · nk !

the # of ways to place balls numbered {1, . . . , n} into bins numbered {1, . . . , k}.
Example: text documents
Bag-of-words: vectorial representation of text documents.

It was the best of times, it was the 1 despair

worst of times, it was the age of
wisdom, it was the age of foolishness, 2 evil
it was the epoch of belief, it was the
epoch of incredulity, it was the
0 happiness
season of Light, it was the season of
Darkness, it was the spring of hope,
it was the winter of despair, we had 1 foolishness
everything before us, we had nothing
before us, we were all going direct to
Heaven, we were all going direct the
other way – in short, the period was
so far like the present period, that
some of its noisiest authorities
insisted on its being received, for
good or for evil, in the superlative
degree of comparison only.

• Fix V = some vocabulary.

• Treat words in document as independent draws from a multinomial over V :
X
p = (p1 , . . . , p|V | ), such that pi ≥ 0 and pi = 1
i

How would we estimate the parameters of a multinomial?

F: Alternatives to maximum likelihood?

Alternatives to maximum likelihood
Choosing a model in {Pθ : θ ∈ Θ} given observations x1 , x2 , . . . , xn .
• Maximum likelihood.
The default, most common, choice.

• Method of moments.
Pick the model whose moments EX ∼Pθ f (X ) match empirical estimates.

• Bayesian estimation.
Return the maximum a-posteriori distribution, or the overall posterior.

• Maximum entropy.
We’ll see this soon.

• Other optimization-based or game-theoretic criteria.

As in generative adversarial nets, for instance.

Desiderata for probability estimators

Overall goal: Given data x1 , . . . , xn , want to choose a model Pθ , θ ∈ Θ.

• Let T (x1 , . . . , xn ) be some estimator of θ.

• Suppose X1 , . . . , Xn are i.i.d. draws from Pθ . Ideally T (X1 , . . . , Xn ) ≈ θ.

Some typical desiderata, if X1 , . . . , Xn ∼ Pθ .

1 Unbiased: ET (X1 , . . . , Xn ) = θ.
2 Asymptotically consistent: T (X1 , . . . , Xn ) → θ as n → ∞.
3 Low variance: var(T (X1 , . . . , Xn )) is small.
4 Computationally feasible: Is T (X1 , . . . , Xn ) easy to compute?

Do maximum-likelihood estimators possess these properties?

Are maximum likelihood estimators unbiased?

In general, no.

Example: Fit a normal distribution to observations X1 , . . . , Xn ∼ N(µ, σ 2 ).

• Maximum likelihood estimate:
X1 + · · · + Xn
µ
b=
n
2 (X1 − µb)2 + · · · + (Xn − µ
b)2
σ
b =
n
• Can check that E[b
µ] = µ but

n−1 2
σ2] =
E[b σ .
n

Maximum likelihood: asymptotically consistent?

Not always, but under some conditions, yes.
Rough intuition:
• Given data X1 , . . . , Xn ∼ Pθ∗ , want to choose a model Pθ , θ ∈ Θ.
• We pick the θ that maximizes
n
1 1X
LL(θ) = ln Pθ (Xi ) → EX ∼Pθ∗ [ln Pθ (X )]
n n
i=1
= EX ∼Pθ∗ [ln Pθ∗ (X )] − K (Pθ∗ , Pθ )
Postscript: some other canonical distributions

We’ve seen the normal, Poisson, binomial, and multinomial.

Some others:
1 Gamma: two-parameter family of distributions over R+
2 Beta: two-parameter family of distributions over [0, 1]
3 Dirichlet: k-parameter family of distributions over the k-probability simplex

All of these are exponential families of distributions.

Point Estimatiors
No ratings yet
Point Estimatiors
52 pages
Statistics Study Guide: Matthew Chesnes The London School of Economics September 22, 2001
No ratings yet
Statistics Study Guide: Matthew Chesnes The London School of Economics September 22, 2001
22 pages
Full Unit 4 Including Numericals
No ratings yet
Full Unit 4 Including Numericals
38 pages
Maximum Likelihood Estimation Guide
No ratings yet
Maximum Likelihood Estimation Guide
55 pages
Learning Models From Data: 1 Parametric Estimation
No ratings yet
Learning Models From Data: 1 Parametric Estimation
14 pages
Probability Theory For Machine Learning: Chris Cremer September 2015
No ratings yet
Probability Theory For Machine Learning: Chris Cremer September 2015
40 pages
MSD Discrete Count Models 2
No ratings yet
MSD Discrete Count Models 2
42 pages
Unit 2 (2) - 1
No ratings yet
Unit 2 (2) - 1
37 pages
Maximum Likelihood and Bayesian Parameter Estimation: Chapter 3, DHS
No ratings yet
Maximum Likelihood and Bayesian Parameter Estimation: Chapter 3, DHS
35 pages
Notes For Lectures 1 To 10 - 2024
No ratings yet
Notes For Lectures 1 To 10 - 2024
39 pages
Probability Distributions Guide
No ratings yet
Probability Distributions Guide
86 pages
STAT2120: Categorical Data Analysis Chapter 1: Introduction
No ratings yet
STAT2120: Categorical Data Analysis Chapter 1: Introduction
51 pages
Notes 2
No ratings yet
Notes 2
6 pages
Stati Sem 3 Notes
No ratings yet
Stati Sem 3 Notes
19 pages
Session 32 - Point Estimate
No ratings yet
Session 32 - Point Estimate
53 pages
FRM Part 1: Distributions
No ratings yet
FRM Part 1: Distributions
25 pages
FRM Part 1: Book 2 - Quantitative Analysis
No ratings yet
FRM Part 1: Book 2 - Quantitative Analysis
24 pages
Lec 01
No ratings yet
Lec 01
44 pages
CS109/Stat121/AC209/E-109 Data Science: Statistical Models
No ratings yet
CS109/Stat121/AC209/E-109 Data Science: Statistical Models
26 pages
Prelims Stats
No ratings yet
Prelims Stats
39 pages
MIT18 05S14 Reading10b PDF
No ratings yet
MIT18 05S14 Reading10b PDF
9 pages
Probability Distributions: Inferential Statistics AB
No ratings yet
Probability Distributions: Inferential Statistics AB
19 pages
Chap - 2point - Estimation
No ratings yet
Chap - 2point - Estimation
11 pages
Chapter 2 Commonly Used Probability Distribution
No ratings yet
Chapter 2 Commonly Used Probability Distribution
50 pages
Par Est
No ratings yet
Par Est
36 pages
Probability and Distributions
No ratings yet
Probability and Distributions
6 pages
2 Mle
No ratings yet
2 Mle
28 pages
Week 10 - Statistics, Random Sampling, Point Estimation
No ratings yet
Week 10 - Statistics, Random Sampling, Point Estimation
14 pages
Random Variables Probability Mass Function: Mathematical Expectation Mathematical Expectation
No ratings yet
Random Variables Probability Mass Function: Mathematical Expectation Mathematical Expectation
6 pages
DOM105 Session 3
No ratings yet
DOM105 Session 3
17 pages
Theoretical Distributions 1
No ratings yet
Theoretical Distributions 1
2 pages
OCR MEI S2 Summary Sheets
No ratings yet
OCR MEI S2 Summary Sheets
7 pages
Unit-13 (1) - Normal Distribution
No ratings yet
Unit-13 (1) - Normal Distribution
20 pages
Block 4
No ratings yet
Block 4
80 pages
Review Statistics
No ratings yet
Review Statistics
24 pages
Poisson Distribution Explained
No ratings yet
Poisson Distribution Explained
7 pages
SP2009F - Lecture03 - Maximum Likelihood Estimation (Parametric Methods)
No ratings yet
SP2009F - Lecture03 - Maximum Likelihood Estimation (Parametric Methods)
23 pages
Exercises
No ratings yet
Exercises
9 pages
R300 Advanced Econometrics Methods Lecture Slides
No ratings yet
R300 Advanced Econometrics Methods Lecture Slides
362 pages
L08 MaximumLikelihoodEstimation
No ratings yet
L08 MaximumLikelihoodEstimation
5 pages
Module01 ProbabilityAndHypothesisTesting
No ratings yet
Module01 ProbabilityAndHypothesisTesting
62 pages
Ie265 10
No ratings yet
Ie265 10
20 pages
Probability Distributions Overview
No ratings yet
Probability Distributions Overview
45 pages
Advanced Stats for H2 Math Students
No ratings yet
Advanced Stats for H2 Math Students
3 pages
Lecture 36
No ratings yet
Lecture 36
22 pages
Math2101Stat 5
No ratings yet
Math2101Stat 5
23 pages
Distribution
No ratings yet
Distribution
6 pages
Cheat Sheet For The Final Exam
No ratings yet
Cheat Sheet For The Final Exam
6 pages
Common Probability Distributions: 1.1 Bernoulli Distribution
No ratings yet
Common Probability Distributions: 1.1 Bernoulli Distribution
6 pages
Random Variables: Random Variables Study Material For Week 6 Lecture Five
No ratings yet
Random Variables: Random Variables Study Material For Week 6 Lecture Five
7 pages
Point Estimation: Statistics (MAST20005) & Elements of Statistics (MAST90058) Semester 2, 2018
No ratings yet
Point Estimation: Statistics (MAST20005) & Elements of Statistics (MAST90058) Semester 2, 2018
12 pages
Lecture 3 With Notes - PDF
No ratings yet
Lecture 3 With Notes - PDF
11 pages
Stat 301 L15
No ratings yet
Stat 301 L15
25 pages
Probability Distributions Guide
No ratings yet
Probability Distributions Guide
12 pages
Theoretical Distributions
No ratings yet
Theoretical Distributions
5 pages
CSKHKPV Palampurujhjhj
No ratings yet
CSKHKPV Palampurujhjhj
18 pages
Estimating Functions (Godambe Article)
No ratings yet
Estimating Functions (Godambe Article)
12 pages
Steps in Intervention Research Designing and Devel
No ratings yet
Steps in Intervention Research Designing and Devel
10 pages
SRM Notes
No ratings yet
SRM Notes
38 pages
Bias in The Effective Bid-Ask Spread
No ratings yet
Bias in The Effective Bid-Ask Spread
53 pages
J Apor 2021 102711
No ratings yet
J Apor 2021 102711
17 pages
2016 2017 Exam
No ratings yet
2016 2017 Exam
8 pages
Chapter - 5 Introduction To Estimation
No ratings yet
Chapter - 5 Introduction To Estimation
14 pages
Topic 01 - Descriptive Statistics PDF
No ratings yet
Topic 01 - Descriptive Statistics PDF
27 pages
Statistics Estimation Guide
No ratings yet
Statistics Estimation Guide
17 pages
Stat-112 1-10
No ratings yet
Stat-112 1-10
108 pages
MTech QROR PQB 2020
No ratings yet
MTech QROR PQB 2020
13 pages
Interval Estimation
No ratings yet
Interval Estimation
33 pages
Inference on Population Variance
No ratings yet
Inference on Population Variance
8 pages
Mixed Data Sampling (MIDAS) Regression Models
No ratings yet
Mixed Data Sampling (MIDAS) Regression Models
37 pages
Instant Download Multilevel Modeling Using R 1st Edition Edition W. Holmes Finch PDF All Chapters
100% (13)
Instant Download Multilevel Modeling Using R 1st Edition Edition W. Holmes Finch PDF All Chapters
81 pages
Yrbs Analysis Software
No ratings yet
Yrbs Analysis Software
29 pages
Knaus
No ratings yet
Knaus
112 pages
Wang Ho 2010 FeSFA
No ratings yet
Wang Ho 2010 FeSFA
11 pages
Statistics For The Behavioral Sciences 2nd Edition Gregory J Privitera Download
100% (1)
Statistics For The Behavioral Sciences 2nd Edition Gregory J Privitera Download
83 pages
2018 Exam Past Paper and Memo - 1
No ratings yet
2018 Exam Past Paper and Memo - 1
18 pages
Verly Et Al. (Eds.), Geostatistics For Natural Resources Characterization, Part by D. Reidel Publishing Company
No ratings yet
Verly Et Al. (Eds.), Geostatistics For Natural Resources Characterization, Part by D. Reidel Publishing Company
19 pages
STATPROB Module 7
No ratings yet
STATPROB Module 7
16 pages
Estimation EMV
No ratings yet
Estimation EMV
37 pages
Higgins 2002
No ratings yet
Higgins 2002
20 pages
Unit 2 Notes - Final
No ratings yet
Unit 2 Notes - Final
32 pages
Statistics Cheat Sheet for Students
100% (1)
Statistics Cheat Sheet for Students
8 pages
Chapter Four
No ratings yet
Chapter Four
65 pages
Probable Error Mean
100% (1)
Probable Error Mean
8 pages
Barndorff-Nielsen, Hansen, Lunde & Shephard (2009)
No ratings yet
Barndorff-Nielsen, Hansen, Lunde & Shephard (2009)
32 pages
Econometrics for Advanced Students
No ratings yet
Econometrics for Advanced Students
73 pages