m+n]. } = P{X 2m}
Converse is also true
(iv) Let x be a non-negative integer valued R.V. satisfying
P{X> m4 lh yuq)=P{X A om
‘Then 4 must have geometric distr
() Let x=
ion
sn) be independent geometric R.V. with parameter p,
then X Aen)
in{X,} is also geometric with parameter p =1
3.1.6 Hyper-Geometric Distribution
‘A box contains v marbles of which A¢ are marked. Now, n marbles are
drawn,
Let x denote no, of marked marbles drawn.
(")
max (0,M ¢n—N) Sx
P(X =x) 5 min (iM)
[[_ ator Sra Te ay Nee ET Re DE ae,
FE New Dai 0016 Fu (ot ESTE, Cas MINNA PATON RRRiT,
Resu
(i) Let x and y be independent R.V.’s with distributions 6(m,p) &(,?)
‘Then conditional distribution of x given x +Y is hyper geomettic.
y)
Poisson Distribution
to be a Poisson R.V. with parameter 1 >0 if its PMF is given
Xj XpionXy —be independent Poisson. -—R.V.’s
Xp ~ PO) RaQ at
Then
Sy =X [tout Ny 188 PQA tty) RIS
Converse is also tru.
Let X&Y be independent R.V.’s with P(2,) and P(A) respectively
then conditional distribution of xX given x+y is binomial.
(Converse is true)
Uses of the Poisson Distribution
For large », and small p, X ~ Bin(n,p) is approximately distributed as Poi
(np). This is sometimes termed the “law of small numbers
‘A Poisson Process with rate per unit time is such that
(_X, the number of occurrences of an event in any given time interval
of length « is Poi (x).
Gi) The number of events in non-overlapping time intervals are
independent random variables (see later).
"BAIL (Fit For a Srak Hs Khas Near LT New Db 11016 Ph (11) S650, Cal ODIRDENG K DDPLGTIG, SHOEI
om nfstlnacaaeny on Web: wr ne3.1.8 Multi-Nomial Distribution (Generalized Binomial Distribution)
Let sy.on%44 be non-negative imtegers such that x, +...) $m then
probability that exactly trials terminate in 4,(/=1,...,f—1) & hence that
xp em [ny tt np) trials is
AEN iP er et
z
M (= (ne + pre to pal +a)
A(eta) Viale eR
#(x,)=np,Yar(x))=09,(I-P))
Cov Xj.) =m? /Pe
Summary
(s. Distribution |PMF E(X)\. |Var(X) M(t)
Poison
al ray a ha le)
* Lk=0,1,2,
Binomit |?%=4)="Caota™* leseey
ale ie doo aoe
¥~-B(r7) k= 0,1,2,000
nel wet
3 Unto jest ate
4 | wo Point 7 80-F|-nye-ap [Oe
a |w
J a 2
X~ NB( rp P P
get |e
| Hyper.
peometeiec
Tee ce
Fle) Sia Sara Hn Kas Near
mal nt3.2. Continuous Distributions
3.2.4. Uniform Distribution
is said to have uniform distribution on [4,b] ifits PDF is given by
asxsh
ref
0; otherwise
1 ‘0
M()= 1 {ete ):r#0
Oma}
Results: Let x be an RLV. with a continuous DF. F, then (x) has the
uniform distribution on [0,1}
3.2.2. Gamma Distribution
An RV. X is said to have gamma distribution with parameters q and B if
its PDP is
= o0
otherwise
(b) When a=" (n>0 integer) & B=2
Then
S(x)= {os erty
"HATE, (Ft Foo a Sara Hae Kas, Near TT, New Dai
Erma ezdanendenyis said chi-square 72 (n) distribution
B(X)=n, Ver(X)=2n
1 1
MO Taya
Results: Let X,(/=1,....2) be independent R.V. such that X; ~G(a,,B),
then
s,
¥4~6(Da,p) RV.
Corollary:
(i) Take a, =1 vi
‘Then S, ~ G(n.B)
ie. sum of exponential is Gamma
Gi) Let X~G(a),B) & ¥~G(az,B) be independent RL. then
x4¥ and % are independent
or x + and
= + all independent
conversely also true.
(Gi) Memory less property of exponential
P(X>résly,,)=P(X>7)
where x ~exp(2)
Gx) If X.Y are independent exponential R.V.’s with parameter, then
x
xe¥
3.2.3. Beta Distribution
An RN. x 48 said to have beta distribution with parameters a&f
(a>0,B>0) ifits PDF is
Zz
has @ (0,1) distribution
a
ool H pees0
0; otherwise
We write x ~ (2,8)
(x)=. v(x of
Note: Ifa = =1, we have U(0,t)
Results:
(i) xX ~B(a,B) then 1-X ~ 8(B,a)
(i) Let X~G(a,B) & Y~G(az,p) be independent then
als SRE & TSN ASTD a
~B(aj,02) RV.
XY3.2.4. Normal Distribution (Gaussian Law)
(@) AnR.V. X is said to have a standard normal distribution if its PDF is,
F()=
we write x ~N(0,1)
(b) An RV. is said to have normal distribution with parameters 4 & o(> 0)
if
We write x~(j.0")
wojeoa| we )
Central moments
iee((x-»))=0 it odd
=[(2n—1)(2n-3)....3.1]}o™ if n is even
Results:
(i) Let XX, be independent R:V.'s such that
X= M(ugo8). bebo
then
“En eo(Snds]
Corollary:
@) X,-N(u07)
(b) IE X,~N(0,1)(¢=1,...0m) ate independent, then
S,
N(0.1)
(i) Let x &Y be independent R.V.’s then x +Y is normal iff x &¥ is
normal
(iil) Let x &¥ be independent R-V."s with (0,1) then x+y and x-y
are independent.
i Haz Khas, Near ELT, New Deli1016, Ph (112650507, Ca BIBS VOPT6I7, SRIBATSD
Emil auivncndem com Webster daca
Fw(iv) Let X, & Xp are independent N(H;,07) &
2.0?) then 4-3
ee
Put Your Own Notes
and X, +X; are independent Sees
(&) @) X~N(o) 3 4? =N2(1)
(0) X=N (4,02) => ak ~ Waa?)
aX +b~(au+0,a%0)
i) x=M(uo?)-22=#- w(o1)
(il) X&Y deiid. N(007) RV.'s then
J -Cauchy (10)
x
Cauchy (1,0
ia (1,0)
at 2
WA has PDF dezree
HH wiz)
3.28. Cauchy Distribution
‘An RLV. is said to have Cauchy distibution with parameters 4 and @ ifits
PDF is
vt
S()=t 4 j-e 0,-1< p<
(a) p is the correlation coefficient
(b) for
normal densities: this corresponds to xX, and Xz being independent
the bi-variate normal density is the product of wo uni-variate
Normal random variables i
(© x+y and.x-y are independent if 6? = o}
(candy are independent iff =0
Extension to Multivariate Normal Distribution: Infact, it's
straightforward to extend the normal distribution to vectors or arbitrary
length, the multivariate normal distribution has density
Fon
N(xas2) “(ze Jer Fexn{- Arp Ex sh |
Where |
"BATE Ft For a Sara HK BB43.
44,
Note that x is a vector; it has mean 1 which is itself a vector and = is the
variance-covariance matrix, It x is
k-dimensional then x isa kxk matrix,
Conditional Distributions and Densities
Given several random variables how much information does knowing one
provide about the others? The notion of conditional probability provides an
explicit answer to this question.
Definition 4.7 (Conditional Discrete Density Function): For discrete
random variables with x and r with probability mass points ,%35..5%
and Ya .003r
Six (ol [Y=y)1X =]
Pre]
is called the conditional discrete density function of y_given x =
Definition 4.8 (Conditional Discrete Distribution): For jointly discrete
random variables x and Y,
Sax (Vx) = PLY S91 X
X fae(osl2)
is called the conditional discrete distribution of v given x =x.
Definition 4.9 (Conditional Probability Density Function): For
continuous random variables x and r with. joint probability density
function fy (%9)>
Su y(%y)
Sa (3)
Where fy (x) is the marginal density of x.
fyx (vx) = + if fy (x)> 0
Conditional Distribution For jointly continuous random variables x and Y,
Sins (v13)=
Conditional Expectation
We can also ask what the expected behaviour of one random variable is,
given knowledge of the value of a second random variable and this gives
rise to the idea of conditional expectation,
Definition 4.16 (Conditional Expectation): The conditional expectation in
discrete and continuous cases corresponds to an expectation with respect to
the appropriate conditional probability distribution:
Discrete
("fiw (2\x)dz vx such that fy(x)>0
Continuous
BLY |X = x]= f° vf (VI
‘Note that before x is known to take the value x, E[Y | X’] is itself a random.
variable being a function of the random variable x . We might be interested
inthe distribution of the random variable E[Y |X]
Theorem 4.1 (Tower Property of Conditional Expectation): For any two
random variables X, and X,
Pa (01) 685757, Cal 9991S A HO9TGIT, SRISTODFoee Coren notte
BLE LaT]= EG)
Exercise 4.4.1: Suppose that © ~ Ufo, 1] and (x \)
-Bin(2, ©)
Find E[4|@] and hence or otherwise show that E(x
4.5. Conditional Expectations of Functions of Randoms Variables
By extending the theorem on marginal expectations we can relate the
conditional and marginal expectations of functions of random variables (in
particular, their variances).
Theorem 4.2: (Marginal Expectation of @ Transformed Random
Variables): For any random variables 4, and x, and for any function
a).
POUeoEaleouenl
Theorem 4.3 (Marginal Variance): Foray random variables X, and x;
Var( 4) = [Var ¥) | Xa) Vas(BL% |X).
46. Independence of Random Variables
Whilst the previous sections have been conceméd with the information that
one random variable carries about another, it would seem that there must be
pairs af random variables which each provide nd: information whatsoever
about the other. Iti, for example, difficult to imagine that the value obtain,
‘when a dic is rolled in Coventry will tell us much about the outcome of a
coin toss taking place at the same time in Lancaster.
There are two equivalent statements of a property termed stochastic
independence which captute precisely this idea The following two
definitions are equivalent for both discrete and continuous random vatiables
Definition 4.11: (Stochastic Independence): Definition 1 Random
variables X;,Ky.X, are stochastically independent iff
Feat ent) FTF (5)
Definition 2: Random variables ,,¥,--.%, are stochastically
independent ift
Th, (s)
Py caly Moore
If X, and x, are independent then theis conditional densities are equal to
their marginal densities
4.7. Covariance and Correlation
Having established that sometimes one random variable does convey
information about another and in other cases knowing the value of a random
variable tells us nothing useful about smother random variable it is useful to
have mechanisms for characterising the teltionship between pais (or larger
‘groups) of random variables.
Definition 4.12 (Covariance and Correlation): Covariance: For random
variables. X and ¥ defined on the same probability space
se
‘aH, ie ofa Sral Haar Ks, Nar LL, New Da 1006, Ps OH) ASA, Cas DRTC a
Ell jfalgrendeancom: Web: Wow Annee gay A & PPT, Sta48.
cov] =B[(4-ne Hy]
=B[AY]-ty
Correlation: For random variables Y and Y defined on the same
probabiity space
Cov[XY]___ Cov[x,¥]
oxy]
provided that oy >0 and oy >6,
Theorem 4.4 (Cauchy-Schwarz Inequality): Let X and Y have finite
second moments. Then
’]
with equality ifand only if P[Y =eX]=1 for some constant c.
=a(X)
Theorem 4.5 (Distribution of a Function of «Random Variable): Let X
be a random variable and Y= ¢(X) where g is injective (i.e. it maps at
(eler)) =elerp s efx? ]e[)
‘Transformation of Random Variables:
most one x to any value y). Then
\de"" (2)
f0)= fe" ONPG
ven that (g'(y)) exists and (g“'(y)) >0 Vy or (e"(y)) <0 vy. Ife
is not bijective (one-to-one) there may be values of y for which there exists
no x such that y= g(x). Such points clearly have density zero,
When the conditions of this theorem are not satisfied it is necessary to be a
little more careful. The most general approach for finding the density of a
transformed random variable is to explicitly construct the distribution
function of the transformed random variable and then to use the standard
approach to tur the distribution function into a density (this approach is
discussed in Larry Wasserstein’s “All of Statistics”)
Exercise 4.8.1: Let X be distributed exponentially with parameter a, that
_foe* x20
Sox) Lo reo
Find the density function of
(y= a(x) with e(x)
(i) ¥=x", poo
0 for x<0
Gil) ¥=g(x) with g(X)=}x forosxsi
1 forxat
"BAI (St Fo nS
Khas New LT, New DaRFIIOO6, Ph (12650507, Ca PODTERS4 A 98916173, SRISETODTy Joint and Conditional Distributions
[Rnigo 900": 7008 cortned atte
Theorem 4.6 (Probability Integral Transformation): If X is a random
variable with continuous Fy (x), then U=Fy(X) is uniformly distributed
over the interval (0,1)
Conversely if U is uniform over (0,1), then X = Fz'(U/) has distribution
function Fy
4.9, Moment-Generating-Funetion Technique
The following technique is but one example of a situation in which the
‘noment generating function proves invaluable,
Funetion of a Variable. For Y= ¢(’) compute
[“]
If the result is the MGF of a known distribution then it-will follow that Y
has that distribution.
‘Sums of Independent random variables. For ¥ =), X;y where the X, ate
independent random variables for which the MGF exists V-h<1-0
me(=B[2=T] my, (0) for ~herch
mm (0)=[e"]
‘Thus [],mx, (0) may be used to identify the distribution of Y as above.
‘AI, Prt Foy Jin Sar Hass Khas, Near Nw Doi 10016, Pas (1) BASS, Cale SODTARGG K HPOTTIN. BSEOUTSS @
nai fdddlatcademy cons Wen wediCHAPTER 5
INFERENCE
5A.
Sample Statistics
‘Suppose we select a sample of size n from a population of size N. For each
in {t,..n}, let X, be a random variable denoting the outcome of the i
‘observation of a variable of interest. For example, X, might be the height of
the i person sampled. Under the assumptions of simple random sampling,
the X; are independent and identically distributed (iid)
‘Therefore, if the distribution of a single unit sampled from the population
can be characterized by a distribution with density function. /, the marginal
density function of each X, is also and their joist density fanction g is a
simple product of their marginal densities:
(At) =F) FCF Om)
In order to make inferences about a population parameter, we use sample
data to form an estimate of the population parameter. We calculate our
estimate using an estimator or sample statistic, which is a function of the X,
We have already seen examples of sample statisties, for exampte the sample
Sx
where n is the size of the sample is an estimator of the population mean,
e.g. for discrete X
we Sx(xs)]
where NV is the number of distinct values which it is possible for an X; to
take
Sampling Distributions
Since an estimator 6 is a function of random variables, it follows that 6 is
itself a random variable and possesses its own distribution. The probability
distribution of an estimator itself i called a sampling distribution
Proposition 5.1 (Distribution of The Sample Mean): Let ¥ denote the
sample mean of a random sample of size m from ¢ normal distribution with
‘mean Hand variance o?. Then
7A, et For dia Sra Haez Kh, es LT New Dal
Penal naa
16 Ph: (1) 265727, Cae WIDTEMG 9891617, TODTheorem S.A (Central Limit Theorem): Let f be a density funetion with
mean } and finite variance o®. Let ¥ be the sample mean of a random
sample of size m froth f and let
onl =
‘Then the distribution of Z, approaches the standard normal distribution as
n+. This is often written as: Z,—“-»N(0,1) with —4» denoting
convergence in distribution.
Thus, if the sample size is “large enough”, the sample mean can be assumed
to follow a normal distribution regardless of the population distribution. In
practice, this assumption is often taken to be valid for a sample size n>30
‘The Chi-Squared Distribution
The chi-squared distribution is a special case of the gamma distribution, The
sample variance
of a standard normal distribution is x? with 1-1 degrees of freedom.
Definition $.1: If X is a random variable with density
Oe
s(a)=musie) qe ee
o otherwise
then X is defined to have 72 distribution with & degrees of freedom (1)
where & is a positive integer.
Thus the 22 density isa gamma density with r
Result: Ifthe RVs X;,=1,... are independently normally distributed with
meats 5, and variances o? then
3 = y
RV
has a 12 distribution.
Theorem 5.2: 1f X,,...X, is @ random sample from a normal distribution
with mean wand variance o? then
@) ¥ ana D7 (x,-H) are independent,
Due
iy SEMA? basa 42, distribution,
Website: wor disaendemcm
Ci) 3ST, Ca POLO A HALEN EHTS
ea |TOs
oe Cavted ie
Corollary Ss f = 9(x,
2) is the sample variance of a random
sample of size n from normal distribution with mean 4x and variance o°,
then
(oo) ada
The ¢ Distribution
The « distribution is closely related to the normal distribution and is needed
for making inferences about the mean of a normal distribution when the
variance is also unknown.
Definition 5.2: If Z~N(0,1),U~} and Z and U are independent of one
other, then .
ic
k
where 4 denotes a r distribution with & degrees of freedom,
‘The density of the 1, distribution is:
ker
romain
qfé
2
For -01 although an extension known as the Cauchy principle value can be
defined more generally), and it can be shown that
k
Var[x]=—£-, for >
[l=75 2
Theorem 5.3: 1f X~ i, then
),
as ko, That is, as k>a the density approaches the density of a
standard normal.
‘The F Distribution
The F Distribution is useful for making inferences about the ratio of two
unknown variances.
284 et Fon SiS Tn Rh Nag UT Ne DAE TOG FH) TY, Ge RDO OTA ARID
a |53.
Definition 5.3: Suppose U and ¥ are independently distributed vith ee,
u~3, and y~ 2 . Then the random variable
ue
is distributed according to an F distribution with m and n degrees of
freedom,
‘The density of X is given by
x
Returning to the case of two independent random samples. froin normal
Populations of common variance, but differing means:
XorXy, ~N(tio?) oma, ~ M(H2,07)
we have
Point Estimation
The sample mean and variance are examples of point estimators, because the
estimates they produce are single point values, rather than a range of values.
For a given parameter there are an infinite number of possible estimators,
hence the question arises: what makes a “good” estimator?
Definition 5.4 (Unbiasedness): Let X be @ random variable with pdf
J(x;0), where @eQER? is some unknown parameter, p21, Let
XjyonXq be a random sample from the distribution of and let 6 denote a
statistic. 6 is an unbiased estimator of 8 if
£[@]-0 voco
Where the expectation is with respect to f(x; 8.
If 6 is not unbiased, we say that 6 is a biased estimator of 0, with
Bias(6) =x 6]-0
If Bias(8)+0 when n> then we say that 6 is asymptotically unbiased.
Example 5.1: Consider the following estimator ofthe population variance
aE (a
2HARI Fen Fos) Sara Haus Kis, Near LUT, Now De
rma neiddrnndem cam Web worm dianeadenn com
16, Ph) 26757, Cal 91RD OTT ASRTTID EaSo this estimator is biased with bias equial to—2~ As this decays to zero as.
n> it is an asymptotically unbiased estimator. However, we can see that
the sample variance
Consistency: In order to define a consistent éstimator, we first define
‘convergence in probability.
Definition 5.5 (Convergence in Probal Let {X,} be a sequence of
random variables and let be @ random variable. We say that X,
converges in probability to X if Ve>0
lim P[|X,-¥|2¢]=0, or equivalently tim P(x, —X] x.
Definition 5.6 (Consistent Estimator): Let X),..., be a sample from the
distribution of X where X is a random variable with distribution function
F(x; 0). Let 6 denote a statistic. 6 is a consistent estimator of 0 if,
whatever the value of 6,
b6—+0
A particular case of consistency is often used to justify using sample
averages:
Theorem 54 (Weak Law of Large Numbers): Let ¥=15"",x, with
Xiong td Then
Xu,
ite. X is a consistent estimator of 1.
"BAL, (Fist Foo) a Sarak Haus King Naw LL, New Dal
mai arta coms Webster dieteaden co
6, Ps (I) 27, Cale SOIR A OTT, TDeam =
5.4.
Ay
Theorene 5.5: Suppose X,—P-»a and the real function g is continuous at
aa. Then g(X,)—2->2(2).
Consistency according to the definition above may be hard to prove, but it
turns out that a sufficient (though not necessary) condition for consistency is
that 8ias(8) +0 and Var(8)—+0 as n>.
Definition 5.7 (Consistency in Mean-Squared Error): If 6 is an estimator
of , then the mean squared error of 6 is defined as
wse()=#[(0-0)']
and 6 is said to be consistent in M!
if Mse(6) +0. as the. size of the
sample on which 6 is based increases to infinity.
Result: Mse(6) = var(6) +[sias(@)]
Interval Estimation
This section describes confidence intervals which ave intervals constructed
such that they contain @ with some level of confidence.
Definition 5.8 (Confidence Interval): Let Xj,...X, be @ random sample
feom a distribution with pdf f(x; 6) where @ is afi unknown parameter in
the parameter space ©. 1f {and U arestatisties such that
P[Lsesu
then the interval (1,U) i8 a 100(1~a)% confidence interval for @
\-a is known as .the confidence coefficient and a is the level of
significance.
There are several ways in which confidence intervals can be constructed.
The basic procedure we shall use t0 construct a 100(I-«)% confidence
interval for a parameter is as follows
(i) Select a sample statistic to estimate the parameter.
(ii) Identify the sampling distribution forthe statistic.
(iii) Determine the bounds within which the sample statistic will reside
with probability 1a,
iv) Invert these bounds to obtain an expression in terms of 6
Confidence Intervals based on the CLT: Suppose that we are interested in
the population mean @ and we wish to use the sample mean X as an
estimator for @. Then if the population density has variance o the CLT
states that
4 5N(0,1)
Where nm is the sample size.
7A, rn iow Sis Sara Hoa Khas, Near LT, New De 0016, Pas HART
‘Cosi etsadiracaden com Webuts wormdpaetdemcom
|