Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
4 views24 pages

Lecture 1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views24 pages

Lecture 1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Statistical Modeling, Lecture 1

Melike Efe
July 7, 2025
Sabancı University
Table of contents

1. Introduction

2. Sampling Distributions

1
Introduction
Course Rules and Expectations

Attendance and Participation:


▶ Participation is not mandatory.
Exams:
▶ Midterm: 45%.
▶ Final Exam: 55% (Date announced by Student Resources).
▶ Make-up exams available with a valid medical report. There will be
a single make-up exam after the final exam period, which will
replace all your missing exam grades. The make-up exam will be
cumulative, covering all topics.
Note: The make-up exam is expected to be more challenging than
the regular exams.
Study Advice:
▶ Review material weekly.
▶ Attend lectures and recitations regularly.
2
Why Statistics?

Statistics is the science of collecting, analyzing, interpreting, and


presenting data.
Why Important?
▶ make informed decisions based on data.
▶ Used in nearly every field, including science, business, and social
sciences.
Examples:
▶ Business: Optimize marketing strategies using consumer data.
▶ Environmental Science: Analyze climate change trends and predict
future.
▶ Insurance: Given your driving record, car information, and coverage,
what is a fair premium?
▶ Clinical Trials: A drug is tested on 100 patients; 56 were cured and
44 showed no improvement. Is the drug effective?
3
Relationship Between Statistics and Probability

▶ Probability: Starts with a known model or distribution and predicts


outcomes.
▶ Probability Question: Suppose we have a fair coin (we know it’s
fair). What is the probability of getting 7 heads in 10 flips?
ˆ This is a probability problem because we know the model (fair coin =
50% heads)

4
Relationship Between Statistics and Probability

▶ Probability: Starts with a known model or distribution and predicts


outcomes.
▶ Probability Question: Suppose we have a fair coin (we know it’s
fair). What is the probability of getting 7 heads in 10 flips?
ˆ This is a probability problem because we know the model (fair coin =
50% heads)
▶ Statistics: Starts with data and infers the underlying model or
parameters.
▶ Statistics Question: Suppose we flip a mystery coin 10 times and
observe 7 heads. Can we conclude the coin is fair?
ˆ This is a statistics problem because we start with data (7 heads)
ˆ We try to infer something about the model (is it fair? biased?)
ˆ We might do a hypothesis test or calculate a confidence interval

4
Sampling Distributions
Introduction

Definition: (Random Sample) If X1 , X2 , . . . , Xn are independent and


identically distributed random variables, we say that they constitute a
random sample from the infinite population given by their common
distribution.
Remark: If µ and σ 2 denote the mean and variance of the population
(distribution), it follows from the above definition that E(Xi ) = µ and
var(Xi ) = σ 2 for all i = 1, . . . , n.

5
Definition: If X1 , X2 , . . . , Xn constitute a random sample, then the
sample mean is given by Pn
Xi
X = i=1
n
and the sample variance is given by
Pn
2 (Xi − X )
S = i=1
n−1

6
The Sampling Distribution of the Mean

Theorem: If X1 , X2 , . . . , Xn constitute a random sample from an infinite


population with the mean µ and the variance σ 2 , then

σ2
E(X ) = µ and var(X ) =
n
Moreover,
E(S 2 ) = σ 2 .
Definition: (Standard error of the mean) Standard deviation of the
sample mean X is called the standard error of the mean and it is given by
σ

n
.

7
Chebyshev’s Inequality

Theorem: (Chebyshev’s Inequality) If µ and σ are the mean and the


standard deviation, respectively, of a random variable X , then for any
positive constant c

σ2
P(µ − c < X < µ + c) ≥ 1 − .
c2
Theorem: (Law of Large Numbers) If X1 , X2 , . . . , Xn constitute a
random sample from an infinite population with mean µ and finite
vaiance σ 2 , then for any c > 0

σ2
P(µ − c < X < µ + c) ≥ 1 − .
nc 2
This implies
lim P(µ − c < X < µ + c) = 1
n→∞

8
The Sampling Distribution of the Mean: normal case

Definition: A random variable X has a normal distribution and it is


referred to as a normal random variable if and only if its probability
density is given by

(x − µ)2
1 −
f (x) = √ e 2σ 2 , −∞ < x < ∞.
σ 2π
where σ > 0.
Definition: The normal distribution with µ = 0 and σ = 1 is referred to
as the standard normal distribution.

9
Theorem: If X has a normal distribution with the mean µ and the
X −µ
standard deviation σ, then Z = has the standard normal
σ
distribution.
Theorem: If X1 , X2 , . . . , Xn constitute a random sample from an infinite
normal population with the mean µ and the standard deviation σ, then
X −µ
Zn = √ has the standard normal distribution.
σ/ n

10
Table of Standard Normal Probabilities P(0 ≤ Z ≤ z)

z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359
0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224
0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549
.. .. .. .. .. .. .. .. .. .. ..
1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621
1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015
1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319
1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767
2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817
2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857
2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890
2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916
2.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936
2.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952
2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964
2.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970 0.4971 0.4972 0.4973 0.4974
.. .. .. .. .. .. .. .. .. .. ..

11
Example: If Z is a random variable having the standard normal
distribution, find P(Z < 1.73) and P(−1.5 ≤ Z ≤ 0.24).

12
Example: Assume that exam scores Xj of students in a certain
population are independent and normally distributed with µ = 24 and
σ = 10. Assume that 400 randomly chosen students are enrolled in this
course and the passing grade is 21. Find the mean and variance of the
class average X . Estimate the probability that X exceeds the passing
grade.

13
The Central Limit Theorem

Theorem: If X1 , X2 , . . . , Xn constitute a random sample from an infinite


population with the mean µ and the variance σ 2 , then the limiting
distribution of
X −µ
Z= √
σ/ n
as n → ∞ is the standard normal distribution.

14
Example: A university is planning to accept new students for the next
academic year. The university can accommodate at most 3695 students.
Each applicant is accepted with probability 0.36, and applications follow
a Bernoulli trial model. If the university receives 10,000 applications, use
the Central Limit Theorem (CLT) to estimate the probability that the
university cannot accommodate all accepted students.

15
Example: Let X1 , X2 , . . . , Xn be a sequence of i.i.d. exponentially
distributed random variables with E[Xj ] = 1.
Find the smallest value of the sample size n so that for the sample mean
X of size n, P(0.9 ≤ X ≤ 1.1) ≥ 0.95 holds.

16

You might also like