0% found this document useful (0 votes)

54 views22 pages

Statistics I: Introduction and Distributions of Sampling Statistics

This document provides an outline for a lecture on statistics. It begins with an introduction to descriptive statistics, which involves describing and summarizing data sets. It then discusses distributions of sampling statistics, including the sample mean, variance, and the central limit theorem. The document provides details on topics like frequency tables, histograms, the sample mean, median, mode, variance, standard deviation, percentiles, and the sample correlation coefficient. It explains concepts like the weak law of large numbers and outlines the distribution of sampling statistics.

Uploaded by

tamer_aci

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

54 views22 pages

Statistics I: Introduction and Distributions of Sampling Statistics

Uploaded by

tamer_aci

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

statistics I

mm1: introduction and

distributions of sampling statistics

petar popovski
assistant professor
antennas, propagation and radio networking (APNET)
department of electronic systems
aalborg university

e-mail: [email protected]
lecture outline

introduction

descriptive statistics
– description and summarization of data sets
– chebyshev’s inequality and the weak law of large numbers
– normal data sets
– sample correlation coefficient

distributions of sampling statistics

– sample mean and variance
– central limit theorem
– sampling distribution from a normal population

2 / 22
introduction

statistics: collection/description/analysis of data and

inference
– the term first appeared in 1770, in relation to the collection of
facts of interest to the state
– homer simpson on statistics: “oh, people can come up with
statistics to prove anything, Kent. 14% of people know that.”

descriptive and inferential statistics

– reasonable conclusions can be obtained by assuming certain
probability model for the data

populations and samples

– a sample should be representative of that population
– why the random sample is good?

3 / 22
description of data sets (1)

frequency tables and graphs

relative frequency tables and graphs

4 / 22
description of data sets (2)

histograms
– bins, class intervals, left-end inclusion convention

– the histogram can be used to approximate the continuous

probability density functions (pdf)

5 / 22
description of data sets (3)

ogive=cumulative
frequency plot
– to approximate the
cumulative
distribution function of
given pdf

stem and leaf

daily minimum temperatures (in F)

6 / 22
sample mean, median and mode
1 n
sample mean x = ∑ xi
n i =1
linear property calculation with frequencies

∀i, yi = axi + b ⇒ y = ax + b k
v f relation to the
x =∑ i i mean value of
i =1 n random variable

sample median
– if n is odd, it is the (n+1)/2-th value
– if n is even, it is the average of the values in positions n/2 and n/2+1

mean vs. median

– when they are expected to be the same?

sample mode = the most frequent value in the set

7 / 22
sample variance and standard deviation
variance standard deviation
1 n 1 n
s =
2
∑ i
n − 1 i =1
( x − x ) 2
s= ∑
n − 1 i =1
( xi − x ) 2

n-1 instead of n due to unbiased estimation

algebraic identity linear property

n n

∑ i
( x
i =1
− x ) =2
∑ i
x 2
− x
i =1
2
∀i, yi = a + bxi ⇒ s y2 = b 2 s x2

example

203 − 9( 359 )
2

s =
2
= 8.361
8

8 / 22
sample percentiles

to determine the sample 100p percentile, where 0≤p≤1,

for a data of size n, we need to find the value that:
– at least np of the values are less than or equal to it
– at least n(1-p) of the values are greater than or equal to it

example
let the population size be n=33. the sample 10 percentile is the 4th
smallest value, since ⎡33·0.1⎤=4

quartiles
– first (25%), second (50%), third (75%)

9 / 22
chebyshev’s inequality

(
for any value of k ≥ 1 more than 100 1 − k12 ) percent of
the data lie within the interval
(x − ks, x + ks )
– it is universal, but therefore the bound can be loose
probability-version of the chebyshev’s inequality

weak law of large numbers

10 / 22
normal and skewed data sets
normal histogram approximately normal

skewed to the left the empirical rule

for approximately normal data sets
1. Approx. 68% of observations lie within
x±s
2. Approx. 95% of observations lie within
x ± 2s
3. Approx. 99.7% of observations lie within
x ± 3s

11 / 22
sample correlation coefficient
the statistical data can be given as pairs of values and we want to
find if there is a relation between those values

sample correlation coefficient

∑(x(x− −x )(x )(y y− −y )y )

r =∑
i i
i =1 i i =
r= i =1 (n − 1) s x s y =
(nn − 1) s x s y
n
∑ (xi − x )( yi − y )
= ∑i =(1xi − x )( yi − y )
=
∑(x(x− −x )x ) ∑(y(y− −y )y )
in=1 n
2 2
n n

∑ ∑
i 2 j 2
i =1 i j =1 j
i =1 j =1

measures association, not causation

12 / 22
distribution of sampling statistics
sampling

if X 1 , X 2 ,L X n are independent random variables having

a common distribution F , then they constitute a
sample (or random sample) from the distribution F .

types of inference problems

– parametric = F is known up to the values of some parameters
– non-parametric = nothing is assumed about the form of F

we now define statistic as a random variable whose

value is determined by the sample data
– our goal is to examine the properties of this random variable

Y = f ( X 1 , X 2 ,L X n )

14 / 22
sample mean

we suppose that the value of any population member

can be as a random variable with
expectation (population mean) variance (population variance)
μ σ2
sample mean for the sample of values X 1 , X 2 ,L X n
n

∑X i
X= i =1
n

σ2
E [X ] = μ Var (X ) =
n

15 / 22
central limit theorem (1)
a fundamental result in probability theory

– the theorem is very powerful, as the distribution of the variable

Xi can have a general form, it is only required to have a finite
mean and variance

from this theorem it follows that the variable Z is a

standard normal random variable
n

∑X
z2
− nμ 1 −
i
pZ ( z ) = e 2
Z= i =1
2π
σ n

16 / 22
central limit theorem (2)

example

n
⎡ n ⎤
E ⎢∑ X i − W ⎥ = 3n − 400 ∑X i − W − (3n − 400)
⎣ i =1 ⎦ Z= i =1

⎛ n ⎞ 0.09n + 1600
Var⎜ ∑ X i − W ⎟ = 0.09n + 1600
⎝ i =1 ⎠
400 − 3n
≤ 1.28 ⇒ n ≥ 117
0.09n + 1600

17 / 22
central limit theorem (3)
an important application of the central limit theorem is
for binomial random variables
⎧1 with prob. p
X = X1 + X 2 + L + X n Xi = ⎨
⎩0 with prob. (1 − p )
X is a random variable that represent the number of
successes in n trials, where the probability of success
in each trial is p
E [X i ] = p; Var ( X i ) = p (1 − p )

X − np
the central limit theorem states that Z =
np (1 − p )
is approximately a standard normal random variable
see problem 15 of chapter 6

18 / 22
sample variance

the sample variance is a statistics defined as

∑ i
( X − X ) 2

S2 = i =1
n −1

by using (n-1) in the denominator we obtain

[ ]
E S2 =σ 2

19 / 22
sampling from normal population (1)

let X 1 , X 2 ,L X n be a sample from a normal population

X i ~ N (μ ,σ 2 )
then the sample mean is a normal random variable with

(
X ~ N μ , σn
2
)
to find the distribution of the sample variance
n

∑ i
( X − X ) 2

S2 = i =1
n −1
recall the chi-square distribution
– Y = Z1 + Z 2 + L Z n has chi-square distribution with n degrees
2 2 2

of freedom if each Z i is standard normal

20 / 22
sampling from normal population (2)
(n − 1) S 2
the variable σ2 has a chi-square distribution
with n-1 degrees of freedom

recall the t-distribution with n degrees of freedom as a

distribution of Z
χ n2
n

X −μ
then it follows that n
S
has a t-distribution with n-1 degrees of freedom

21 / 22
sampling from a finite population
random sample of population of N elements
⎛N⎞
– each of the ⎜⎜ ⎟⎟ subsets is equally likely to be sample
⎝n⎠
consider the case where the fraction p of the
population has some feature i. e. in total Np elements
– let Xi be the indicator variable

X = X1 + X 2 + L + X n

note that now X 1 , X 2 ,K X n are not independent

however, if N >> n , then the distribution of X has

approximately the features of binomial r.v. with n and p

22 / 22

Biostatistics in Orthodontics
100% (4)
Biostatistics in Orthodontics
108 pages
MMW Lecture 4.3 Data Management Part 3
No ratings yet
MMW Lecture 4.3 Data Management Part 3
55 pages
National Academy - Dharmapuri P.G TRB Maths: Contact: 8248617507, 7010865319
No ratings yet
National Academy - Dharmapuri P.G TRB Maths: Contact: 8248617507, 7010865319
4 pages
Hypothesis Testing 23.09.2023
No ratings yet
Hypothesis Testing 23.09.2023
157 pages
TOPIC 6 Sampling Distribution and Point Estimation of Parameters
No ratings yet
TOPIC 6 Sampling Distribution and Point Estimation of Parameters
38 pages
Biostatistics and Epidemiology Course Outline
100% (1)
Biostatistics and Epidemiology Course Outline
2 pages
2.11.sampling and Central Limit Theorem
No ratings yet
2.11.sampling and Central Limit Theorem
26 pages
Sampling Distribution Basics
No ratings yet
Sampling Distribution Basics
30 pages
Chapter 6-8 Sampling and Estimation
No ratings yet
Chapter 6-8 Sampling and Estimation
48 pages
100 Days of GATE Data Science and AI
No ratings yet
100 Days of GATE Data Science and AI
13 pages
Inferential Statistics
No ratings yet
Inferential Statistics
73 pages
Estimation & Hypothesis Testing - PPTX (Final)
No ratings yet
Estimation & Hypothesis Testing - PPTX (Final)
92 pages
Permutations, Probability, and Statistics Guide
No ratings yet
Permutations, Probability, and Statistics Guide
3 pages
DMV - Unit I
No ratings yet
DMV - Unit I
44 pages
Statistics For Managers Using Microsoft® Excel 5th Edition: Presenting Data in Tables and Charts
No ratings yet
Statistics For Managers Using Microsoft® Excel 5th Edition: Presenting Data in Tables and Charts
35 pages
Lecture On Sampling Distributions
No ratings yet
Lecture On Sampling Distributions
31 pages
Sampling Distribution (19.09.2020)
No ratings yet
Sampling Distribution (19.09.2020)
23 pages
1 Review 1-13-2025
No ratings yet
1 Review 1-13-2025
97 pages
Sampling Dist
No ratings yet
Sampling Dist
34 pages
Econ Review Stat W2 2025
No ratings yet
Econ Review Stat W2 2025
49 pages
Point Estimation
No ratings yet
Point Estimation
47 pages
Screenshot 2024-12-15 at 01.18.34
No ratings yet
Screenshot 2024-12-15 at 01.18.34
161 pages
Chpater Three
No ratings yet
Chpater Three
84 pages
2 5+Sample+Moments
No ratings yet
2 5+Sample+Moments
30 pages
Econ-2042 - Unit 5-HO
No ratings yet
Econ-2042 - Unit 5-HO
22 pages
Week 9
No ratings yet
Week 9
19 pages
Chapter 07
No ratings yet
Chapter 07
31 pages
Limit Theoram
No ratings yet
Limit Theoram
20 pages
Session 11&12
No ratings yet
Session 11&12
46 pages
A Reflection On Test Automation
No ratings yet
A Reflection On Test Automation
30 pages
Business Statistics CH
No ratings yet
Business Statistics CH
29 pages
Statistical Methods Ecourse (ICAR)
No ratings yet
Statistical Methods Ecourse (ICAR)
76 pages
Central Limit Theorem Grade 11 Group 4
No ratings yet
Central Limit Theorem Grade 11 Group 4
7 pages
Sampling & Confidence Intervals
No ratings yet
Sampling & Confidence Intervals
72 pages
Statistical Foundations: SOST70151 - LECTURE 5
No ratings yet
Statistical Foundations: SOST70151 - LECTURE 5
49 pages
AP Statistics 1st Semester Study Guide
No ratings yet
AP Statistics 1st Semester Study Guide
6 pages
Formula List Statistics 2
No ratings yet
Formula List Statistics 2
4 pages
Chapter 2 Students-Sta408
No ratings yet
Chapter 2 Students-Sta408
59 pages
Probability & Regression Basics
100% (2)
Probability & Regression Basics
5 pages
Stat Notes
No ratings yet
Stat Notes
5 pages
Module01 ProbabilityAndHypothesisTesting
No ratings yet
Module01 ProbabilityAndHypothesisTesting
62 pages
Sampling Distribution
No ratings yet
Sampling Distribution
41 pages
ST104a Commentary Autumn 2021
No ratings yet
ST104a Commentary Autumn 2021
29 pages
Central Limit Theorem
100% (3)
Central Limit Theorem
38 pages
Ch1 Prob II NAU Spring23
No ratings yet
Ch1 Prob II NAU Spring23
17 pages
Tolerance Analysis Using Worst Case Approach
100% (2)
Tolerance Analysis Using Worst Case Approach
25 pages
Chapter 5
No ratings yet
Chapter 5
47 pages
Guidance Note 3B Version 1.0
100% (1)
Guidance Note 3B Version 1.0
26 pages
Statistics Lecture 3 Summary
No ratings yet
Statistics Lecture 3 Summary
5 pages
Statand Prob Q4 M5
No ratings yet
Statand Prob Q4 M5
16 pages
Sampling Distributions
No ratings yet
Sampling Distributions
32 pages
Fundamentals of Statistics
No ratings yet
Fundamentals of Statistics
8 pages
Gsbiju MA202 3 1
No ratings yet
Gsbiju MA202 3 1
5 pages
Applied Statistics and Probability For Engineers Chapter - 7
No ratings yet
Applied Statistics and Probability For Engineers Chapter - 7
8 pages
Lecture 3 - Sampling-Distribution & Central Limit Theorem
No ratings yet
Lecture 3 - Sampling-Distribution & Central Limit Theorem
5 pages
5 BSM214 Lecture5 Fall2023
No ratings yet
5 BSM214 Lecture5 Fall2023
25 pages
Lecture Transcript 3 (Sampling and Sampling Distribution)
No ratings yet
Lecture Transcript 3 (Sampling and Sampling Distribution)
5 pages
STPM 950 SP Math (M) (9.3.12)
No ratings yet
STPM 950 SP Math (M) (9.3.12)
54 pages
Statistics I: Parameter Estimation, Part I
No ratings yet
Statistics I: Parameter Estimation, Part I
24 pages
UCT PSY2015F Statistics 2023
No ratings yet
UCT PSY2015F Statistics 2023
34 pages
Question Paper Pure Mathematics
No ratings yet
Question Paper Pure Mathematics
8 pages
Introduction To Statistics Part IV: Statistical Inference: Achim Ahrens Anna Babloyan Erkal Ersoy
No ratings yet
Introduction To Statistics Part IV: Statistical Inference: Achim Ahrens Anna Babloyan Erkal Ersoy
44 pages
Sampling CLT CI
No ratings yet
Sampling CLT CI
81 pages
Week 006-007 - Course Module Central Limit Theorem
No ratings yet
Week 006-007 - Course Module Central Limit Theorem
17 pages
The Practice of Statistic For Business and Economics Is An Introductory
No ratings yet
The Practice of Statistic For Business and Economics Is An Introductory
15 pages
Unit 8. Data Analysis
No ratings yet
Unit 8. Data Analysis
69 pages
Course: Statistical Inference & Applications: Instructor in Charge
No ratings yet
Course: Statistical Inference & Applications: Instructor in Charge
30 pages
Lect9 Math231
No ratings yet
Lect9 Math231
42 pages
2017-1-31 Lecture 3a - Central Limit Theorem v1.0
No ratings yet
2017-1-31 Lecture 3a - Central Limit Theorem v1.0
4 pages
Intro to Statistical Concepts
No ratings yet
Intro to Statistical Concepts
19 pages
Subject: Inferential Statistics Module Number: 1.1 Module Name: Parameter Estimation - Preliminaries
No ratings yet
Subject: Inferential Statistics Module Number: 1.1 Module Name: Parameter Estimation - Preliminaries
30 pages
Parameter, Statistic and Random Samples: Random Sample of Size N If The X
No ratings yet
Parameter, Statistic and Random Samples: Random Sample of Size N If The X
15 pages
Sampling Distributions of Sample Means
No ratings yet
Sampling Distributions of Sample Means
7 pages
MIT14 30s09 Lec17
No ratings yet
MIT14 30s09 Lec17
9 pages
IEP102 Group 3
No ratings yet
IEP102 Group 3
75 pages
che4C3Notes 2006
No ratings yet
che4C3Notes 2006
96 pages
Statistics For Managers Using Microsoft® Excel 5th Edition: Fundamentals of Hypothesis Testing: One Sample Tests
No ratings yet
Statistics For Managers Using Microsoft® Excel 5th Edition: Fundamentals of Hypothesis Testing: One Sample Tests
51 pages
A New Statistical Framework For The Determination of Safe Creep Life Using The Theta Projection Technique
No ratings yet
A New Statistical Framework For The Determination of Safe Creep Life Using The Theta Projection Technique
12 pages
8614 Assignment 01
No ratings yet
8614 Assignment 01
24 pages
Che 4C3/6C3: Lecturer: Dr. John Macgregor Ta'S: Arv Jegatheesan, Nrb-B105, Ext. 26876, Jegatha@Mcmaster - Ca
No ratings yet
Che 4C3/6C3: Lecturer: Dr. John Macgregor Ta'S: Arv Jegatheesan, Nrb-B105, Ext. 26876, Jegatha@Mcmaster - Ca
14 pages
L8 Statistical Estimation 1
No ratings yet
L8 Statistical Estimation 1
48 pages
Data Analytics Unit III
No ratings yet
Data Analytics Unit III
88 pages
DiffusionModel DDPM
No ratings yet
DiffusionModel DDPM
52 pages
Confidence Intervals with σ unknown
No ratings yet
Confidence Intervals with σ unknown
9 pages
Cumulative Function N Dimensional Gaussians 12.2013
No ratings yet
Cumulative Function N Dimensional Gaussians 12.2013
8 pages
Active Learning Lecture Slides: Chapter 2 Summarizing and Graphing Data
No ratings yet
Active Learning Lecture Slides: Chapter 2 Summarizing and Graphing Data
11 pages
Materi 4 Estimasi Titik Dan Interval-Edit
No ratings yet
Materi 4 Estimasi Titik Dan Interval-Edit
73 pages
Normal Probability Distribution and Z Table
No ratings yet
Normal Probability Distribution and Z Table
4 pages
Selected Statistical Tests
No ratings yet
Selected Statistical Tests
258 pages
Statistics I: Hypothesis Testing, Part II
No ratings yet
Statistics I: Hypothesis Testing, Part II
27 pages
NN5 Time Series Forecasting Model
No ratings yet
NN5 Time Series Forecasting Model
29 pages
Statistics For Managers Using Microsoft® Excel 5th Edition: Numerical Descriptive Measures
No ratings yet
Statistics For Managers Using Microsoft® Excel 5th Edition: Numerical Descriptive Measures
64 pages
Statistics For Managers Using Microsoft® Excel 5th Edition: Two-Sample Tests
No ratings yet
Statistics For Managers Using Microsoft® Excel 5th Edition: Two-Sample Tests
53 pages
The Simple Linear Regression Model: Specification and Estimation
No ratings yet
The Simple Linear Regression Model: Specification and Estimation
66 pages
Statistics I: Parameter Estimation, Part II
No ratings yet
Statistics I: Parameter Estimation, Part II
22 pages
Statistics Son
No ratings yet
Statistics Son
39 pages
Mmstat10e PPT 04
No ratings yet
Mmstat10e PPT 04
17 pages
Towards A Statistical Paradigm For Climate: Change
No ratings yet
Towards A Statistical Paradigm For Climate: Change
9 pages
Water Engineering & Hydraulics: Dr. Tamer Eshtawi Second Semester, 2016
No ratings yet
Water Engineering & Hydraulics: Dr. Tamer Eshtawi Second Semester, 2016
13 pages
Active Learning Lecture Slides: Chapter 4 Probability
No ratings yet
Active Learning Lecture Slides: Chapter 4 Probability
15 pages
Active Learning Lecture Slides: Chapter 1 Introduction To Statistics
No ratings yet
Active Learning Lecture Slides: Chapter 1 Introduction To Statistics
15 pages
Probd
No ratings yet
Probd
49 pages
Statistics For Managers Using Microsoft® Excel 5th Edition: The Normal Distribution and Other Continuous Distributions
No ratings yet
Statistics For Managers Using Microsoft® Excel 5th Edition: The Normal Distribution and Other Continuous Distributions
41 pages
Active Learning Lecture Slides: Chapter 11 Multinomial Experiments and Contingency Tables
No ratings yet
Active Learning Lecture Slides: Chapter 11 Multinomial Experiments and Contingency Tables
9 pages
Appendix 2.2 - Drinking Water Quality Standard: RD TH
No ratings yet
Appendix 2.2 - Drinking Water Quality Standard: RD TH
4 pages
ESI 6321 - Spring 2007 Exam 1 Solutions: Problem 1
No ratings yet
ESI 6321 - Spring 2007 Exam 1 Solutions: Problem 1
3 pages
hw4s Key
No ratings yet
hw4s Key
3 pages
Treatment and Analysis of Data - Applied Statistics: What Does It Mean?
No ratings yet
Treatment and Analysis of Data - Applied Statistics: What Does It Mean?
17 pages
Introductory Lecture 2007 PDF
No ratings yet
Introductory Lecture 2007 PDF
14 pages

Statistics I: Introduction and Distributions of Sampling Statistics

Uploaded by

Statistics I: Introduction and Distributions of Sampling Statistics

Uploaded by

statistics I

mm1: introduction and

 distributions of sampling statistics

 statistics: collection/description/analysis of data and

 descriptive and inferential statistics

 populations and samples

 frequency tables and graphs

 relative frequency tables and graphs

– the histogram can be used to approximate the continuous

 stem and leaf

daily minimum temperatures (in F)

 mean vs. median

 sample mode = the most frequent value in the set

n-1 instead of n due to unbiased estimation

algebraic identity linear property

 to determine the sample 100p percentile, where 0≤p≤1,

 weak law of large numbers

skewed to the left the empirical rule

sample correlation coefficient

∑(x(x− −x )(x )(y y− −y )y )

 measures association, not causation

 if X 1 , X 2 ,L X n are independent random variables having

 types of inference problems

 we now define statistic as a random variable whose

we suppose that the value of any population member

– the theorem is very powerful, as the distribution of the variable

 from this theorem it follows that the variable Z is a

 the sample variance is a statistics defined as

 by using (n-1) in the denominator we obtain

 let X 1 , X 2 ,L X n be a sample from a normal population

of freedom if each Z i is standard normal

 recall the t-distribution with n degrees of freedom as a

 note that now X 1 , X 2 ,K X n are not independent

 however, if N >> n , then the distribution of X has

You might also like

distributions of sampling statistics

statistics: collection/description/analysis of data and

descriptive and inferential statistics

populations and samples

frequency tables and graphs

relative frequency tables and graphs

stem and leaf

mean vs. median

sample mode = the most frequent value in the set

to determine the sample 100p percentile, where 0≤p≤1,

weak law of large numbers

measures association, not causation

if X 1 , X 2 ,L X n are independent random variables having

types of inference problems

we now define statistic as a random variable whose

we suppose that the value of any population member

from this theorem it follows that the variable Z is a

the sample variance is a statistics defined as

by using (n-1) in the denominator we obtain

let X 1 , X 2 ,L X n be a sample from a normal population

recall the t-distribution with n degrees of freedom as a

note that now X 1 , X 2 ,K X n are not independent

however, if N >> n , then the distribution of X has