0% found this document useful (0 votes)

26 views27 pages

Advanced Statistics for Research

The document discusses key concepts in statistics including: 1) Defining levels of uncertainty based on measurement error and probabilities related to randomly distributed values. 2) The main difference between a population and sample is how observations are assigned - a population includes all elements from a data set while a sample consists of one or more observations drawn from the population. 3) Reporting a single measurement contains inherent information about the accuracy of both the measurement and measurement system.

Uploaded by

gundokaygee17

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views27 pages

Advanced Statistics for Research

Uploaded by

gundokaygee17

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Advanced Diploma

Research Methodology and Project (RMP470S)

One-dimensional statistics

 Dr N. Dlangamandla
 Email: [email protected]
 30 August 2023
The challenge of statistics is to….
Define the levels of uncertainty
• Based on measurement error and probabilities
• Related to randomly distributed values.

• These are named random errors and are different from

systematic errors which result from some bias in the
measurement technique (e.g. calibration error)

• The main difference between a population and sample has

to do with how observations are assigned to the data set.

• A population includes all of the elements from a set of data.

• A sample consists one or more observations drawn from the
population.
Introduction
• A research paper reports a distance measurement of 10.5
m.
• The implication is that:
• The measurement accuracy is 10.5 ±0.05 m;
• The measurement instrument has been calibrated
• The measurement instrument is capable of resolving
measurements to this accuracy
• A measurement of 10.7 m is significantly different from the result
stated
• Plotting a histogram of the same result measured several times can
show the slightly different results being recorded.

• Reporting of a single measurement contains inherent

information about the accuracy of both the measurement
and the measurement system.
5% Probability estimate
• This is a common measure in statistics –a ‘rule of thumb’
• 5% of all the measured values will lie outside this range of
values centred on the mean value
• 95% of the measurements will lie within this range
• This probability value is a measure of the random,
symmetrical distribution of measured values about the
mean value.
• Assuming a normal (random) distribution about the mean
value (μ) less than 5% of the measurements will lie outside
the range of ± two standard deviations (σ) away from the
mean.
• On average 2.5% will have values greater than μ + 2σ and
2.5% will have values small than μ + 2σ.
• There might be situations where the 5% probability of error
is unacceptably large, then a smaller probability might be
mandated.
Normal distribution curve
Descriptive and inferential statistics
• Descriptive statistics summarize information already present in
data
◦ Visualizations like boxplots, histograms, etc.
◦ Summary measures like averages, standard deviation, median,
etc.

• Inferential statistics use a sample of data to make predictions about

larger populations or about unobserved/future trends
◦ Any measurements made in the presence of noise or variation
◦ Generalizations from a sample to a population
▪ Confidence intervals, hypothesis tests, etc.
◦ Comparisons made between datasets
▪ Comparisons, correlations, regress, etc.
6
Statistics describe different types of data
• Categorical values take one of a discrete set of unordered values
◦ A tissue type: blood/skin/lung/GI/etc.

• Ordinal values take one of a discrete set of ordered values

◦ Counts or rank orders
◦ Often (but not always) analyzed in the same way as continuous
values

• Continuous values take one value from an ordered numerical scale

◦ Times, frequencies, ratios, percentages, abundances, etc.

7
Simple descriptive statistics
• A statistic is any single value that summarizes an entire dataset

• Parametric summary statistics

◦ Typically used to describe "well-behaved" data that are
approximately
normally-distributed
▪ i.e. continuous, symmetric, thin-tailed, no outliers
▪ Closeness needed for "approximately" depends on application

• Average = Mean =  = x/n

8
Simple descriptive statistics
σ(𝑥−𝜇)2
• Standard deviation = Variance =  = (population)
𝑛
◦ Beware the difference between population and sample standard deviation
σ(𝑥−𝜇)2 σ(𝑥−𝜇)2
◦ s= (population) or σ = (sample) Why?
𝑛 𝑛−1

9
Nonparametric statistics
• Can be used to describe any data regardless of distribution
◦ No free lunch: they're less sensitive to false and real signals
◦ Fewer false positives, but potentially fewer true positives, too

• Median = m = x[|x|/2] = midpoint of dataset

• Percentile = p(y) = x[y|x|] = data point y% of the way "through" dataset

• Quartiles = 25th, 50th, and 75th percentiles = {p(0.25), p(0.5), p(0.75)}

◦ Also quintiles, deciles, etc.

• Inter-quartile range = IQR = p(0.75) - p(0.25)

◦ Difference between upper and lower quartiles

8/30/2023 10
Statistics for paired data: comparisons
• What about experiments that result in more than one measurement?
◦ Paired? Multidimensional?

Subject S1 S2 S3 S4 S5 …
Height 150 155 160 165 175 …
Height 153 154 162 163 191 …
(father)

Height 150 155 160 165 175 161 Can we generate a "joint"
statistic that summarizes
Mean something about the
"similarity" of two sets of
Height 153 154 162 163 191 164.6 measurements?
(father)
11
Simple descriptive statistics
• Simply put, a z-score (also called a standard score) gives you an idea of how far from the
mean a data point is. But more technically it’s a measure of how many standard deviations
below or above the population mean a raw score is.
• Data expressed as z-scores are relative to a dataset's mean and 
• The Z Score Formula: One Sample
• The basic z score formula for a sample is:
◦ z = (x-)/
99%
95%
2/3

12
Simple descriptive statistics
• For example, let’s say you have a test score of 190. The test has a
mean (μ) of 150 and a standard deviation (σ) of 25. Assuming
anormal distribution, your z score would be:
z = (x –μ) / σ
= (190 –150) / 25 = 1.6.
• The z score tells you how many standard deviations from the mean
your score is. In this example, your score is 1.6 standard deviations
above the mean.

13
Simple descriptive statistics
• Technically, a z-score is the number of standard deviations from the
mean value of the reference population (a population whose known
values have been recorded, like in these charts the CDC compiles
about people’s weights). For example:

• A z-score of 1 is 1 standard deviation above the mean.

• A score of 2 is 2 standard deviations above the mean.
• A score of -1.8 is -1.8standard deviations below the mean.
• A z-score tells you where the score lies on anormal distribution
curve. A z-score of zero tells you the values is exactly average while
a score of +3 tells you that the value is much higher than average.

14
Z Score Formula: Standard Error of the Mean
• When you have multiple samples and want to describe the standard
deviation of those sample means (the standard error), you would
use this z score formula:
z = (x –μ) / (σ / √n)
This z-score will tell you how many standard errors there are
between the sample mean and the population mean.
Example problem:
• In general, the mean weight of women is 65 kg with a standard
deviation of 3.5 kg. What is the probability of finding a random
sample of 50 women with a mean height of 70 kg, assuming the
heights are normally distributed?
z = (x –μ) / (σ / √n)
= (70 –65) / (3.5/√50) = 5 / 0.495 = 10.1

15
Z Score Formula: Standard Error of the Mean
The standard error

σ 𝑥−𝜇 2
𝜎𝑠 𝑛−1
𝜎𝜇 = =
𝑛 𝑛

The degrees of freedom = (n – 1).

16
Probability: Basic definitions
• Experiment: anything that produces a non-deterministic result
◦ Coin flip, die roll, item count, concentration measurement,
distance measurement...

• Sample space: the set of all possible outcomes for a particular

experiment, finite or infinite, discrete or continuous
◦ {H, T}, {1, 2, 3, 4, 5, 6}, {0, 1, 2, 3, ...}, {0, 0.1, 0.001, 0.02,
3.14159, ...}

• Event: any subset of a sample space

◦ {}, {H}, {1, 3, 5}, {0, 1, 2}, [0, 3)

• Probability: for an event E, the limit of n(E)/n as n grows large

17
Where normal distribution can not be used for
probability calculations
• The mean value is close to zero and negative values are not possible
in the data set
• The distribution is skewed about the mean. This is defined
numerically as the skewness of the population.
• The μ±2σrange of values do not contribute 95% of the probability.
This is numerically defined as the kurtosis of the population.

18
Combining errors and uncertainties

• Once several parameters have been determined experimentally

• And associated errors determine using the 5% probability concept

• Some additional mathematical processing might be required

• In which the different parameters and their associated errors are

combined

• To calculate the final value of interest and the associated error.

• There are some simple rules for combining errors which a based on
the least squared error analysis used to calculate the mean value
19
Combining errors and uncertainties

• If two values are to be added or subtracted:

𝑥𝑖 ± 2𝜎𝑖 𝑎𝑛𝑑 𝑥𝑗 ∓ 2𝜎𝑗

0.5
𝑦 = (𝑥𝑖 ± 𝑥) ∓ 2 𝜎𝑖2 ± 𝜎𝑗2

• All units must be the same

• If two values are divided or multiplied: 𝑥𝑖 ∓ 2𝜎𝑖 𝑎𝑛𝑑 𝑥𝑗 ∓ 2𝜎𝑗
2 2
𝑥𝑖 𝑥𝑖 𝜎𝑖 𝜎𝑗
𝑦= ± 2 +
𝑥𝑗 𝑥𝑗 𝑥𝑖 𝑥𝑗
•The units of y, xi, xj do not have to be identical.
• Expressions are based on the RMS analysis and so are statistically
rigorous and should be used in combining data with their associated
errors.
20
Student’s t-test
• In some experimental investigations it is important to know if two
populations are likely to be sample populations selected from the
same global populations.

• This type of one-dimensional question can be addressed using a t-

test (also known as Student’s t-test)

• The test is most applicable when the standard deviations are very
large in comparison to the likely changes or differences between
the two mean values.

21
Two types of t-test
Paired t-test

• Is one in which the same population is tested twice to determine if

there has been a change in the overall population

• It is a method of determining if there is a statistically significant

change in the population after an intervention.

• A simple mean and standard deviation calculation will not show a

significant change if the change is likely to be significantly smaller
than the standard deviation measure of the

22
Two types of t-test
Unpaired t-test

◦ Is one in which two different populations are measured to determine if

there is a difference between the two populations.

◦ In this case the two populations are unrelated and the number of samples
can be different in the two sample sets.
• The t-test can be evaluated using the MS-Excel function t-test-
paired and t-test-unpaired.

• In Matlab the functions are t-test 1 and t-test 2 for the paired and
unpaired data sets respectively.

23
ANOVA statistics
• The We looked at one-dimension statistical methods first
• Then two different populations were compared using the t test.
• If there are more than two dependent data sets, these techniques are
inadequate.
• If many repeat measurements are made of a number of members of the
population
• The ANOVA statistical methods allow the calculation of probability
estimates for three or more datasets
• As for the t-test, this method can determine statistically significant
differences when the standard deviations in the parameters are much
larger than the difference between the populations
◦ The ANOVA test can be evaluated using the MS Excel function ANOVA
◦ Two-factor with replication ANOVA
◦ Two-factor without replication
◦ In Matlab the functions are anova 1 and anova2

24
Exercise 1: Instrumentation & Calibration
Review specification sheets for 3 experimental
instruments that you will use in your research project.
Briefly summarise the following user requirements:
◦ Dynamic range
◦ Sensitivity
◦ Linearity
◦ Calibration requirements
◦ Calibration procedure

25
Exercise 2: Review of statistical analysis in journal
article
Review a published journal article in your engineering
discipline which includes a statistical analysis.

• Write a brief report on the statistical analysis.

• Can you suggest an improved statistical analysis?
• Suggest some additional parameters that might have been
measured during the data acquisition stage.
• Explain how you would analyze the total data set of the
additional measurements.

26
END

One Dimensional Statistics
No ratings yet
One Dimensional Statistics
21 pages
Lecture On Normal Distribution - Docx STAT JUNE 17
No ratings yet
Lecture On Normal Distribution - Docx STAT JUNE 17
15 pages
Review of Chapters 1-5
No ratings yet
Review of Chapters 1-5
21 pages
Statistics 1 Revision Sheet
No ratings yet
Statistics 1 Revision Sheet
9 pages
Chapter 5 - RM
No ratings yet
Chapter 5 - RM
22 pages
Descriptive Statistics & Data Analysis
No ratings yet
Descriptive Statistics & Data Analysis
48 pages
Prob & Stats (Slides) PDF
No ratings yet
Prob & Stats (Slides) PDF
101 pages
Lecture06 Ch6 Forsyth Inf Stats FA24
No ratings yet
Lecture06 Ch6 Forsyth Inf Stats FA24
56 pages
Biostatistics Revision DR - NJ
No ratings yet
Biostatistics Revision DR - NJ
67 pages
Statistics Basics for Data Science
100% (1)
Statistics Basics for Data Science
27 pages
History Reporting
No ratings yet
History Reporting
61 pages
Statistics ESCP
No ratings yet
Statistics ESCP
383 pages
Bio Statistics
No ratings yet
Bio Statistics
97 pages
Formula Stables
No ratings yet
Formula Stables
29 pages
AYURSURE (Research and Stat) 4
No ratings yet
AYURSURE (Research and Stat) 4
44 pages
Lecture 2 Foundations of Inference
No ratings yet
Lecture 2 Foundations of Inference
23 pages
The World of Statistics
No ratings yet
The World of Statistics
1 page
Data Analysis
No ratings yet
Data Analysis
8 pages
Statistics and Data Management Guide
No ratings yet
Statistics and Data Management Guide
14 pages
Week 4 Bioscience
No ratings yet
Week 4 Bioscience
37 pages
Distribution of Data
No ratings yet
Distribution of Data
32 pages
Measure of Dispersion-1
No ratings yet
Measure of Dispersion-1
17 pages
My Little Stats Book
No ratings yet
My Little Stats Book
8 pages
"Significance Testing 101: The Z Test - Part One": My Former Statistics Professor Used To Say That: "Our World Is Noisy."
No ratings yet
"Significance Testing 101: The Z Test - Part One": My Former Statistics Professor Used To Say That: "Our World Is Noisy."
5 pages
Location) .: Distribution Is The Purpose of Measure of Central
No ratings yet
Location) .: Distribution Is The Purpose of Measure of Central
13 pages
Chapter 4 - Summarizing Numerical Data
No ratings yet
Chapter 4 - Summarizing Numerical Data
8 pages
Lecture 6 Estimation
No ratings yet
Lecture 6 Estimation
8 pages
Unit-4 - Confidence Interval and CLT
No ratings yet
Unit-4 - Confidence Interval and CLT
29 pages
ECM1001 Formula Sheet
No ratings yet
ECM1001 Formula Sheet
15 pages
The World of Statistics
No ratings yet
The World of Statistics
1 page
Statistics Equationls
No ratings yet
Statistics Equationls
5 pages
Tutoring Study Plan
No ratings yet
Tutoring Study Plan
17 pages
4 - Stat - Measures of Variation 2024
No ratings yet
4 - Stat - Measures of Variation 2024
27 pages
Manm526 W1
No ratings yet
Manm526 W1
38 pages
Intro to Statistical Estimation
No ratings yet
Intro to Statistical Estimation
19 pages
GB Academy Equation List
No ratings yet
GB Academy Equation List
16 pages
Error Analysis - Statistics: - Accuracy and Precision - Individual Measurement Uncertainty
No ratings yet
Error Analysis - Statistics: - Accuracy and Precision - Individual Measurement Uncertainty
33 pages
C - Normal Distribution
No ratings yet
C - Normal Distribution
196 pages
Normal Distribution
No ratings yet
Normal Distribution
9 pages
Stats 1 For Students
No ratings yet
Stats 1 For Students
60 pages
4x @6ote ) 'Btda2@m
No ratings yet
4x @6ote ) 'Btda2@m
55 pages
CHM 421 - ToPIC 3 - Statistics
No ratings yet
CHM 421 - ToPIC 3 - Statistics
58 pages
Precision & Accuracy in Experiments
No ratings yet
Precision & Accuracy in Experiments
42 pages
Chapter 1 - F2021 - IE 242
No ratings yet
Chapter 1 - F2021 - IE 242
35 pages
SPC Awareness Training
No ratings yet
SPC Awareness Training
70 pages
UCT PSY2015F Statistics 2023
No ratings yet
UCT PSY2015F Statistics 2023
34 pages
Module I. Basic Calculations. Average, Standard Deviation by Excel
No ratings yet
Module I. Basic Calculations. Average, Standard Deviation by Excel
48 pages
Basic Statistics-Concepts and Applications
No ratings yet
Basic Statistics-Concepts and Applications
45 pages
Tenko Raykov, George A. Marcoulides-Basic Statistics - An Introduction With R-Rowman & Littlefield Publishers (2012) PDF
No ratings yet
Tenko Raykov, George A. Marcoulides-Basic Statistics - An Introduction With R-Rowman & Littlefield Publishers (2012) PDF
345 pages
Statistical Inference Course Guide
No ratings yet
Statistical Inference Course Guide
69 pages
ISDS 361A - Cheat Sheet Exam 1 PDF
No ratings yet
ISDS 361A - Cheat Sheet Exam 1 PDF
2 pages
Intro to Descriptive Statistics
100% (1)
Intro to Descriptive Statistics
20 pages
Statistics Cheat Sheet
100% (3)
Statistics Cheat Sheet
23 pages
Business Forecasting 9th Edition Hanke Solution Manual
71% (7)
Business Forecasting 9th Edition Hanke Solution Manual
9 pages
L8 Statistical Estimation 1
No ratings yet
L8 Statistical Estimation 1
48 pages
Statistics Final Review
No ratings yet
Statistics Final Review
37 pages
Systematic Mapping Studies in Software Engineering: June 2008
No ratings yet
Systematic Mapping Studies in Software Engineering: June 2008
11 pages
English Ed Research Stats Guide
No ratings yet
English Ed Research Stats Guide
8 pages
History of Econometrics Milestones
No ratings yet
History of Econometrics Milestones
2 pages
BCA Exam: Statistical Techniques
No ratings yet
BCA Exam: Statistical Techniques
7 pages
Toaru Majutsu No Index-NT Volume2
No ratings yet
Toaru Majutsu No Index-NT Volume2
85 pages
Advanced Gauge Theory for Physicists
No ratings yet
Advanced Gauge Theory for Physicists
16 pages
3 - q3 Practical Research
No ratings yet
3 - q3 Practical Research
18 pages
Research Methods For Business Students, 9th Edition Mark Saunders Download
No ratings yet
Research Methods For Business Students, 9th Edition Mark Saunders Download
88 pages
Chapter Eleven: Sampling: Design and Procedures
No ratings yet
Chapter Eleven: Sampling: Design and Procedures
28 pages
Syllabus For College and Advanced Algebra
100% (1)
Syllabus For College and Advanced Algebra
6 pages
Objective Knowledge An Evolutionary Approach - Karl R. Popper
100% (1)
Objective Knowledge An Evolutionary Approach - Karl R. Popper
402 pages
Basic Statistical Concepts Review
No ratings yet
Basic Statistical Concepts Review
8 pages
Toh
No ratings yet
Toh
53 pages
Introduction of Quantum Mechanics
50% (2)
Introduction of Quantum Mechanics
2 pages
Instrumental Analysis Guide
No ratings yet
Instrumental Analysis Guide
14 pages
Stock Market Analysis and Prediction: Jabalpur Engineering College, Jabalpur (M.P.)
No ratings yet
Stock Market Analysis and Prediction: Jabalpur Engineering College, Jabalpur (M.P.)
12 pages
Research Methodology
No ratings yet
Research Methodology
2 pages
Week 4 Quiz
No ratings yet
Week 4 Quiz
2 pages
Research Methods & Report Writing
No ratings yet
Research Methods & Report Writing
30 pages
Statistics and Probability - q4 - Mod6 - Computation of Test Statistic On Population-Mean - V2
No ratings yet
Statistics and Probability - q4 - Mod6 - Computation of Test Statistic On Population-Mean - V2
24 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
35 pages
MÔ HÌNH BIẾN TRUNG GIAN
No ratings yet
MÔ HÌNH BIẾN TRUNG GIAN
5 pages
Business Statistics Course Guide
No ratings yet
Business Statistics Course Guide
6 pages
Correlationanalysis
No ratings yet
Correlationanalysis
49 pages
Essentials of Statistics For Business and Economics 8th Edition Anderson Solutions Manual Download
100% (8)
Essentials of Statistics For Business and Economics 8th Edition Anderson Solutions Manual Download
50 pages
Communicating with the Afterlife
No ratings yet
Communicating with the Afterlife
16 pages
Research Individual Assignment
No ratings yet
Research Individual Assignment
2 pages
Activity 1 Scientific Investigation
No ratings yet
Activity 1 Scientific Investigation
2 pages
Advanced Econometrics Course Guide
No ratings yet
Advanced Econometrics Course Guide
4 pages
Methods of Research Calmorin Chapter 1
No ratings yet
Methods of Research Calmorin Chapter 1
46 pages

Advanced Statistics for Research

Uploaded by

Advanced Statistics for Research

Uploaded by

Advanced Diploma

Research Methodology and Project (RMP470S)

• These are named random errors and are different from

• The main difference between a population and sample has

• A population includes all of the elements from a set of data.

• Reporting of a single measurement contains inherent

• Inferential statistics use a sample of data to make predictions about

• Ordinal values take one of a discrete set of ordered values

• Continuous values take one value from an ordered numerical scale

• Parametric summary statistics

• Average = Mean =  = x/n

• Median = m = x[|x|/2] = midpoint of dataset

• Percentile = p(y) = x[y|x|] = data point y% of the way "through" dataset

• Quartiles = 25th, 50th, and 75th percentiles = {p(0.25), p(0.5), p(0.75)}

• Inter-quartile range = IQR = p(0.75) - p(0.25)

• A z-score of 1 is 1 standard deviation above the mean.

The degrees of freedom = (n – 1).

• Sample space: the set of all possible outcomes for a particular

• Event: any subset of a sample space

• Probability: for an event E, the limit of n(E)/n as n grows large

• Once several parameters have been determined experimentally

• And associated errors determine using the 5% probability concept

• Some additional mathematical processing might be required

• In which the different parameters and their associated errors are

• To calculate the final value of interest and the associated error.

• If two values are to be added or subtracted:

• All units must be the same

• This type of one-dimensional question can be addressed using a t-

• Is one in which the same population is tested twice to determine if

• It is a method of determining if there is a statistically significant

• A simple mean and standard deviation calculation will not show a

◦ Is one in which two different populations are measured to determine if

• Write a brief report on the statistical analysis.

You might also like