Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
51 views243 pages

Reasoning

Document for Licensure exam in science.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views243 pages

Reasoning

Document for Licensure exam in science.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 243

STATISTICAL

REASONING:
From Correlation to
Causation to Law-like
Phenomenon of Nature
Ethelbert P. Dapiton, Ph D

Lecture for the Ed. D. class of DHVTSU, 1st Sem, Jan 2023
‘Science’ and Man’s Quest for Answers to the
Phenomena of the Natural World
•Humans have always sought
to explain the events of the
world
•Science explains the world
we see but it is based on
what we can observe with
our five senses
A Review of “How Science Works”
The goal of science is
to investigate and
understand nature, to
explain events, and to
use those
explanations to make
useful predictions
A Review of “How Science Works”
• Science only deals with the natural world
• Collects and organizes information in a specific
manner
• Proposes explanations that can be tested
using evidence collected in a scientifically
approved manner
The Scientific Method
Science was therefore
developed as
measurement and
quantification, as the
perception of the
numbers ruling the
universe.
“Number rules the universe.”
— Pythagoras
Thinking Like a Scientist

•Scientific thinking usually begins with


observations.
•The information gathered as data.
Scientific Ideas
• Based on evidence you gather • Theory
from your observations A set of ideas that tie together your
observations
• You can create models, theories • Law
and laws and predictions based Based on ideas that have been tested by
on your evidence observations and experiments

• It is important to know that • Model


scientific ideas are never 100% Representation of an idea; what it may look
proven like or how it may work

• They can only be supported or • Prediction


What you think may happen based on your
unsupported by observations observations
There is no single way to “do”
science. The methods you use to
answer a question depends on
what the question is.
Approaches to Learn
About Nature
•Discovery
Science
•Hypothesis
Driven Science
Hypothesis Driven Science
The current method of scientific practice was based on
the so-called hypothetico-deductive system, the
essence of which is the formulation of a hypothesis
derived from a collection of facts, testing the
hypothesis by trying to ‘falsify’ it, collecting more facts
if ‘falsification’ fails, and repeating the falsification tests
until either you and the hypothesis agree on a draw or
one of you admits defeat.
Approaches to Knowledge
Descriptive Approach
Conveying of knowledge through the verbal and pictorial description
of events or circumstances
Rationalistic Approach
Logical organizing and analysis of existing information
Scientific Approach
Discovery – generation of new knowledge
Restrictions and Assumptions of Science
Science only deals with the Empirical
Information which arrives in the brain via one of the sensory channels

Science Assumes Orderliness


Scientists believe there is an underlying pattern or order to all events
(behavioral events, ordinal events in nature)
Makes no sense to seek the “Laws of Behavior” unless we assume that behavior is
lawful

Science Assumes Determinism


All events are caused by earlier events. Earlier events determine which events will
follow.

Prediction of future events are only possible if their causes are in the past. Our ability
to predict the future demonstrates our understanding of what causes an event to occur
in the first place

Example: The psychologist assumes that all behavior is determined by prior events.
Hence, all behavior is predictable.
The Doctrine of Determinism

Y = f (X1, X2, X3, …Xn)

Notice in this equation the value of Y is determined by the


values we select for the X variables
“Y” therefore may correctly be called a “dependent variable”
We have freedom to select values for the “X’s” on the right
side of the equation
A synonym for freedom is the word “independent”, hence we
refer to the variables on the right side of the equation as
“independent variables”
Classification of Variables
Independent Variable is the variable the investigator
wishes to manipulate in order to determine its effects
upon behavior.
• Manipulate means to cause to change in value
• Called an Independent Variable because the
investigator is free (or at liberty) to change its
value
Dependent Variable is the measurable aspect of
behavior the investigator is interested in determining
the effect of the manipulation upon.
Called a Dependent Variable because we
believe its value will depend upon which
values we select for our Independent
Variable
Example of Determinism

Behavior is Deterministic
Y = f (X1, X2, X3, …Xn)
Behavior = f (event1, event2, … eventn)
The Dependent Variable in psychology is always some
measurable aspect of behavior (Response)
The events that determine behavior (IVs) come from
three major categories: Environmental, Organismic, and
Behavioral.
Environmental Variables
Environmental variables refer to how the surroundings or
situation impact upon behavioral determination
These are also known as “Stimulus” variables because a
stimulus is defined as a change in the environment
Most relationships studied in psychology are R=f(S)
relationships. In other words, psychology looks to see how
the environment affects what we do or how well we perform
on some task.
Example:
Reading Speed = f (level of illumination)
Organismic Variables

Organismic variables refer to how inherent properties of


the organism under study affects behavior
These are also known as “Subject” variables because they
refer to characteristics of the subject
A second type of relationship studied in psychology is
called the R=f(O) relationship.
Example:
Reading Speed = f (Age of the Subject)
Behavioral Variables

Behavioral variables refer to the fact that one behavior can


affect other behaviors that are simultaneously occurring
These are also known as “Response” variables
A third type of relationship studied in psychology is called
the R=f(R) relationship.
Example:
Reading Speed = f (Simultaneous Music Listening)
Operationalism
• One of the principal differences between science and
common sense is that science is grounded in
operationalism.
• Casual use of words can often lead to ambiguity or
misunderstanding.
• To avoid this, scientists operationally define each term
and concept they employ.
• When we operationally define a variable, this means we
describe the techniques of quantifying or measuring it.
• Operational definitions provide precise, unambiguous
meaning for concepts and variables.
From Theoretical to Empirical
Theoretical Relationship Performance = f (Practice)

Operational
Definition

Empirical Relationship # Maze Errors = f (# Trials)


Scientific Theory
Scientific Theory is
an explanation of how
or why something happens based on scientific
knowledge resulting from repeated observations
and experiments
• A theory is based on thousands of experiments carried
out by many different scientists
• It can be proven to be incorrect as new knowledge is
gained from further experiments
Scientific Theory
The Layperson misunderstands the
relative importance of theories and facts
A Scientific Theory is not a guess arrived
at in the absence of facts
A Theory is a very high-order explanation
that integrates all known pertinent facts
Theories serve as a basis for the prediction
of “Future Facts”
Theory is the ultimate product of Science
Scientific Law
A Scientific Law is a statement about
how something works that seems to be
true all of the time
• Tells what will happen, but does not necessarily
explain why
• Is less likely to change
than a Scientific Theory
Universal Laws
• Science is governed by truths that are
valid everywhere in the universe.

• These truths are called Universal Laws


Relationship of Scientific Theory &
Scientific Law
All scientific evidence is based
on observation
•Quantitative: uses numbers to describe
the evidence
•Qualitative: more descriptive, cannot be
easily measured
Observations are followed
by inferences

• A logical interpretation based on prior knowledge


and experience
• use of prior experience to infer (a priori)
• Inferences can also be based on hypotheses
• can be tested and proven wrong (a posteriori)
Scientific Method
• A series of steps used to work on problems or answer questions
• By following the steps you can eliminate errors that could give
false results
• When the same steps are used by someone else they should be
able to come up with the same results
• Can be used in every aspect of your life and most of you do it
with out realizing what you are doing
Scientific Method
1. Identify a problem or question you want to answer
2. Gather background information on the subject
• It is possible someone has already done the research for you
3. Form a hypothesis
• A hypothesis is an educated guess based on prior knowledge and the research
done in step 2
4. Test your hypothesis by designing an experiment
• Test only one variable (or thing) per experiment
• The thing you are controlling or adjusting is the independent variable
• The thing that is responding to the independent variable is the dependent variable
• If possible set up a second experiment where everything remains the same –
this is called a control
Scientific Method
5. Record you data and graph your results
• Tables and graphs allow you to see patterns and draw conclusions
• Sort data: graphs, concept maps, pictures, models
• Make sense of information
• When you graph:
• The dependent variable is graphed on the y-axis
• The independent variable is graphed on the x-axis
6. Draw a conclusion based on the results of your experiment
• Did the results of the experiment agree or disagree with your hypothesis
• What did you learn by doing your experiment
• Did the experiment raise any new questions
Publishing and Repeating
• All scientific experiments are repeated to ensure that the
results were not a random thing
• Often experiments are repeated by other scientists
• A theory is a group of hypotheses that have been repeatedly
tested and so far have not been proven wrong
• Allow scientists to make predictions about new situations
Doing it over and over and over…
Multiple Trials Replication
• Repeating an • Able to be done by
experiment proves others
its reliability and • Researchers must
validity. be able to do the
exact same
• Reliability procedure and get
• Answers are the exact same
consistent. results.
• Validity • Shows
• Does your confirmations of
experiment show
what it should? ideas and theories
An important thing to remember…
Within the framework of ‘science’, you
do not ask questions that you cannot
solve.
THE MEANING OF
RELATIONSHIPS
THREE possible forms a relationship
can take
SAMPLE SLIDES
1.None of the variables influences the other
(Symmetrical Relationship)
2.Both variables influence each other (Reciprocal
Relationship)
3.Only one of the variables influences the other
(Asymmetrical Relationship)
Symmetrical Relationships
- This means that neither variable “causes” the other, neither
variable can be considered “prior” in time to the other.
SAMPLE SLIDES
Five types of symmetrical relationships
1. Alternative indicators of the same concept (e.g. signs of
anxiety, palm perspiration and heart pounding).→ Factor
analysis.
2. Effects of a common cause (e.g. storks and babies, ice cream
sales and cases of drowning) → often referred to as
‘spurious’ relationships.
3. Functional interdependence as elements of a unit (e.g. the presence
of lungs correlates with the presence of the heart, it is not that one
“causes” the other but both are indispensible in the functioning of
the SAMPLE
unit, theSLIDES
organism) → structural analysis
4. Parts of a “complex” (e.g. rich people are often members of country
clubs, drive particular brands of cars, stay at particular types of hotels
when on trips, attend the opera, etc- there is no functional
interdependence between these different parts but the lifestyle
“complex” ensures that these are often found together). → descriptive
analysis
5. Accidental or Fortuitous: (e.g. the association between the emergence
of the space age and rock and roll music). → coincidence
Reciprocal Relationships (both influence each other)

Alternating Asymmetry: one acts on the other, the other acts on the
first; mutually reinforcing the relationship
SAMPLE SLIDES
- changes in independent variable (IV) are responsible for changes in
dependent variable (DV)
- The identification of the IV and DV is often obvious, but sometimes the
choice is not clear. The independence and dependence may be
evaluated on the basis of:
- The degree to which each variable may be altered. The relatively
unalterable variable is the independent variable (IV) for instance,
age, social status, present manufacturing technology, etc.
- The time order between the variables. The independent variable
(IV) precedes the dependent variable (DV).
TYPES OF ASYMMETRICAL RELATIONSHIPS

1. Stimulus-Response: e.g. war and civilian morale, those exposed to the


stimulus andSLIDES
SAMPLE those not exposed must be similar on all other factors. This
makes the ‘selection’ of the sample of great importance in inferring
stimulus-response relationship.

2. Disposition-Response: liberal- liberal action, the tendency, given


circumstances, to respond in specific ways (e.g. attitudinal research,
attitude is taken as independent variable and action as dependent
variable, e.g. prejudice and discrimination). Beware of redundancy,
where a larger concept embraces a smaller one
TYPES OF ASYMMETRICAL RELATIONSHIPS

3. Property-Disposition: e.g. sugar’s properties, shape, size, weight etc


and solubility. Central type of relationship in social research, the
property ofSLIDES
SAMPLE an individual and a disposition to act: race and
alienation, region of country and voting, class and voting etc. Since
properties are resistant to change they are often taken as
independent variables.

4. Necessary precondition for a given effect: technological advancement


and nuclear weaponry, technological advancement does not cause
nuclear weapons but “makes it possible.” Free labor necessary
precondition for development for capitalism.
TYPES OF ASYMMETRICAL RELATIONSHIPS

5. Immanent relationship: property inherent in nature of setup to produce a


relationship between two variables, e.g. Bureaucracy and red tape, the
dependent
SAMPLEvariable
SLIDES arises out of the independent variable, bureaucracy
leads to adherence to rules, which leads to red tape.

6. Means-ends (purposive relationships), e.g. standardization of procedure


and lower costs, nest building and survival of the young.
Do the ends determine the means or do the means determine the ends,
in whose mind does the purpose reside, if it resides in the mind of the
actor then the end determined the means, if the end resides in the mind
of the investigator then the means (cause) determines the end (effect)
CAUSATION
Why things happen?
Either cause and effect are the very glue of
the cosmos, or they are a naive illusion due to
insufficient math.
But which?
Scientists are interested in:
i) causation
ii) understanding
iii) prediction
iv) control
SCIENCE IMPLIES EXPLANATION THROUGH
‘INVARIABLE LAWS’
In search of the dominant direction of influence of variables
• The variable that is not subject to change has causal priority- it comes
before the other variable in the relationship.
• Fixed variables, also known as “status variables”: sex, race, birth
order, national origin
• Relatively but not absolutely fixed variables: social class, religion,
rural/urban residence
Correlation
• A correlation is a (statistical) measurement of the association of
two variables.
• Positive Correlation: As one variable increases, the other
increases. (Examples: cigarette smoking and lung cancer;
education and income; unemployment and homelessness)
• Negative Correlation: As one variable increases, the other
decreases. (Examples: caffeine intake and sleep; age and
working memory capacity; stress and life expectancy)
Correlation vs Causation
• Correlation tells us two variables are related
• Types of relationship reflected in correlation:
• X causes Y or Y causes X (causal relationship)
• X and Y are caused by a third variable Z
(spurious relationship)

51
Correlation vs Causation Example
• ‘‘The correlation between workers’
education levels and wages is strongly
positive”
• Does this mean education “causes”
higher wages?
• We don’t know for sure !
• Correlation tells us two variables are
related BUT does not tell us why

52
Correlation vs Causation
❑Possibility 1
• Education improves skills and skilled workers
get better paying jobs
• Education causes wages to 
❑Possibility 2
• Individuals are born with quality A which is
relevant for success in education and on the
job
• Quality (NOT education) causes wages to 

53
Without proper
interpretation, causation
should not be assumed,
or even implied.
Third or Missing Variable Problem

 A relationship other than causal might exist


between the two variables.
 It is possible that there is some other
variable or factor that is causing the
outcome.
 A strong relationship between two variables
does not always mean that changes in one
variable causes changes in the other.
 The relationship between two variables is
often influenced by other variables which are
lurking in the background.
 There are two relationships which can be
mistaken for causation:

1. Common response
2. Confounding
• Common response refers to the possibility
that a change in a lurking variable is causing
changes in both our explanatory variable and
our response variable

• Confounding refers to the possibility that


either the change in our explanatory variable is
causing changes in the response variable OR
that a change in a lurking variable is causing
changes in the response variable.
1. Common Response:
 Both X and Y respond to changes
in some unobserved variable, Z.
2. Confounding
• The effect of X on Y is indistinguishable
from the effects of other explanatory
variables on Y.
• When studying medications, the “placebo
effect” is an example of confounding.
“Correlation is not causation”-Ronald A. Fisher (1958)
When can we imply causation?
When controlled experiments are
performed.
Unless data have been gathered by experimental means
and confounding variables have been eliminated,
correlation never implies causation.
The goal of scientific research is to find
causal relationships
• Causal relationships have three characteristics:
• Covariation: the alleged cause varies with the supposed
effect
• Time order: the cause precedes the effect in time
• Elimination of alternative explanations to isolate
causation to one factor
• Need to avoid spurious relationships
Causal Research Designs
• Causal research: Studies that enable researchers
to assess “cause-effect” relationships between
two or more variables
• Independent variables: Variables whose values are
directly manipulated by the researcher
• Dependent variables: Measures of effects or outcomes
that occur as a result of changes in levels of the
independent or causing variable(s)
Causal Research Designs
• Research requires researchers to
collect data using experimental
designs
• Experiment: An empirical investigation
that tests for hypothesized
relationships between dependent
variables and manipulated
independent variables
The Nature of Experimentation
• Experiments can explain cause-and-effect
relationships between
variables/constructs and determine why
events occur
• Variable: A concept or construct that can
vary or have more than one value
The Nature of Experimentation
• Control variables: Do not vary freely or
systematically with independent variables
• Should not change as the independent variable is
manipulated
• Extraneous variables: Any variables that
experimental researchers do not measure or
control that may affect the dependent variable
Research Design and Validity is a Very
Important Consideration to Establish Causality
Validity: The extent to which the
conclusions drawn from an
experiment are true
Research Design and Validity is a Very
Important Consideration to Establish Causality
• When choosing a research design, it is also important to consider
• Internal validity:
• Refers to a causal relationship that was not created by a spurious relationship (a
relationship in which a second independent variable influenced the dependent
variable)
• Extent to which the research design accurately identifies causal relationships
• Effects: history, maturation, testing, selection biases, experimental mortality,
instrument decay, demand characteristics
• External validity:
• Refers to the extent to which the results of an experiment can be generalized
across populations, time, and settings
• Extent to which a causal relationship found in a study can be expected to be true
for the entire target population
Research Design and Validity is a Very
Important Consideration to Establish Causality
• Experimental research designs are especially good for
isolating causal factors.
• Experimentation allows a researcher to make causal
inferences with great confidence in the design through
control over exposure to an experimental treatment.
• But, although experiments have great internal validity, they
suffer from weaker external validity.
The Role of Experimental Research Design in
Establishing Causality
 The classical randomized experiment has five basic characteristics:
1. At least one experimental group that will have exposure to the
treatment and one control group that will not
2. Randomly assigned individuals to each group, avoiding self-
selection
3. Controlled administration of the treatment, including the
circumstances under which the experimental group is exposed
4. Measurement of a dependent variable before and after the
treatment with a pre-test and a post-test; any difference
between the tests can be attributed to the experimental effect
of exposure to the treatment
5. Controlled environment of the experiment (time, location, and
other physical aspects)
The Role of Experimental Research Design in
Establishing Causality
• Post-test design:
• Shares the characteristics of the classical randomized
experiment, except that no pre-test is used because the
sample is truly random and sufficiently large that one can
assume that the control and experimental group(s) are
equivalent
The Role of Experimental Research Design in
Establishing Causality
•Repeated-measurement design:
• Adds to the classic example additional pre-
tests, post-tests, or both in an effort to
measure longer-term effects of experimental
treatments
The Role of Experimental Research Design in
Establishing Causality
•Multigroup design:
• A modification of the classic example in which
more than one experimental group is created
to compare the effects of different treatments
The Role of Experimental Research Design in
Establishing Causality
• Field experiment:
• An experiment in a natural setting in which the
investigator does not have control over group
membership but does have control over one or
more independent variables
• Causal inferences made using this design are not as
strong—but may be more practical for some
situations
Comparing Laboratory and Field Experiments
• Laboratory (lab) experiments: Causal
research designs that are conducted in an
artificial setting
• Field experiments: Causal research designs
that manipulate the independent variables in
order to measure the dependent variable in
a natural setting
• Performed in natural or “real” settings
Field experiments performed in natural or “real” settings
Are Associations Always
Connected to the Disease?
NO, BUT …………………………..
• Cigarette smoking and lung cancer
• Age and prostate cancer
• Car accidents and alcohol
• Agriculture and antibiotic resistance
“Correlation is not causation”-Ronald A. Fisher (1958)1
The Ironic Tale of One of the Founding Fathers of Modern Statistics

• Fisher was a paid tobacco industry consultant


and a devoted pipe smoker.
• He did not think the statistical evidence for a
link was convincing.
• He accepted that smoking seemed to be
correlated with lung cancer, but declared that
‘correlation is not causation.’
• He said a good case had been made for further
research, but not for suggesting to people that
they should stop smoking.
1Fisher, R. A., Letter of R.A. Fisher to Nature, 108 (1958).
“Correlation is not causation”-Ronald A. Fisher (1958)
The Ironic Tale of One of the Founding Fathers of Modern Statistics
Reviewing Fisher’s arguments today is interesting.
He made many valid scientific points against the
research linking lung cancer to smoking.

In 1954 Richard Doll and Bradford Hill published


evidence in the British Medical Journal showing a
strong link between smoking and lung cancer. They
published further evidence in 1956.

History and further research, proved Fisher’s


assertion was wrong.
“Correlation is not causation”-Ronald A. Fisher (1958)
The Ironic Tale of One of the Founding Fathers of Modern Statistics
Karl Pearson used large samples which
he measured and tried to deduce
correlations in the data. An idea he
inherited from Francis Galton-the father
of regression.

Ronald Fisher on the other hand, followed


Gosset in trying to use small samples and
rather than deduce correlations, to find
causes.
From regularity to correlation to causation to law
Physics became flooded with empirical laws that were
extremely useful.
Snell law, Hooke's law, Ohm's law, and Joule's law are examples of
purely empirical generalizations that were discovered and used
much before they were explained by more fundamental principles.
Galileo demonstrating The Law of Gravity
(The Inclined Plane Experiment)

"Galileo dimostra l'esperienza della caduta dei gravi a Don Giovanni de'
Medici", Giuseppe Bezzuoli (1839)
Nonexperimental Design
• Nonexperimental designs are characterized by at least one of the
following:
– Presence of a single group
– Lack of control over the assignment of subjects to groups
– Lack of control over the application of the independent variable
– Inability to measure the dependent variable before and after exposure to the
independent variable occurs
Nonexperimental Design
• Small-N designs:
• Also called case studies or comparative cases
studies
• Involve rich, deep understanding of a small number
of cases
• May be used for exploratory, descriptive, or
explanatory purposes
Nonexperimental Design

•Focus groups:
•Can be used to create hypotheses for
testing through other research designs
•Generally not used to establish causal
relationships
Nonexperimental Design
• Cross-sectional designs (survey, aggregate
analysis):
• Characterized by measurements of the independent
and dependent variables at approximately the same
time
• Data analysis, rather than a treatment, is necessary
for making causal inferences
Nonexperimental Design

•Longitudinal designs:
• Allow for the measurement of variables at
different points in time
• Can model change across time; examine the
time order of a causal relationship; and
estimate age, cohort, and period effects
Nonexperimental Design

•Trend analysis:
•Analysis of variables measured across
periods of generally 20 years or more
with a focus on explaining change over
time
Nonexperimental Design
• Panel studies:
• Cross-sectional designs that include a time element
• Rely on measurement of the same units of analysis
at different points in time—creating waves of data
for analysis over time
• Panel mortality
Nonexperimental Design

•Intervention analysis:
•Measurements of a dependent variable
before and after the introduction of an
independent variable that is observed but
not controlled by the researcher
Nonexperimental Design
• Nonexperimental designs are generally
characterized as having less internal reliability
but better external validity than experimental
designs.
• There is always a tradeoff when moving from
one design to another.
RANDOM ERROR, BIAS,
MISCLASSIFICATION AND
CONFOUNDING
BIAS
Systematic, non-random deviation of results and
inferences from the truth, or processes leading to
such deviation.
Any trend in the collection, analysis, interpretation,
publication or review of data that can lead to
conclusions which are systematically different from
the truth.
Bias can be either conscious or unconscious
CONFOUNDING
A problem resulting from the fact that one feature
of study subjects has not been separated from a
second feature, and has thus been confounded
with it, producing a spurious result.

The spuriousness arises from the effect of the first


feature being mistakenly attributed to the second
feature.
THE DIFFERENCE BETWEEN
BIAS AND CONFOUNDING

• Bias creates an association that


is not true.
• Confounding describes an
association that is true, but
potentially misleading.
EXAMPLE OF RANDOM ERROR, BIAS,
MISCLASSIFICATION AND CONFOUNDING
IN THE SAME STUDY:
STUDY: In a study, babies of women who bottle
feed and women who breast feed are compared,
and it is found that the incidence of gastroenteritis,
as recorded in medical records, is lower in the
babies who are breast-fed.
EXAMPLE OF RANDOM ERROR
By chance, there are more episodes of
gastroenteritis in the bottle-fed group in the study
sample. (When in truth breast feeding is not
protective against gastroenteritis).

Or, also by chance, no difference in risk was found.


(When in truth breast feeding is protective against
gastroenteritis).
EXAMPLE OF BIAS
The medical records of bottle-fed babies only are less
complete (perhaps bottle fed babies go to the doctor
less) than those of breast fed babies, and thus record
fewer episodes of gastro-enteritis in them only.

This is called bias because the observation itself is in


error.

In this case the error was not conscious.


EXAMPLE OF CONFOUNDING
The mothers of breast-fed babies are of higher social
class, and the babies thus have better hygiene, less
crowding and perhaps other factors that protect against
gastroenteritis.

Less crowding and better hygiene are truly protective


against gastroenteritis, but we mistakenly attribute their
effects to breast feeding.

This is called confounding, because the observation is


correct (breast-fed babies have less gastroenteritis), but
its explanation is wrong.
EXAMPLE OF
MISCLASSIFICATION
Lack of good information on feeding history results
in some breast-feeding mothers being randomly
classified as bottle-feeding, and vice-versa.
If this happens, the study underestimates either of
the two groups.
Data Analysis Using
Inferential Statistics
Inferential Statistics
• Enables you to estimate population mean based on
sample results
• Enables you to estimate sampling error
• Enables you to test for statistical significance
Confidence Intervals
• This is the range where the true mean exists
• Estimate of the population mean based on the
sample survey
• Social science standard for the confidence interval is
plus or minus 5%. We want to be 95% certain that our
population estimate is correct within a narrow range.
• Sampling error is the analogous term when working
with proportions, like with survey data
• Sometimes called the margin of error
Confidence Intervals
• The specific estimate would be the “Point estimate”
• Example: Based on an excellent random sample data,
we estimated that the average salary of the Senior
High School teachers in private HEIs in Pampanga is
P15,715.
Confidence Intervals
• We would also construct confidence intervals that would
calculate where the true mean of the population resides.
• We would then provide the lower and upper estimates
(confidence intervals), saying I am 95% certain that the
true average salary of the Senior High School teachers in
private HEIs in Pampanga is between P14,357 and P17,072.
• Sampling error provides a likely range for the true
proportion in the population
Sampling Error
• Typically results are within +/- 5 %
• That means that if we had surveyed everyone,
the results would be within +/-5% of the results
from the survey.
Sampling Error
• Example: Most recent senatorial survey shows the following
result for the top 1 slot:
• Imee has 43%
• Bato has 46%
The report a +/-5% sampling error.
So if every voter will be surveyed, we might find that 48% of the
voting population favors Imee and 41% favors Bato
Conclusion: race is too close to call.
Statistical Significance
Statisticians have provided us with the tools to
estimate how likely are sample results are based on
chance or how likely are results are in error.
These are called
tests of statistical significance.
Common Tests for Statistical Significance

• Chi-Square: nominal and ordinal data


• t-tests: dependent variable is ratio/interval data
• ANOVA: dependent variable is ratio/interval, and
independent variable is nominal or ordinal with 3 or
more categories
Chi-Square
Based on what you would expect if the there was
no difference in the frequency distribution.
Use with nominal or ordinal data.
t-Tests
• Single Mean= one-sample t:
• Interval/ration data where you are comparing to a known
population mean
• Paired Means=paired sample t:
• before and after design
• Independent Means= independent sample t:
• comparing 2 means
• For t-tests: you must have interval or ratio data.
Statistical Significance &
Confidence Level
Allows the researcher to estimate how
likely it is that you have gotten the
results you see in your analysis of
sample data as a result of chance.
Statistical Significance
• Typically, we use a standard for determining how
likely we would be to get these results by chance
alone.

• The convention is to set an alpha level or p value of


0.05 or less (i.e., 0.01).
Statistical Significance
• This means there is only a 5% chance or less that
you would have obtained these results if there really
was no difference in the larger population.

• Another way to say it: your results are statistically


significant at the .05 level.
Statistical Significance
• With only a 5% chance of being incorrect,
I am willing to take a risk that my sample
results fairly accurately captures the
true population
Remember:
A significance test is nothing more than a determination of the
probability of getting the results you got by chance.

While the formulas differ, they all get interpreted the same
way.

The social science standard is a p-value or an alpha value of 0.05


or less
Statistical Significance = Meaningful Significance?
Statistical Significance Does Not Imply:
• Your results are meaningful or important.
• The relationship is strong or weak.
• That design errors have been eliminated.
• Your study has no value if your results are not statistically
significant.
• A p-value of .001 is not more important than a p-value of .01
or .05
Statistical Significance = Meaningful Significance?

“Unfortunately, researchers often place undue


emphasis on significance tests….Perhaps it is
because they have spent so much time in courses
learning to use significance tests, that many
researchers give the tests an undue emphasis in
their research.”
{W. Phillips Shively (2009). The Craft of Political Research. 7th ed. Pearson Education, Inc.: Upper
Saddle River, New Jersey. p. 172.}
Key Points to Remember for Significance Testing
• P values, or significance levels, measure the strength of the evidence
against the null hypothesis; the smaller the P value, the stronger the
evidence against the null hypothesis
• An arbitrary division of results, into “significant” or “non-significant”
according to the P value, was not the intention of the founders of
statistical inference
• A P value of 0.05 need not provide strong evidence against the null
hypothesis, but it is reasonable to say that P<0.001 does. In the
results sections of papers the precise P value should be presented,
without reference to arbitrary thresholds
Key Points to Remember for Significance Testing

• Results of researches should not be reported as “significant”


or “non-significant” but should be interpreted in the context
of the type of study and other available evidence. Bias or
confounding should always be considered for findings with
low P values
• To stop the discrediting of researches by chance findings, we
need more powerful studies (i.e., longitudinal experimental
and longitudinal field studies)
A statistically significant result isn’t attributed
to chance and depends on two key variables:
sample size
and
effect size
Sample size
• refers to how large the sample for your
experiment is. The larger your sample size, the
more confident you can be in the result of the
experiment (assuming that it is a randomized
sample).
Effect size
• refers to the size of the difference in results between the two sample
sets and indicates practical significance.
• If there is a small effect size (say 0.1%) you will need a very large
sample size to determine whether that difference is significant or just
due to chance.
• However, if you observe a very large effect on your numbers, you will
be able to validate it with a smaller sample size to a higher degree of
confidence.
TRIVIA!!!
• An important precursor of statistical significance
testing was the discovery of the normal curve by
Abraham De Moivre in 1733 as a by-product of his
method of approximating the sum of a large number
of binomial terms.
• Laplace and Carl Friedrich Gauss further developed
applications of the normal distribution in the 1820s
TRIVIA!!!
• The first formal significance test was correctly demonstrated
by a Scottish physician and mathematician (John Arbuthnott,
1710) that the excess of male births is statistically significant,
but erroneously concluded that this was due to Divine
Providence (intelligent design, rather than chance).

• In 1815, Friedrich Wilhelm Bessel first used the term


"probable error measurement" for his statistical significance
test. Probable error refers to "the deviation from a central
measure between whose positive and negative values one
half the cases may be expected to fall by chance“.
TRIVIA!!!
• In 1827, French mathematician Pierre-Simon Laplace, used a p-value-
like statistic and a more formal hypothesis framework to analyze
seasonal barometric pressure measurements.
• Laplace wrote that a very small value of what would today be the p-
value “would indicate with a great likelihood that the value of x (the
discrepancy between seasons) is not due solely to the anomalies of
chance.”
• Finding that very small probability (0.0000015815), Laplace
concluded that “the observed discrepancy thus indicates, with an
extreme likelihood, a constant cause.”
• It appears that Laplace implicitly used a 0.01 alpha level in his
hypothesis testing.
Politics, Intrigues and
Dissenting Paradigms
in 19 & 20 century
th th

Statistical Thinking
Politics, Intrigues and Dissenting Paradigms
in 19th & 20th century Statistical Thinking

• In 1892*, Francis Edgeworth, a lawyer and economist who


was self-educated in mathematics and statistics, developed a
test of significance in which he compared the difference of
the means with the "modulus" (12 times the standard
deviation).
• A difference of twice the modulus was considered
significant, and differences of 1.5 times the modulus were
noteworthy.
*"The Law of Error and Correlated Averages", 1892, Phil Mag.
Politics, Intrigues and Dissenting Paradigms
in 19th & 20th century Statistical Thinking
• In 1900 Karl Pearson published his chi-square "goodness of fit" test,
comparing data to a theoretically expected curve to deduce statistical
significance testing
• His research and theorizing led to the development of the chi-square
goodness of fit test and the birth of modern statistical significance
testing. This test was the first to allow for determination of the
probability of occurrence of discrepancies between observed and
expected frequencies
• Pearson was clearly influenced by Francis Galton's ideas, but he
apparently came to fully appreciate those ideas only after his
association with Edgeworth in the early 1890s.
Politics, Intrigues and Dissenting Paradigms
in 19th & 20th century Statistical Thinking

• Pearson gave a series of lectures in 1893 that emphasized


Edgeworth's significance testing methods, but, perhaps as a
result of their competitive relationship, Pearson decided to
measure differences not in terms of the modulus, but rather
in terms of a new measure of variation which he called the
standard deviation.
• It became evident that Pearson was rather highly motivated
by his desire to outshine Edgeworth.
Politics, Intrigues and Dissenting Paradigms
in 19th & 20th century Statistical Thinking

• The p value was meant to be a flexible inferential measure, whereas


the hypothesis test was a rule for behavior, not inference.
• P value, as conceived by Ronald Fisher (1922, 1925) was not
compatible with the Neyman-Pearson hypothesis test [Jerzy Neyman
and Egon Pearson (1933)] in which it has become embedded on the
nature of the scientific method we are using today.
• Their combination has obscured the important differences between
Neyman and Fisher on the nature of the scientific method and
repressed our understanding of the philosophic implications of the
basic methods in use today.
Politics, Intrigues and Dissenting Paradigms
in 19th & 20th century Statistical Thinking

• What has become institutionalized as inferential statistics in


psychology is not Fisherian statistics.
• Modern hypothesis testing is an anonymous hybrid of the tests
proposed by Ronald Fisher (1922, 1925) on the one hand, and Jerzy
Neyman and Egon Pearson (1933) on the other.
• It is an incoherent mishmash of some of Fisher's ideas combined with
some of the ideas of Neyman and E. S. Pearson on the other
(Gigerenzer et al., 1989).
• This blend was called "hybrid logic" of statistical inference.
• Fisher, Neyman, and Pearson would all have rejected it, although for
different reasons.
Politics, Intrigues and Dissenting Paradigms
in 19th & 20th century Statistical Thinking
• The institutionalized hybrid carries the message that statistics is a
single integrated structure that speaks with a single authoritative
voice.
• This entails the claim that the problem of inductive inference in
fact has an algorithmic answer (i.e., the hybrid logic) that works for all
contents and contexts.
• Statistical tools tend to turn into theories of mind.
• It became the dogma -"statistics is statistics is statistics"
• The result bears fruit in our modern way of doing research- Statistical
theories mixed with confusion of rational inductive inference
Politics, Intrigues and Dissenting Paradigms
in 19th & 20th century Statistical Thinking
• The field of psychology was the first to be converted to the silent
reconciliation dogma of the inference revolution (Gigerenzer &
Murray, 1987).
• It happened between approximately 1940 and 1955 in the United
States, and led to the institutionalization of one brand of inferential
statistics as the method of scientific inference in university curricula,
textbooks, and the editorials of major journals.
• Before 1940, null hypothesis testing using analysis of variance or t test
was practically nonexistent.
• Only 17 articles in all from 1934 through 1940 (Rucci & Tweney,
1980).
Politics, Intrigues and Dissenting Paradigms
in 19th & 20th century Statistical Thinking
• By the early 1950s, half of the psychology departments in
leading U.S. universities had made inferential statistics a
graduate program requirement (Rucci & Tweney, 1980).
• By 1955, more than 80% of the empirical articles in four
leading psychology journals used null hypothesis testing
(Sterling, 1959).
• Editors and experimenters began to measure the quality of
research by the level of significance obtained.
Politics, Intrigues and Dissenting Paradigms
in 19th & 20th century Statistical Thinking
• For instance, in 1962, the editor of the Journal of Experimental
Psychology, A. W. Melton (1962, pp. 553-554), stated his criteria for
accepting articles.
-If the null hypothesis was rejected at the 0.05 level but not at the
0.01 level, there was a "strong reluctance" to publish the results,
whereas findings significant at the 0.01 level deserved a place in the
journal.
• The Publication Manual of the American Psychological Association(1974)
prescribed how to report the results of significance tests (but did not
mention other statistical methods), and used, as Melton did, the
label negative results synonymously with "not having rejected the null" and
the label positive results with "having rejected the null."
Politics, Intrigues and Dissenting Paradigms
in 19th & 20th century Statistical Thinking
• Piaget's, Köhler's, Bartlett's, Pavlov's, and Skinner's experimental
work would have been rejected under such editorial policies - these
men did not set up null hypotheses and try to refute them.
• Some of them were actively hostile toward institutionalized statistics.
• For his part, Skinner (1972) disliked the intimate link that Fisher
established between statistics and the design of experiments:
• "What the statistician means by the design of experiments is
design which yields the kind of data to which his techniques are
applicable“… "They have taught statistics in lieu of scientific
method“.
Politics, Intrigues and Dissenting Paradigms
in 19th & 20th century Statistical Thinking

• In fact, the Skinnerians were forced to found a new journal,


the Journal of the Experimental Analysis of Behavior, in order
to publish their kind of experiments.
• Their focus was on experimental control, that is, on
minimizing error beforehand, rather than on large samples,
that is, on measuring error after the fact.
Politics, Intrigues and Dissenting Paradigms
in 19th & 20th century Statistical Thinking

• Before World War 2, the field of social sciences (i.e.


psychology, education) drew their inferences about the
validity of hypotheses by many means - ranging from
eyeballing to critical ratios.
• The issue of statistical inference was not of primary
importance.
• Techniques of statistical inference were known and
sometimes used, but experimental method was not yet
dominated by and almost equated with statistical inference.
Politics, Intrigues and Dissenting Paradigms
in 19th & 20th century Statistical Thinking

• Fisher's first book, Statistical Methods for Research


Workers, published in 1925, was successful in
introducing biologists and agronomists to the new
techniques.
• It had the agricultural smell of issues like the weight of
pigs and the effect of manure, and, such alien topics
aside, it was technically far too difficult to be
understood by most psychologists and social
scientists.
Politics, Intrigues and Dissenting Paradigms
in 19th & 20th century Statistical Thinking

• Fisher's second statistical book, The Design of


Experiments, first published in 1935, was most influential on
psychology and other social sciences.
• At the very beginning of his book, Fisher rejected the theory
of inverse probability (Bayesian theory) and congratulated
the Reverend Bayes for having been so critical of his own
theory as to withhold it from publication (Bayes' treatise was
published posthumously in 1763).
Politics, Intrigues and Dissenting Paradigms
in 19th & 20th century Statistical Thinking
• Bayes' theorem is attractive for researchers because it allows one to
calculate the probability P(H| D) of a hypothesis H given some data D, also
known as inverse probability.
• A frequentist theory, such as Fisher's null hypothesis testing or Neyman-
Pearson theory, however, does not.
• It deals with the probabilities P(D|H) of some data D given a hypothesis H,
such as the level of significance.
• Fisher was not satisfied with an approach to inductive inference based on
Bayes' theorem.
• The use of Bayes' theorem presupposes that a prior probability distribution
over the set of possible hypotheses is available.
• For a frequentist, such as Fisher, this prior distribution must theoretically
be verifiable by actual frequencies, that is, by sampling from its reference
set.
Politics, Intrigues and Dissenting Paradigms
in 19th & 20th century Statistical Thinking

• In The Design of Experiments, Fisher started with null hypothesis


testing, also known as significance testing.
• It eventually became the backbone of institutionalized statistics in
psychology and other social sciences today.
• In a test of significance, one confronts a null hypothesis with observations,
to find out whether the observations deviate far enough from the null
hypothesis to conclude that the null is implausible.
• The specific techniques of null hypothesis testing, such as the t test
(devised by Gossett, using the pseudonym "Student", in 1908) or the F test
(F for Fisher, e.g., in analysis of variance) are so widely used that they may
be the lowest common denominator of what psychologists and other social
scientists today do and know.
Politics, Intrigues and Dissenting Paradigms
in 19th & 20th century Statistical Thinking
• Just as with Bayes' theorem, the problems we encounter do not
concern the formula - the theorem is a simple consequence of the
definition of conditional probability.
• The problems arise with its application to inductive inference in
science.
• This is called the logic of inference.
• During Fisher's long and hostile controversy with Neyman and
Pearson, which lasted from the 1930s to his death in 1962, he
changed, and sometimes even reversed, parts of his logic of
inference.
Politics, Intrigues and Dissenting Paradigms
in 19th & 20th century Statistical Thinking
• In the Design of Experiment, Fisher suggested that we think of the
level of significance as a convention: "It is usual and convenient for
experimenters to take 5 per cent as a standard level of significance, in
the sense that they are prepared to ignore all results which fail to
reach this standard“
• Fisher's assertion that 5 % (in some cases, 1 %) is a convention that is
adopted by all experimenters and in all experiments, and
nonsignificant results are to be ignored, became part of the
institutionalized hybrid logic.
Politics, Intrigues and Dissenting Paradigms
in 19th & 20th century Statistical Thinking

• But Fisher had second thoughts, which he stated most


clearly in the mid-1950s.
• These did not become part of the hybrid logic.
• One of the reasons for that revision was his controversy with
Neyman and Neyman- Pearson's (1950) insistence that one
has to specify the level of significance (which is denoted as a
in Neyman-Pearson theory) before the experiment, in order
to be able to interpret it as a long-run frequency of error.
Politics, Intrigues and Dissenting Paradigms
in 19th & 20th century Statistical Thinking
• Neyman and Pearson took the frequentist position more seriously
than Fisher. They argued that the meaning of a level of significance
such as 5% is the following:
• If the null hypothesis is correct, and the experiment is
repeated many times, then the experimenter will wrongly
reject the null in 5% of the cases. To reject the null if it is
correct is called an error of the first kind (Type I error) in
Neyman-Pearson theory, and its probability is called alpha
(a).
Politics, Intrigues and Dissenting Paradigms
in 19th & 20th century Statistical Thinking
• In his last book, Statistical Methods and Scientific Inference (1956),
Fisher ridiculed this definition as "absurdly academic, for in fact no
scientific worker has a fixed level of significance at which from year to
year, and in all circumstances, he rejects hypotheses; he rather gives
his mind to each particular case in the light of his evidence and his
ideas".
• Fisher rejected the Neyman-Pearson logic of repeated experiments
(repeated random sampling from the same population), and thereby
rejected his earlier proposal to have a conventional standard level of
significance, such as .05 or .01.
Politics, Intrigues and Dissenting Paradigms
in 19th & 20th century Statistical Thinking

• What researchers should do, according to Fisher's


second thoughts, is to publish the exact level of
significance, say, p = 0.03 (not p < 0.05), and
communicate this result to their fellow research
workers.
• This means that the level of significance is
determined after the experiment, not, as Neyman and
Pearson proposed, before the experiment.
Politics, Intrigues and Dissenting Paradigms
in 19th & 20th century Statistical Thinking

• Thus the phrase "level of significance" has three meanings:


• (a) the standard level of significance, a conventional standard
for all researchers (early Fisher),
• (b) the exact level of significance, a communication to
research fellows, determined after the experiment (late
Fisher), and
• (c) the alpha level, the relative frequency of Type I errors in the
long run, to be decided on using cost-benefit
considerations before the experiment (Neyman & Pearson).
Politics, Intrigues and Dissenting Paradigms
in 19th & 20th century Statistical Thinking

• The basic difference is this:


• For Fisher, the exact level of significance is a
property of the data (i.e., a relation between a body
of data and a theory);
• For Neyman and Pearson, alpha is a property of the
test, not of the data. Level of significance and alpha
are not the same thing.
Politics, Intrigues and Dissenting Paradigms
in 19th & 20th century Statistical Thinking
• Neyman and Pearson thought their straightforward long-run
frequentist interpretation of the significance test - and the associated
concepts of power and of stating two statistical hypotheses (rather
than only one, the null) - would be an improvement on Fisher's
theory and make it more consistent. Fisher disagreed.
• Whereas Neyman and Pearson thought of mathematical and
conceptual consistency, Fisher thought of ideological differences.
Politics, Intrigues and Dissenting Paradigms
in 19th & 20th century Statistical Thinking

• Fisher accused Neyman & Pearson, and their followers of


confusing technology with knowledge:
• Their focus on Type I and Type II errors, on cost-benefit
considerations that determine the balance between the
two, and on repeated sampling from the same population
has little to do with scientific practice, but it is
characteristic for quality control and acceptance
procedures in manufacturing.
Politics, Intrigues and Dissenting Paradigms
in 19th & 20th century Statistical Thinking
• Fisher (1955) compared the Neyman-Pearsonians to the Soviets, their
5-year plans, and their ideal that "pure science can and should be
geared to technological performance."
• He also compared them to Americans, who confuse the process of
gaining knowledge with speeding up production or saving money.
• Incidentally, Neyman was Polish born in Russia, and eventually went
to Berkeley, California, after Fisher made it difficult for him to stay on
at University College in London
Politics, Intrigues and Dissenting Paradigms
in 19th & 20th century Statistical Thinking
• Fisher attached an epistemic interpretation to a
significant result, which referred to a particular
experiment.
• Neyman rejected this view as inconsistent and
attached a behavioral meaning to a significant result
that did not refer to a particular experiment, but to
repeated experiments. (Pearson found himself
somewhere in between.)
Politics, Intrigues and Dissenting Paradigms
in 19th & 20th century Statistical Thinking
• In the Design of Experiment, Fisher talked about how "to disprove" a
null hypothesis.
• Whatever the words he used, he always held that a significant result
affects our confidence or degree of belief that the null hypothesis is
false.
• This is referred to as an epistemic interpretation:
• Significance tells us about the truth or falsehood of a particular hypothesis in
a particular experiment.
• Here we see very clearly Fisher's quasi-Bayesian view that the exact
level of significance somehow measures the confidence we should
have that the null hypothesis is false.
Politics, Intrigues and Dissenting Paradigms
in 19th & 20th century Statistical Thinking
• But from a more consistent frequentist viewpoint, as
expressed by Neyman, a level of significance does not
tell us anything about the truth of a particular
hypothesis; it states the relative frequency of Type I
errors in the long run.
• Neyman (1957) called his frequentist
interpretation behavioristic:
• To accept or reject a hypothesis is a decision to take a
particular action.
Politics, Intrigues and Dissenting Paradigms
in 19th & 20th century Statistical Thinking
• Imagine a typical application of Neyman-Pearson theory: quality
control.
• Imagine you have chosen the probability of Type I errors (false
alarms) to be . 10 and that of Type II errors (misses) to be .01,
because misses are much more costly to your firm than false alarms.
• Every day you take a random sample from the firm's production.
• Even if the production is normal, you will expect a significant result
(false alarm) in 10% of all days.
• Therefore, if a significant result occurs, you will act as if the null
hypothesis were false, that is, stop the production and check for a
malfunction; but you will not necessarily believe that it is false -
because you expect a lot of false alarms in the long run.
Politics, Intrigues and Dissenting Paradigms
in 19th & 20th century Statistical Thinking
• Fisher rejected Neyman's arguments for "inductive behavior"
as "childish" (1955, p. 75), stemming from "mathematicians
without personal contact with the Natural Sciences" (p. 69).
• And he maintained his epistemic view: "From a test of
significance ... we have a genuine measure of the confidence
with which any particular opinion may be held, in view of our
particular data" (p. 74). For all his anti-Bayesian talk, Fisher
adopted a very similar-sounding line of argument
(Johnstone, 1987).
Politics, Intrigues and Dissenting Paradigms
in 19th & 20th century Statistical Thinking

•As we what we have learned, Fisher wanted


to both reject the Bayesian cake and eat it
too.
Politics, Intrigues and Dissenting Paradigms
in 19th & 20th century Statistical Thinking

• Fisher's writings and arguments had a remarkably elusive


quality, and people have read his work quite differently.
• Fisher’s logic of inference was misinterpreted or have been
deliberately fashioned into a package desirable to the
understanding of non-statisticians- thus giving birth to what
is now known as the hybrid logic of inference combined with
that of Neyman-Pearson’s ideas.
Politics, Intrigues and Dissenting Paradigms
in 19th & 20th century Statistical Thinking
• The structure of the hybrid logic that has been first taught in the
social sciences almost 80 years ago started by combining ideas
of Fisher, on the one hand, and Neyman and Pearson on the
other.
• Through the work of the statisticians George W. Snedecor (1934,
1937) at Iowa State College, Harold Hotelling (1931) at Columbia
University and Palmer Johnson (1949) at the University of
Minnesota, Fisher's ideas spread in the United States.
• Psychologists began to cleanse the Fisherian message of its
agricultural smell and its mathematical complexity, and to write a
new genre of textbooks featuring null hypothesis testing.
Politics, Intrigues and Dissenting Paradigms
in 19th & 20th century Statistical Thinking
• The denial of the existing conflicts and the pretense that there is only
one statistical solution to inductive inference were carried to an
extreme in psychology and several neighboring sciences.
• This one solution was the hybrid logic of scientific inference, the
offspring of the shotgun marriage between Fisher and Neyman and
Pearson.
• The hybrid logic became institutionalized in experimental psychology
(Gigerenzer, 1987), personality research (Schwartz & Dangleish,
1982), clinical psychology and psychiatry (Meehl, 1978), education
(Carver, 1978), quantitative sociology (Morrison & Henkel, 1970), and
archaeology (Cowgill, 1977; Thomas, 1978).
• Nothing like this happened in physics, chemistry, or molecular biology
( Gigerenzer et al., 1989).
Politics, Intrigues and Dissenting Paradigms
in 19th & 20th century Statistical Thinking

• The debate between Fisher and Neyman-Pearson’s ideas


were silently resolved through the hybridization process in
the ‘cookbooks’ written in the 1940s to 1960s, largely by
non-statisticians to teach students in the social sciences the
‘rules of statistics’.
• Fisher’s theory of significance testing, which was historically
first, was merged with concepts from the Neyman-Pearson
theory of hypothesis testing and eventually taught until
today as a subject of “Statistics” per se.
Politics, Intrigues and Dissenting Paradigms
in 19th & 20th century Statistical Thinking
• We call this compromise the ‘hybrid theory’ of statistical inference,
and it goes without saying that neither Fisher nor Neyman-Pearson
would have looked with favor on this offspring of their forced
marriage.
• To users of statistics, this seemed perfectly acceptable, since often
the same formulae were used and the same numerical results
obtained.
• The great differences in conceptual interpretation were overlooked in
the plug-in-and-crank through use of statistical rules.
Politics, Intrigues and Dissenting Paradigms
in 19th & 20th century Statistical Thinking
• The Fisherian and the Neyman-Pearson were presented
anonymously as statistical method, while unresolved
controversial issues and alternative approaches to scientific
inference are completely ignored.
• Key concepts from Neyman-Pearson theory such as power
are introduced along with Fisher’s significance testing,
without mentioning that both parties viewed these as
irreconcilable.
• The hybrid theory comes with a list of prescriptions that are
held to constitute what is “scientific” and “objective”.
Politics, Intrigues and Dissenting Paradigms
in 19th & 20th century Statistical Thinking
• The researcher must specify the level of significance before
conducting the experiment (following Neyman-Pearson
rather than Fisher); he must not draw conclusions from a
non-significant result (following Fisher’s writings, but not
Neyman-Pearson).
• Neyman’s behavioristic interpretation did not even become
part of the hybrid logic of inference; and the Type 1 and
Type 2 errors are given an epistemic interpretation.
• This has led to an enormous confusion about the meaning of
a significance level.
Politics, Intrigues and Dissenting Paradigms
in 19th & 20th century Statistical Thinking
• For instance, in practice, experimenters often will
note, when inspecting the data, at what most
stringent conventional level the data are significant
with respect to the null hypothesis.
• They then report that the null hypothesis is, for
example: “rejected at the 0.01 level”, an expression
that occurs neither in Fisher nor in the writings of
Neyman and Pearson.
Politics, Intrigues and Dissenting Paradigms
in 19th & 20th century Statistical Thinking
• The hybridization of the Fisher and Neyman–Pearson approaches
started within the discipline of psychology and spread across
throughout the entire field of social sciences.
• Textbooks at that time (1940–1960) were written primarily for
psychological researchers.
• However, many statistical textbooks of that era are addressed to both
psychological and educational researchers
• Early authors promoting the error that the level of significance
specified the probability of hypothesis include Lindquist (1940, 1953),
Guilford (1942), Edwards (1950, 1954), McNemar (1949, 1955),
Anastasi (1958) and Ferguson (1959)
Politics, Intrigues and Dissenting Paradigms
in 19th & 20th century Statistical Thinking

• One of the first textbooks introducing psychologists to null


hypothesis testing is J.P. Guilford’s Fundamental Statistics in
Psychology and Education, first published in 1942.
• It was one of the most widely read textbook in the 1940s and
1950s
• Guilford suggested that hypothesis testing would reveal the
probability that the null hypothesis is true. “If the result
comes out one way, the hypothesis is probably correct, if it
comes out another way, the hypothesis is probably wrong”.
Politics, Intrigues and Dissenting Paradigms
in 19th & 20th century Statistical Thinking
• Guilford’s logic wavered back and forth between correct and incorrect
statements, and ambiguous ones that can be read like Rorschach
inkblots.
• He used phrases such as “we obtained directly the probabilities that
the null hypothesis was plausible” and “the probability of extreme
deviations from chance” interchangeably for the level of significance.
• Guilford marked the beginning of a genre of statistical texts that
waver between the researchers’ desire for probabilities of hypotheses
and what significance testing can actually provide.
Politics, Intrigues and Dissenting Paradigms
in 19th & 20th century Statistical Thinking
• Nunally (1975), used all of the following statements to explain what a
significant result such as 5% actually means:
• “the probability that an observed difference is real”
• “the improbability of observed results being due to error”
• “the statistical confidence (…) with odds of 95 out of 100 that the
observed difference will hold up in investigations”
• “the danger of accepting a statistical result as real when it is
actually due only to error”
• the degree to which experimental results are taken “seriously”
Nunally, J. C. (1975). Introduction to statistics for psychology and education. New York: McGraw-Hill
Politics, Intrigues and Dissenting Paradigms
in 19th & 20th century Statistical Thinking
• Nunally (1975), used all of the following statements to explain what a
significant result such as 5% actually means:
• the degree of “faith [that] can be placed in the reality of
the finding”
• “the investigator can have 95% confidence that the
sample mean actually differs from the population mean”
• “if the probability is low, the null hypothesis is
improbable”
• “all of these are different ways to say the same thing”
Politics, Intrigues and Dissenting Paradigms
in 19th & 20th century Statistical Thinking
• For the poor students who will read those explanations, will likely
misattribute the author’s confusion to their own lack of statistical
intelligence.
• This state of bewilderment will last as long as the ritual continues to
exist.
• Today’s students still encounter obscure statements in the most-
widely read texts:
• “Inferential statistics indicate the probability that the particular
sample of scores obtained are actually related to whatever you are
attempting to measure or whether they could have occurred by
chance” (Gerrig & Zimbardo, 2002, p. 44).
Politics, Intrigues and Dissenting Paradigms
in 19th & 20th century Statistical Thinking

• In the Philippines, the hybrid logic of inference has dominated


almost all sciences due to inheritance from the 1940s-1960s
cookbooks replicated by local authors of statistics textbooks.
• Spared those in the pure mathematics and in several HEIs with a
course offering of statistics from bachelors level up to post-
graduate where ‘real statistical theories’ are being taught.
-These people are perhaps silent, although deep in their hearts they are
discontented with the way statistical thinking are being taught through
the ‘cook book’ approach.
Politics, Intrigues and Dissenting Paradigms
in 19th & 20th century Statistical Thinking
• In 2005, the Philippine Statistical Association reviewed
locally-authored elementary statistics textbooks that are
most commonly used in the tertiary level.1
• This review were undertaken by professional statisticians
with extensive teaching experiences and professional
practice in the field
1Isidoro P. David and Dalisay S. Maligalig (2006). Are We Teaching Statistics
Correctly to our Youth? The Philippine Statistician. Vol. 55, Nos. 3-4, pp. 1-28
Politics, Intrigues and Dissenting Paradigms
in 19th & 20th century Statistical Thinking
• The findings of the review are:
• There are missing important topics in the books.
• Some of statistical concepts were not presented correctly.
• There is no bridge between descriptive and inferential statistics.
Some fundamental concepts such as the law of large numbers,
central limit theorem and sampling distributions that are
necessary in building the concepts in inferential statistics are not
were not discussed well.
• Most of the books used the cookbook approach - full of recipes,
with emphasis on computational details.
• The books did not consider the prevalence of high-speed
computers - quite outdated.
• Examples are not very practical.
• Some books are full of typographical errors.
Politics, Intrigues and Dissenting Paradigms
in 19th & 20th century Statistical Thinking
• An analysis using another method promoted by Fisher, mathematical
likelihood, shows that the p value substantially overstates the
evidence against the null hypothesis.
• Likelihood makes clearer the distinction between error rates and
inferential evidence and is a quantitative tool for expressing evidential
strength that is more appropriate for the purposes of epidemiology
than the p value.
• Since Joseph Berkson (1938), other new generation of statisticians
have questioned the use of hypothesis testing in the sciences.
Politics, Intrigues and Dissenting Paradigms
in 19th & 20th century Statistical Thinking
• Assumptions are swept under the carpet
• The subjective elements of classical statistics, such as the
choice of null hypothesis, determining the outcome
space, the appropriate significance level and the
dependence of significant tests on the stopping rule are
all swept under the carpet.
• Bayesian methods put them where we can see them - in
the prior.
Politics, Intrigues and Dissenting Paradigms
in 19th & 20th century Statistical Thinking

• p values are irrelevant (which leads to


incoherence) and misleading
• The frequentist theory of probability is only capable of
dealing with random variables which generate a
frequency distribution ‘in the long run’.
• P values are often misunderstood to be probabilities
about the hypothesis, given the data taken from
random variables
Politics, Intrigues and Dissenting Paradigms
in 19th & 20th century Statistical Thinking

• Both Fisher and Neyman felt the vocation to supersede Karl


Pearson’s heritage in statistics
• Fisher was a personal adversary of Karl Pearson, and Neyman
underrated his mathematical capacities and, although less
outspoken given his cooperation with Karl’s son, Egon, did
never hide the feeling that a new start was required (Louçã,
2007).
• For a time, this provided motivation for convergence
between both men but eventually take on separate paths.
Politics, Intrigues and Dissenting Paradigms
in 19th & 20th century Statistical Thinking
• When trying to move to Western Europe to enhance his
academic opportunities and to avoid danger in those
frightful 1930s, the Polish Jerzy Neyman corresponded with
Fisher, only four years his elder, believing he was the man to
help him.
• They were indeed quite close in the vision of statistics as the
language for the new science to rule all other sciences.
Politics, Intrigues and Dissenting Paradigms
in 19th & 20th century Statistical Thinking
• William Gosset, a friend of both, emphasized their motivational
vicinity when he wrote to Fisher in order to arrange for the visit of
Neyman to Rothamstead Experimental Station (West Common,
Harpenden, United Kingdom) :
• “He is fonder of algebra than correlation tables [meaning, against
Karl Pearson] and is the only person except yourself I have heard
talk about maximum likelyhood (sic) as if he enjoyed it” (quoted in
Joan Fisher- second of Ronald Fisher's five daughters, 1978: 451).
• For a time, this provided motivation for convergence between both
men but eventually take on separate paths.
Considering Bayesian probability
Bayesian probability is an interpretation of the
concept of probability, in which, instead
of frequency or propensity of some phenomenon,
probability is interpreted as reasonable
expectation representing a state of knowledge or
as quantification of a personal belief.
Considering Bayesian probability
The Bayesian interpretation of probability can be seen
as an extension of propositional logic that enables
reasoning with hypotheses, i.e., the propositions
whose truth or falsity is uncertain.
In the Bayesian view, a probability is assigned to a
hypothesis, whereas under frequentist inference, a
hypothesis is typically tested without being assigned a
probability.
Considering Bayesian probability
• Bayesian estimate ignores outlier
• The distribution is narrower
• Confidence is greater
• The answer is probably much closer to correct
Considering Bayesian probability
In day-to-day reasoning, we take into account our prior beliefs
and use that alongside with the evidence from the observations
we have made.
On 7 March 2016, the American Statistical
Association (ASA) released a statement to
improve the interpretation of statistical
significance and p-values and their role in
scientific research.
The ASA statement provided 6 guidelines for the use of
p-values as part of good statistical practice:
• P-values can indicate how incompatible the data are with a specified
statistical model.
• P-values do not measure the probability that the studied hypothesis is true,
or the probability that the data were produced by random chance alone.
• Scientific conclusions and business or policy decisions should not be based
only on whether a p-value passes a specific threshold.
• Proper inference requires full reporting and transparency.
• A p-value, or statistical significance, does not measure the size of an effect
or the importance of a result.
• By itself, a p-value does not provide a good measure of evidence regarding
a model or hypothesis.
How do you like having an inference like this?
The Logic of
Hypothesis Testing
Hypothesis
• The word ‘hypothesis’ is derived from the Greek word-
‘hypotithenai’ which means ‘to put under’ or ‘to suppose’
• Consists of 2 words:
• Hypo – tentative or subject to the verification
• Thesis- statement about solution to a problem
• Thus, ‘hypothesis’ means ‘tentative statement about solution
to a problem’ or ‘guesses to solve the research problem’
• It is a presumptive statement of a proposition or a
reasonable guess, based upon the available evidence, which
the researcher seeks to prove through his study.
Hypotheses
• The research hypothesis is your best guess as to the
relationship between variables or what you predict
will be impact of one variable on another.
• The null hypothesis is always a statement that
"there is no difference" or "no impact" between our
independent variable and the dependent variable.
"Significant" Does Not Imply that There Is a Causal Effect

• It is useful to distinguish between the statistical null


hypothesis and the substantive null hypothesis.
• Statistical null hypothesis is a general statement or default
position that there is no relationship or difference between
two measured phenomena, or no association or difference
among groups
• Substantive null hypothesis refers to the absence of a
particular cause.
• What is rejected in significance testing is the statistical
hypothesis, not the existence or absence of a cause.
"Significant" Does Not Imply that There Is a Causal Effect
• In the famous The Lady Tea-Tasting Experiment in the Design of
Experiment, Fisher stated clearly that we cannot conclude from a
significant result (disproving the null) that the opposite hypothesis (which
is not formulated as an exact statistical hypothesis in null hypothesis
testing) is proven.
• This implies that we can Reject the Null BUT we cannot Accept the
Alternative
• This experiment was designed to test a lady's claim that she could tell
whether the milk or the tea infusion was first added to a cup.
• That is, we cannot infer the existence of a causal process from a
significant result.
Given that the lady can discriminate between whether the milk or the tea
infusion was first added to the cup, there exist other causal mechanisms
(someone told the lady in which cups the tea infusion had been poured
first) that are consistent with rejecting the null hypothesis.
Stating the Research Hypotheses
• Research Hypothesis
• Women and men earn different salaries.
• Null Hypothesis:
• There is no difference between women and men’s
salaries.
Testing a Hypothesis about a
Single Mean
• Research hypothesis: There is a difference in
average hours worked as compared to “40.”
• Null: not different from 40
Testing a Hypothesis about a
Single Mean
• Results: Average number of hours =42.

• P-value: =0.000

• Interpretation?
Interpretation
• You reject the null hypothesis
• You are 95% certain that the average number of
hours worked by is slightly more than the
assumed 40 hours.
Comparing 2 Means: Gender and Income

•The research hypothesis is that there


is a difference men’s and women’s
income.
•The null hypothesis:
• There is no difference between men’s
and women’s income.
Respondents’ Gender N Mean
Male 609 Php18,965.11
Female 756 Php13,096.23
p-value=0.001
Inference?
Analysis of Variance (ANOVA)
• What happens when you have more than 3 groups (or
categories) you want to compare?
• Religion
• Economic status (lower income, middle income,
high income)
• Education (HS, College, Graduate Degree)
Statistical Significance for
3 Groups of Subjects
• Is there a difference in income based on whether
one has a High School degree or less, some college or
completed a bachelor’s degree, or has a graduate
degree?

• Your Null Hypothesis is?


Statistical Significance for
3 Groups of Subjects
Education and Income
• HS or less: P9,225
• College: P16,764
• Graduate Degree: P32,275
• Are these results statistically significant?

• P-value = 0.001
• Inference?
Potential Errors
• Type I Error:
• This occurs when you reject the null hypothesis even
though there is a possibility that the null hypothesis is
true.
• Type II Error:
• This occurs when you fail to reject the null hypothesis,
even though there is a possibility that, in reality, it is false.
Type I and Type II

•Generally, social scientists feel that it is


worse to make a Type I error than a Type
II error.
One and Two-Tailed Tests
• ONE-Tailed Test: is used whenever the hypothesis
specifies a direction. We are concerned with only one
tail of the normal curve.
• TWO-tailed test: when the research question does
not specify a direction.
HYPOTHESIS TESTING
IN DETAIL
Hypothesis Testing
Steps in Hypothesis Testing: Two-Tailed Test (Z-test @ 5%)
1. State the hypotheses
Null hypothesis:  = 0
2. Identify the test statistic
and its probability Alternative hypothesis:   0
distribution where 0 is the hypothesised mean
3. Specify the significance
level
4. State the decision rule
Rejection area Rejection area
5. Collect the data and perform
the calculations
6. Make the statistical decision
1.96 SE 1.96
7. Make the economic or SE
0
investment decision

One-Tailed Test (Z-test @ 5%)

Rejection area
Null hypothesis:   0
Alternative hypothesis:  > 0
1.645 SE
0
Hypothesis Testing – Test Statistic & Errors

Test Statistic:

sample statistic - hypothesis ed value


Test statistic =
standard error of the sample statistic

Test Concerning a Single Mean


Type I and Type II Errors X - μ0
Test statistic, Z or t =
• Type I error is rejecting the null sX
when it is true. Probability = s
significance level. sx =
n Use σx if available
• Type II error is failing to reject
the null when it is false.
• The power of a test is the Decision H0 true H0 false
probability of correctly rejecting
the null (i.e. rejecting the null when Do not reject null Correct Type II error
it is false) Reject null Type I Correct
error
Hypothesis about Two Population Means
Normally distributed populations and independent samples
Examples of hypotheses:
H0 : μ1 − μ2 = 0 versus Ha : μ1 − μ2  0
H0 : μ1 − μ2 = 5 versus Ha : μ1 − μ2  5 (x1 − x2 ) − (μ1 − μ2 )
Test statistic, t =
H0 : μ1 − μ2  0 versus Ha : μ1 − μ2  0 standard error
H0 : μ1 − μ2  3 versus Ha : μ1 − μ2  3
etc......

Population variances unknown but Population variances unknown and


assumed to be equal cannot be assumed equal

s2 s2
Standard error = +
n1 n 2
(n1 − 1)s1 2 + (n2 − 1)s2 2 s12 s22
s= Standard error = +
n1 + n 2 − 2 n1 n2
s2 is a pooled estimator of the common
variance
Degrees of freedom = (n1 + n2 - 2)
Hypothesis about Two Population Means
Normally distributed populations and samples that are not independent -
“Paired comparisons test”
Possible hypotheses:
H0 : μd = μd0 versus Ha : μd  μd0
d − μd0
H0 : μd  μd0 versus Ha : μd  μd0 Test statistic =
sd
H0 : μd  μd0 versus Ha : μd  μd0

Symbols and other formula Application


1 n • The data is arranged in paired
d = sample mean difference =  di
n i=1 observations

μd0 = hypothesiz ed value of the difference • Paired observations are


observations that are
sd2 = sample variance of the sample difference s di dependent because they have
s something in common
sd = standard error of the mean difference = d • E.g. dividend payout of
n
companies before and after a
degrees of freedom = n − 1
change in tax law
Hypothesis about a Single Population Variance
Possible hypotheses:

H0 : σ 2 = σ02 versus Ha : σ 2  σ 02
2 (n − 1)s2
H0 : σ 2  σ02 versus Ha : σ 2  σ02 Test statistic, χ =
σ 02
H0 : σ 2  σ02 versus Ha : σ 2  σ02
Assuming normal population

Symbols Chi-square distribution is


asymmetrical and
s2 = variance of the sample
Obtained bounded below by 0
data from the Chi-
02 = hypothesized value of square Obtained from
tables. (df, 1 the Chi-square
the population variance tables. (df, /2)
- /2 )
n = sample size
Degrees of freedom = n – 1

NB: For one-tailed test use  Lower Higher


critical value critical value
or (1 – ) depending on
whether it is a right-tail or
Reject H0 Fail to reject H0 Reject H0
left-tail test.
Hypothesis about Variances of Two Populations
Possible hypotheses: The convention is to always put
the larger variance on top
H0 : σ12 = σ22 versus Ha : σ12  σ22

H0 : σ12  σ22 versus Ha : σ12  σ22 s12


Test statistic, F =
2
H0 : σ12  σ22 versus Ha : σ12  σ22 s2
Assuming normal populations

Degrees of freedom: F Distributions


Obtained from the
are asymmetrical
F-distribution table
numerator = n1 - 1, and bounded below
for:
by 0
denominator = n2 - 1  - one tailed test
/2 - two tailed test

Critical
value

Fail to reject H0 Reject H0


Origin of the t test
• As a chemist at the Guinness Brewery in Dublin (Ireland), William
S. Gosset was in charge of quality control.
• His job was to make sure that the stout (a thick, dark beer) leaving
the brewery was of high enough quality to meet the standards of
the brewery’s many discerning customers.
• It’s easy to imagine, when testing stout, why testing a large amount of stout might be
undesirable, not to mention dangerous to one’s health.
• So to test for quality Gosset often used a sample of only 3 or 4 observations per batch.
• But he noticed that with samples of this size, his tests for quality weren’t quite right.
• He knew this because when the batches that he rejected were sent back to the
laboratory for more extensive testing, too often the test results turned out to be wrong.
• As a practicing statistician, Gosset knew he had to be wrong some of the time, but he
hated being wrong more often than the theory predicted.
• One result of Gosset’s frustrations was the development of a test to handle small
samples.
Origin of the t test
• Gosset earned a degree in chemistry at Oxford, and joined the Guinness brewery
firm in 1899.
• His work for Guinness led him to investigate the statistical validity of results
obtained from small samples (previous statistical theory had concentrated
instead on large samples).
• He took a leave of absence to spend 1906/1907 studying under Karl Pearson at
University College, London.
• His publications in Pearson's journal Biometrika were signed "Student," not
because of a Guinness company policy forbidding publication, as is often said, but
more precisely because of a company wish to keep secret the fact that they were
gaining an industrial advantage from employing statisticians- a trade secret at
that time.
• Gosset's most important result is known as the "Student's t" test or distribution,
published in 1908 (The probable error of a mean. Biometrika. 6 (1): 1–25. March
1908).
Origin of the t test
• The pseudonym “Student” was selected by
Christopher Digges La Touche, the Managing
Director of Guinness: “It was decided by La
Touche that such publication might be made
without the brewers’ names appearing. They
would be merely designated ‘Pupil’ or ‘Student’
” (Box 1987, p. 46).1
1Box, J. F. (1987), “Guinness, Gosset, Fisher, and Small Samples,” Statistical Science, 2, 45–52.
Origin of the t test
• “Student’s”real identity was known only to colleagues of his
immediate acquaintance.
• Although Student was by the 1930s world famous in agronomy, the
design of experiments, and mathematical and applied sciences, the
world did not know who stood behind the pseudonym.
• Gosset did not openly reveal his identity until 1936, when he tried at
a meeting of the Royal Statistical Society to check a blustering Fisher,
who was making again a pitch for his “randomized”, antieconomic
design of experiments (Gosset, 1936:115; Jeffreys, 1939b).
Origin of the t test
• “Student” was not the only person to benefit from Guinness’s
enlightened attitude toward statistical education.
• Guinness sent Edward Somerfield (Gosset’s assistant) to work with
Fisher at Rothamsted Experimental Station in 1922 and George Story
(Gosset’s protégé) to work with Pearson at University College London
in 1928.
• Both were permitted to publish papers in the outside literature, but
once again only under pseudonyms: “Mathetes” (Somerfield) and
“Sophister” (Story).
• This subsequent use of pseudonyms indeed may have been
motivated by a desire to maintain a competitive edge.
Origin of the t test
• ‘Mathetes’ (1924). Statistical study on the effect of manuring
on infestation of barley by gout fly. Annals of Applied Biology
.xi, 220-235.
• ‘Sophister’ (1928). Discussion of Small Samples Drawn from
an Infinite Skew, Biometrika, Volume 20A, Issue 3-4, 1
December 1928, Pages 389–423.
Correlation Analysis
Sample Covariance and Correlation Coefficient
Scatter Plots n
y
x
 (Xi − X )(Yi − Y )
x
x
x
Sample covariance = i=1
x x
n −1
x x x
x Correlation coefficient measures the direction and
x
x
x
x extent of linear association between two variables
x x
x x
x x covariance x, y
x Sample correlatio n coefficien t, rx, y =
sx sy
s = sample standard deviation − 1.0  rx , y  + 1.0

Testing the Significance of the Correlation Coefficient

r n−2 Set Ho:  = 0, and Ha:  ≠ 0


Test statistic = t =
1− r 2 Reject null if |test statistic| >
critical t
Degrees of freedom = (n - 2)
Linear Regression

Basic idea: a linear relationship between two variables, X and Y. Yi = b0 + b1 X 1 +  i


Note that the standard error of estimate (SEE) is in the same units as ‘Y’ and hence should
be viewed relative to ‘Y’.

Y, dependent
variable
Yi x
Mean of i values = 0
i error term
or residual
x x
x

Yˆi x
x
x

Yˆi = bˆ0 + bˆ1 X i


x
x
x
x
x

x x
x Least squares regression finds the straight line
that minimises
x

x x
x εˆi2 ( = sum of the squared errors, SSE)

X, independent
Xi variable
The Components of Total Variation

n  − 2
Total variation =   Yi − Y  = SST
i=1  
n   2 n n  − 2
 
Unexplaine d variation =  Yi − Yi  = εˆi = SSE Explained variation =   Yi − Y  = SSR
i=1   i=1 i=1  
ANOVA, Standard Error of Estimate & R2
Sum of squares regression (SSR)

Sum of squared errors (SSE) Standard Error of Estimate

n
Sum of squares total (SST)
 εˆ
i=1
i
2
SSE
SEE = =
n-2 n-2

Coefficient of determination
R2 is the proportion of the total
variation in y that is explained by Interpretation
the variation in x
When correlation is strong
• R2 is high
2 SSR SST - SSE
R = = • Standard error of the estimate is low
SST SST
Assumptions & Limitations of Regression Analysis

Assumptions Limitations
1. The relationship between the 1. Regression relations change over
dependent variable, Y, and the time (non-stationarity)
independent variable, X, is linear 2. If assumptions are not valid, the
2. The independent variable, X, is not interpretation and tests of
random hypothesis are not valid
3. The expected value of the error 3. When any of the assumptions
term is 0 underlying linear regression are
4. The variance of the error term is violated, we cannot rely on the
the same for all observations parameter estimates, test
(homoskedasticity) statistics, or point and interval
forecasts from the regression
5. The error term is uncorrelated
across observations (i.e. no
autocorrelation)
6. The error term is normally
distributed
Parametric and nonparametric tests

Parametric tests: Nonparametric tests:


• rely on assumptions regarding the • either do not consider a particular
distribution of the population, and population parameter, or
• are specific to population • make few assumptions about the
parameters. population that is sampled.
All tests covered on the previous Used primarily in three situations:
slides are examples of parametric • when the data do not meet
tests. distributional assumptions
• when the data are given in ranks
• when the hypothesis being
addressed does not concern a
parameter (e.g. is a sample random
or not?)
Degrees of Freedom
• The number of "observations" (pieces of information) in the
data that are free to vary when estimating statistical
parameters.
• Refers to the number of scores that are free to vary.
• df is a function of both the number of observations and the
number of variables in one's model
• Typically, the degrees of freedom equal your sample size
minus the number of parameters you need to calculate
during an analysis.
Degrees of Freedom
• Degrees of freedom is a combination of how much data you
have and how many parameters you need to estimate.
• It indicates how much independent information goes into
a parameter estimate.
• If you want a lot of information to go into
parameter estimates to obtain more precise estimates and
more powerful hypothesis tests, therefore you need many
degrees of freedom
Degrees of Freedom
• Degrees of freedom also define the probability distributions for
the test statistics of various hypothesis tests.
• For example, hypothesis tests use the t-distribution, F-
distribution, and the chi-square distribution to determine
statistical significance.
• Each of these probability distributions is a family of distributions
where the degrees of freedom define the shape.
• Hypothesis tests use these distributions to calculate p-values.
• The DF are directly linked to p-values through these distributions
Degrees of Freedom
• t-tests are hypothesis tests for the mean and use the t-
distribution to determine statistical significance.
• A 1-sample t-test determines whether the difference between
the sample mean and the null hypothesis value is statistically
significant.
• When you have a sample and you are going to estimate the
mean, you have n – 1 degrees of freedom, where n is the sample
size.
• The -1 is called the restriction
• Consequently, for a 1-sample t-test, the degrees of freedom is
n – 1.
Degrees of Freedom
• The df define the shape of the t-distribution that your t-test uses to
calculate the p-value.
• Because the degrees of freedom are so closely related to sample size,
you can see the effect of sample size.
• As the degrees of freedom decreases, the t-distribution has thicker
tails.
• This property allows for the greater uncertainty associated with small
sample sizes.
Degrees of Freedom
Degrees of Freedom
• The F-test in ANOVA also tests group means. It uses the F-
distribution, which is defined by the degrees of freedom.
Degrees of Freedom
Degrees of Freedom
• The chi-square test of independence determines whether
there is a statistically significant relationship
between categorical variables.
• Just like other hypothesis tests, this test incorporates
degrees of freedom.
• For a table with r rows and c columns, the general rule for
calculating degrees of freedom for a chi-square test is (r-1)
(c-1).
Degrees of Freedom
Degrees of Freedom
• In a regression model, each term is an estimated parameter that uses one
degree of freedom.
• These are the degrees of freedom associated with the sources of variance.
• The total variance has n-1 degrees of freedom.
• The Regression degrees of freedom corresponds to the number of
coefficients estimated minus 1.
• Thus for example: Including the intercept, if there are 5 coefficients, so the
model has 5-1=4 degrees of freedom.
• The Error degrees of freedom is the DF total minus the DF model, 199 – 4
=195.
Degrees of Freedom
in Regression is Found on the
ANOVA table
Central Limit Theorem
Given a sufficiently large sample size from a
population with a finite level of variance, the mean
of all samples from the same population will be
approximately equal to the mean of the population.
Furthermore, all the samples will follow an
approximate normal distribution pattern, with all
variances being approximately equal to the variance
of the population divided by each sample's size.
Central Limit Theorem
Test of Normality
• The Kolmogorov-Smirnov test & Shapiro-Wilk test- examines if
scores are likely to follow some distribution in some population.

If the Sig. value of the K-S & Shapiro-Wilk Tests is greater than 0.05, the data is
normal. If it is below 0.05, the data significantly deviate from a normal
distribution.
The Church of Statistical Significance
The Twelve Commandments of Statistical Inference
1. Thou shalt not draw inferences from a nonsignificant result.
Remember the type II error, for therein is reflected the power if not
the glory.
2. Thou shalt not pseudo replicate or otherwise worship false degrees of
freedom.
3. Respect the one-tailed test, for it can make thine inferences strong.
4. Forget not the difference between fixed treatments and random
effects.
5. Thou shalt always specify the level of significance before the
experiment; those who specify it afterward by rounding up
obtained p values are cheating.
6. Thou shalt always design thy experiments so that thou canst perform
significance testing.
The Twelve Commandments of Statistical Inference
7. Thou shalt not commit unplanned comparisons without adjusting the rate of
Type I error for thy transgressions.
8. Honor both thy parametric and thy nonparametric methods.
9. Consider not the probability of a particular set of data, but rather the
probability of all those sets as or more extreme than thine own.
10. Thou shalt not confuse neither manipulation and observation, nor
causation and correlation.
11. Thou shalt not presume statistical significance to be of scientific
importance.
12. Thou shalt not be fearful of paying homage to a Statistician or His Holy
Book, especially before planning an experiment; neither shalt thou
be fearful of ignoring the Word of a Statistician when it is damnable; for
thou art alone responsible for thine acceptance or rejection of the
hypothesis, be it ever so false or true.

You might also like