Chapter 10 Analysis of Quantitative Data
Hernández Sampieri
Once the data has been encoded, transferred to a matrix, saved in a file
and after correcting errors, the researcher proceeds to analyze them.
The analysis of data in most cases is carried out in an 'automated' manner.
through 'software' and on a data matrix.
Phases of Data Analysis
Phase 1
Select the program
Phase 2
Execute it
Phase 3
Explore the data
Phase 4
Assess reliability and validity
Phase 5
Analyze the Hypotheses
Phase 6
Additional analysis
Phase 7
Present the results
Step 1 Select an analysis program
There are various programs, their operation is very similar.
They consist of two parts:
The definition of the variables
The data matrix
Example of a data matrix
Case Hair color Age
1 1 1 35 |
2 1 |1 29 |
3 2 1 28 |
4 |2 4 33 |
Genero 1= masculino 2= femenino
black
Age (gross value) in years
SPSS or SPAW
Statistical package for social sciences, developed by the University of Chicago,
it is one of the most widespread.
MINITAB
It is low cost, has a 'demo' at http://minitab.com
Step 2 Run the program
Before installing the program, it is necessary to check that our equipment
meet all the requirements for the execution of the package, so that it does not
conflicts arise in the team during the installation or execution of the program.
The hardware and software requirements to run SPSS are minimal.
Step 3 Explore the data
This stage is immediate to the execution of the program, it is simple if it was carried out the
sequence of the previous stages.
Stage 1
Menu Analyze/reports/ Descriptive statistics/Frequencies
All items are requested (variable, matrix by matrix):
Matrix reports, to see the results item by item or row by row
Descriptive statistics
a) Descriptive (table with the fundamental statistics of all the variables in the matrix,
columns or items
b) Frequencies (frequency tables of the variables of the matrix
c) Explore (relationship between the variables of the matrix)
d) Generate contingency tables
e) Generate reasons
Stage 2
The researcher evaluates the distributions and statistics of the items or columns, observes
which items have a logical and illogical distribution and group the items or indicators in the
research variables (composite variables) according to their definitions
operational and how it developed its measurement instruments.
Stage 3
Transform/Calculate Menu
The program is instructed on how to group the items in the variables of its study.
Stage 4
Analyze Menu
All variables of the study are requested:
a) Descriptive statistics (tables with the fundamental statistics of all the
variables
b) A frequency analysis with statistics, tables, and graphs.
STATISTICAL DATA
Variable of the data matrix
Is it a column or an item?
Research variable
They are the measured properties that are part of the hypotheses or that are intended.
describe
Composite variable
It is when the research variable is composed of several variables from the matrix or
items.
The analysis of the data depends on three factors
a) Measurement level of the variable
b) How the hypotheses or objectives were formulated
c) Researcher's interest
The final descriptive analysis is about the study variables.
Statistics is not an end in itself but a tool to evaluate data and test.
hypothesis.
Descriptive statistics
Frequency distribution
It is the set of scores arranged in their respective categories.
Frequency distribution
(how would you like to be called ethnically)
Codes (values) Frequencies
Hispanic 1 52 |
Latino 2 88 |
Latin American 3 6 |
Americano 22 |
Others | 5 | 20 |
They did not respond 6 12 |
Total 200
Frequency distribution
When the description of the frequencies is so extensive, it is necessary to summarize it into ranges.
Example:
1-10
11-20
21-30
Etc.
Frequency distribution
staff cooperation in the company's quality project
Codes (values) Valid percentage | Percentage
accumulated |
Collaboration has been obtained. 1 91 | 74.6 | 74.6 |
No collaboration has been obtained. 2 5 | 4.1 | 78.7 |
3 26 21.3 100
They did not respond.
Total | 122 | 100 | |
Frequency distribution
(reasons for the preference for their favorite character)
| Frequencies Valid percentage Cumulative percentage
|
Valid Fun
Good 10 5.1
They have powers | 23 | 11.7 | 11.9 | 90.2 |
They are strong 9.6
| Total | 194 | 98.5 | 100 | |
Lost | They did not 1.5 3
answer | |
Total 197 100 | |
Other ways to present the frequency distribution
Histogram Chart
Bar Chart
Circular Chart
Frequency polygon
Relate the scores with their respective frequencies through useful graphs.
to describe the data
Measures of central tendency
Mean or central values of a distribution that serve to locate it within the scale
of measurement.
Fashion
It is the category or score that appears most frequently
Median
Value that divides the distribution in half
Calculation of the Median
N+1 9+1
_______ = ________= 5
2 2
Media
It is the arithmetic mean of a distribution and is the most common measure of central tendency.
used
It is the sum of all values divided by the number of cases
Measures of variability
They are intervals that indicate the dispersion of data on the measurement scale.
Range (Route)
Indicate the total extent of the data on the scale.
XM-Xm
Standard deviation
Average deviation of the scores with respect to the mean that is expressed in the
original measurement units of distribution.
Variance
It is used in inferential analyses.
Another descriptive statistic
Asymmetry and kurtosis
Statistics that are used to understand how similar a distribution is to the distribution
theoretical called normal curve or bell of Gauss
Translation of statistics into English
Fashion ± Mode
Median
Media ± Mean
Standard deviation ± Standard deviation
Variance
Maximum
Minimum ± Minimum
Range
Skewness ± Skewness
Kurtosis - Kurtosis
Z Scores
They are measures that indicate the direction and the degree to which an individual value deviates from the
media, on a scale of standard deviation units.
Reason
It is the relationship between two categories
The ratio of men to women is 60/30=2
Category Frequency
Male | 60 |
Female 30 |
Rate
It is the relationship between the number of cases in a category and the total number of observations.
Rate= Number of events / Total number of possible events
Rate= Number of live births in the city / Number of inhabitants
Rate = 10,000 / 300,000 x 1000 = 33.33
That is, there are 33.33 live births for every 1000.
Step 4 Evaluate the reliability and validity achieved by the measurement instrument
It can range from 0 no reliability to 1 maximum reliability.
The reliability of the scales is calculated using various methods:
Stability measure (test ± retest)
It is calculated by applying the same test to the participants twice and then applying a
correlation coefficient between the scores of both applications.
Method of alternative or parallel forms
It is calculated through a correlation coefficient between the results of two tests.
supposedly equivalent.
It is applied in testing - post-testing
Method of split halves.
It is calculated by means of a correlation coefficient between the scores of the halves.
of the instrument.
The Validity
The validity of the content is obtained by ensuring that the dimensions measured by the
instruments should be representative of the universe or domain of the dimensions of the variables of
interest.
Criterion validity evidence is produced by correlating the scores of the
participants, with their values obtained in the criterion.
Correlation involves associating scores obtained by the sample on two or more variables.
Step 5 Analyze the proposed hypotheses through statistical tests (Statistical analysis
inferential
It is used to test hypotheses and estimate parameters, based on the sampling distribution.
Hypothesis testing
It consists of testing whether the hypothesis is in agreement with the sample data.
The possible outcomes would be:
Accept a true hypothesis (correct decision)
2. Reject a false hypothesis (correct decision)
3. Accepting a false hypothesis (beta error or type II)
4. Rejecting a true hypothesis (alpha error or type I)
Sampling distribution
A sampling distribution is a set of values based on a statistic calculated from
all possible samples from a population.
Significance level
It is a level of the probability of making a mistake and establishes a priori method for the researcher.
It must start from the following assumptions:
1.- The population distribution of the dependent variable is normal
2.- The measurement level of the dependent variable is by intervals or ratio.
3.- When the populations in question have a similar dispersion in their distributions
Parametric analysis
Pearson correlation coefficient
It is a statistical test to analyze the relationship between 2 variables measured at a level.
intervals or ratios.
t test
It is a statistical test to evaluate whether 2 groups differ significantly from each other.
regarding their socks. It is used for 2 groups
Proportion difference test
It is a statistical test to analyze whether 2 proportions differ significantly.
yes.
Analysis of variance
It is a statistical test to analyze whether more than 2 groups differ significantly among themselves.
Yes regarding their means and variances. It is used for 3, 4 or more groups.
Proportion difference test
It is a statistical test to analyze whether 2 proportions differ significantly.
yes.
Non-parametric analysis
It should start from the following considerations:
1.- They do not require budgets regarding the shape of the population distribution.
2.- They do not necessarily have to be measured in intervals or ratios, they can be analyzed
nominal or ordinal data.
Chi squared
It is a statistical test to evaluate hypotheses between two categorical variables. It is used
to test correlational hypotheses.
Spearman and Kendall coefficients
They are correlation measures for variables at an ordinal level of measurement; the individuals or
Sample objects can be ordered by ranges.
Coefficients for cross-tabulations
In addition to the chi-squared, there are these coefficients to evaluate whether the variables
included in the cross-tabulation are correlated.
Step 6 Conduct additional analysis
After conducting our analysis, we may decide to add other analyses or
extra tests to confirm trends and evaluate the data from different angles.
Step 7 Prepare the results for presentation
It is recommended that, once the results of the statistical analyses are obtained, the
next activities:
1.- Review each result
General and specific analysis of resulting values, tables, diagrams, charts, and
graphs.
2.- Organize the results
First the descriptives, by variable, then the results related to the
reliability and validity and subsequently the inferential ones.
3.- Compare the different results
Its congruence and in case of logical inconsistency, review them again.
4.- Prioritize the most valuable information
5.- Copy the tables into the program with which the report will be prepared
Empty into word processors or one for presentations like Word or Power
point, the tables prepared by programs like SPSS or Minitab.
6.- Comment or describe briefly
The essence of analyses, values, tables, diagrams, graphs.
7.- Review the results again.
8.- And finally, prepare the research report.
Conclusions
We can conclude by saying that we have seen the different phases of data analysis.
quantitative.
Some programs for data analysis have been superficially reviewed in the
investigation.
We have worked with the necessary statistics to prepare the data analysis.
For hypothesis testing, there are parametric and non-parametric methods.
Chapter 7 Data Collection and Descriptive Statistics. Salkind
The data collection process involves four steps:
The construction of formats to collect the information.
The encoding used to represent that data.
The collection of the data itself
Its settlement in the data collection format.
Encoding: Data is encoded when transferred from the format of
original compilation into a format suitable for data analysis.
The only rule for encoding data is to use the simplest codes possible.
The ten commandments of data collection:
1.- When you start to consider a research process, think from then on the
Type of data that will need to be collected to answer the question.
We also have to think about where we are going to get them.
3.-Ensure that the data collection format is easy to use.
4.- Prepare a copy of the data file.
5.- Not depending on other people to collect the data.
6.- Create a detailed program of when and where you will collect your data.
7.- Cultivate the possible sources of your group of subjects.
8.-Try to contact the subjects who missed the interview.
Never discard the original data.
10.-Obey the other nine.
The data analysis can be performed through descriptive statistics and the
inferential
The first step in data analysis is to describe them, or the distribution of scores.
The comparison of score distributions can be carried out through the Measures.
of central tendency.
There are three types of measures of central tendency: the mean, median, and mode.
The mean is the sum of a set of scores divided by the number of
scores.
The median is the score of a distribution above which half is found.
of the scores.
The mode is the score that occurs most frequently.
Measures of variability.
Variability is the degree of dispersion that characterizes a group of scores and is the
degree to which a set of scores differs from some measure of central tendency
generally the average.
The measures of variability are:
The interval that is the difference between the highest score and the lowest.
The standard deviation: It is the average amount by which each of the scores
individuals vary regarding the average of the set of scores.
Conclusions
The data analysis stage is performed in four steps.
The 10 commandments for data analysis were mentioned.
We have learned about the three measures of central tendency used in data analysis.
These statistics are: mean, median, and mode.
the variance measures of data analysis were revealed: range and standard deviation
standard
Chapter 8 Research Methods Salkind
Descriptive statistics is used to describe the characteristics of a sample, the
Inferential statistics is used to infer something about the population from which it was drawn.
sample.
Statistical significance is the degree of risk we are willing to take that
We will reject a null hypothesis when it is actually true.
The risk we run when making this type of error is known as type 1 and type 2 error.
Type 1 error has been assigned certain conventional levels, which are 0.1 and 0.5.
And type 2 error is accepting a false null hypothesis.
Significance Tests
They help us make decisions about populations.
They are based on the fact that each type of null hypothesis is associated with a type of
specific statistics.
Steps to follow for a statistical test
1. Expression of the null hypothesis.
Establish the level of risk associated with the null hypothesis.
3. Selection of the appropriate statistical test.
4. Calculation of the value of the statistical test.
5. Determination of the required value to reject the null hypothesis.
6. Comparison of the obtained value with the critical value
7. If the obtained value is more extreme than the critical value, it is not possible to accept the
null hypothesis.
8. If the value does not exceed the critical value, the null hypothesis is the most attractive.
T-test for independent measures
It is an inferential test of the significance of the difference between two means based on
two independent groups.
Some significance tests are:
Independent samples t-test.
Dependent samples t-test.
Analysis of variance
Techniques to evaluate a dependent variable
Multivariate analysis of variance: It is an advanced technique that determines the occurrence
of differences by group in more than one dependent variable.
Factor analysis: It allows the researcher to reduce the number of variables that
they represent a particular construct and then use the so-called factor scores as
dependent variables.
Conclusions
We have learned the methods for data analysis.
These methods are: significance tests, t tests, for independent samples and
t-test for dependent samples.
The steps for a statistical test were outlined.