SPSS Data Analysis
Drowning in Data
22 April 2014
nd
ROSLE MOHIDIN
Senior Lecturer
School Of Business & Economics, UMS
1
2
Presentation Outline
SPSS Environment -Review of SPSS Basics
– SPSS interface: data view and variable view
– How to enter data in SPSS
– How to clean and edit data
– How to transform variables
– How to sort and select cases
– How to get descriptive statistics
Inferential Statistics in SPSS
– Independent t-test
– Regression
3
3
Features of SPSS
Originally developed for the people in Social
Science Areas, therefore, no heavy programming
background required
Designed as User Friendly and has Pull Down
Menus to Execute Statistical Commands
Ability to do Data Management & Manipulations
Ability to Store Programs & Produce
Reports/Graphs
4
SPSS Program Flow
Outside
Im
Data por
t ing
Source
SPSS Data
Data Modification/ Data Analysis
File Transformation
r y
Raw t E nt
ec
Data Dir
Pull-Down
Menu
OR
Syntax
Menu
(Data Steps) (Analysis Steps)
5
An Example of Research use SPSS a
tool of Data analysis
Youth Risk Behavior Surveillance System
(YRBSS, CDC)
– YRBSS monitors priority health-risk behaviors
and the prevalence of obesity and asthma
among youth and young adults.
– The target population is high school students
– Multiple health behaviors include drinking,
smoking, exercise, eating habits, etc.
6
6
Data view
– The place to enter data
– Columns: variables
– Rows: records
Variable view
– The place to enter variables
– List of all variables
– Characteristics of all variables
7
7
You need a Questionnaire/code book/scoring
guide
You give ID number for each case (NOT real
identification numbers of your subjects) if
you use paper survey.
If you use online survey, you need
something to identify your cases.
You also can use Excel to do data entry.
8
8
Data View Window - Data Entry Site
(Columns=Variables, Rows=Cases)
Help Menu
Pull-down Menu bar Tool bar Information bar
Title bar
Variable
Names
Data View window Active cell Action bar9
Variable View Window
Data Definition Site
64
Characters
Max, No Numeric,
space String, &
Length
Between Others
Beg letter, # of
Decimals Variable
@, #, or $ Description
Value
Code
Description Missing
Click here to see this view value
Description
10
1. Click Variable View
2. Type variable name under
4. Description Name column (e.g. Q01).
2. Type of variable NOTE: Variable name can be 64
variable name
bytes long, and the first
3. Type:
character must be a letter or
numeric or
string one of the characters @, #, or
$.
3. Type: Numeric, string, etc.
4. Label: description of
variables.
1. Click this
Window
11
11
Based on your
code book!
12
12
1. Two variables in the data set.
2. They are: Code and Q01.
3. Code is an ID variable, used to identify individual
case (NOT people’s real IDs).
4. Q01 is about participants’ ages: 1 = 12 years or
younger, 2 = 13 years, 3 = 14 years…
Under Data
View
13
13
Save this
file as SPSS
data
14
14
Cleaning the Data
Key in values and labels for each variable
Run frequency for each variable
Check outputs to see if you have variables
with wrong values.
Check missing values and Questionnaire if you
use surveys, and make sure they are real
missing.
Sometimes, you need to recode string
variables into numeric variables
15
15
Before we see
OK VS. Paste
buttons
Examples… <Output File>
1. OK - results/action
will be executed
16
Wrong
entries
17
17
Descriptive statistics
– Purposes:
1.Find wrong entries
2.Have basic knowledge about the sample and
targeted variables in a study
3.Summarize data
Analyze Descriptive statistics
Frequency
18
18
19
19
20
20
1. Skewness: a measure of the
asymmetry of a distribution.
The normal distribution is
symmetric and has a skewness
value of zero.
Positive skewness: a long right tail.
Negative skewness: a long left tail.
Departure from symmetry : a
skewness value more than twice
its standard error.
2. Kurtosis: A measure of the extent
to which observations cluster
around
a central point. For a normal
distribution, the value of the
Normal kurtosis
Curve
statistic is zero. Leptokurtic data
values are more peaked, whereas
platykurtic data values are flatter
and
more dispersed along the X axis. 21
21
Example - School Data
Raw Data
Subject 1
– Subject # (1)
– Female (1)
– Intensive (1)
– Reading (90)
– Math (67)
Subject 2
– Subject # (2)
– Female (1)
– Moderate (2)
– Reading (72)
– Math (46)
Subject 3
– Subject # (3)
– Male (0)
– Basic (3)
– Reading (41)
– Math (73)
22
School Data
Variable View
Variable View Activated
23
School Data
Completed Dataset – Data View
24
School Data
Completed Dataset – Variable View
25
Click to Obtain
Data File Information
26
Variable Information
27
Value Code Information
28
Basic Statistical Methods
Independent t-test
Multiple
Regression
29
Independent t-test
– Is there a significant difference between 2
groups?
Assumptions 1. Normality 2. Variance 3.
Equality Independence
# of Variables Characteristics School Data
N=100
Dependent = 1 Continuous Math Score
Range of 0-100
Independent =1 Categorical Gender
2-levels
30
How to calculate t-value?
Mean Difference
t-value= Group Variability
31
t-test
Medium
Variability
High
Variability
Low
Variability
32
Independent t-test
1. Go to Analyze.
2. Choose
Compare Means.
3. Choose
Independent
Samples t Test.
33
t-test
1. Choose Dependent
& Independent Variables.
34
Independent Variable
Descriptives &
Analysis
Dependent Variable
Variance Equality Test t - statistics t= Mean Diff
Std. Error Diff
t= Z1 – Z2 = 63.20 – 54.10 = 9.093 = 3.295
SD12 + SD22 (13.914)2 +(13.064)2 2.760
N1 N2
41 59
35
Conclusion &
Chart
There is a
significant
difference in
math ability
between
males and
females.
36
Multiple Regression
– Which IVs can predict the DV and to estimate the effects of
these variables on DV?
Assumptions 1. Normality 2. Variance 3. 4. Linear
Equality Independen Relationship
ce
# of Variables Characteristics Health Survey
Data
N=100
Dependent =1 Continuous LDL Value
0-200
Independent > 1 Continuous or HT, WT, BMI, &
Dichotomous (0 Exercise
or 1) Variables 37
Multiple Regression Diagram
HT
DV
WT
LDL
IV
BMI
Exercise
All 4 IVs are predicting LDL
38
Health Survey Data of N=100
39
Multiple Regression
1.Choose Regression
2. Choose Linear Regression
40
2. Choose Statistics you need.
1. Choose DV, IV, & Method.
3. Choose Residual Plots.
41
Descriptives
& Correlation
Tables
Descriptive
Stats.
Correlation
Coefficients &
corresponding
p-values.
42
R2=how much of the variability in the outcome is accounted
Main Analysis for by the predictors (regression sum of squared/total sum of squares)
Adj. R Sq=Adj for the # of
R=r between pred and Parameters in the model
observ value of the DV
Global test to
see if any
coefficient is
different from
“0” Partial/Part Tolerance
t & Sig=IV Correlations &VIF
B=Reg Coefficient predictability
Beta=Stdized. Reg
Coefficient.
Something is Wrong
if Beta >1!!
43
Residual
Analysis
Residual Normality Linearity and
Equal Variance & residual independence
44
Conclusion Multiple Regression
IVs explain about
40% of the variability
of LDL level.
The significant
predictors of LDL
were BMI and Hrs of
Exercise.
The collinearity
statistics didn’t show
exceptionally large
multicollinearity
among predictors.
Assumptions of
residual normality
and equal variance
were met. 45
Key Concepts
Statistical Models depend on the theory
and data. Choose your model wisely to
see if it can answer your research
questions.
Check Assumptions. Model conclusions
may not be valid unless the assumptions
were met. If not, use appropriate
corrections, do data transformations, or
even use other statistical methods.
46
Conclusions
Statistical judgments come into
our daily lives. Statistics are
more than mathematical
calculations or scientific
research, but they are the way
of logical thinking…
Thank you
47