TRAINING ON DATA ANALYSIS AND VISUALIZATION
22 April 2024, 8AM-5PM
IPB Seminar Room
Lecture 1: Quantitative Research Designs and Basic Concepts
Asst. Prof. Roselle V. Collado (INSTAT)
Topic Outline
Research Design Options
Quantitative Research Designs
Basic Concepts
Descriptive Statistics
Inferential Statistics
What is Research?
Traits
o systematic
o objective
o uses information
Purposes
o improve decision-making
o present solution to problems
o open opportunities for further research
The Research Process
Problem Identification
Development of Approach to the Problem
Research Design Formulation
Data Collation
Data Preparation and Analysis
Report Preparation and Presentation
Research Design Considerations
Nature of the Problem
Research Objectives
Intended Analysis
Time Allotment
Resource Availability/Budgetary Constraints
Needs of the End-User
Research Designs
o Qualitative Research Design
starts with observations collected
observations are used to create theories and hypotheses
o Quantitative Research Design
starts from theory from which hypotheses are formulated
hypotheses are operationalized and empirical evidence are collected
involves obtaining and interpreting data in order to develop a breadth of
knowledge
o Mixed Methods Design
utilizes both quantitative and qualitative approaches in collecting data to
enrich the findings of the research
Quantitative Research Designs
Experimental Research Design
o Example: Classic monggo germination experiment
Objective: Determine what is needed for monggo seeds to germinate
Treatments: No Water, No Air, No Light, Has All
Experimental Units: Monggo seeds in plastic cups
o Concepts:
randomization – preserves objectivity
replication – at least 2 experimental units treated alike
local control – summary of all other practices; everything else is
homogenous except for the treatment being administered
Survey Research Design
o Example: Current State of Agrarian Reform Beneficiaries: Its Implication to the
Comprehensive Agrarian Reform Program
Pseudo-Experimental Research Design
o Example: Large Class vs. Very Large Class Study for STAT1
Mixed Methods Design
Triangulation or Parallel Design
o researchers implement the qualitative and quantitative data collection
simultaneously and the results are compared
Embedded Design
o uses one method of collecting data to support the data collected using the
other
Explanatory Sequential
o first collect and analyze quantitative data, then collect and analyze qualitative
data to explain the quantitative findings
Exploratory Sequential
o first collect and analyze qualitative data, then collect and analyze quantitative
data to test and analyze the qualitative findings
Measure of Skewness: characterize the symmetry of the distribution
possible values of SK:
o SK = 0 (symmetric distribution)
o SK < 0 (negatively skewed; skewed to the left)
o SK > 0 (positively skewed; skewed to the right)
Measure of Kurtosis: characterize the flatness or peakedness of the distribution as
well as the lightness or heaviness
possible values of K:
o K = 0 (mesokurtic)
o K < 0 (platykurtic)
o K > 0 (leptokurtic)
Test of Hypothesis
Null Hypothesis (Ho)
Alternative Hypothesis (Ha)
Decision and Errors in Hypothesis Testing
Errors in Hypothesis Testing:
o Type I Error: rejecting a true null hypothesis
o Type II Error: accepting a false null hypothesis
Level of Significance (α)
Test Statistic: a statistic or its transformation which provides a bases for
determining when to reject Ho
Level of Significance of a test: a measure of risk of rejecting Ho when it is actually
true
Decision Rule
o a specification of that region for which the test statistic leads to the rejection
of Ho
o the region specified is referred to as the critical region of the test
p-value of a test
o measures the weight of the evidence for rejecting (or failing to reject) H o
o p-value smaller than the set level of significance support rejection of H o
o smaller p-value is favorable; smaller chance of generating data more extreme
than the current data
Steps in testing a statistical hypothesis
state the null and alternative hypotheses
identify the test statistic and its distribution
specify the level of significance of the test
state the decision rule
collect the data and perform the computations
Statistical Methods
Parametric Tests
o impose strict assumptions on the distribution of the parent population
o applicable to data measured in interval scale
Non-parametric Tests
o do not impose assumptions on the distribution of the parent population
(distribution-free methods)
o applicable to qualitative data, or data not conforming to normality
Test on Two Populations
Types of Samples:
1. Independent – selection of samples was done separately from the two
populations
2. Related – observations come in pairs that were collected either as matched-pairs
or self-paired observations
Common Measure of Association
1. Pearson
2. Spearman
3. Kendall’s Coefficient
4. Cramer’s V
5. Phi-Coefficient
6. Point-Biserial/Biserial Correlation
Lecture 2: R Programming
Jeremiah
R Programming
data analysis software, programming language, and a complete interactive
programming language
open-source software project
out of the box contained advanced statistical routines not available in other packages
Objects
Numeric (1,2,3…)
Logic (TRUE, FALSE)
Character (“Text”, “123”)
Operators
Arithmetic Operators: +, -, *, /, ^, %%
Assignment Operator: <-, =
Comparison Operator: == (equality), != (not equal), >, >=, <, <=
Logic Operator: & “and” (returns ‘true’ if both are true), | “or” (returns ‘true’ if either
one is true), ! “not” (returns ‘true’ if false or vice versa)
Functions
Example: square function
> square <- function(x){
+ return(x^2)
+}
> square(5)
[1] 25
Vectors
a collection containing multiple values of the same type
created using c() function
limitation: same type of values
example:
> x <- c(1,2,3,4,5)
>x
[1] 1 2 3 4 5
Colon Operator
> 1:5
[1] 1 2 3 4 5
Vector Operations
applies the operation on every element of the vector
Example:
> x <- c(1,2,3,4,5)
> x+1
[1] 2 3 4 5 6
> x <- c(1,2,3)
> y <- c(4,5,6)
> c(x,y)
[1] 1 2 3 4 5 6
Vector Indexing
> x <- 10:20
>x
[1] 10 11 12 13 14 … 20
> x[2]
[1] 11
> x[8]
[1] 17
> x[c(1,8,2)]
[1] 10 17 11
> x[2:4]
[1] 11 12 13
excluding
> x[-10]
[1] 10 11 12 13 14 15 16 17 18 20
logic values
Conditionals
creates branches to where code will be routed
> x <- 6
> if (x > 0){
+ print(“Positive Number”)}
[1] “Positive Number”
if (x>0) {
print(“Positive Number”)
}else if (x<0) {
print(“Negative Number”)
}else {
print(“Zero”)
}
While Loop
iterates the loop while condition is true
i <- 1
while(i=10) {
print(i)
i <- i+1
}
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
…
[1] 10
For Loop
iterates the loop for every element in the given vector
known number of steps
for(i in 1:10){
print (i)
}
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
…
[1] 10
List
two-dimensional collection containing multiple vectors that can be of different types
created using list() function and can be accessed through their and the $ operatir
> list (A=1, B=c(2,2,2,6,8), C=3)
>x
$A
[1] 1
$B
[1] 2 2 2 6 8
$C
[1] 3
is.odd <- function(x){
if(x%%2==0)[return(“Even”)] #remainder 2
else[return(“Odd”)]
}
get.factors <- function(x){
x %% 1:x == 0
}
>get.factors (10)
[1] T T F F T F F F F T
get.factors <- function(x){
(return(1:x)[x %% 1:x == 0])
}
>get.factors (10)
[1] 1 2 5 10
number.info <- function(x){
info <- list(square.root = sqrt(x), square = x^2, odd = is.odd(x),
factors=get.factors(x), prime=length(get.factors(x))==2)
return(info)
}
>number.info (25)
$square.root
[1] 5
$square
[1] 625
$odd
[1] TRUE
$factors
[1] 1 5 25
$prime
[1] FALSE
Data Frame
two-dimensional list with same length of vectors
> df <- data.frame(A=1:3, B=4:6, C=7:9)
A B C
1 1 4 7
2 2 5 8
3 3 6 9
Functions
dim(x)
t(x)
nrow()
Packages
collections of functions and datasets developed by the community
to install a package: install.packages(“<package>”)
to load a package: library(“<package>”)
Exercise:
import data: File > Import Dataset > From Excel
Lecture 3: Data Analysis and Visualization
Christopher Dela Cruz
Quantitative Research
exploration of numeric patterns
relies on data that can be measured
Quantitative Variables
numerical in nature
can be easily quantified
defined units of measurement
types:
o discreet: counted in whole numbers (e.g. class size, number of clients)
o continuous: infinite number of possible values (e.g plant height, yield)
levels of measurement
o nominal – highest analysis can be done: counting only
o ordinal – ranking; intervals may not be equal
o interval – intervals between ranks or values are equal; no true zero values
(e.g. temperature)
o ratio – with equal intervals and true zero values (e.g. height, weight)
Data Analysis Process
1. Ask (research objectives)
2. Prepare (data collection)
3. Process (data cleaning, adjustments)
4. Analyze (statistical tests)
5. Share (visualization, interpretation)
6. Act (storytelling)
Correlation Coefficient
between -1 and 1
strength and direction of a linear relationship
unit-free