Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
25 views8 pages

Training On Data Analysis and Visualization

The document outlines a training session on data analysis and visualization scheduled for April 22, 2024, covering topics such as quantitative research designs, R programming, and data analysis processes. It includes detailed lecture outlines, research design considerations, statistical methods, and R programming basics. The training aims to enhance understanding of data analysis techniques and their applications in research.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views8 pages

Training On Data Analysis and Visualization

The document outlines a training session on data analysis and visualization scheduled for April 22, 2024, covering topics such as quantitative research designs, R programming, and data analysis processes. It includes detailed lecture outlines, research design considerations, statistical methods, and R programming basics. The training aims to enhance understanding of data analysis techniques and their applications in research.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

TRAINING ON DATA ANALYSIS AND VISUALIZATION

22 April 2024, 8AM-5PM


IPB Seminar Room

Lecture 1: Quantitative Research Designs and Basic Concepts


Asst. Prof. Roselle V. Collado (INSTAT)

Topic Outline
 Research Design Options
 Quantitative Research Designs
 Basic Concepts
 Descriptive Statistics
 Inferential Statistics

What is Research?
 Traits
o systematic
o objective
o uses information
 Purposes
o improve decision-making
o present solution to problems
o open opportunities for further research

The Research Process


 Problem Identification
 Development of Approach to the Problem
 Research Design Formulation
 Data Collation
 Data Preparation and Analysis
 Report Preparation and Presentation

Research Design Considerations


 Nature of the Problem
 Research Objectives
 Intended Analysis
 Time Allotment
 Resource Availability/Budgetary Constraints
 Needs of the End-User

Research Designs
o Qualitative Research Design
 starts with observations collected
 observations are used to create theories and hypotheses
o Quantitative Research Design
 starts from theory from which hypotheses are formulated
 hypotheses are operationalized and empirical evidence are collected
 involves obtaining and interpreting data in order to develop a breadth of
knowledge
o Mixed Methods Design
 utilizes both quantitative and qualitative approaches in collecting data to
enrich the findings of the research

Quantitative Research Designs


 Experimental Research Design
o Example: Classic monggo germination experiment
 Objective: Determine what is needed for monggo seeds to germinate
 Treatments: No Water, No Air, No Light, Has All
 Experimental Units: Monggo seeds in plastic cups
o Concepts:
 randomization – preserves objectivity
 replication – at least 2 experimental units treated alike
 local control – summary of all other practices; everything else is
homogenous except for the treatment being administered
 Survey Research Design
o Example: Current State of Agrarian Reform Beneficiaries: Its Implication to the
Comprehensive Agrarian Reform Program
 Pseudo-Experimental Research Design
o Example: Large Class vs. Very Large Class Study for STAT1

Mixed Methods Design


 Triangulation or Parallel Design
o researchers implement the qualitative and quantitative data collection
simultaneously and the results are compared
 Embedded Design
o uses one method of collecting data to support the data collected using the
other
 Explanatory Sequential
o first collect and analyze quantitative data, then collect and analyze qualitative
data to explain the quantitative findings
 Exploratory Sequential
o first collect and analyze qualitative data, then collect and analyze quantitative
data to test and analyze the qualitative findings

Measure of Skewness: characterize the symmetry of the distribution


 possible values of SK:
o SK = 0 (symmetric distribution)
o SK < 0 (negatively skewed; skewed to the left)
o SK > 0 (positively skewed; skewed to the right)

Measure of Kurtosis: characterize the flatness or peakedness of the distribution as


well as the lightness or heaviness
 possible values of K:
o K = 0 (mesokurtic)
o K < 0 (platykurtic)
o K > 0 (leptokurtic)

Test of Hypothesis
 Null Hypothesis (Ho)
 Alternative Hypothesis (Ha)

Decision and Errors in Hypothesis Testing


 Errors in Hypothesis Testing:
o Type I Error: rejecting a true null hypothesis
o Type II Error: accepting a false null hypothesis
 Level of Significance (α)
 Test Statistic: a statistic or its transformation which provides a bases for
determining when to reject Ho

 Level of Significance of a test: a measure of risk of rejecting Ho when it is actually


true

 Decision Rule
o a specification of that region for which the test statistic leads to the rejection
of Ho
o the region specified is referred to as the critical region of the test

 p-value of a test
o measures the weight of the evidence for rejecting (or failing to reject) H o
o p-value smaller than the set level of significance support rejection of H o
o smaller p-value is favorable; smaller chance of generating data more extreme
than the current data

Steps in testing a statistical hypothesis


 state the null and alternative hypotheses
 identify the test statistic and its distribution
 specify the level of significance of the test
 state the decision rule
 collect the data and perform the computations

Statistical Methods
 Parametric Tests
o impose strict assumptions on the distribution of the parent population
o applicable to data measured in interval scale
 Non-parametric Tests
o do not impose assumptions on the distribution of the parent population
(distribution-free methods)
o applicable to qualitative data, or data not conforming to normality

Test on Two Populations


Types of Samples:
1. Independent – selection of samples was done separately from the two
populations
2. Related – observations come in pairs that were collected either as matched-pairs
or self-paired observations

Common Measure of Association


1. Pearson
2. Spearman
3. Kendall’s Coefficient
4. Cramer’s V
5. Phi-Coefficient
6. Point-Biserial/Biserial Correlation

Lecture 2: R Programming
Jeremiah

R Programming
 data analysis software, programming language, and a complete interactive
programming language
 open-source software project
 out of the box contained advanced statistical routines not available in other packages

Objects
 Numeric (1,2,3…)
 Logic (TRUE, FALSE)
 Character (“Text”, “123”)

Operators
 Arithmetic Operators: +, -, *, /, ^, %%
 Assignment Operator: <-, =
 Comparison Operator: == (equality), != (not equal), >, >=, <, <=
 Logic Operator: & “and” (returns ‘true’ if both are true), | “or” (returns ‘true’ if either
one is true), ! “not” (returns ‘true’ if false or vice versa)

Functions
 Example: square function
> square <- function(x){
+ return(x^2)
+}
> square(5)
[1] 25

Vectors
 a collection containing multiple values of the same type
 created using c() function
 limitation: same type of values
 example:
> x <- c(1,2,3,4,5)
>x
[1] 1 2 3 4 5

Colon Operator
> 1:5
[1] 1 2 3 4 5

Vector Operations
 applies the operation on every element of the vector
 Example:
> x <- c(1,2,3,4,5)
> x+1
[1] 2 3 4 5 6

> x <- c(1,2,3)


> y <- c(4,5,6)
> c(x,y)
[1] 1 2 3 4 5 6

Vector Indexing

> x <- 10:20
>x
[1] 10 11 12 13 14 … 20
> x[2]
[1] 11
> x[8]
[1] 17
> x[c(1,8,2)]
[1] 10 17 11
> x[2:4]
[1] 11 12 13

 excluding
> x[-10]
[1] 10 11 12 13 14 15 16 17 18 20

 logic values

Conditionals
 creates branches to where code will be routed

> x <- 6
> if (x > 0){
+ print(“Positive Number”)}
[1] “Positive Number”

if (x>0) {
print(“Positive Number”)
}else if (x<0) {
print(“Negative Number”)
}else {
print(“Zero”)
}

While Loop
 iterates the loop while condition is true

i <- 1
while(i=10) {
print(i)
i <- i+1
}
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5

[1] 10

For Loop
 iterates the loop for every element in the given vector
 known number of steps

for(i in 1:10){
print (i)
}
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5

[1] 10

List
 two-dimensional collection containing multiple vectors that can be of different types
 created using list() function and can be accessed through their and the $ operatir

> list (A=1, B=c(2,2,2,6,8), C=3)


>x
$A
[1] 1
$B
[1] 2 2 2 6 8
$C
[1] 3

is.odd <- function(x){


if(x%%2==0)[return(“Even”)] #remainder 2
else[return(“Odd”)]
}

get.factors <- function(x){


x %% 1:x == 0
}
>get.factors (10)
[1] T T F F T F F F F T

get.factors <- function(x){


(return(1:x)[x %% 1:x == 0])
}
>get.factors (10)
[1] 1 2 5 10

number.info <- function(x){


info <- list(square.root = sqrt(x), square = x^2, odd = is.odd(x),
factors=get.factors(x), prime=length(get.factors(x))==2)
return(info)
}
>number.info (25)
$square.root
[1] 5

$square
[1] 625

$odd
[1] TRUE

$factors
[1] 1 5 25

$prime
[1] FALSE

Data Frame
 two-dimensional list with same length of vectors

> df <- data.frame(A=1:3, B=4:6, C=7:9)


A B C
1 1 4 7
2 2 5 8
3 3 6 9

Functions
dim(x)
t(x)
nrow()

Packages
 collections of functions and datasets developed by the community
 to install a package: install.packages(“<package>”)
 to load a package: library(“<package>”)

Exercise:
 import data: File > Import Dataset > From Excel

Lecture 3: Data Analysis and Visualization


Christopher Dela Cruz

Quantitative Research
 exploration of numeric patterns
 relies on data that can be measured

Quantitative Variables
 numerical in nature
 can be easily quantified
 defined units of measurement
 types:
o discreet: counted in whole numbers (e.g. class size, number of clients)
o continuous: infinite number of possible values (e.g plant height, yield)
 levels of measurement
o nominal – highest analysis can be done: counting only
o ordinal – ranking; intervals may not be equal
o interval – intervals between ranks or values are equal; no true zero values
(e.g. temperature)
o ratio – with equal intervals and true zero values (e.g. height, weight)

Data Analysis Process


1. Ask (research objectives)
2. Prepare (data collection)
3. Process (data cleaning, adjustments)
4. Analyze (statistical tests)
5. Share (visualization, interpretation)
6. Act (storytelling)

Correlation Coefficient
 between -1 and 1
 strength and direction of a linear relationship
 unit-free

You might also like