Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
11 views6 pages

R Studio Notes

The document outlines various types of data classifications such as categorical nominal, categorical ordinal, numerical discrete, and numerical continuous. It includes survey questions related to sleep and school, along with sample data for analysis. Additionally, it discusses statistical concepts like regression coefficients, standard deviations, Z-scores, and Pearson correlation coefficients, providing examples and calculations.

Uploaded by

ericaschwork
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views6 pages

R Studio Notes

The document outlines various types of data classifications such as categorical nominal, categorical ordinal, numerical discrete, and numerical continuous. It includes survey questions related to sleep and school, along with sample data for analysis. Additionally, it discusses statistical concepts like regression coefficients, standard deviations, Z-scores, and Pearson correlation coefficients, providing examples and calculations.

Uploaded by

ericaschwork
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Lab 1A

●​ Categorical nominal: numerical, has two or more categories, no intrinsic order (Ex.
$0-15000)
●​ Categorical ordinal: has more than 2 groups/categories, has order to it
●​ Numerical discrete: no middle ground (ex. age)
●​ Numerical continuous: real numbers

Survey Questions

Sleep and school

1.​ How much sleep do you get? (Numerical continuous)


a.​ …hours
2.​ Does school affect your sleep? (Categorical ordinal)
a.​ Strongly Agree
b.​ Strongly Disagree
c.​ Agree
d.​ Disagree
e.​ Neutal
3.​ How well rested do you feel on average (categorical ordinal)
a.​ Very well
b.​ Well
c.​ Not well
d.​ poor
4.​ What is your gender? (categorical nominal)
a.​ Male
b.​ Female
c.​ Other
d.​ Rather not say
5.​ How much time do your spend on school/homework per day? (categorical ordinal)
a.​ 0-1 Hours
b.​ 2-3
c.​ 3-4
d.​ 5-6
e.​ 6+
6.​ How many times do you wake up during the night (numerical discrete)
a.​ …times
7.​ What is your current GPA? (categorical ordinal)
a.​ 0-1.5
b.​ 1.5-2.5
c.​ 2.5-3.75
d.​ 3.75-4
e.​ 4+
8.​ How many classes are you currently taking? (numerical discrete)
a.​ …classes

Sleephours <- c("7",


"8",
"8",
"7",
"6",
"8",
"7",
"5",
"6",
"10")

Affects <- c("agree",


"agree",
"disagree",
"agree",
"agree",
"stronglyagree",
"neutral",
"stronglyagree",
"stronglyagree",
"stronglydisagree")

Rest <- c("well",


"well",
"well",
"well",
"poor",
"well",
"poor",
"poor",
"poor",
"verywell")

Gender<- c("female",
"female",
"male",
"male",
"female",
"male",
"male",
"female",
"male",
"female")

Schoolworkhours <- c("2-3",


"2-3",
"2-3",
"2-3",
"2-3",
"2-3",
"6+",
"2-3",
"2-3",
"0-1")

Wakeup <- c("2",


"1",
"0",
"0",
"0",
"1",
"0",
"0",
"1",
"0")

GPA <- c("1.5-2.5",


"2.5-3.75",
"3.75-4",
"3.75-4",
"3.75-4",
"1.5-2.5",
"2.5-3.75",
"1.5-2.5",
"2.5-3.75",
"4+")

Classes <- c("3",


"4",
"4",
"3",
"4",
"3",
"4",
"4",
"4",
"3")

df_lab1 <- data.frame (Sleephours,


Affects,
Rest,
Gender,
Schoolworkhours,
Wakeup,
GPA,
Classes)
ls( )

Regression Coefficient
with(Countries, summary(welfare))
1st
Median
3rd

independent * quartile + intercept = regression coefficient interpretation

ggplot(data=Countries, aes(x=welfare, y=lifexp)) +


geom_point() +
geom_text(aes(label=label), hjust = -.4, size=2) +
stat_smooth(method=lm) +
ggtitle("Relationship between Social Welfare Spending and \n Life Expectancy by Country")

ggplot(Parties, aes(x=pop.mil, y=per_party_members)) +


geom_point() +
geom_text(aes(pop.mil, per_party_members, label=country), hjust=-.4, size=4) +
stat_smooth(method=lm) +
ggtitle("Relationship between Populations and % of ppl in a Political Party ")

ggplot(Parties, aes(x=pop.mil, y=per_party_members)) +


geom_point() +
geom_text(aes(pop.mil, per_party_members, label=country), hjust=-.4, size=4,
data=Parties[Parties$per_party_members >10, ]) +
stat_smooth(method=lm) +
ggtitle("Relationship between Populations and % of ppl in a Political Party ")

ggplot(Parties, aes(x = pop.mil, y = per_party_members)) +


geom_point() +
geom_text(data = Parties[Parties$per_party_members > 10, ],
aes(label = country),
hjust = -0.4,
size = 4) +
stat_smooth(method = lm) +
ggtitle("Relationship between Populations and % of People in a Political Party")

Standard Deviations “by hand”

1.​ Calculate mean (excluding missing value)


mean(Countries$lifexp, na.rm=TRUE)

2.​ Deviations = lifexp values - mean


deviations <- Countries$lifexp-mean(Countries$lifexp, na.rm=TRUE)
3.​ Remove missing values from deviations
deviations <- na.omit(deviations)

4.​ Deviations squared


squared_deviations <- deviations^2

5.​ Sum squared deviations


sum_squared_deviations <- sum(squared_deviations)

6.​ Calculate sample size(n) (excluding missing value)


n <- length(na.omit(Countries$lifexp))

7.​ Variance = squared deviations/n-1


variance <- sum_squared_deviations / (n - 1)

8.​ Squared root


sd <- sqrt(variance)

9.​ The answer/result


Sd

How to Calculate Z-Score

Z = (given/known value - mean) / sd

HW #2

Pearson correlation coefficient = -0.4867

This is a negative association, as one variable increases, the other decreases. A correlation
coefficient ranges from -1 to +1. A value of -0.4867, which is closer to -0.5 than 0, suggests
a moderate strength in the relationship.

Weight on height

mod.1 <- lm(lifexp ~ welfare, data=Countries)

# Print regression output


mod.1
mod.2 <- lm(Protest ~ Democ + Corrupt, data = Protest)
mod.2

stargazer::stargazer(mod.2, type="html")

summary(mod.2)

Confidence Interval
#Sample
sample1 <- sample(Countries$wardeaths, 40)
sample1

#Mean
mean1 <- mean(sample1, na.rm = TRUE)
mean1

#SD
sd1 <- sd(sample1, na.rm = TRUE)
sd1

#Confidence Interval
CI1
CI1 <- (mean1 + 2.58 * (sd1 / 6.3245))

CI2
CI2 <- (mean1 - 2.58 * (sd1 / 6.3245))

#Subset
femalesdata <- subset(Lab11, Sex == "f")
malesdata <- subset(Lab11, Sex == "m")

#R Squared for females


model <- lm(Weight ~ Height, data=femalesdata)
summary(model)$r.squared

#R Squared for males


model2 <- lm(Weight ~ Height, data=malesdata)
summary(model)$r.squared

You might also like