Lab 1A
● Categorical nominal: numerical, has two or more categories, no intrinsic order (Ex.
$0-15000)
● Categorical ordinal: has more than 2 groups/categories, has order to it
● Numerical discrete: no middle ground (ex. age)
● Numerical continuous: real numbers
Survey Questions
Sleep and school
1. How much sleep do you get? (Numerical continuous)
a. …hours
2. Does school affect your sleep? (Categorical ordinal)
a. Strongly Agree
b. Strongly Disagree
c. Agree
d. Disagree
e. Neutal
3. How well rested do you feel on average (categorical ordinal)
a. Very well
b. Well
c. Not well
d. poor
4. What is your gender? (categorical nominal)
a. Male
b. Female
c. Other
d. Rather not say
5. How much time do your spend on school/homework per day? (categorical ordinal)
a. 0-1 Hours
b. 2-3
c. 3-4
d. 5-6
e. 6+
6. How many times do you wake up during the night (numerical discrete)
a. …times
7. What is your current GPA? (categorical ordinal)
a. 0-1.5
b. 1.5-2.5
c. 2.5-3.75
d. 3.75-4
e. 4+
8. How many classes are you currently taking? (numerical discrete)
a. …classes
Sleephours <- c("7",
"8",
"8",
"7",
"6",
"8",
"7",
"5",
"6",
"10")
Affects <- c("agree",
"agree",
"disagree",
"agree",
"agree",
"stronglyagree",
"neutral",
"stronglyagree",
"stronglyagree",
"stronglydisagree")
Rest <- c("well",
"well",
"well",
"well",
"poor",
"well",
"poor",
"poor",
"poor",
"verywell")
Gender<- c("female",
"female",
"male",
"male",
"female",
"male",
"male",
"female",
"male",
"female")
Schoolworkhours <- c("2-3",
"2-3",
"2-3",
"2-3",
"2-3",
"2-3",
"6+",
"2-3",
"2-3",
"0-1")
Wakeup <- c("2",
"1",
"0",
"0",
"0",
"1",
"0",
"0",
"1",
"0")
GPA <- c("1.5-2.5",
"2.5-3.75",
"3.75-4",
"3.75-4",
"3.75-4",
"1.5-2.5",
"2.5-3.75",
"1.5-2.5",
"2.5-3.75",
"4+")
Classes <- c("3",
"4",
"4",
"3",
"4",
"3",
"4",
"4",
"4",
"3")
df_lab1 <- data.frame (Sleephours,
Affects,
Rest,
Gender,
Schoolworkhours,
Wakeup,
GPA,
Classes)
ls( )
Regression Coefficient
with(Countries, summary(welfare))
1st
Median
3rd
independent * quartile + intercept = regression coefficient interpretation
ggplot(data=Countries, aes(x=welfare, y=lifexp)) +
geom_point() +
geom_text(aes(label=label), hjust = -.4, size=2) +
stat_smooth(method=lm) +
ggtitle("Relationship between Social Welfare Spending and \n Life Expectancy by Country")
ggplot(Parties, aes(x=pop.mil, y=per_party_members)) +
geom_point() +
geom_text(aes(pop.mil, per_party_members, label=country), hjust=-.4, size=4) +
stat_smooth(method=lm) +
ggtitle("Relationship between Populations and % of ppl in a Political Party ")
ggplot(Parties, aes(x=pop.mil, y=per_party_members)) +
geom_point() +
geom_text(aes(pop.mil, per_party_members, label=country), hjust=-.4, size=4,
data=Parties[Parties$per_party_members >10, ]) +
stat_smooth(method=lm) +
ggtitle("Relationship between Populations and % of ppl in a Political Party ")
ggplot(Parties, aes(x = pop.mil, y = per_party_members)) +
geom_point() +
geom_text(data = Parties[Parties$per_party_members > 10, ],
aes(label = country),
hjust = -0.4,
size = 4) +
stat_smooth(method = lm) +
ggtitle("Relationship between Populations and % of People in a Political Party")
Standard Deviations “by hand”
1. Calculate mean (excluding missing value)
mean(Countries$lifexp, na.rm=TRUE)
2. Deviations = lifexp values - mean
deviations <- Countries$lifexp-mean(Countries$lifexp, na.rm=TRUE)
3. Remove missing values from deviations
deviations <- na.omit(deviations)
4. Deviations squared
squared_deviations <- deviations^2
5. Sum squared deviations
sum_squared_deviations <- sum(squared_deviations)
6. Calculate sample size(n) (excluding missing value)
n <- length(na.omit(Countries$lifexp))
7. Variance = squared deviations/n-1
variance <- sum_squared_deviations / (n - 1)
8. Squared root
sd <- sqrt(variance)
9. The answer/result
Sd
How to Calculate Z-Score
Z = (given/known value - mean) / sd
HW #2
Pearson correlation coefficient = -0.4867
This is a negative association, as one variable increases, the other decreases. A correlation
coefficient ranges from -1 to +1. A value of -0.4867, which is closer to -0.5 than 0, suggests
a moderate strength in the relationship.
Weight on height
mod.1 <- lm(lifexp ~ welfare, data=Countries)
# Print regression output
mod.1
mod.2 <- lm(Protest ~ Democ + Corrupt, data = Protest)
mod.2
stargazer::stargazer(mod.2, type="html")
summary(mod.2)
Confidence Interval
#Sample
sample1 <- sample(Countries$wardeaths, 40)
sample1
#Mean
mean1 <- mean(sample1, na.rm = TRUE)
mean1
#SD
sd1 <- sd(sample1, na.rm = TRUE)
sd1
#Confidence Interval
CI1
CI1 <- (mean1 + 2.58 * (sd1 / 6.3245))
CI2
CI2 <- (mean1 - 2.58 * (sd1 / 6.3245))
#Subset
femalesdata <- subset(Lab11, Sex == "f")
malesdata <- subset(Lab11, Sex == "m")
#R Squared for females
model <- lm(Weight ~ Height, data=femalesdata)
summary(model)$r.squared
#R Squared for males
model2 <- lm(Weight ~ Height, data=malesdata)
summary(model)$r.squared