Built-in Statistical Functions and Regression Analysis in R
Built-in Statistical Functions in R
1. mean() function:
The mean (arithmetic average) is calculated by summing all the numbers in a set and dividing by the
number of values.
Example:
# R Code Example
data <- c(4, 7, 10, 12, 15)
mean_value <- mean(data)
print(mean_value)
Output:
[1] 9.6
Manual Calculation:
Mean = (4 + 7 + 10 + 12 + 15) / 5 = 48 / 5 = 9.6
In this example, we sum the values (4 + 7 + 10 + 12 + 15 = 48), and since there are 5 numbers,
we divide by 5, giving us a mean of 9.6.
2. median() function:
The median is the middle value in a sorted list of numbers. If the list has an odd number of values,
the median is the middle number. If the list has an even number of values, the median is the
average of
the two middle numbers.
Example:
# R Code Example
data <- c(4, 7, 10, 12, 15)
median_value <- median(data)
print(median_value)
Output:
[1] 10
Manual Calculation:
In the sorted list 4, 7, 10, 12, 15, the middle value is 10. Since there are 5 values (an odd number),
the median is the middle one directly.
3. sd() function (Standard Deviation):
The standard deviation measures the amount of variation or dispersion in a set of values.
Example:
# R Code Example
data <- c(4, 7, 10, 12, 15)
sd_value <- sd(data)
print(sd_value)
Output:
[1] 4.62
Manual Calculation:
1. Find the mean: Mean = 9.6
2. Subtract the mean from each number and square the result:
(4 - 9.6)^2 = 31.36, (7 - 9.6)^2 = 6.76, (10 - 9.6)^2 = 0.16, (12 - 9.6)^2 = 5.76, (15 - 9.6)^2 = 28.16
3. Find the mean of these squared differences (variance): Variance = 18.51
4. The standard deviation is the square root of the variance: Standard Deviation = sqrt(18.51) = 4.30
4. var() function (Variance):
The variance is the square of the standard deviation, representing the spread of data points.
Example:
# R Code Example
data <- c(4, 7, 10, 12, 15)
var_value <- var(data)
print(var_value)
Output:
[1] 21.33333
Manual Calculation:
Variance is the average of the squared differences from the mean: Variance = 21.33333
5. Regression Analysis: Simple Linear Regression
Linear regression models the relationship between two variables by fitting a line to the observed
data.
In R, the function lm() is used for linear modeling.
Example:
# R Code Example
hours <- c(2, 3, 4, 5, 6)
scores <- c(50, 60, 70, 80, 90)
# Fit a simple linear regression model
model <- lm(scores ~ hours)
print(model)
summary(model)
Output:
Call:
lm(formula = scores ~ hours)
Coefficients:
(Intercept) 30.000
hours 10.000
Manual Calculation:
The equation of the regression line is score = 30 + 10 * hours.
- Intercept (30): When hours = 0, the score is 30.
- Slope (10): For every additional hour studied, the score increases by 10 points.
The model predicts that if a student studies for 5 hours, their score will be:
Predicted Score = 30 + 10 * 5 = 80