0% found this document useful (0 votes)

7 views32 pages

CB161 (R Lab Manual)

The document outlines various statistical measures including central tendency (mean, median, mode, geometric mean, harmonic mean) and dispersion (range, quartile deviation, mean deviation, standard deviation, coefficient of variation) using R programming. It also covers correlation and regression analysis, detailing correlation coefficients, regression lines, rank correlation, multiple correlation coefficients, and multiple linear regression. Each section includes definitions, formulas, and source code examples demonstrating how to calculate these statistics in R.

Uploaded by

raavihema3

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views32 pages

CB161 (R Lab Manual)

Uploaded by

raavihema3

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 32

EXPERIMENT1

AIM: Measures of central tendency

a) Mean b) Median c) Mode d) Geometric Mean e) Harmonic Mean

The measure of central tendency in R Language represents the whole set of data by a
single value. It gives us the location of the central points. There are three main measures
of central tendency.

a) Mean: It is the sum of observations divided by the total number of observations. It is

also defined as average which is the sum divided by count.

Where, n = number of terms

Source code:
marks <- c (97, 78, 57, 64, 87)
result <- mean(marks)
print(result)
Output: [1] 76.6

b) Median: It is the middle value of the data set. It splits the data into two halves. If the
number of elements in the data set is odd then the center element is median and if it is
even then the median would be the average of two central elements.

• Where n = number of terms

• Syntax: median (x, na.rm = False)
• Where, X is a vector and na.rm is used to remove missing value
• The mean is the average of a data set.
• The mode is the most common number in a data set.
• The median is the middle of the set of numbers.
Source code:
marks <- c (97, 78, 57, 64, 87)
result <- median(marks)
print(result)

Output: [1] 78

c) Mode: The mode is the most common number in a data set.

Source code:

marks <- c (97, 78, 57,78, 97, 66, 87, 64, 87, 78)
mode = function ()
{
return(names(sort(-table(marks))) [1])
}
mode ()
Output: [1] “78”
d) Geometric Mean: The geometric mean is the nth root of the product of n numbers.
• prod (): This function calculates the product of all elements in a vector.
• ^ (power): This operator raises a number to a power.
• length(x): This function returns the number of elements in the vector x.
• exp (): This function calculates the exponential of a number.
• mean (): This function calculates the average of a vector.
• log (): This function calculates the natural logarithm of a vector.
exp(mean(log(x)))
Source code:
x <- c (4, 8, 9, 9, 12, 14, 17)
exp(mean(log(x)))

Output: [1] 9.579479

Geometric Mean of Columns in Data Frame
Source code:
df <- data. frame (a=c (1, 3, 4, 6, 8, 8, 9),
b=c (7, 8, 8, 7, 13, 14, 16),
c=c (11, 13, 13, 18, 19, 19, 22),
d=c (4, 8, 9, 9, 12, 14, 17))
exp(mean(log(df$a)))
Output:[1] 4.567508

e) Harmonic Mean: Harmonic mean (HM) is a reciprocal of the arithmetic mean of

the reciprocal of given numbers. There should be no zero number in the dataset,
otherwise harmonic mean will be zero. The harmonic mean is commonly used for
calculating the mean of rates or ratios (e.g. speed of the car) as it gives more accurate
answers than arithmetic means.

Source code:
library("psych")
x <- c (10, 20, 60)
harmonic. mean(x)
output: 18
harmonic mean from data frame columns
Source code:
df <- data. frame (bike=c ("A", "B", "C"), speed=c (10, 20, 60))
df bike speed
1 A 10
2 B 20
3 C 60
library("psych")
harmonic. mean(df$speed)
output: 18
EXPERIMENT 2

AIM: Measures of dispersion

a) Range b) Quartile deviation c) Mean deviation d) Standard deviation
e) Coeff. of Variation.

Dispersion refers to the degree of spread or variability in a dataset. While measures of central
tendency (like mean, median) describe the center of the data, measures of dispersion describe
the extent to which data values deviate from the center.
Dispersion helps in understanding the reliability, consistency, and variability of the dataset. A
smaller dispersion implies that values are closely clustered, while a larger dispersion indicates
greater variability.
Types of Measures of Dispersion:
a) Range:The difference between the maximum and minimum values in a dataset.

Range=Max−Min
b) Quartile Deviation (Semi-Interquartile Range): Measures the spread of the middle
50% of the data.
QD = Q3−Q1 /2

where Q1 = First Quartile, Q3 = Third Quartile

c) Mean Deviation: Average of absolute deviations from the mean.
MD=1/n∑∣xi−xˉ∣

d) Standard Deviation: Square root of the average of the squared deviations from the
mean.

e) Coefficient of Variation: A relative measure of dispersion. Expressed as a

percentage
CV = σ / μ
where:
σ: The standard deviation of dataset
μ: The mean of dataset

Source code:
data <- c(10, 12, 14, 18, 20, 22, 25)
range_val <- max(data) - min(data)
Q1 <- quantile(data, 0.25)
Q3 <- quantile(data, 0.75)
quartile_deviation <- (Q3 - Q1) / 2
mean_val <- mean(data)
mean_deviation <- mean (abs (data - mean_val))
std_deviation <- sd(data)
cv <- (std_deviation / mean_val) * 100
cat ("Data: ", data, "\n")
cat ("Range: ", range_val, "\n")
cat("Quartile Deviation: ", quartile_deviation, "\n")
cat ("Mean Deviation: ", mean_deviation, "\n")
cat ("Standard Deviation: ", std_deviation, "\n")
cat ("Coefficient of Variation (%): ", cv, "\n")
output:
Data: 10 12 14 18 20 22 25
Range: 15
Quartile Deviation: 4
Mean Deviation: 4.163265
Standard Deviation: 5.160563
Coefficient of Variation (%): 30.35627

EXPERIMENT 3
AIM: Correlation & Regression
a) Correlation coefficient b) Regression lines c) Rank Correlation d) Multiple
correlation coefficient e) Multiple linear regression
Correlation and Regression: Correlation measures the strength and direction of the
linear relationship between two variables.
a) Correlation Coefficient: The correlation coefficient (usually Pearson's r) measures
the strength and direction of a linear relationship between two variables.
This is the most basic usage. It calculates the Pearson correlation coefficient for all
pairs of variables in the input data
method parameter: You can use other correlation methods like Spearman or
Kendall. For example, cor (data, method = "spearman") calculates the Spearman
correlation coefficient.
• Correlation coefficients range from -1 to 1
• 1 indicates a perfect positive correlation.
• -1 indicates a perfect negative correlation.
• 0 indicates no correlation.
Source Code:

x<-c(7,9,4,10,6,7,8,8,5,6)
y<-c(6,8,6,10,8,5,10,7,7,8)
n<-length(x)
xy<-x*y
xx<-x*x
yy<-y*y
mydata <- data.frame(x,y,xy,xx,yy)
print(mydata)
sums<-list(sum(x),sum(y),sum(x*y),sum(x*x),sum(y*y))
mydata<-rbind(mydata,sums)
print(mydata,row.names=FALSE)
meanx<-sum(x)/n
print(meanx)
meany<-sum(y)/n
print(meany)
cov<-(sum(x*y)/n)-(meanx*meany)
print(cov)
sdx<-sqrt((sum(x*x)/n)-(meanx^2))
print(sdx)
sdy<-sqrt((sum(y*y)/n)-(meany^2))
print(sdx)
corcof<-cov/sdx/sdy
print(" Correlation coefficient using program")
print(round(corcof,digits=4))
print(" Correlation coefficient using built in")
print(cor(x,y))
plot(x, y)

output:
[1] 7.0
[1] 7.5
[1] 1.5
[1] 1.732
[1] 1.565
[1] " Correlation coefficient using program"
[1] 0.5536
[1] " Correlation coefficient using built in"
[1] 0.5536079

b) Regression lines: Shows the best-fit line through the data points.
Simple linear regression equation: Y=a+bX
where:
aa: Intercept (value of Y when X = 0)
bb: Slope (rate of change of Y w.r.t X)
Source Code:
x<-c(7,9,4,10,6,7,8,8,5,6)
y<-c(6,8,6,10,8,5,10,7,7,8)
n<-length(x)
xy<-x*y
xx<-x*x
yy<-y*y
mydata <- data. frame(x,y,xy,xx,yy)
print(mydata)
sums<-list(sum(x),sum(y),sum(x*y),sum(x*x),sum(y*y))
mydata<-rbind(mydata,sums)
print(mydata,row.names=FALSE)
print("regression line x on y")
result1<-lm(x~y)
print(result1)
print("regression line y on x")
result2<-lm(y~x)
print(result2)
x<-coef(result1)[1] + coef(result1)[2]*23
print(x)
y<-coef(result2)[1] + coef(result2)[2]*45
print(y)

output:
[1] "regression line x on y"
Call:
lm(formula = x ~ y)
Coefficients:
(Intercept) y
1.5333 0.6667
[1] "regression line y on x"
Call:
lm(formula = y ~ x)
Coefficients:
(Intercept) x
2.2333 0.7500
[1] 16.8667 # Predicted x when y = 23
[1] 35.9833 # Predicted y when x = 45
c) Rank Correlation: Measures the degree of association between the ranks of two
variables.
Source Code:
x <- c (10, 20, 30, 40, 50)
y <- c(3, 4, 5, 6, 7)
a <- rank(-x)
cat ("Ranks for X (descending):\n")
print(a)
b <- rank(-y)
cat ("Ranks for Y (descending): \n")
print(b)
d <- a - b
cat ("Difference in ranks:\n")
print(d)
ssqd <- sum(d^2)
cat ("Sum of squared differences:\n")
print(ssqd)
n <- length(x)
rankcor <- 1 - ((6 * ssqd) / (n * (n^2 - 1)))
cat("Manual Spearman Rank Correlation Coefficient:\n")
print(rankcor)
corr <- cor.test(x, y, method = "spearman")
cat ("Spearman Rank Correlation using built-in function:\n")
print(corr$estimate)

output:
Ranks for X (descending):
[1] 5 4 3 2 1
Ranks for Y (descending):
[1] 5 4 3 2 1
Difference in ranks:
[1] 0 0 0 0 0
Sum of squared differences:
[1] 0
Manual Spearman Rank Correlation Coefficient:
[1] 1
Spearman Rank Correlation using built-in function:
spearman
1
d) Multiple correlation coefficient: Measures the strength of the relationship between one
dependent variable and multiple independent variables.
Source Code:
x<-c(7,9,4,10,6,7,8,8,5,6)
y<-c(6,8,6,10,8,5,10,7,7,8)
z<-c(1,2,3,4,5,6,9,7,8,9)
n<-length(x)
corcof<- function(x,y)
{
xy<-x*y
xx<-x*x
yy<-y*y
mydata <- data.frame(x,y,xy,xx,yy)
sums<-list(sum(x),sum(y),sum(x*y),sum(x*x),sum(y*y))
mydata<-rbind(mydata,sums)
cat("\n")
print(mydata,row.names=FALSE)
meanx<-sum(x)/n
meany<-sum(y)/n
cov<-(sum(x*y)/n)-(meanx*meany)
sdx<-sqrt((sum(x*x)/n)-(meanx^2))
sdy<-sqrt((sum(y*y)/n)-(meany^2))
corcof<-cov/sdx/sdy
print(" Correlation coefficient using program")
print(round(corcof,digits=4))
}
r12<-corcof(x,y)
r23<-corcof(y,z)
r13<-corcof(x,z)
cat("\n\n\n")
print("partial correlation coefficient")
pcof<-(r12-(r13*r23))/(sqrt(1-(r13^2))*sqrt(1-(r23^2)))
print(pcof)
print("multiple correlation coefficient")
mcof<-sqrt((r12^2+r13^2-2*r12*r13*r23)/(1-r23^2))
print(mcof)

output:
[1] " Correlation coefficient using program"
[1] 0.5536
[1] " Correlation coefficient using program"
[1] 0.5530
[1] " Correlation coefficient using program"
[1] 0.7716
[1] "partial correlation coefficient"
[1] 0.2394
[1] "multiple correlation coefficient"
[1] 0.7857
e) Multiple linear regression: Predicts a dependent variable using multiple independent
variables. Y=a+b1X1+b2X2+…+bnXn
Source Code:
x1<-c (3,5,6,8,12,14)
x2<-c (16,10,7,4,3,2)
x3<-c (90,72,54,42,30,12)
x1<-c (37,45,38,42,31)
x2<-c (4,0,5,2,4)
x3<-c (71200,66800,75000,70300,65400)
n<-length(x)
x1x2<-x1*x2
x1x3<-x1*x3
x2x3<-x2*x3
x1x1<-x1*x1
x2x2<-x2*x2
mydata <- data.frame(x1,x2,x3,x1x2,x1x3,x2x3,x1x1,x2x2)
print(mydata)
sums<-list(sum(x1),sum(x2),sum(x3),sum(x1*x2),sum(x1*x3),sum(x2*x3),sum(x1*x1),
sum(x2*x2))
mydata<-rbind(mydata,sums)
print(mydata,row.names=FALSE)
result1<-lm(x3~x1+x2)
print(result1)

output:
x1 x2 x3 x1x2 x1x3 x2x3 x1x1 x2x2

3 16 90 48 270 1440 9 256

x1 x2 x3 x1x2 x1x3 x2x3 x1x1 x2x2

5 10 72 50 360 720 25 100

6 7 54 42 324 378 36 49

8 4 42 32 336 168 64 16

12 3 30 36 360 90 144 9

14 2 12 28 168 24 196 4

48 42 300 236 1818 2820 474 434

Call:
lm(formula = x3 ~ x1 + x2)
Coefficients:
(Intercept) x1 x2
61.400 -3.646 2.538
EXPERIMENT 4
AIM: Curve fitting
a) Straight line b) Parabola c) Y=aXb d) Y=abX e) Y=aebX

a) Straight line:
x<-c(1,2,3,4,6,8)
y<-c(2.4,3,3.6,4,5,6)
n<-length(x)
xy<-x*y
xx<-x*x
mydata <- data.frame(x,y,xy,xx)
print(mydata)
sums<-list(sum(x),sum(y),sum(x*y),sum(x*x))
mydata<-rbind(mydata,sums)
print (mydata, row.names=FALSE)
stline<-lm(y~x)
print(stline)
summary(stline)
plot(x,y)
abline(stline,col="BLUE")
output:

b)Parabola:
x<-c(0,1,2,3,4)
y<-c(1,1.8,1.3,2.5,6.3)
n<-length(x)
xy<-x*y
xx<-x*x
xxx<-x^3
xxxx<-x^4
xxy<-x^2*y
mydata <- data.frame(x,y,xy,xx,xxx,xxxx,xxy)
print(mydata)
sums<-list(sum(x),sum(y),sum(x*y),sum(x*x),sum(x^3),sum(x^4),sum(x^2*y))
mydata<-rbind(mydata,sums)
print(mydata,row.names=FALSE)
parabola <- lm(y ~ x+I(x^2))
print(parabola)
f<-coef(parabola)[1]+((coef(parabola)[2])*x)+((coef(parabola)[3])*x*x)
print(f)
plot(x,y)
curve((coef(parabola)[1]+(coef(parabola)[2]*x)+(coef(parabola)
[3]*x*x)),from=x[1],n=x[n],add=T)
curve(predict(parabola,newdata=data.frame(x)),add=T)

output:

c) Y=aXb
x<-c(1,2,3,4,6,8)
y<-c(2.4,3,3.6,4,5,6)
n<-length(x)
logx<-round(log10(x),digits=4)
logy<-round(log10(y),digits=4)
logxlogy<-round(logx*logy,digits=4)
logxlogx<-round(logx*logx,digits=4)
mydata <-data.frame(logx,logy,logxlogy,logxlogx)
colnames(mydata)=c("X=logx","Y=logy","XY","XX")
print(mydata)
sums<-
list(sum(logx),sum(logy),round(sum(logx*logy),digits=4),round(sum(logx*logx),digits=4))
mydata<-rbind(mydata,sums)
print(mydata,row.names=FALSE)
power<-lm(log10(y)~log10(x))
print(power)
alpha<-10^(coef(power)[1])
beta<-coef(power)[2]
print(round(alpha,digits=4))
print(round(beta,digits=4))
f<-alpha*(x^beta)
print(f)
plot(x,y)
curve(alpha*(x^beta),from=x[1],n=x[n],add=T)
output:
(Intercept)
2.2858
log10(x)
0.4372
[1] 2.285842 3.095021 3.695344 4.190646 5.003481
[6] 5.674118

d) Y=abX
x<-c(1,1.5,2,2.5,3,3.5,4)
y<-c(1,1.3,1.6,2,2.7,3.4,4.1)
n<-length(x)
logy<-round(log10(y),digits=4)
xlogy<-round(x*logy,digits=4)
xx<-x*x
mydata <-data.frame(x,logy,xlogy,xx)
colnames(mydata)=c("X=x","Y=logy","XY","XX")
print(mydata)
sums<-list(sum(x),sum(logy),sum(x*logy),sum(x*x))
mydata<-rbind(mydata,sums)
print(mydata,row.names=FALSE)
power<-lm(log10(y)~x)
print(power)
alpha<-10^(coef(power)[1])
beta<-10^(coef(power)[2])
print(alpha)
print(beta)
f<-alpha*(beta^x)
print(f)
plot(x,y)
curve(alpha*(beta^x),from=x[1],n=x[n],add=T)

output:

(Intercept)

0.6245328

1.611352

[1] 1.006342 1.277441 1.621572 2.058408 2.612923

[6] 3.316820 4.210339

e) Y=aebX
x<-c(1,2,3,4,5)
y<-c(1.8,5.1,8.9,14.1,19.8)
n<-length(x)
logy<-round(log10(y),digits=4)
xlogy<-round(x*logy,digits=4)
xx<-x*x
mydata <-data.frame(x,logy,xlogy,xx)
colnames(mydata)=c("X=x","Y=logy","XY","XX")
print(mydata)
sums<-list(sum(x),sum(logy),sum(x*logy),sum(x*x))
mydata<-rbind(mydata,sums)
print(mydata,row.names=FALSE)
power<-lm(log10(y)~x)
print(power)
alpha<-10^(coef(power)[1])
beta<-coef(power)[2]/0.4343
print(alpha)
print(beta)
f<-alpha*exp((beta^x))
print(f)
plot(x,y)
curve((alpha*exp((beta^x))),from=x[1],n=x[n],add=T)

output:
X=x Y=logy XY XX
1 0.2553 0.2553 1
2 0.7076 1.4152 4
3 0.9494 2.8482 9
4 1.1492 4.5968 16
5 1.2967 6.4835 25
15 4.3582 15.5990 55

Call:
lm(formula = log10(y) ~ x)

Coefficients:
(Intercept) x
0.1143 0.2524
(Intercept)
1.301047

0.5812651

[1] 2.326662 1.824012 1.583379 1.458378 1.390307

EXPERIMENT 5

AIM: ANOVA
a) one-way classification b) two-way classification

a) one-way classification:
Source code:

# Data

data <- c(25, 30, 28, 35, 40, 42, 45, 48, 46, 10, 15, 18)

group <- factor(c("A", "A", "A", "B", "B", "B", "C", "C", "C", "D", "D", "D"))

# One-Way ANOVA

oneway_model <- aov(data ~ group)

print(summary(oneway_model))

output:
Df Sum Sq Mean Sq F value Pr(>F)

group 3 2188.2 729.4 52.09 4.71e-06 ***

Residuals 8 111.9 14.0

Signif. codes: 0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05

b) two-way classification
Source code:

# Data

output <- c (10,12,11,13,14,15,13,16,12,15,14,16)

machine <- factor(rep(c("M1","M2","M3"), each = 4))

operator <- factor(rep(c("O1","O2"), times = 6))

# Two-Way ANOVA

twoway_model <- aov(output ~ machine + operator)

print(summary(twoway_model))

output:
Df Sum Sq Mean Sq F value Pr(>F)
machine 2 16.0 8.0 4.000 0.0574 .

operator 1 2.0 2.0 1.000 0.3390

Residuals 8 16.0 2.0

---

Signif. codes: 0 ‘.’ 0.1 ‘ ’ 1

EXPERIMENT 6

AIM: Time series

a) Moving averages b) ARIMA

a) Moving averages
Source code:

# Sample monthly data (e.g., monthly demand or sales)

data <- c(25, 30, 28, 35, 40, 42, 45, 48, 46, 50, 53, 55)

months <- month.abb[1:12]

# Create a time series object

ts_data <- ts(data, start = c(2024, 1), frequency = 12)

# Display original data

print("Original Monthly Data:")

print(ts_data)

# 3-month Moving Average

ma3 <- filter(ts_data, rep(1/3, 3), sides = 2)

# 4-month Moving Average

ma4 <- filter (ts_data, rep(1/4, 4), sides = 2)

# 5-month Moving Average

ma5 <- filter(ts_data, rep(1/5, 5), sides = 2)

# Combine all into a data frame

result <- data.frame(

Month = months,

Sales = ts_data,

MA_3 = round(ma3, 2),

MA_4 = round(ma4, 2),

MA_5 = round(ma5, 2)

print("Moving Averages Table:")

print(result)

# Plot

plot(ts_data, type = "o", col = "black", ylim=c(20,60), main = "Moving Averages",

ylab = "Sales", xlab = "Month")

lines(ma3, type="o", col="blue")

lines(ma4, type="o", col="red")

lines(ma5, type="o", col="green")

legend("topleft", legend=c("Original", "MA(3)", "MA(4)", "MA(5)"),

col=c("black", "blue", "red", "green"), lty=1, pch=1)

output:
[1] "Moving Averages Table:"

Month Sales MA_3 MA_4 MA_5

1 Jan 25 NA NA NA

2 Feb 30 27.67 29.50 NA

3 Mar 28 31.00 33.25 31.6

4 Apr 35 34.33 36.25 35.0

5 May 40 39.00 40.50 38.0

6 Jun 42 42.33 43.75 42.0

7 Jul 45 45.00 45.25 44.2

8 Aug 48 46.33 47.25 46.2

9 Sep 46 48.00 49.25 48.4

10 Oct 50 49.67 51.00 50.4

11 Nov 53 52.67 NA NA

12 Dec 55 NA NA NA
b) ARIMA
Source code:

# Load necessary libraries

install. Packages("forecast") # Only once

library(forecast)

# Create sample time series data

data <- c(112,118,132,129,121,135,148,148,136,119,104,118,

115,126,141,135,125,149,170,170,158,133,114,140)

# Convert to time series object (monthly data starting from Jan 1949)

ts_data <- ts(data, start = c(1949,1), frequency = 12)

# Plot the original time series

plot(ts_data, main = "Monthly Data", ylab = "Value", col = "blue")

# Check if the data is stationary using Augmented Dickey-Fuller Test

# install.packages("tseries") # Uncomment if not installed

library(tseries)

adf.test(ts_data)

# Apply differencing if not stationary (auto.arima handles it automatically)

# Fit ARIMA model

model <- auto.arima(ts_data)

# Display the ARIMA model summary

summary(model)
# Forecast next 12 periods

forecast_values <- forecast(model, h = 12)

# Print forecast results

print(forecast_values)

# Plot the forecast

plot(forecast_values, main = "ARIMA Forecast")

Output:

Series: ts_data

ARIMA(0,1,1)(0,1,1)[12]

Coefficients:

ma1 sma1

-0.3775 -0.5964

s.e. 0.2314 0.1534

sigma^2 estimated as 120.7: log likelihood=-140.42

AIC=286.83 AICc=287.89 BIC=291.79

Forecast Output (first few months)

Point Forecast Lo 80 Hi 80 Lo 95 Hi 95

Jan 1951 147.6 132.2 163.0 124.2 171.0

Feb 1951 137.8 122.3 153.3 114.2 161.4

Mar 1951 153.1 136.2 170.1 127.3 178.9

...
EXPERIMENT 7
AIM: Goodness of fit
a) Binomial b) Poisson

a) Binomial: Binomial distribution is a discrete probability distribution used

when:
A fixed number of independent trials (n) are conducted
Each trial has two possible outcomes: success or failure
The probability of success (p) is constant for each trial

P(X=k): Probability of exactly kkk successes

nnn: Total number of trials
ppp: Probability of success in a single trial
kkk: Number of successes (0 ≤ k ≤ n)

Source code:

# Observed frequencies
obs <- c(7, 6, 19, 35, 30, 23, 7, 1)
x <- 0:7
n <- 7
total <- sum(obs)
mean <- sum(obs * x) / total
p <- mean / n

# Expected frequencies
expected_probs <- dbinom(x, size = n, prob = p)
expected_freq <- round(expected_probs * total)

# Chi-square test
chisq.test(obs, p = expected_probs, rescale.p = TRUE)
Output:

Chi-squared test for given probabilities

data: obs
X-squared = 29.102, df = 7, p-value =
0.0001386
b) Poisson : The Poisson distribution is a discrete probability distribution used to
model the number of events occurring in a fixed interval of time or space,
when the events happen independently and at a constant average rate (λ =
lambda)
f(x) = P(X=x) = (e-λ λx )/x!
X: number of events

λ: average rate (mean)

e≈2.718e
Source code:
# Observed frequencies

x <- 0:8

f <- c(103, 143, 98, 42, 8, 4, 2, 0, 0)

# Total frequency

total <- sum(f)

# Calculate mean (lambda)

lambda <- sum(x * f) / total

# Expected probabilities

pr <- dpois(x, lambda = lambda)

# Expected frequencies

fe <- round(pr * total)

# Show data

mydata <- data.frame(x, f, pr = round(pr, 5), Expected = fe)

print(mydata)

# Chi-square test for goodness of fit

result <- chisq.test(f, p = pr, rescale.p = TRUE)

print(result)

# Critical value at 5% significance level

s <- length(x) - 1

cat("Chi-square table value (df =", s, "):", qchisq(0.95, df = s), "\n")

Output:
x f pr Expected

0 103 0.26647 107

1 143 0.35240 141

2 98 0.23303 93

3 42 0.10273 41

4 8 0.03396 14

5 4 0.00898 4

6 2 0.00198 1

7 0 0.00037 0

8 0 0.00006 0

Chi-squared test for given probabilities

data: f

X-squared = 4.7755, df = 8, p-value = 0.7813

Chi-square table value (df = 8 ): 15.50731

EXPERIMENT 8

AIM: Parametric tests

a) t-test for one-mean b) t-test for two means c) paired t-test d) F-test

a) t-test for one-mean: Tests whether the mean of a single group differs from
a known or hypothesized value.
Null Hypothesis (H₀): μ = μ₀
Alternative Hypothesis (H₁): μ ≠ μ₀ (or > or <)
Source code:

# Sample data

x <- c (25, 27, 29, 28, 26, 30, 31)

# Test if mean is 28

result <- t.test(x, mu = 28)

print(result)
Output:

data: x

t = 0.4472, df = 6, p-value = 0.6715

alternative hypothesis: true mean is not equal to 28

95 percent confidence interval: 26.72526 29.56045

sample estimates: mean of x = 28.14286

b) t-test for two means: Compares the means of two independent groups.
Assumes equal or unequal variance (use var.equal).
H₀: μ₁ = μ₂
H₁: μ₁ ≠ μ₂
Source code:

# Two independent samples

group1 <- c (22, 25, 27, 30, 32)

group2 <- c (20, 23, 26, 28, 29)

# Test for equal means

result <- t.test(group1, group2, var.equal = TRUE)

print(result)

Output:

t = 0.8944, df = 8, p-value = 0.3967

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval: -3.20719 7.20719

mean of x = 27.2, mean of y = 25.2

c) paired t-test: Used when same subjects are measured before and after a
treatment. H₀: mean difference = 0
Source code:

# Before and after values

before <- c (200, 195, 210, 190, 205)

after <- c (198, 193, 208, 192, 204)

# Paired t-test

result <- t.test(before, after, paired = TRUE)

print(result)

Output:

t = 2.8284, df = 4, p-value = 0.0473

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval: 0.2 3.8

mean of the differences: 2

d) F-test: Tests equality of variances of two populations.

H₀: σ₁² = σ₂²
H₁: σ₁² ≠ σ₂
Source code:

# Two samples

x <- c (15, 16, 14, 15, 17, 18)

y <- c (10, 12, 9, 11, 13, 14)

# F-test for equality of variances

result <- var.test(x, y)

print(result)

Output:

F = 2.25, num df = 5, denom df = 5, p-value = 0.3012

alternative hypothesis: true ratio of variances is not equal to 1

percent confidence interval: 0.4350 11.6426

EXPERIMENT 9
AIM: Non-parametric tests
a) Sign test b) Mann-Whitney test c) Run test d) Kolmogorov-Smirnov test

a) Sign test: The Sign Test is used to test the median of a population or for
comparing paired samples. It only considers the signs (+/-) of differences,
ignoring their magnitude.
Hypotheses:

H₀: The median difference is zero.

H₁: The median difference is not zero.

Source code:

# Install BSDA if not installed

# install.packages("BSDA")

library(BSDA)

# Example: Paired data (Before and After)

before <- c(55, 60, 52, 63, 70, 65, 62)

after <- c (58, 62, 54, 60, 68, 68, 63)

# Perform Sign Test

SIGN.test(x = before, y = after, alternative = "two.sided")

Output:

Dependent-samples Sign-Test

data: before and after

S = 1, p-value = 0.07031

alternative hypothesis: true median difference is not equal to 0

b) Mann-Whitney test (Wilcoxon Rank Sum Test): Used to compare two

independent samples. It’s a non-parametric alternative to the t-test.
Hypotheses:
H₀: The two populations are equal.
H₁: The two populations are not equal
Source code:

# Group 1 and Group 2 data

group1 <- c(85, 90, 88, 75, 95)

group2 <- c(80, 70, 78, 85, 68)

# Mann-Whitney U Test

wilcox.test(group1, group2, alternative = "two.sided")

Output:

Wilcoxon rank sum test with continuity correction

data: group1 and group2

W = 21.5, p-value = 0.2893

alternative hypothesis: true location shift is not equal to 0

c)Run test (Wald-Wolfowitz Runs Test): Checks the randomness of a sequence.

It counts the number of runs (uninterrupted sequences of similar items).
Source code:

# Install and load tseries

install.packages("tseries")

library(tseries)

# Sample sequence

sequence <- c(1.2, 1.5, 1.3, 1.7, 2.1, 1.9, 1.8, 2.2, 2.4)

# Run test

runs.test(as.factor(sequence > median(sequence)))

Output:

Runs Test

data: as.factor(sequence > median(sequence))

Standard Normal = 0.169, p-value = 0.866

alternative hypothesis: two.sided

c) Kolmogorov-Smirnov test: Used to test if a sample comes from a specific

distribution (e.g., normal), or to compare two samples. It’s based on the
maximum distance between their empirical distribution functions.
Source code:

x <- rnorm(30, mean=5)

y <- rnorm(30, mean=6)
ks.test(x, y)
Output:

Two-sample Kolmogorov-Smirnov test

data: x and y
D = 0.5, p-value = 0.0029
alternative hypothesis: two-sided

EXPERIMENT 10

AIM: Graphical representation of data

a) Bar plot b) Frequency polygon c) Histogram d) Pie chart e) scatter plot

Graphical representation helps to visualize and interpret data effectively. Different types of
graphs are used depending on the nature of data and what insights are to be derived.
a) Bar plot:

Source Code:
H <- c(5,15,17,18,16,15)
M <- c(1980,1981,1982,1983,1984,1985)
barplot(H,xlab="Year",ylab="Profit",ylim=c(0,20),
col=rainbow(6),names.arg=M,main="RVRJCPHARMACEUTICAL
FIRM",border="red")
Output:
b) Frequency polygon : A frequency polygon is a line graph that represents the
distribution of data. It is drawn by connecting the midpoints of the top of the bars in

histogram.
Source Code:
data <- c(10, 20, 20, 30, 30, 30, 40, 40, 50, 50, 50, 50, 60)
hist_data <- hist(data, plot = FALSE)
plot(hist_data$mids, hist_data$counts, type = "o", col = "red", xlab = "Class
Intervals", ylab = "Frequency", main = "Frequency Polygon")
Output:

c) Histogram: A histogram is used for continuous data and shows the frequency
distribution of a dataset using adjacent rectangles.
Source Code:
v<-c (3,5,6,19,9,18,23,67,11,10,44,45,54,37,26,8,5,1)
hist(v,main="STUDENTS MARKS", xlab ="Weight",xlim=c(0,70), ylab="no.of
branches",ylim=c(0,5),col=rainbow(10))

Output:

d) Pie chart: A pie chart is a circular graph divided into sectors representing
proportions of a whole.
Source Code:
# Pie Chart
slices <- c(10, 20, 30, 40)
labels <- c("Q1", "Q2", "Q3", "Q4")
pie(slices, labels = labels, main = "Pie Chart", col = rainbow(length(slices)))

Output:
e) scatter plot: A scatter plot shows the relationship between two continuous variables.
Points are plotted for each pair (x, y).
Source Code:
# Scatter Plot
x <- c(1, 2, 3, 4, 5, 6)
y <- c(2, 4, 5, 7, 10, 12)
plot(x, y, main = "Scatter Plot", xlab = "X values", ylab = "Y values", col = "blue",
pch = 16)

Output:

Volatility Insights for Chartists
No ratings yet
Volatility Insights for Chartists
7 pages
Muthayammal College of Arts and Science Rasipuram: Assignment No - 3
No ratings yet
Muthayammal College of Arts and Science Rasipuram: Assignment No - 3
8 pages
Module V 1
No ratings yet
Module V 1
7 pages
5-MEASURES of DISPERSION-02-Aug-2019Material I 02-Aug-2019 Exp. No. 1 - Measures of Central Tendency Dispersion Skewness and Kurtosi
No ratings yet
5-MEASURES of DISPERSION-02-Aug-2019Material I 02-Aug-2019 Exp. No. 1 - Measures of Central Tendency Dispersion Skewness and Kurtosi
10 pages
Business Analytics Unit 4
No ratings yet
Business Analytics Unit 4
24 pages
Unit 4-1
No ratings yet
Unit 4-1
21 pages
BDA 09 Shridhti Tiwari
No ratings yet
BDA 09 Shridhti Tiwari
12 pages
Capital Gains
No ratings yet
Capital Gains
8 pages
Report Stats PDF
No ratings yet
Report Stats PDF
23 pages
EXP-1 - Statistics and Plotting
No ratings yet
EXP-1 - Statistics and Plotting
23 pages
MATM Midterm Reviewer
No ratings yet
MATM Midterm Reviewer
10 pages
DSA1101 2019 Week1 Part2
No ratings yet
DSA1101 2019 Week1 Part2
38 pages
Practical No.7
No ratings yet
Practical No.7
3 pages
R Console
No ratings yet
R Console
6 pages
Lecture Notes 2 - Descriptive Statistics-1720598791715
No ratings yet
Lecture Notes 2 - Descriptive Statistics-1720598791715
21 pages
0 - MTH 4272 - Notes and Exercises
No ratings yet
0 - MTH 4272 - Notes and Exercises
27 pages
Maths Lab
No ratings yet
Maths Lab
17 pages
Stats Lab1
No ratings yet
Stats Lab1
11 pages
Stats - The Theory 2
No ratings yet
Stats - The Theory 2
25 pages
Central Tendency in R Programming
100% (1)
Central Tendency in R Programming
6 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
15 pages
Advanced Statistics
No ratings yet
Advanced Statistics
259 pages
CPL Practical 1
No ratings yet
CPL Practical 1
14 pages
Unit II TYCS DS
No ratings yet
Unit II TYCS DS
176 pages
Angilan, Ef
No ratings yet
Angilan, Ef
5 pages
AIML
No ratings yet
AIML
14 pages
03 - Measures - of - Center - Variation
No ratings yet
03 - Measures - of - Center - Variation
45 pages
Chapter 1
No ratings yet
Chapter 1
44 pages
Class 1 - 20th August 2024 - Descriptive Statistic
No ratings yet
Class 1 - 20th August 2024 - Descriptive Statistic
6 pages
Document
No ratings yet
Document
23 pages
Chapter - 3 Common Statistical Procedure
No ratings yet
Chapter - 3 Common Statistical Procedure
20 pages
IDS Notes Unit 2
No ratings yet
IDS Notes Unit 2
20 pages
Data Analysis: Measures of Dispersion
No ratings yet
Data Analysis: Measures of Dispersion
6 pages
Introduction to Statistical Analysis
No ratings yet
Introduction to Statistical Analysis
10 pages
Session 3
No ratings yet
Session 3
61 pages
Nummerical Summaries
No ratings yet
Nummerical Summaries
11 pages
Heumann Et Al - 2016-Introduction To Statistics and Data Analysis
No ratings yet
Heumann Et Al - 2016-Introduction To Statistics and Data Analysis
317 pages
Christian Heumann, Michael Schomaker Shalabh-Introduction To Statistics and Data Analysis With Exercises, Solutions and Applications in R-Springer (2017)
100% (3)
Christian Heumann, Michael Schomaker Shalabh-Introduction To Statistics and Data Analysis With Exercises, Solutions and Applications in R-Springer (2017)
453 pages
R Commands
No ratings yet
R Commands
5 pages
Difference Between (Median, Mean, Mode, Range, Midrange) (Descriptive Statistics)
No ratings yet
Difference Between (Median, Mean, Mode, Range, Midrange) (Descriptive Statistics)
11 pages
ISA Summary Toya
No ratings yet
ISA Summary Toya
38 pages
New Chapter 13 Elementary Statistics
No ratings yet
New Chapter 13 Elementary Statistics
15 pages
Analytics People Programming Parte 1
No ratings yet
Analytics People Programming Parte 1
10 pages
Lec 4
No ratings yet
Lec 4
18 pages
R Programming
No ratings yet
R Programming
8 pages
Business Mathematics & Statistics
No ratings yet
Business Mathematics & Statistics
31 pages
Notes Stats Quiz 2
No ratings yet
Notes Stats Quiz 2
10 pages
Measures of Dispersion Guide
No ratings yet
Measures of Dispersion Guide
31 pages
Unit 3 DS
No ratings yet
Unit 3 DS
30 pages
08 Numerical Summary Measures in R
No ratings yet
08 Numerical Summary Measures in R
34 pages
Data Science with R: Key Concepts
No ratings yet
Data Science with R: Key Concepts
12 pages
Chapter 4 Basic Statistics
No ratings yet
Chapter 4 Basic Statistics
22 pages
RM EBBA Class 8 CH0 11 Quatitative Analysis
No ratings yet
RM EBBA Class 8 CH0 11 Quatitative Analysis
37 pages
Statistics
No ratings yet
Statistics
64 pages
Week 2 Notes
No ratings yet
Week 2 Notes
35 pages
Understanding Linear Models and Regression
No ratings yet
Understanding Linear Models and Regression
23 pages
UNIT I Notes-1
No ratings yet
UNIT I Notes-1
18 pages
Derivatives Volatility Quiz
No ratings yet
Derivatives Volatility Quiz
4 pages
Statistics Test
No ratings yet
Statistics Test
1 page
Cep2 Content Module 13
No ratings yet
Cep2 Content Module 13
23 pages
Fits Us Tables Ansi b4.1-1967 r1987 PDF
100% (1)
Fits Us Tables Ansi b4.1-1967 r1987 PDF
9 pages
Statistik Distribusi Data
No ratings yet
Statistik Distribusi Data
18 pages
Chap15 Statistical Quality Control
No ratings yet
Chap15 Statistical Quality Control
111 pages
F.Y.Bba Assignment No.02 Quantitative Techniques - I
No ratings yet
F.Y.Bba Assignment No.02 Quantitative Techniques - I
2 pages
Statistics 1 AQA Revision Notes
No ratings yet
Statistics 1 AQA Revision Notes
7 pages
Instant Download of Statistical and Econometric Methods For Transportation Data Analysis 1st Edition Simon P. Washington Ebook PDF, Every Chapter
100% (4)
Instant Download of Statistical and Econometric Methods For Transportation Data Analysis 1st Edition Simon P. Washington Ebook PDF, Every Chapter
86 pages
3COMDE0503
No ratings yet
3COMDE0503
2 pages
Statistical Flaws in Excel - Hans Pottel
100% (1)
Statistical Flaws in Excel - Hans Pottel
20 pages
Product A Sales Forecast 2018-2021
No ratings yet
Product A Sales Forecast 2018-2021
7 pages
Quantitative Analysis: Dr. Basheer Ahmad Samim
No ratings yet
Quantitative Analysis: Dr. Basheer Ahmad Samim
71 pages
Measures of Dispersion or Variation: Vijay - Gahlawat@yahoo - Co.in
No ratings yet
Measures of Dispersion or Variation: Vijay - Gahlawat@yahoo - Co.in
31 pages
Corporate Finance The Core 3rd Edition Berk Test Bank Download
100% (1)
Corporate Finance The Core 3rd Edition Berk Test Bank Download
84 pages
Wa0045.
No ratings yet
Wa0045.
3 pages
07 Box Plots, Variance and Standard Deviation
No ratings yet
07 Box Plots, Variance and Standard Deviation
5 pages
Isatin Yield Experiment Analysis
No ratings yet
Isatin Yield Experiment Analysis
6 pages
BAMS1743 Formula List
No ratings yet
BAMS1743 Formula List
5 pages
02 - ASDM Workbook Part 1
No ratings yet
02 - ASDM Workbook Part 1
71 pages
51.3 Stratified Random Sampling
No ratings yet
51.3 Stratified Random Sampling
15 pages
The Variance: X N X X About The Mean
No ratings yet
The Variance: X N X X About The Mean
2 pages
Stat Manual
No ratings yet
Stat Manual
18 pages
GM21CM043 - ASDM Exam - Solutions
No ratings yet
GM21CM043 - ASDM Exam - Solutions
15 pages
Tata Motors 2010-11 Sales Analysis
No ratings yet
Tata Motors 2010-11 Sales Analysis
9 pages
Gpa Salary
No ratings yet
Gpa Salary
14 pages
Add Math Statistics
No ratings yet
Add Math Statistics
14 pages
Business Statistics
No ratings yet
Business Statistics
20 pages
LPG Market Analysis & Statistics
No ratings yet
LPG Market Analysis & Statistics
11 pages