R - Lab Experiments - Manual
R - Lab Experiments - Manual
Week 1:
Installing R and RStudio
Basic functionality of R, variable, data types in R
Installing R on Windows OS
To install R on Windows OS:
Click on "install R for the first time" link to download the R executable (.exe) file.
Run the R executable file to start installation, and allow the app to make changes to your
device.
R has now been successfully installed on your Windows OS. Open the R GUI to start
writing R codes.
Basic functionality of R
Features of R Programming Language
R Packages: One of the major features of R is it has a wide availability of libraries. R has
CRAN(Comprehensive R Archive Network), which is a repository holding more than 10, 0000
packages.
Distributed Computing: Distributed computing is a model in which components of a
software system are shared among multiple computers to improve efficiency and
performance. Two new packages ddR and multidplyr used for distributed programming in R
were released in November 2015.
Statistical Features of R
Basic Statistics: The most common basic statistics terms are the mean, mode, and median.
These are all known as “Measures of Central Tendency.” So using the R language we can
measure central tendency very easily.
Static graphics: R is rich with facilities for creating and developing interesting static graphics.
R contains functionality for many plot types including graphic maps, mosaic plots, biplots,
and the list goes on.
Probability distributions: Probability distributions play a vital role in statistics and by using R
we can easily handle various types of probability distributions such as Binomial Distribution,
Normal Distribution, Chi-squared Distribution, and many more.
Data analysis: It provides a large, coherent, and integrated collection of tools for data
analysis.
Data types in R
# numeric
x <- 10.5
class(x)
# integer
x <- 1000L
class(x)
# complex
x <- 9i + 3
class(x)
# character/string
x <- "R is exciting"
class(x)
# logical
x <- TRUE
class(x)
Output
[1] "numeric"
[1] "integer"
[1] "complex"
[1] "character"
[1] "logical"
Variables in R
Creating Variables in R
============================
Variables are containers for storing data values.
R does not have a command for declaring a variable.
A variable is created the moment you first assign a value to it.
To assign a value to a variable, use the <- sign. To output (or print) the variable value, just
type the variable name:
Output:
[1] "John"
[1] 40
Multiple Variables
=======================
R allows you to assign the same value to multiple variables in one line:
Example
# Assign the same value to multiple variables in one line
var1 <- var2 <- var3 <- "Orange"
Variable Names:
===============
A variable can have a short name (like x and y) or a more descriptive name (age, carname,
total_volume). Rules for R variables are:
A variable name must start with a letter and can be a combination of letters, digits, period(.)
and underscore(_). If it starts with period(.), it cannot be followed by a digit.
A variable name cannot start with a number or underscore (_)
Variable names are case-sensitive (age, Age and AGE are three different variables)
Reserved words cannot be used as variables (TRUE, FALSE, NULL, if...)
# Legal variable names:
myvar <- "John"
my_var <- "John"
myVar <- "John"
MYVAR <- "John"
myvar2 <- "John"
.myvar <- "John"
output:
Week 2:
a) Implement R script to show the usage of various operators available in R
language.
Source Code:
# R Arithmetic Operators Example for integers
a <- 7.5
b <- 2
a <- c(8, 9, 6)
b <- c(2, 4, 5)
a <- 7.5
b <- 2
a=2
print ( a )
a <- TRUE
print ( a )
454 -> a
print ( a )
a <<- 2.9
print ( a )
c(6, 8, 9) -> a
print ( a )
Output
[1] 2
[1] TRUE
[1] 454
[1] 2.9
[1] 6 8 9
Output:
Enter your age :48
[1] "You are valid for voting : 48"
c) Implement R script to find biggest number between two numbers.
Source Code:
{
x <- as.integer(readline(prompt = "Enter first number :"))
y <- as.integer(readline(prompt = "Enter second number :"))
z <- as.integer(readline(prompt = "Enter third number :"))
}
Output:
1
2Enter first number :2
3Enter second number :22
4Enter third number :4
5[1] "Greatest is : 22"
6
# Input year
input_year <- 2024
(0r)
Week 3:
a) Implement R Script to create a list.
Source Code:
We can create a list using the list() function.
x <- list("a" = 2.5, "b" = TRUE, "c" = 1:3)
x
[[1]]
[1] 2.5
[[2]]
[1] TRUE
[[3]]
[1] 1 2 3
b) Implement R Script to access elements in the list.
Source Code:
Lists can be accessed in similar fashion to vectors. Integer, logical or character
vectors can be used for indexing.
x <- list(name = "John", age = 19, speaks = c("English", "French"))
Output
======
[1] "John"
[1] 19
[1] "English" "French"
$name
[1] "John"
$age
[1] 19
$name
[1] "John"
$speaks
[1] "English" "French"
$name
[1] "John"
$age
[1] 19
$speaks
[1] "English" "French"
c) Implement R Script to merge two or more lists. Implement R Script to
perform matrix operation
Source Code:
n1 = list(1,2,3)
c1 = list("Red", "Green", "Black")
print("Original lists:")
print(n1)
print(c1)
print("Merge the said lists:")
mlist = c(n1, c1)
print("New merged list:")
print(mlist)
Output:
==========
[1] "Original lists:"
[[1]]
[1] 1
[[2]]
[1] 2
[[3]]
[1] 3
[[1]]
[1] "Red"
[[2]]
[1] "Green"
[[3]]
[1] "Black"
[[2]]
[1] 2
[[3]]
[1] 3
[[4]]
[1] "Red"
[[5]]
[1] "Green"
[[6]]
[1] "Black"
Week 4:
Implement R script to perform following operations:
a) various operations on vectors.
Source Code:
The c() function is used for creating a vector in R. This function returns a one-
dimensional array, also known as vector.
1) Using the colon(:) operator)
z<-x:y
This operator creates a vector with elements from x to y and assigns it to z.
a<-4:-10
a
Output
=======
[1] 4 3 2 1 0 -1 -2 -3 -4 -5 -6 -7 -8 -9 -10
seq_vec<-seq(1,4,by=0.5)
seq_vec
class(seq_vec)
Output
======
[1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0
seq_vec<-seq(1,4,length.out=6)
seq_vec
class(seq_vec)
Output
Numeric vector
The decimal values are known as numeric data types in R.
d<-45.5
num_vec<-c(10.1, 10.2, 33.2)
d
num_vec
class(d)
class(num_vec)
Output
[1] 45.5
[1] 10.1 10.2 33.2
[1] "numeric"
[1] "numeric"
Integer vector
A non-fraction numeric value is known as integer data.
This integer data is represented by "Int."
The Int size is 2 bytes and long Int size of 4 bytes.
d<-as.integer(5)
e<-5L
int_vec<-c(1,2,3,4,5)
int_vec<-as.integer(int_vec)
int_vec1<-c(1L,2L,3L,4L,5L)
class(d)
class(e)
class(int_vec)
class(int_vec1)
Output
[1] "integer"
[1] "integer"
[1] "integer"
[1] "integer"
Character vector
A character is held as a one-byte integer in memory. In R, there are two different
ways to create a
character data type value, i.e., using as.character() function and by typing string
between
double quotes("") or single quotes('').
d<-'shubham'
e<-"Arpita"
f<-65
f<-as.character(f)
d
e
f
char_vec<-c(1,2,3,4,5)
char_vec<-as.character(char_vec)
char_vec1<-c("shubham","arpita","nishka","vaishali")
char_vec
class(d)
class(e)
class(f)
class(char_vec)
class(char_vec1)
Output
=======
[1] "shubham"
[1] "Arpita"
[1] "65"
[1] "1" "2" "3" "4" "5"
[1] "shubham" "arpita" "nishka" "vaishali"
[1] "character"
[1] "character"
[1] "character"
[1] "character"
[1] "character"
Logical vector
The logical data types have only two values i.e., True or False.
These values are based on which condition is satisfied.
A vector which contains Boolean values is known as the logical vector.
d<-as.integer(5)
e<-as.integer(6)
f<-as.integer(7)
g<-d>e
h<-e<f
g
h
log_vec<-c(d<e, d<f, e<d,e<f,f<d,f<e)
log_vec
class(g)
class(h)
class(log_vec)
Output
=====
[1] FALSE
[1] TRUE
[1] TRUE TRUE FALSE TRUE FALSE FALSE
[1] "logical"
[1] "logical"
[1] "logical"
b) Finding the sum and average of given numbers using arrays.
Source Code:
vec = c(1, 2, 3 , 4)
print("Sum of the vector:")
Output:
========
[1] “Sum of the vector:”
[1] 10
[1] 2.5
[1] 24
[[1]]
[1] "c"
[[2]]
[1] "b"
[[3]]
[1] "a"
Output
=======
[1] 2
[1] "a"
Output
======
[1] 10
[1] "s"
Week 5:
a) Implement R Script to perform various operations on matrices
Source Code:
Sample matrices
A <- matrix(c(10, 8,
5, 12), ncol = 2, byrow = TRUE)
A
B <- matrix(c(5, 3,
15, 6), ncol = 2, byrow = TRUE)
B
Output
#A #B
[, 1] [, 2] [, 1] [, 2]
[1, ] 10 8 [1, ] 5 3
[2, ] 5 12 [2, ] 15 6
A+B
#Addition of A and B
[, 1] [, 2]
[1, ] 15 11
[2, ] 20 18
A-B
#Substraction of A and B
[, 1] [, 2]
[1, ] 5 5
[2, ] -10 6
Transpose a matrix in R
============================
To find the transpose of a matrix in R you just need to use the t function as
follows:
t(A)
#Transpose of A
[, 1] [, 2]
[1, ] 10 5
[2, ] 8 12
t(B)
#Transpose of B
[, 1] [, 2]
[1, ] 5 15
[2, ] 3 6
Matrix multiplication in R
============================
There are different types of matrix multiplications:
by a scalar, element-wise multiplication, matricial multiplication, exterior and
Kronecker product.
Multiplication by a scalar
--------------------------------
In order to multiply or divide a matrix by a scalar you can make use of the * or /
operators, respectively:
2*A
Output
[, 1] [, 2]
[1, ] 20 16
[2, ] 10 24
A/2
Output
[, 1] [, 2]
[1, ] 5.0 4
[2, ] 2.5 6
Element-wise multiplication
----------------------------
The element-wise multiplication of two matrices of the same dimensions can also
be computed with the * operator.
The output will be a matrix of the same dimensions of the original matrices.
A*B
Element-wise multiplication of A and B
[, 1] [, 2]
[1, ] 50 24
[2, ] 75 72
Power of a matrix in R
There is no a built-in function in base R to calculate the power of a matrix, so we
will provide two different alternatives.
On the one hand, you can make use of the %^% operator of the expm package as
follows:
# install.packages("expm")
library(expm)
A %^% 2
Power of A
[, 1] [, 2]
[1, ] 140 176
[2, ] 110 184
On the other hand the matrixcalc package provides the matrix.power function:
# install.packages("matrixcalc")
library(matrixcalc)
matrix.power(A, 2)
Power of A
[, 1] [, 2]
[1, ] 140 176
[2, ] 110 184
Determinant of a matrix in R
The determinant of a matrix
�
A, generally denoted by
∣
�
∣
∣A∣, is a scalar value that encodes some properties of the matrix.
In R you can make use of the det function to calculate it.
det(A) # 80
det(B) # -15
Inverse of a matrix in R
In order to calculate the inverse of a matrix in R you can make use of the solve
function.
M <- solve(A)
M
Inverse of A
[, 1] [, 2]
[1, ] 0.1500 -0.100
[2, ] -0.0625 0.125
Rank of a matrix in R
The rank of a matrix is maximum number of columns (rows) that are linearly
independent.
qr(A)$rank # 2
qr(B)$rank # 2
# Equivalent to:
library(Matrix)
rankMatrix(A)[1] # 2
print(df)
Output:
=========
Name Language Age
1 Amiya R 22
2 Raj Python 25
3 Asish Java 45
# R program to illustrate operation on a data frame
# Creating a dataframe
df = data.frame(
"Name" = c("Amiya", "Raj", "Asish"),
"Language" = c("R", "Python", "Java"),
"Age" = c(22, 25, 45)
)
cat("Before adding row\n")
print(df)
V1 V2 V3
1 100 AB ab
2 200 CD cd
3 300 EF ef
4 400 GH gh
5 500 IJ ij
Week 6:
a) Write an R script to find basic descriptive statistics using summary,
str, quartile function on mtcars& cars datasets.
Source Code:
data() ##List of pre-loaded data
data(mtcars) ##Loading a built-in R data
head(mtcars, 6) ##Print the first 6 rows
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
nrow(mtcars) ##[1] 32
ncol(mtcars) ##[1] 11
head(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
tail(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.7 0 1 5 2
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.9 1 1 5 2
## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.5 0 1 5 4
## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.5 0 1 5 6
## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.6 0 1 5 8
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.6 1 1 4 2
summary(mtcars)
## mpg cyl disp hp
## Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0
## 1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5
## Median :19.20 Median :6.000 Median :196.3 Median :123.0
## Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7
## 3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0
## Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0
## drat wt qsec vs
## Min. :2.760 Min. :1.513 Min. :14.50 Min. :0.0000
## 1st Qu.:3.080 1st Qu.:2.581 1st Qu.:16.89 1st Qu.:0.0000
## Median :3.695 Median :3.325 Median :17.71 Median :0.0000
## Mean :3.597 Mean :3.217 Mean :17.85 Mean :0.4375
## 3rd Qu.:3.920 3rd Qu.:3.610 3rd Qu.:18.90 3rd Qu.:1.0000
## Max. :4.930 Max. :5.424 Max. :22.90 Max. :1.0000
## am gear carb
## Min. :0.0000 Min. :3.000 Min. :1.000
## 1st Qu.:0.0000 1st Qu.:3.000 1st Qu.:2.000
## Median :0.0000 Median :4.000 Median :2.000
## Mean :0.4062 Mean :3.688 Mean :2.812
## 3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:4.000
## Max. :1.0000 Max. :5.000 Max. :8.000
quantile(mtcars$wt)
## 0% 25% 50% 75% 100%
## 1.51300 2.58125 3.32500 3.61000 5.42400
dat <- iris # load the iris dataset and renamed it dat
summary(dat)
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100
## 1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300
## Median :5.800 Median :3.000 Median :4.350 Median :1.300
## Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
## 3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
## Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
head(dat) # first 6 observations
quantile(dat$Sepal.Length, 0.5)
# Calculating summary
summary = summary(myData)
print(summary)
library(datasets)
str(iris)
# Aggregate the dataset by Species and calculate the mean Sepal.Width for each
Species
aggregated_df <- aggregate(Sepal.Width ~ Species, iris, mean)
Week 7:
a) Reading different types of data sets (.txt, .csv) from Web or disk and writing
in file in specific disk location.
Source Code:
File reading in ROne of the important formats to store a file is in a text file.
R provides various methods that one can read data from a text file.
read.delim(): This method is used for reading “tab-separated value” files
(“.txt”).
By default, point (“.”) is used as decimal points.
Syntax: read.delim(file, header = TRUE, sep = “\t”, dec = “.”, …)
Parameters:
file: the path to the file containing the data to be read into R.
header: a logical value. If TRUE, read.delim() assumes that your file has a
header row, so row 1 is the name of each column.
If that’s not the case, you can add the argument header = FALSE.
sep: the field separator character. “\t” is used for a tab-delimited file.
dec: the character used in the file for decimal points.
Output: Name.Age.Qualification.Address
1 Amiya,18,MCA,BBS
2 Niru,23,Msc,BLS
3 Debi,23,BCA,SBP
4 Biku,56,ISC,JJPfile.
choose(): You can also use file.choose() with read.csv() just like before
b) Reading Excel data sheet in R.
Source Code:
install.packages("XML")
Creating XML file:
XML files can be created by saving the data with the respective tags containing information
about the content and saving it with ‘.xml’.
<RECORDS>
<STUDENT>
<ID>1</ID>
<NAME>Alia</NAME>
<MARKS>620</MARKS>
<BRANCH>IT</BRANCH>
</STUDENT>
<STUDENT>
<ID>2</ID>
<NAME>Brijesh</NAME>
<MARKS>440</MARKS>
<BRANCH>Commerce</BRANCH>
</STUDENT>
<STUDENT>
<ID>3</ID>
<NAME>Yash</NAME>
<MARKS>600</MARKS>
<BRANCH>Humanities</BRANCH>
</STUDENT>
<STUDENT>
<ID>4</ID>
<NAME>Mallika</NAME>
<MARKS>660</MARKS>
<BRANCH>IT</BRANCH>
</STUDENT>
<STUDENT>
<ID>5</ID>
<NAME>Zayn</NAME>
<MARKS>560</MARKS>
<BRANCH>IT</BRANCH>
</STUDENT>
</RECORDS>
Reading XML File:
The XML file can be read after installing the package and then parsing it with xmlparse()
function,
which takes as input the XML file name and prints the content of the file in the form of a list.
# loading the library and other important packages
library("XML")
library("methods")
print(data)
Output:
1
Alia
620
IT
2
Brijesh
440
Commerce
3
Yash
600
Humanities
4
Mallika
660
IT
5
Zayn
560
IT
Week 8:
a) Implement R Script to create a Pie chart, Bar Chart, scatter plot and
Histogram (Introduction to ggplot2 graphics)
Source Code:
Load the ggplot2 package
library(ggplot2)
# Pie Chart
pie_data <- iris$Species
pie_chart <- ggplot(data = iris, aes(x = "")) +
geom_bar(aes(fill = Species), width = 1) +
coord_polar("y", start = 0) +
labs(title = "Pie Chart of Iris Species") +
scale_fill_manual(values = c("#F8766D", "#00BA38", "#619CFF"), labels =
levels(iris$Species)) +
theme_void()
# Bar Chart
bar_chart <- ggplot(data = iris, aes(x = Species)) +
geom_bar(fill = "#619CFF") +
labs(title = "Bar Chart of Iris Species", x = "Species", y = "Count") +
theme_minimal()
# Scatter Plot
scatter_plot <- ggplot(data = iris, aes(x = Sepal.Length, y = Sepal.Width, color =
Species)) +
geom_point() +
labs(title = "Scatter Plot of Sepal Length vs Sepal Width", x = "Sepal Length", y
= "Sepal Width") +
theme_minimal()
# Histogram
histogram <- ggplot(data = iris, aes(x = Sepal.Length, fill = Species)) +
geom_histogram(binwidth = 0.2, position = "identity") +
labs(title = "Histogram of Sepal Length", x = "Sepal Length", y = "Count") +
theme_minimal()
1. Histogram
A histogram is a graphical tool that works on a single variable. Numerous variable values are
grouped into bins, and a number of values termed as the frequency are calculated .
b) Implement R Script to perform mean, median, mode, range,
summary, variance, standard deviation operations.
Source Code:
> x <- c(1,2,3,4,5,1,2,3,1,2,4,5,2,3,1,1,2,3,5,6) # our data set
> mean.result = mean(x) # calculate mean
> print (mean.result)
[1] 2.8
> mode.result = mode(x) # calculate mode (with our custom function named ‘mode’)
> print (mode.result)
[1] 1
(or)
# Sample data
data <- c(3, 5, 2, 8, 6, 9, 10, 7, 1, 4)
# Mean
mean_val <- mean(data)
# Median
median_val <- median(data)
# Range
range_val <- range(data)
# Summary
summary_val <- summary(data)
# Variance
variance_val <- var(data)
# Standard Deviation
standard_deviation_val <- sd(data)
Week 9:
a) Implement R Script to perform Normal, Binomial distributions.
Source Code:
# Generate random numbers from a Normal distribution
normal_data <- rnorm(1000, mean = 0, sd = 1)
Correlation
------------
> x<-seq(-10,10, 1)
> y<-x*x
> plot(x,y)
> cor(x,y)
[1] 0
Output:
---------
fatdata<-fat[,c(1,2,5:11)]
> summary(fatdata[,-1]) # do you remember what the negative index (-1) here means?
> lm1 <- lm(pctfat.brozek ~ neck, data = fatdata)
> plot(pctfat.brozek ~ neck, data = fatdata)
> abline(lm1)
> names(lm1)
> summary(lm1)
> lm1
> summary(lm1)
> plot(fitted(lm1), resid(lm1))
> qqnorm(resid(lm1))
##Residuals:
Coefficients:
Estimate Std. Error t value Pr(>|t|) #These are the comprehensive results
H0: There is no linear association between pctfat.brozek and age, fatfreeweight and neck.
Ha: Here is a linear association between pctfat.brozek and age, fatfreeweight and neck.
Call:
lm(formula = pctfat.brozek ~ age + fatfreeweight + neck, data = fatdata)
Residuals:
Coefficients:
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Week 12:
Data sources: SQLite examples for relational databases, Loading SPSS and SAS
files, Reading from Google Spreadsheets, API and web scraping examples
Correlation:
========================
# Create two numeric vectors
x <- c(1, 2, 3, 4, 5)
y <- c(6, 7, 8, 9, 10)
Linear Regression:
========================
# Create two numeric vectors
x <- c(1, 2, 3, 4, 5)
y <- c(6, 7, 8, 9, 10)
Multiple Regression:
========================
# Create three numeric vectors
x1 <- c(1, 2, 3, 4, 5)
x2 <- c(6, 7, 8, 9, 10)
y <- c(11, 12, 13, 14, 15)