Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
86 views39 pages

R - Lab Experiments - Manual

The document discusses installing R and RStudio on Windows OS and the basic functionality of R including data types, variables, operators and some example codes to demonstrate conditional statements and functions. It provides step-by-step instructions to install R and RStudio and explains concepts like data types, variables, operators used in R language.

Uploaded by

bmn889735
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
86 views39 pages

R - Lab Experiments - Manual

The document discusses installing R and RStudio on Windows OS and the basic functionality of R including data types, variables, operators and some example codes to demonstrate conditional statements and functions. It provides step-by-step instructions to install R and RStudio and explains concepts like data types, variables, operators used in R language.

Uploaded by

bmn889735
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 39

Lab Experiments:

Week 1:
Installing R and RStudio
Basic functionality of R, variable, data types in R

Installing R on Windows OS
To install R on Windows OS:

Go to the CRAN website.

Click on "Download R for Windows".

Click on "install R for the first time" link to download the R executable (.exe) file.

Run the R executable file to start installation, and allow the app to make changes to your
device.

Select the installation language.

Follow the installation instructions.


Click on "Finish" to exit the installation

R has now been successfully installed on your Windows OS. Open the R GUI to start
writing R codes.

Installing RStudio Desktop

To install RStudio Desktop on your computer, do the following:

1. Go to the RStudio website.


2. Click on "DOWNLOAD" in the top-right corner.
3. Click on "DOWNLOAD" under the "RStudio Open Source License".
4. Download RStudio Desktop recommended for your computer.
5. Run the RStudio Executable file (.exe) for Windows OS or the Apple Image Disk file
(.dmg) for macOS X.

6. Follow the installation instructions to complete RStudio Desktop installation.


RStudio is now successfully installed on your computer. The RStudio Desktop IDE interface is
shown in the figure below:

Basic functionality of R
Features of R Programming Language
R Packages: One of the major features of R is it has a wide availability of libraries. R has
CRAN(Comprehensive R Archive Network), which is a repository holding more than 10, 0000
packages.
Distributed Computing: Distributed computing is a model in which components of a
software system are shared among multiple computers to improve efficiency and
performance. Two new packages ddR and multidplyr used for distributed programming in R
were released in November 2015.
Statistical Features of R
Basic Statistics: The most common basic statistics terms are the mean, mode, and median.
These are all known as “Measures of Central Tendency.” So using the R language we can
measure central tendency very easily.
Static graphics: R is rich with facilities for creating and developing interesting static graphics.
R contains functionality for many plot types including graphic maps, mosaic plots, biplots,
and the list goes on.
Probability distributions: Probability distributions play a vital role in statistics and by using R
we can easily handle various types of probability distributions such as Binomial Distribution,
Normal Distribution, Chi-squared Distribution, and many more.
Data analysis: It provides a large, coherent, and integrated collection of tools for data
analysis.

numeric - (10.5, 55, 787)


integer - (1L, 55L, 100L, where the letter "L" declares this as an integer)
complex - (9 + 3i, where "i" is the imaginary part)
character (a.k.a. string) - ("k", "R is exciting", "FALSE", "11.5")
logical (a.k.a. boolean) - (TRUE or FALSE)
We can use the class() function to check the data type of a variable:

Data types in R
# numeric
x <- 10.5
class(x)

# integer
x <- 1000L
class(x)

# complex
x <- 9i + 3
class(x)

# character/string
x <- "R is exciting"
class(x)

# logical
x <- TRUE
class(x)
Output
[1] "numeric"
[1] "integer"
[1] "complex"
[1] "character"
[1] "logical"

Variables in R

Basic Data Types


================
Basic data types in R can be divided into the following types:

Creating Variables in R
============================
Variables are containers for storing data values.
R does not have a command for declaring a variable.
A variable is created the moment you first assign a value to it.
To assign a value to a variable, use the <- sign. To output (or print) the variable value, just
type the variable name:

name <- "John"


age <- 40

name # output "John"


age # output 40

Output:
[1] "John"
[1] 40

Multiple Variables
=======================
R allows you to assign the same value to multiple variables in one line:

Example
# Assign the same value to multiple variables in one line
var1 <- var2 <- var3 <- "Orange"

# Print variable values


var1
var2
var3
Output:
[1] "Orange"
[1] "Orange"
[1] "Orange"

Variable Names:
===============
A variable can have a short name (like x and y) or a more descriptive name (age, carname,
total_volume). Rules for R variables are:
A variable name must start with a letter and can be a combination of letters, digits, period(.)
and underscore(_). If it starts with period(.), it cannot be followed by a digit.
A variable name cannot start with a number or underscore (_)
Variable names are case-sensitive (age, Age and AGE are three different variables)
Reserved words cannot be used as variables (TRUE, FALSE, NULL, if...)
# Legal variable names:
myvar <- "John"
my_var <- "John"
myVar <- "John"
MYVAR <- "John"
myvar2 <- "John"
.myvar <- "John"

# Illegal variable names:


2myvar <- "John"
my-var <- "John"
my var <- "John"
_my_var <- "John"
my_v@ar <- "John"
TRUE <- "John"

output:

Week 2:
a) Implement R script to show the usage of various operators available in R
language.
Source Code:
# R Arithmetic Operators Example for integers

a <- 7.5
b <- 2

print ( a+b ) #addition


print ( a-b ) #subtraction
print ( a*b ) #multiplication
print ( a/b ) #Division
print ( a%%b ) #Reminder
print ( a%/%b ) #Quotient
print ( a^b ) #Power of
Output
[1] 9.5
[1] 5.5
[1] 15
[1] 3.75
[1] 1.5
[1] 3
[1] 56.25

# R Operators - R Arithmetic Operators Example for vectors

a <- c(8, 9, 6)
b <- c(2, 4, 5)

print ( a+b ) #addition


print ( a-b ) #subtraction
print ( a*b ) #multiplication
print ( a/b ) #Division
print ( a%%b ) #Reminder
print ( a%/%b ) #Quotient
print ( a^b ) #Power of
Output
[1] 10 13 11
[1] 6 5 1
[1] 16 36 30
[1] 4.00 2.25 1.20
[1] 0 1 1
[1] 4 2 1
[1] 64 6561 7776
# R Operators - R Relational Operators Example for Numbers

a <- 7.5
b <- 2

print ( ab ) # greater than


print ( a==b ) # equal to
print ( a<=b ) # less than or equal to
print ( a>=b ) # greater than or equal to
print ( a!=b ) # not equal to
Output
$ Rscript r_op_relational.R
[1] FALSE
[1] TRUE
[1] FALSE
[1] FALSE
[1] TRUE
[1] TRUE
# R Operators - R Relational Operators Example for Numbers
a <- c(7.5, 3, 5)
b <- c(2, 7, 0)

print ( a<b ) # less than


print ( a>b ) # greater than
print ( a==b ) # equal to
print ( a<=b ) # less than or equal to
print ( a>=b ) # greater than or equal to
print ( a!=b ) # not equal to
Output
[1] FALSE TRUE FALSE
[1] TRUE FALSE TRUE
[1] FALSE FALSE FALSE
[1] FALSE TRUE FALSE
[1] TRUE FALSE TRUE
[1] TRUE TRUE TRUE

# R Operators - R Logical Operators Example for basic logical elements

a <- 0 # logical FALSE


b <- 2 # logical TRUE

print ( a & b ) # logical AND element wise


print ( a | b ) # logical OR element wise
print ( !a ) # logical NOT element wise
print ( a && b ) # logical AND consolidated for all elements
print ( a || b ) # logical OR consolidated for all elements
Output
[1] FALSE
[1] TRUE
[1] TRUE
[1] FALSE
[1] TRUE

# R Operators - R Logical Operators Example for boolean vectors

a <- c(TRUE, TRUE, FALSE, FALSE)


b <- c(TRUE, FALSE, TRUE, FALSE)

print ( a & b ) # logical AND element wise


print ( a | b ) # logical OR element wise
print ( !a ) # logical NOT element wise
print ( a && b ) # logical AND consolidated for all elements
print ( a || b ) # logical OR consolidated for all elements
Output
[1] TRUE FALSE FALSE FALSE
[1] TRUE TRUE TRUE FALSE
[1] FALSE FALSE TRUE TRUE
[1] TRUE
[1] TRUE
# R Operators - R Assignment Operators

a=2
print ( a )

a <- TRUE
print ( a )

454 -> a
print ( a )

a <<- 2.9
print ( a )

c(6, 8, 9) -> a
print ( a )
Output
[1] 2
[1] TRUE
[1] 454
[1] 2.9
[1] 6 8 9

b) Implement R script to read person‘s age from keyboard


and display whether he is eligiblefor voting ornot.
Source Code:
{
age <- as. integer (readline(prompt = "Enter your age:"))

if (age >= 18) {


print(paste("You are valid for voting :", age))
} else{
print(paste("You are not valid for voting :", age))
}

Output:
Enter your age :48
[1] "You are valid for voting : 48"
c) Implement R script to find biggest number between two numbers.
Source Code:

a=readline(prompt="enter first number:")


b=readline(prompt="enter second number:")
a<-as.integer(a)
b<-as.integer(b)
if(a>b)
{
print(paste(a,"is biggest"))
}else
{
print(paste(b,"is biggest"))
}

{
x <- as.integer(readline(prompt = "Enter first number :"))
y <- as.integer(readline(prompt = "Enter second number :"))
z <- as.integer(readline(prompt = "Enter third number :"))

if (x > y && x > z) {


print(paste("Greatest is :", x))
} else if (y > z) {
print(paste("Greatest is :", y))
} else{
print(paste("Greatest is :", z))
}

}
Output:
1
2Enter first number :2
3Enter second number :22
4Enter third number :4
5[1] "Greatest is : 22"
6

d) Implement R script to check the given year is leap year ornot.


Source Code:
# Function to check for a leap year
is_leap_year <- function(year) {
if ((year %% 4 == 0 && year %% 100 != 0) || year %% 400 == 0) {
return(TRUE)
} else {
return(FALSE)
}
}

# Input year
input_year <- 2024

# Check if it's a leap year


if (is_leap_year(input_year)) {
print(paste(input_year, "is a leap year."))
} else {
print(paste(input_year, "is not a leap year."))
}
Output:

[1] "2024 is a leap year."

(0r)

# Program to check if the input year is a leap year or not

year = as.integer(readline(prompt="Enter a year: "))


if((year %% 4) == 0) {
if((year %% 100) == 0) {
if((year %% 400) == 0) {
print(paste(year,"is a leap year"))
} else {
print(paste(year,"is not a leap year"))
}
} else {
print(paste(year,"is a leap year"))
}
} else {
print(paste(year,"is not a leap year"))
}

Enter a year: 2000


[1] "2000 is a leap year"

Week 3:
a) Implement R Script to create a list.
Source Code:
We can create a list using the list() function.
x <- list("a" = 2.5, "b" = TRUE, "c" = 1:3)
x
[[1]]
[1] 2.5
[[2]]
[1] TRUE
[[3]]
[1] 1 2 3
b) Implement R Script to access elements in the list.
Source Code:
Lists can be accessed in similar fashion to vectors. Integer, logical or character
vectors can be used for indexing.
x <- list(name = "John", age = 19, speaks = c("English", "French"))

# access elements by name


x$name
x$age
x$speaks

# access elements by integer index


x[c(1, 2)]
x[-2]

# access elements by logical index


x[c(TRUE, FALSE, FALSE)]

# access elements by character index


x[c("age", "speaks")]
x <- list(name = "John", age = 19, speaks = c("English", "French"))

# access elements by name


x$name
x$age
x$speaks

# access elements by integer index


x[c(1, 2)]
x[-2]

# access elements by logical index


x[c(TRUE, FALSE, FALSE)]

# access elements by character index


x[c("age", "speaks")]

Output
======
[1] "John"
[1] 19
[1] "English" "French"
$name
[1] "John"

$age
[1] 19
$name
[1] "John"

$speaks
[1] "English" "French"

$name
[1] "John"

$age
[1] 19

$speaks
[1] "English" "French"
c) Implement R Script to merge two or more lists. Implement R Script to
perform matrix operation
Source Code:
n1 = list(1,2,3)
c1 = list("Red", "Green", "Black")
print("Original lists:")
print(n1)
print(c1)
print("Merge the said lists:")
mlist = c(n1, c1)
print("New merged list:")
print(mlist)

Output:
==========
[1] "Original lists:"
[[1]]
[1] 1

[[2]]
[1] 2

[[3]]
[1] 3

[[1]]
[1] "Red"

[[2]]
[1] "Green"

[[3]]
[1] "Black"

[1] "Merge the said lists:"


[1] "New merged list:"
[[1]]
[1] 1

[[2]]
[1] 2

[[3]]
[1] 3

[[4]]
[1] "Red"

[[5]]
[1] "Green"

[[6]]
[1] "Black"
Week 4:
Implement R script to perform following operations:
a) various operations on vectors.
Source Code:
The c() function is used for creating a vector in R. This function returns a one-
dimensional array, also known as vector.
1) Using the colon(:) operator)
z<-x:y
This operator creates a vector with elements from x to y and assigns it to z.
a<-4:-10
a
Output
=======
[1] 4 3 2 1 0 -1 -2 -3 -4 -5 -6 -7 -8 -9 -10

2) Using the seq() function


we can create a vector with the help of the seq() function.
The seq() function is used in two ways, i.e., by setting step size
with ?by' parameter or specifying the length of the vector with the 'length.out'
feature.

seq_vec<-seq(1,4,by=0.5)
seq_vec
class(seq_vec)

Output
======
[1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0

seq_vec<-seq(1,4,length.out=6)
seq_vec
class(seq_vec)
Output

[1] 1.0 1.6 2.2 2.8 3.4 4.0


[1] "numeric"

Numeric vector
The decimal values are known as numeric data types in R.
d<-45.5
num_vec<-c(10.1, 10.2, 33.2)
d
num_vec
class(d)
class(num_vec)
Output

[1] 45.5
[1] 10.1 10.2 33.2
[1] "numeric"
[1] "numeric"

Integer vector
A non-fraction numeric value is known as integer data.
This integer data is represented by "Int."
The Int size is 2 bytes and long Int size of 4 bytes.
d<-as.integer(5)
e<-5L
int_vec<-c(1,2,3,4,5)
int_vec<-as.integer(int_vec)
int_vec1<-c(1L,2L,3L,4L,5L)
class(d)
class(e)
class(int_vec)
class(int_vec1)
Output

[1] "integer"
[1] "integer"
[1] "integer"
[1] "integer"

Character vector
A character is held as a one-byte integer in memory. In R, there are two different
ways to create a
character data type value, i.e., using as.character() function and by typing string
between
double quotes("") or single quotes('').
d<-'shubham'
e<-"Arpita"
f<-65
f<-as.character(f)
d
e
f
char_vec<-c(1,2,3,4,5)
char_vec<-as.character(char_vec)
char_vec1<-c("shubham","arpita","nishka","vaishali")
char_vec
class(d)
class(e)
class(f)
class(char_vec)
class(char_vec1)
Output
=======
[1] "shubham"
[1] "Arpita"
[1] "65"
[1] "1" "2" "3" "4" "5"
[1] "shubham" "arpita" "nishka" "vaishali"
[1] "character"
[1] "character"
[1] "character"
[1] "character"
[1] "character"

Logical vector
The logical data types have only two values i.e., True or False.
These values are based on which condition is satisfied.
A vector which contains Boolean values is known as the logical vector.

d<-as.integer(5)
e<-as.integer(6)
f<-as.integer(7)
g<-d>e
h<-e<f
g
h
log_vec<-c(d<e, d<f, e<d,e<f,f<d,f<e)
log_vec
class(g)
class(h)
class(log_vec)
Output
=====
[1] FALSE
[1] TRUE
[1] TRUE TRUE FALSE TRUE FALSE FALSE
[1] "logical"
[1] "logical"
[1] "logical"
b) Finding the sum and average of given numbers using arrays.
Source Code:
vec = c(1, 2, 3 , 4)
print("Sum of the vector:")

# inbuilt sum method


print(sum(vec))

# using inbuilt mean method


print("Mean of the vector:")
print(mean(vec))

# using inbuilt product method


print("Product of the vector:")
print(prod(vec))

Output:
========
[1] “Sum of the vector:”

[1] 10

[1] “Mean of the vector:”

[1] 2.5

[1] “Product of the vector:”

[1] 24

c) To display elements of list in reverse order.


Source Code:
In R, the rev() function is used to reverse the elements of a
vector object.
x <- list("a", "b", "c")
result = rev(x)
print(result)
Output:

[[1]]
[1] "c"
[[2]]
[1] "b"
[[3]]
[1] "a"

d) Finding the minimum and maximum elements in the array.


Source Code:
In R, we can find the minimum or maximum value of a vector or data frame. We use the
min() and max() function to find minimum and maximum value respectively. The min()
function returns the minimum value of a vector or data frame. The max() function returns
the maximum value of a vector or data frame.
numbers <- c(2,4,6,8,10)

# return minimum value present in numbers


min(numbers) # 2

characters <- c("s", "a", "p", "b")

# return alphabetically minimum value in characters


min(characters) # "a"

Output
=======
[1] 2
[1] "a"

numbers <- c(2,4,6,8,10)


# return largest value present in numbers
max(numbers) # 10

characters <- c("s", "a", "p", "b")

# return alphabetically maximum value in characters


max(characters) # "s"

Output
======
[1] 10
[1] "s"
Week 5:
a) Implement R Script to perform various operations on matrices
Source Code:
Sample matrices
A <- matrix(c(10, 8,
5, 12), ncol = 2, byrow = TRUE)
A

B <- matrix(c(5, 3,
15, 6), ncol = 2, byrow = TRUE)
B
Output
#A #B
[, 1] [, 2] [, 1] [, 2]
[1, ] 10 8 [1, ] 5 3
[2, ] 5 12 [2, ] 15 6
A+B
#Addition of A and B
[, 1] [, 2]
[1, ] 15 11
[2, ] 20 18

A-B
#Substraction of A and B
[, 1] [, 2]
[1, ] 5 5
[2, ] -10 6

Transpose a matrix in R
============================
To find the transpose of a matrix in R you just need to use the t function as
follows:

t(A)
#Transpose of A
[, 1] [, 2]
[1, ] 10 5
[2, ] 8 12
t(B)
#Transpose of B
[, 1] [, 2]
[1, ] 5 15
[2, ] 3 6

Matrix multiplication in R
============================
There are different types of matrix multiplications:
by a scalar, element-wise multiplication, matricial multiplication, exterior and
Kronecker product.

Multiplication by a scalar
--------------------------------
In order to multiply or divide a matrix by a scalar you can make use of the * or /
operators, respectively:

2*A
Output
[, 1] [, 2]
[1, ] 20 16
[2, ] 10 24
A/2
Output
[, 1] [, 2]
[1, ] 5.0 4
[2, ] 2.5 6
Element-wise multiplication
----------------------------
The element-wise multiplication of two matrices of the same dimensions can also
be computed with the * operator.
The output will be a matrix of the same dimensions of the original matrices.

A*B
Element-wise multiplication of A and B
[, 1] [, 2]
[1, ] 50 24
[2, ] 75 72

Power of a matrix in R
There is no a built-in function in base R to calculate the power of a matrix, so we
will provide two different alternatives.

On the one hand, you can make use of the %^% operator of the expm package as
follows:

# install.packages("expm")
library(expm)

A %^% 2
Power of A
[, 1] [, 2]
[1, ] 140 176
[2, ] 110 184
On the other hand the matrixcalc package provides the matrix.power function:

# install.packages("matrixcalc")
library(matrixcalc)

matrix.power(A, 2)
Power of A
[, 1] [, 2]
[1, ] 140 176
[2, ] 110 184

Determinant of a matrix in R
The determinant of a matrix

A, generally denoted by



∣A∣, is a scalar value that encodes some properties of the matrix.
In R you can make use of the det function to calculate it.

det(A) # 80
det(B) # -15
Inverse of a matrix in R
In order to calculate the inverse of a matrix in R you can make use of the solve
function.

M <- solve(A)
M
Inverse of A
[, 1] [, 2]
[1, ] 0.1500 -0.100
[2, ] -0.0625 0.125

Rank of a matrix in R
The rank of a matrix is maximum number of columns (rows) that are linearly
independent.
qr(A)$rank # 2
qr(B)$rank # 2

# Equivalent to:
library(Matrix)
rankMatrix(A)[1] # 2

b) Implement R Script to extract the data from dataframes.


Source Code:

# R program to illustrate dataframe

# A vector which is a character vector


Name = c("Amiya", "Raj", "Asish")

# A vector which is a character vector


Language = c("R", "Python", "Java")

# A vector which is a numeric vector


Age = c(22, 25, 45)

# To create dataframe use data.frame command and


# then pass each of the vectors
# we have created as arguments
# to the function data.frame()
df = data.frame(Name, Language, Age)

print(df)

Output:
=========
Name Language Age
1 Amiya R 22
2 Raj Python 25
3 Asish Java 45
# R program to illustrate operation on a data frame

# Creating a dataframe
df = data.frame(
"Name" = c("Amiya", "Raj", "Asish"),
"Language" = c("R", "Python", "Java"),
"Age" = c(22, 25, 45)
)
cat("Before adding row\n")
print(df)

# Add a new row using rbind()


newDf = rbind(df, data.frame(Name = "Sandeep",
Language = "C",
Age = 23
))
cat("After Added a row\n")
print(newDf)
Output:
Before adding row
Name Language Age
1 Amiya R 22
2 Raj Python 25
3 Asish Java 45

After Added a row


Name Language Age
1 Amiya R 22
2 Raj Python 25
3 Asish Java 45
4 Sandeep C 23
Adding extra columns: We can add extra column using the command cbind(). The syntax
for this is given below,
newDF = cbind(df, the entries for the new column you have to add )
df = Original data frame

c) Write R script to display file contents.


Source Code:
# R program to read a csv file
# Get content into a data frame
data <- read.csv("CSVFileExample.csv",
header = FALSE, sep = "\t")

# Printing content of Text File


print(data)
Output:

V1 V2 V3
1 100 AB ab
2 200 CD cd
3 300 EF ef
4 400 GH gh
5 500 IJ ij

d) Write R script to copy file contents from one file to another


Source Code:
# Get list of file names
my_files <- list.files("C:/Desktop/User/Murali/my directory A")
my_files
# [1] "file no 1.docx" "file no 2.txt" "file no 3.xlsx"
# Copy files

file.copy(from = paste0("C:/Desktop/User/Murali/my directory A/", my_files),


to = paste0("C:/Desktop/User/Murali/my directory B/", my_files))
Output:
# [1] TRUE TRUE TRUE

Week 6:
a) Write an R script to find basic descriptive statistics using summary,
str, quartile function on mtcars& cars datasets.
Source Code:
data() ##List of pre-loaded data
data(mtcars) ##Loading a built-in R data
head(mtcars, 6) ##Print the first 6 rows
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1

nrow(mtcars) ##[1] 32
ncol(mtcars) ##[1] 11

head(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1

tail(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.7 0 1 5 2
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.9 1 1 5 2
## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.5 0 1 5 4
## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.5 0 1 5 6
## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.6 0 1 5 8
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.6 1 1 4 2
summary(mtcars)
## mpg cyl disp hp
## Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0
## 1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5
## Median :19.20 Median :6.000 Median :196.3 Median :123.0
## Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7
## 3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0
## Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0
## drat wt qsec vs
## Min. :2.760 Min. :1.513 Min. :14.50 Min. :0.0000
## 1st Qu.:3.080 1st Qu.:2.581 1st Qu.:16.89 1st Qu.:0.0000
## Median :3.695 Median :3.325 Median :17.71 Median :0.0000
## Mean :3.597 Mean :3.217 Mean :17.85 Mean :0.4375
## 3rd Qu.:3.920 3rd Qu.:3.610 3rd Qu.:18.90 3rd Qu.:1.0000
## Max. :4.930 Max. :5.424 Max. :22.90 Max. :1.0000
## am gear carb
## Min. :0.0000 Min. :3.000 Min. :1.000
## 1st Qu.:0.0000 1st Qu.:3.000 1st Qu.:2.000
## Median :0.0000 Median :4.000 Median :2.000
## Mean :0.4062 Mean :3.688 Mean :2.812
## 3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:4.000
## Max. :1.0000 Max. :5.000 Max. :8.000

quantile(mtcars$wt)
## 0% 25% 50% 75% 100%
## 1.51300 2.58125 3.32500 3.61000 5.42400

dat <- iris # load the iris dataset and renamed it dat

summary(dat)
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100
## 1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300
## Median :5.800 Median :3.000 Median :4.350 Median :1.300
## Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
## 3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
## Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
head(dat) # first 6 observations

str(dat) # structure of dataset

quantile(dat$Sepal.Length, 0.5)

quantile(dat$Sepal.Length, 0.25) # first quartile


## 25%
## 5.1
quantile(dat$Sepal.Length, 0.75) # third quartile
## 75%
## 6.4

b) Write an R script to find subset of dataset by using subset (), aggregate ()


functions on iris dataset
Source Code:
# Import the data using read.csv()
myData = read.csv("iris.csv",
stringsAsFactors = F)

# Calculating summary
summary = summary(myData)
print(summary)

library(datasets)
str(iris)

## 'data.frame': 150 obs. of 5 variables:


## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
## $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

# Get first 5 rows of each subset


subset(iris, Species == "setosa")[1:5,]

## Sepal.Length Sepal.Width Petal.Length Petal.Width Species


## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa

subset(iris, Species == "versicolor")[1:5,]


## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 51 7.0 3.2 4.7 1.4 versicolor
## 52 6.4 3.2 4.5 1.5 versicolor
## 53 6.9 3.1 4.9 1.5 versicolor
## 54 5.5 2.3 4.0 1.3 versicolor
## 55 6.5 2.8 4.6 1.5 versicolor

subset(iris, Species == "virginica")[1:5,]


## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 101 6.3 3.3 6.0 2.5 virginica
## 102 5.8 2.7 5.1 1.9 virginica
## 103 7.1 3.0 5.9 2.1 virginica
## 104 6.3 2.9 5.6 1.8 virginica
## 105 6.5 3.0 5.8 2.2 virginica

# Subset the dataset based on a condition


subset_df <- subset(iris, Sepal.Length > 5)

# Print the subset


print(subset_df)

# Aggregate the dataset by Species and calculate the mean Sepal.Width for each
Species
aggregated_df <- aggregate(Sepal.Width ~ Species, iris, mean)

# Print the aggregated dataset


print(aggregated_df)

## Species size Sepal.Length Sepal.Width


## 1 setosa big 5.800000 4.000000
## 2 versicolor big 6.282759 2.868966
## 3 virginica big 6.663830 2.997872
## 4 setosa small 4.989796 3.416327
## 5 versicolor small 5.457143 2.633333
## 6 virginica small 5.400000 2.600000

Week 7:
a) Reading different types of data sets (.txt, .csv) from Web or disk and writing
in file in specific disk location.
Source Code:
File reading in ROne of the important formats to store a file is in a text file.
R provides various methods that one can read data from a text file.
read.delim(): This method is used for reading “tab-separated value” files
(“.txt”).
By default, point (“.”) is used as decimal points.
Syntax: read.delim(file, header = TRUE, sep = “\t”, dec = “.”, …)
Parameters:
file: the path to the file containing the data to be read into R.
header: a logical value. If TRUE, read.delim() assumes that your file has a
header row, so row 1 is the name of each column.
If that’s not the case, you can add the argument header = FALSE.
sep: the field separator character. “\t” is used for a tab-delimited file.
dec: the character used in the file for decimal points.

# R program reading a text file using file.choose()


myFile = read.delim(file.choose(), header = FALSE)# If you use the code
above in RStudio# you will be asked to choose a file
print(myFile)

# R program to read a file in table format


# Using read.table()
myData = read.table("basic.csv")
print(myData)
Output:
1 Name,Age,Qualification,Address
2 Amiya,18,MCA,BBS
3 Niru,23,Msc,BLS
4 Debi,23,BCA,SBP
5 Biku,56,ISC,JJP
read.csv(): read.csv() is used for reading “comma separated value” files
(“.csv”). In this also the data will be imported as a data frame.
Syntax: read.csv(file, header = TRUE, sep = “,”, dec = “.”, …)
Parameters:
file: the path to the file containing the data to be imported into R.
header: logical value. If TRUE, read.csv() assumes that your file has a header
rowso row 1 is the name of each column. If that’s not the case, you can add
the argument header = FALSE.
sep: the field separator character
dec: the character used in the file for decimal points.

# R program to read a file in table format # Using read.csv2()


myData = read.csv2("basic.csv")
print(myData)

Output: Name.Age.Qualification.Address
1 Amiya,18,MCA,BBS
2 Niru,23,Msc,BLS
3 Debi,23,BCA,SBP
4 Biku,56,ISC,JJPfile.
choose(): You can also use file.choose() with read.csv() just like before
b) Reading Excel data sheet in R.
Source Code:

#Install openxlsx package


install.packages("openxlsx")
# Load openxlsx
library(openxlsx)
#Install openxlsx packageinstall.
packages("openxlsx")
# Load openxlsx
library(openxlsx)
# Read excel fileread.
xlsx('/Users/admin/new_file.xlsx')

c) Reading XML dataset in R


Source Code:

install.packages("XML")
Creating XML file:

XML files can be created by saving the data with the respective tags containing information
about the content and saving it with ‘.xml’.
<RECORDS>
<STUDENT>
<ID>1</ID>
<NAME>Alia</NAME>
<MARKS>620</MARKS>
<BRANCH>IT</BRANCH>
</STUDENT>
<STUDENT>
<ID>2</ID>
<NAME>Brijesh</NAME>
<MARKS>440</MARKS>
<BRANCH>Commerce</BRANCH>
</STUDENT>
<STUDENT>
<ID>3</ID>
<NAME>Yash</NAME>
<MARKS>600</MARKS>
<BRANCH>Humanities</BRANCH>
</STUDENT>
<STUDENT>
<ID>4</ID>
<NAME>Mallika</NAME>
<MARKS>660</MARKS>
<BRANCH>IT</BRANCH>
</STUDENT>
<STUDENT>
<ID>5</ID>
<NAME>Zayn</NAME>
<MARKS>560</MARKS>
<BRANCH>IT</BRANCH>
</STUDENT>
</RECORDS>
Reading XML File:
The XML file can be read after installing the package and then parsing it with xmlparse()
function,
which takes as input the XML file name and prints the content of the file in the form of a list.
# loading the library and other important packages
library("XML")
library("methods")

# the contents of sample.xml are parsed


data <- xmlParse(file = "sample.xml")

print(data)
Output:
1
Alia
620
IT
2
Brijesh
440
Commerce
3
Yash
600
Humanities
4
Mallika
660
IT
5
Zayn
560
IT

Week 8:
a) Implement R Script to create a Pie chart, Bar Chart, scatter plot and
Histogram (Introduction to ggplot2 graphics)
Source Code:
Load the ggplot2 package
library(ggplot2)

# Pie Chart
pie_data <- iris$Species
pie_chart <- ggplot(data = iris, aes(x = "")) +
geom_bar(aes(fill = Species), width = 1) +
coord_polar("y", start = 0) +
labs(title = "Pie Chart of Iris Species") +
scale_fill_manual(values = c("#F8766D", "#00BA38", "#619CFF"), labels =
levels(iris$Species)) +
theme_void()

# Bar Chart
bar_chart <- ggplot(data = iris, aes(x = Species)) +
geom_bar(fill = "#619CFF") +
labs(title = "Bar Chart of Iris Species", x = "Species", y = "Count") +
theme_minimal()

# Scatter Plot
scatter_plot <- ggplot(data = iris, aes(x = Sepal.Length, y = Sepal.Width, color =
Species)) +
geom_point() +
labs(title = "Scatter Plot of Sepal Length vs Sepal Width", x = "Sepal Length", y
= "Sepal Width") +
theme_minimal()

# Histogram
histogram <- ggplot(data = iris, aes(x = Sepal.Length, fill = Species)) +
geom_histogram(binwidth = 0.2, position = "identity") +
labs(title = "Histogram of Sepal Length", x = "Sepal Length", y = "Count") +
theme_minimal()

# Print the plots


print(pie_chart)
print(bar_chart)
print(scatter_plot)
print(histogram)

1. Histogram
A histogram is a graphical tool that works on a single variable. Numerous variable values are

grouped into bins, and a number of values termed as the frequency are calculated .
b) Implement R Script to perform mean, median, mode, range,
summary, variance, standard deviation operations.
Source Code:
> x <- c(1,2,3,4,5,1,2,3,1,2,4,5,2,3,1,1,2,3,5,6) # our data set
> mean.result = mean(x) # calculate mean
> print (mean.result)
[1] 2.8

> x <- c(1,2,3,4,5,1,2,3,1,2,4,5,2,3,1,1,2,3,5,6) # our data set


> median.result = median(x) # calculate median
> print (median.result)
[1] 2.5

> mode <- function(x) {


+ ux <- unique(x)
+ ux[which.max(tabulate(match(x, ux)))]
+}

> x <- c(1,2,3,4,5,1,2,3,1,2,4,5,2,3,1,1,2,3,5,6) # our data set

> mode.result = mode(x) # calculate mode (with our custom function named ‘mode’)
> print (mode.result)
[1] 1

> variance.result = var(x) # calculate variance


> print (variance.result)
[1] 2.484211

> sd.result = sqrt(var(x)) # calculate standard deviation


> print (sd.result)
[1] 1.576138

(or)
# Sample data
data <- c(3, 5, 2, 8, 6, 9, 10, 7, 1, 4)

# Mean
mean_val <- mean(data)

# Median
median_val <- median(data)

# Mode (using the mode function)


mode_val <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
mode_val <- mode_val(data)

# Range
range_val <- range(data)

# Summary
summary_val <- summary(data)

# Variance
variance_val <- var(data)

# Standard Deviation
standard_deviation_val <- sd(data)

Week 9:
a) Implement R Script to perform Normal, Binomial distributions.
Source Code:
# Generate random numbers from a Normal distribution
normal_data <- rnorm(1000, mean = 0, sd = 1)

# Create a histogram of the Normal distribution


ggplot(data = data.frame(x = normal_data), aes(x)) +
geom_histogram(binwidth = 0.2, fill = "#619CFF", color = "black") +
labs(title = "Normal Distribution", x = "Values", y = "Frequency") +
theme_minimal()

# Generate random numbers from a Binomial distribution


binomial_data <- rbinom(1000, size = 10, prob = 0.5)

# Create a bar chart of the Binomial distribution


ggplot(data = data.frame(x = binomial_data), aes(x)) +
geom_bar(fill = "#F8766D", color = "black") +
labs(title = "Binomial Distribution", x = "Number of Successes", y = "Count") +
theme_minimal()

b) Implement R Script to perform correlation, Linear and multiple regression.


Source Code:

Correlation

------------
> x<-seq(-10,10, 1)

> y<-x*x

> plot(x,y)

> cor(x,y)

[1] 0

Output:
---------

Simple Linear Regression

fatdata<-fat[,c(1,2,5:11)]

> summary(fatdata[,-1]) # do you remember what the negative index (-1) here means?
> lm1 <- lm(pctfat.brozek ~ neck, data = fatdata)
> plot(pctfat.brozek ~ neck, data = fatdata)
> abline(lm1)
> names(lm1)

[1] "coefficients" "residuals" "effects" "rank"

[5] "fitted.values" "assign" "qr" "df.residual"

[9] "xlevels" "call" "terms" "model"

> summary(lm1)
> lm1
> summary(lm1)
> plot(fitted(lm1), resid(lm1))
> qqnorm(resid(lm1))

##Residuals:

Min 1Q Median 3Q Max

-14.0076 -4.9450 -0.2405 5.0321 21.1344

Coefficients:

Estimate Std. Error t value Pr(>|t|) #These are the comprehensive results

(Intercept) -40.5985 6.6857 -6.072 4.66e-09 ***

neck 1.5671 0.1756 8.923 < 2e-16 ***

Multiple Linear Regression:


> lm2<-lm(pctfat.brozek~age+fatfreeweight+neck,data=fatdata)
> summary(lm2)
lm3 <- lm(pctfat.brozek ~ age + fatfreeweight + neck + factor(bmi), data = fatdata)
> summary(lm3)

which corresponds to the following multiple linear regression model:

pctfat.brozek = β0 + β1*age + β2*fatfreeweight + β3*neck + ε

This tests the following hypotheses:

H0: There is no linear association between pctfat.brozek and age, fatfreeweight and neck.
Ha: Here is a linear association between pctfat.brozek and age, fatfreeweight and neck.

Call:
lm(formula = pctfat.brozek ~ age + fatfreeweight + neck, data = fatdata)

Residuals:

Min 1Q Median 3Q Max

-16.67871 -3.62536 0.07768 3.65100 16.99197

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -53.01330 5.99614 -8.841 < 2e-16 ***

age 0.03832 0.03298 1.162 0.246

fatfreeweight -0.23200 0.03086 -7.518 1.02e-12 ***

neck 2.72617 0.22627 12.049 < 2e-16 ***

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 5.901 on 248 degrees of freedom

Multiple R-squared: 0.4273, Adjusted R-squared: 0.4203

F-statistic: 61.67 on 3 and 248 DF, p-value: < 2.2e-16


Week 10:
Introduction to Non-Tabular Data Types: Time series, spatial
data, Network data. Data Transformations: Converting
Numeric Variables into Factors, Date Operations, String
Parsing, Geocoding.
Time series:
Time series is a series of data points in which each data point
is associated witha timestamp. A simple example is the price
of a stock in the stock market atdifferent points of time on a
given day. Another example is the amount ofrainfall in a
region at different months of the year. R language
uses manyfunctions to create, manipulate and plot the time
series data. The data for thetime series is stored in an R object
called time-series object. It is also a R dataobject like a vector
or data frame.The time series object is created by using the
ts() functionThe basic syntax for ts() function in time series
analysis is −timeseries.
syntax:
----------
object.name <- ts(data, start, end, frequency)Following is the
description of the parameters used
data is a vector or matrix containing the values used in the
time series.start specifies the start time for the first
observation in time series.end specifies the end time for the
last observation in time series.frequency specifies the number
of observations per unit time.Except the parameter "data" all
other parameters are optional.

# Get the data points in form of a R vector.


rainfall <-
c(799,1174.8,865.1,1334.6,635.4,918.5,685.5,998.6,784.2,985,
882.8,1071)
# Convert it to a time series object.
rainfall.timeseries <- ts(rainfall,start = c(2012,1),frequency =
12)
# Print the timeseries data.
print(rainfall.timeseries)
# Give the chart file a name.
png(file = "rainfall.png")
# Plot a graph of the time series.
plot(rainfall.timeseries)
# Save the file.
dev.off()

When we execute the above code, it produces the following


result and chart −
Jan Feb Mar Apr May Jun Jul Aug Sep
2012 799.0 1174.8 865.1 1334.6 635.4 918.5 685.5 998.6
784.2
Oct Nov Dec
2012 985.0 882.8 1071.0
Week 11:
Introduction Dirty data problems: Missing values, data manipulation, duplicates,
forms of data dates, outliers, spelling

Week 12:
Data sources: SQLite examples for relational databases, Loading SPSS and SAS
files, Reading from Google Spreadsheets, API and web scraping examples

Correlation:
========================
# Create two numeric vectors
x <- c(1, 2, 3, 4, 5)
y <- c(6, 7, 8, 9, 10)

# Compute correlation coefficient


correlation <- cor(x, y)

# Print the correlation coefficient


print(correlation)

Linear Regression:
========================
# Create two numeric vectors
x <- c(1, 2, 3, 4, 5)
y <- c(6, 7, 8, 9, 10)

# Fit a linear regression model


model <- lm(y ~ x)

# Print the summary of the linear regression model


summary(model)

Multiple Regression:
========================
# Create three numeric vectors
x1 <- c(1, 2, 3, 4, 5)
x2 <- c(6, 7, 8, 9, 10)
y <- c(11, 12, 13, 14, 15)

# Combine predictors into a data frame


data <- data.frame(x1, x2)

# Fit a multiple regression model


model <- lm(y ~ ., data = data)

# Print the summary of the multiple regression model


summary(model)

# Stepwise selection of variables


stepModel <- step(model)

# Print the summary of the stepwise regression model


summary(stepModel)

You might also like