0% found this document useful (0 votes)

34 views21 pages

Lab Record

Uploaded by

vyastanay30

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views21 pages

Lab Record

Uploaded by

vyastanay30

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

lOMoARcPSD|41453364

Lab Record 21BCG10126 - hgv 7huyh bihkbih

Computer Science (Anna University)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university

Downloaded by Tanay Vyas ([email protected])
lOMoARcPSD|41453364

VIT Bhopal University

NAS1001 – Associative Data Analytics (LTP-4)

Slot: B11+B12+B13+B14
Class ID: BL2023241000207
FALL SEMESTER 2023-2024

Course Instructor: Dr. D Lakshmi

Name of the Student: Aniket Shrivastava

List of Experiments

Downloaded by Tanay Vyas ([email protected])

lOMoARcPSD|41453364

List of Challenging Experiments (Indicative) SLO:

1,2,5,9,12

1. Understanding of R System and installation and configuration of R 1-4

Environment and R-Studio, Understanding R Packages, their installation
and management

2. Understanding of nuts and bolts of R: 4-5

a. R program Structure
b. R Data Type, Command Syntax and Control Structures
c. File Operations in R

3. Excel and R integration with R connector. 5-7

4. Preparing Data in R 7-9

a. Data Cleaning
b. Data imputation
c. Data conversion

5. Outliers detection using R 9-12

6. Correlation and Regression Analysis in R 10-13

7. Clustering Algorithms implementation using R 13-15

8. Classification Algorithm implementation using R 15-17

Classification (Spam/Not spam)

9. Case study on Stock Market Analysis and applications. Stock data can be 17-19
obtained from Yahoo! Finance, Google Finance. A team of students can
apply statistical modeling on the stock data to uncover hidden patterns. R
provides tools for moving averages, auto regression and time-series
analysis which forms the crux of financial applications.

10. Detect credit card fraudulent transactions - The dataset can be obtained 19-20
from Kaggle. The team will use a variety of machine learning algorithms
that will be able to discern fraudulent from non-fraudulent one.

Experiment No: 1

Downloaded by Tanay Vyas ([email protected])

lOMoARcPSD|41453364

Aim: Understanding of R System and installation and configuration of R Environment and R-

Studio, Understanding R Packages, their installation and management

Data Description: R is a programming language for statistical computing and graphics

supported by the R Core Team and the R Foundation for Statistical Computing.
Designed by: Ross Ihaka, Robert Gentleman

Installing R:

Download R:

1. Go to the R Project's official website: https://www.r-project.org/

2. Click on the "CRAN" link under the "Download and Install R" section.
3. For Windows: Double-click the downloaded executable file and follow the installation
instructions.
4. For macOS: Double-click the downloaded package file and follow the installation
instructions.
5. For Linux: Follow the installation instructions specific to your Linux distribution.

Installing RStudio:

Download RStudio:

1. Go to the RStudio download page: https://www.rstudio.com/products/rstudio/download/

2. Under "RStudio Desktop," click the appropriate download link for your operating system
(Windows, macOS, or Linux).
3. Install RStudio:
4. For Windows: Double-click the downloaded installer and follow the installation
instructions.
5. For macOS: Double-click the downloaded disk image (.dmg) file, drag the RStudio icon
to the Applications folder, and then open RStudio from the Applications folder.
6. For Linux: Follow the installation instructions specific to your Linux distribution.

Installing R packages
It is a fundamental part of working with R. R packages contain pre-built functions, data sets, and
documentation that extend the capabilities of the R programming language. Here are the steps
to install R packages using the R console within RStudio:

Open RStudio:
Launch RStudio on your computer.

Open R Console:

Downloaded by Tanay Vyas ([email protected])

lOMoARcPSD|41453364

Once RStudio is open, you'll see several panels. The left-top panel is the R Console. This is
where you can directly interact with R by typing commands.

Install a Package:
To install an R package, you'll use the install.packages() function followed by the name of the
package you want to install. For example, to install the "ggplot2" package, type the following
command in the R Console and press Enter: install.packages("ggplot2")

Load the Package:

After installing a package, you need to load it into your R session to use its functions. Use the
library() function for this purpose. For example, to load the "ggplot2" package, type:
library(ggplot2)

Experiment No: 2

Aim: Understanding of nuts and bolts of R:

a. R program Structure
b. R Data Type, Command Syntax and Control Structures
c. File Operations in R

Data Description

a. R Program Structure: An R program consists of a series of commands

that are executed sequentially. These commands can be typed directly into
the R console or saved in a script file with a .R extension.

b. R Data Types, Command Syntax, and Control Structures: R

supports various data types, including numeric, character, logical, factor, and
more. Here's a quick overview: Numeric: Used for storing numeric values
(integers or decimals). Character: Used for storing text data. Logical:
Represents binary values TRUE or FALSE. Factor: Represents categorical data
with levels or categories.

c. File Operations in R: R provides functions to perform various file

operations:

R Code

a. R Program Structure:

library(package_name)

print(result)
my_function <- function(arg1, arg2) {
return(result)

Downloaded by Tanay Vyas ([email protected])

lOMoARcPSD|41453364

result <- my_function(value1, value2)

b. R Data Types, Command Syntax, and Control Structures:

x <- 5
name <- "John"
is_valid <- TRUE
sum_result <- 3 + 7

c. File Operations in R:
Reading files
# Reading text files
data <- read.table("data.txt", header = TRUE)

# Reading CSV files

data <- read.csv("data.csv")

# Reading Excel files (requires 'readxl' package)

library(readxl)
data <- read_excel("data.xlsx")

Writing files
# Writing data to text file
write.table(data, "output.txt", sep = "\t", row.names = FALSE)

# Writing data to CSV file

write.csv(data, "output.csv", row.names = FALSE)

# Writing data to Excel file (requires 'openxlsx' package)

library(openxlsx)
write.xlsx(data, "output.xlsx")

Experiment No: 3

Aim: Excel and R integration with R connector.

Data Description:
In this example, the CSV file has two columns:
experience_years: This column represents the number of years of experience each person
has.
salary: This column contains the corresponding salary for each person based on their
experience.
Sample rows and columns

Downloaded by Tanay Vyas ([email protected])

lOMoARcPSD|41453364

R Code
> install.packages("csv")
> library("csv")
> Salary_Dataset = read.csv(file.choose(), 1)
> Salary_Dataset

Sample Input and Output

Downloaded by Tanay Vyas ([email protected])

lOMoARcPSD|41453364

Experiment No: 4

Aim: Preparing Data in R

a. Data Cleaning
b. Data imputation
c. Data conversion

Data Description

In this example, the CSV file has two columns:

experience_years: This column represents the number of years of experience each person
has.
salary: This column contains the corresponding salary for each person based on their
experience.

Sample rows and columns

Downloaded by Tanay Vyas ([email protected])

lOMoARcPSD|41453364

R Code
# Load libraries
library(dplyr)
library(missForest)

# Read dataset
data <- read.csv("data.csv")

# Data Cleaning
cleaned_data <- data %>%
distinct() %>%
select(-Irrelevant_Column)

# Check for missing values

missing_values <- sum(is.na(cleaned_data))

if (missing_values > 0) {
# Data Imputation
imputed_data <- missForest(cleaned_data, verbose = TRUE)
} else {
imputed_data <- cleaned_data
}

# Data Conversion (if needed)

imputed_data$Categorical_Column <- as.factor(imputed_data$Categorical_Column)

# Display prepared dataset

print(imputed_data)

Sample Input and Output

Downloaded by Tanay Vyas ([email protected])

lOMoARcPSD|41453364

Experiment No: 5

Aim: Outliers detection using R

Data Description
In this example, the CSV file has two columns:
experience_years: This column represents the number of years of experience each person
has.
salary: This column contains the corresponding salary for each person based on their
experience.

Sample rows and columns

Downloaded by Tanay Vyas ([email protected])

lOMoARcPSD|41453364

R Code

Sample Input and Output

Experiment No: 6

Aim: Correlation and Regression Analysis in R

Data Description

In this example, the CSV file has two columns:

experience_years: This column represents the number of years of experience each person
has.
salary: This column contains the corresponding salary for each person based on their
experience.

Downloaded by Tanay Vyas ([email protected])

lOMoARcPSD|41453364

Sample rows and columns

R Code

Downloaded by Tanay Vyas ([email protected])

lOMoARcPSD|41453364

Downloaded by Tanay Vyas ([email protected])

lOMoARcPSD|41453364

Experiment No: 7

Aim: Clustering Algorithms implementation using R

Sample rows and columns

Downloaded by Tanay Vyas ([email protected])

lOMoARcPSD|41453364

R Code

Downloaded by Tanay Vyas ([email protected])

lOMoARcPSD|41453364

Sample Input and Output

Experiment No: 8

Aim: Classification Algorithm implementation using R

Classification (Spam/Not spam)

R Code

# Load required libraries

library(tm) # Text mining
library(e1071) # For Naive Bayes classifier
library(caret) # For model evaluation

Downloaded by Tanay Vyas ([email protected])

lOMoARcPSD|41453364

# Load the SpamAssassin dataset (replace with your actual file path)
spam_data <- read.csv("path/to/spamassassin_data.csv", stringsAsFactors = FALSE)

# Preprocess the text data

corpus <- Corpus(VectorSource(spam_data$text))
corpus <- tm_map(corpus, content_transformer(tolower))
corpus <- tm_map(corpus, removePunctuation)
corpus <- tm_map(corpus, removeNumbers)
corpus <- tm_map(corpus, removeWords, stopwords("en"))
corpus <- tm_map(corpus, stripWhitespace)

# Create a document-term matrix

dtm <- DocumentTermMatrix(corpus)

# Convert the document-term matrix to a data frame

spam_df <- as.data.frame(as.matrix(dtm))
colnames(spam_df) <- make.names(colnames(spam_df))

# Combine with labels

spam_df$label <- spam_data$label

# Split data into training and testing sets

set.seed(123)
train_indices <- sample(1:nrow(spam_df), 0.7 * nrow(spam_df))
train_data <- spam_df[train_indices, ]
test_data <- spam_df[-train_indices, ]

# Train a Naive Bayes classifier

naive_bayes_model <- naiveBayes(label ~ ., data = train_data)

# Make predictions
predictions <- predict(naive_bayes_model, newdata = test_data, type = "class")

# Evaluate the model

conf_matrix <- confusionMatrix(predictions, test_data$label)
print(conf_matrix)

Sample Input and Output

Downloaded by Tanay Vyas ([email protected])

lOMoARcPSD|41453364

Experiment No: 9

Aim:Case study on Stock Market Analysis and applications. Stock data can be obtained from
Yahoo! Finance, Google Finance. A team of students can apply statistical modeling on the stock
data to uncover hidden patterns. R provides tools for moving averages, auto regression and
time-series analysis which forms the crux of financial applications.

Data Description

Stock data imported from Yahoo FInances.

Downloaded by Tanay Vyas ([email protected])

lOMoARcPSD|41453364

R Code
# Load required libraries
library(dplyr)
library(lubridate)

# Read the stock data CSV file (or load data from API)
stock_data <- read.csv("stock_data.csv")

# Convert date column to Date format

stock_data$Date <- ymd(stock_data$Date)

# Calculate 50-day and 200-day moving averages

stock_data$MA_50 <- SMA(stock_data$Close, n = 50)
stock_data$MA_200 <- SMA(stock_data$Close, n = 200)

# Load required library

library(forecast)

# Convert data to time series format

stock_ts <- ts(stock_data$Close, frequency = 365)

# Fit auto-regression model (ARIMA)

ar_model <- auto.arima(stock_ts)

# Load required libraries

library(ggplot2)
library(forecast)

# Decompose time series into trend, seasonal, and residual components

decomposed <- decompose(stock_ts)

# Plot decomposed components

plot(decomposed)

# Create a time series plot of stock prices and moving averages

ggplot(stock_data, aes(x = Date)) +
geom_line(aes(y = Close, color = "Stock Price")) +
geom_line(aes(y = MA_50, color = "50-day MA")) +
geom_line(aes(y = MA_200, color = "200-day MA")) +
labs(title = "Stock Price and Moving Averages", y = "Price") +
scale_color_manual(values = c("Stock Price" = "blue", "50-day MA" = "red", "200-day MA" =
"green"))

Sample Input and Output

Downloaded by Tanay Vyas ([email protected])

lOMoARcPSD|41453364

Experiment No: 10

Aim: Detect credit card fraudulent transactions - The dataset can be obtained from Kaggle. The
team will use a variety of machine learning algorithms that will be able to discern fraudulent
from non-fraudulent one.

Data Description
The dataset was obtained from Kaggle

R Code
# Load required libraries
library(AnomalyDetection)
library(randomForest)

# Load the CreditCardFraud dataset

data("CreditCardFraud")

# Split data into training and testing sets (70% training, 30% testing)
set.seed(123)
train_indices <- sample(1:nrow(CreditCardFraud), 0.7 * nrow(CreditCardFraud))
train_data <- CreditCardFraud[train_indices, ]
test_data <- CreditCardFraud[-train_indices, ]

# Build Random Forest model

rf_model <- randomForest(Class ~ ., data = train_data, ntree = 100)

# Make predictions
predictions <- predict(rf_model, newdata = test_data)

Downloaded by Tanay Vyas ([email protected])

lOMoARcPSD|41453364

# Calculate accuracy
accuracy <- sum(predictions == test_data$Class) / nrow(test_data)
print(paste("Accuracy score on Test Data: :", accuracy))

Sample Input and Output

Downloaded by Tanay Vyas ([email protected])

R Programming Course Material
No ratings yet
R Programming Course Material
217 pages
R Programming Lab
No ratings yet
R Programming Lab
26 pages
Stats With R
No ratings yet
Stats With R
103 pages
R Record A Section
No ratings yet
R Record A Section
54 pages
Experiment 1
100% (2)
Experiment 1
7 pages
R Programming Lab
100% (1)
R Programming Lab
46 pages
Computing Systems DS AI Lab Manual
No ratings yet
Computing Systems DS AI Lab Manual
68 pages
RP Lab Manual
No ratings yet
RP Lab Manual
24 pages
20ITPL702 - DataScienceWithMachineLearning
No ratings yet
20ITPL702 - DataScienceWithMachineLearning
69 pages
Introduction To R: 1 Getting Started
No ratings yet
Introduction To R: 1 Getting Started
14 pages
Introduction To R PDF
No ratings yet
Introduction To R PDF
56 pages
R Programming Lab r22 Lab Manual 3 26
No ratings yet
R Programming Lab r22 Lab Manual 3 26
24 pages
Pushpendra Lab File
No ratings yet
Pushpendra Lab File
51 pages
Data Science Using R - Lab Manual-Complete Ver 2.0 - Nov 2024
No ratings yet
Data Science Using R - Lab Manual-Complete Ver 2.0 - Nov 2024
36 pages
R Programming Lab
No ratings yet
R Programming Lab
33 pages
Pran Jal
No ratings yet
Pran Jal
54 pages
Presentation of R
No ratings yet
Presentation of R
109 pages
R Programmimg Lab FIle
No ratings yet
R Programmimg Lab FIle
35 pages
R Program Questions 1-24
No ratings yet
R Program Questions 1-24
56 pages
Data - Analysis - With - R - 24
No ratings yet
Data - Analysis - With - R - 24
47 pages
R Language Lab Manual Lab 1
100% (1)
R Language Lab Manual Lab 1
33 pages
R Language Lab Manual Lab 1
No ratings yet
R Language Lab Manual Lab 1
32 pages
R Programming Unit 2
No ratings yet
R Programming Unit 2
46 pages
R Programming Lab
No ratings yet
R Programming Lab
18 pages
R Labmanual
No ratings yet
R Labmanual
22 pages
R Tutorial
No ratings yet
R Tutorial
100 pages
R Course ISLR Basics 2023
No ratings yet
R Course ISLR Basics 2023
77 pages
Getting Started With R
No ratings yet
Getting Started With R
155 pages
2 Undefined
No ratings yet
2 Undefined
86 pages
R Lab
No ratings yet
R Lab
114 pages
Vinit R Programming
No ratings yet
Vinit R Programming
39 pages
R Prog Lab Manual Theory
No ratings yet
R Prog Lab Manual Theory
16 pages
Karim Budhwani (470) 447-0765 Sr. Data Analyst Professional Summary
No ratings yet
Karim Budhwani (470) 447-0765 Sr. Data Analyst Professional Summary
4 pages
R Programming Lab Manual
No ratings yet
R Programming Lab Manual
28 pages
Introduction+to+R++and+Data++Analysis+with+R++Final+9 20 23
No ratings yet
Introduction+to+R++and+Data++Analysis+with+R++Final+9 20 23
28 pages
ProgrammingForDS14 Rbasics
No ratings yet
ProgrammingForDS14 Rbasics
32 pages
Unit 1 - Data Analysis Using R
No ratings yet
Unit 1 - Data Analysis Using R
28 pages
R Lanaguage
No ratings yet
R Lanaguage
25 pages
SSMDA Expt 7
No ratings yet
SSMDA Expt 7
16 pages
R - Lab Experiments - Manual
No ratings yet
R - Lab Experiments - Manual
39 pages
S24 Stats10 Lab1-1
No ratings yet
S24 Stats10 Lab1-1
8 pages
Introduction To R
No ratings yet
Introduction To R
23 pages
Rintro
No ratings yet
Rintro
14 pages
Statistical Methods Lab Manual-2021-22
No ratings yet
Statistical Methods Lab Manual-2021-22
58 pages
Unit 2 R
No ratings yet
Unit 2 R
16 pages
All v2 Basic Statistics Using R
No ratings yet
All v2 Basic Statistics Using R
241 pages
R Programming
No ratings yet
R Programming
22 pages
Introduction To R
No ratings yet
Introduction To R
36 pages
Chap 1
No ratings yet
Chap 1
32 pages
Lab1 411 Eman Yahya 7773225
No ratings yet
Lab1 411 Eman Yahya 7773225
16 pages
W1 Class Overview and R Basics
No ratings yet
W1 Class Overview and R Basics
33 pages
R Gettingstarted
No ratings yet
R Gettingstarted
7 pages
R Manual
No ratings yet
R Manual
10 pages
RBasics Handout
No ratings yet
RBasics Handout
6 pages
CS ELEC 4 - Analytics Techniques & Tools/Machine Learning: Module No.: 1 (Prelim) Module Title: Writer
No ratings yet
CS ELEC 4 - Analytics Techniques & Tools/Machine Learning: Module No.: 1 (Prelim) Module Title: Writer
22 pages
Linear Regression Analysis HUDM 5122: Introduction To R Johnny Wang
No ratings yet
Linear Regression Analysis HUDM 5122: Introduction To R Johnny Wang
17 pages
Programming With R: Lecture #4
No ratings yet
Programming With R: Lecture #4
34 pages
Part I: Introductory Materials: Introduction To R
No ratings yet
Part I: Introductory Materials: Introduction To R
25 pages
Introduction To R Installation: Data Types Value Examples
No ratings yet
Introduction To R Installation: Data Types Value Examples
9 pages
Introduction To Amazon s3
No ratings yet
Introduction To Amazon s3
32 pages
Netapp Certification Program: Reference Document List
No ratings yet
Netapp Certification Program: Reference Document List
18 pages
AzqaSaleemKhan (SP22 RCS 003) FPGrowth
No ratings yet
AzqaSaleemKhan (SP22 RCS 003) FPGrowth
19 pages
Peer-to-Peer Versus Client
No ratings yet
Peer-to-Peer Versus Client
8 pages
Management Information System CHAPTER 1
No ratings yet
Management Information System CHAPTER 1
2 pages
Installation Guide PROGRESS DATABASE ON UNIX SERVER
No ratings yet
Installation Guide PROGRESS DATABASE ON UNIX SERVER
196 pages
Take Home Quiz MIS
No ratings yet
Take Home Quiz MIS
5 pages
Course List
No ratings yet
Course List
6 pages
Vardan Hingmire: Curriculum Vitae
No ratings yet
Vardan Hingmire: Curriculum Vitae
2 pages
Salesforce Admin Report
No ratings yet
Salesforce Admin Report
34 pages
Complete Unit 2
No ratings yet
Complete Unit 2
217 pages
PROVINCIAL
No ratings yet
PROVINCIAL
5 pages
Bak g009
No ratings yet
Bak g009
10 pages
Sr. Machine Learning Engineer
No ratings yet
Sr. Machine Learning Engineer
1 page
IT Network Support Engineer CV
No ratings yet
IT Network Support Engineer CV
4 pages
Ravi Resume
No ratings yet
Ravi Resume
4 pages
BD Case Study
No ratings yet
BD Case Study
3 pages
HESFB Loan Application Guide 2021/22
No ratings yet
HESFB Loan Application Guide 2021/22
2 pages
Microsoft 365 Modules Overview
No ratings yet
Microsoft 365 Modules Overview
3 pages
Your Plan Likely Went Well For Several Reasons
No ratings yet
Your Plan Likely Went Well For Several Reasons
4 pages
This Technical Information Sheet (Tis) Describes How To Configure The Itools Eurombus Opc
No ratings yet
This Technical Information Sheet (Tis) Describes How To Configure The Itools Eurombus Opc
3 pages
Essentials C3D2010 Session 01 Introduction
No ratings yet
Essentials C3D2010 Session 01 Introduction
13 pages
Java Developer Career Profile
No ratings yet
Java Developer Career Profile
6 pages
Documenting Architecture
No ratings yet
Documenting Architecture
37 pages
Unit - 12 - Dependability and Security Specification
No ratings yet
Unit - 12 - Dependability and Security Specification
27 pages
Ukraine Power Grid
No ratings yet
Ukraine Power Grid
8 pages
Moxa Mgate 5109 Series Datasheet v1.0
No ratings yet
Moxa Mgate 5109 Series Datasheet v1.0
6 pages
RM and Scada
No ratings yet
RM and Scada
53 pages
Padmabhooshan Vasantraodada Patil Institute of Technology, Budhgaon
No ratings yet
Padmabhooshan Vasantraodada Patil Institute of Technology, Budhgaon
31 pages