0% found this document useful (0 votes)

14 views75 pages

R Programming Lab

data

Uploaded by

Voleti Sambasivarao

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views75 pages

R Programming Lab

data

Uploaded by

Voleti Sambasivarao

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 75

I.

Download and install R-Programming environment and install basic packages using
install.packages() command in R.

To download, install, and set up the R programming environment, follow these steps:

Step 1: Download R

1. Go to the official R project website: https://cran.r-project.org/.

2. Click on the "Download R" link, and then choose a CRAN mirror (choose a server
location close to you).
3. Download the appropriate version of R for your operating system (Windows, macOS, or
Linux).

Step 2: Install R

1. For Windows:
o Run the downloaded .exe file and follow the installation instructions.
2. For macOS:
o Open the downloaded .pkg file and follow the installation instructions.
3. For Linux:
o Depending on your distribution, you can install R via your package manager. For
example, on Ubuntu/Debian:

bash
Copy code
sudo apt-get update
sudo apt-get install r-base

Step 3: Install RStudio (Optional but Recommended)

RStudio provides a user-friendly integrated development environment (IDE) for R.

1. Download RStudio from https://rstudio.com/products/rstudio/download/.

2. Install it by following the instructions for your operating system.

Step 4: Install Basic R Packages

Once you have R installed, you can install R packages using the install.packages() command.
Here’s how to do that:

1. Open R or RStudio.
2. In the R console, install the necessary packages by running the following commands:

r
Copy code
# Install basic packages
install.packages("ggplot2") # for data visualization
install.packages("dplyr") # for data manipulation
install.packages("tidyr") # for data tidying
install.packages("readr") # for reading data
install.packages("stringr") # for string manipulation
install.packages("lubridate") # for date/time manipulation
install.packages("shiny") # for building web apps
install.packages("caret") # for machine learning

# Check for any package dependencies

install.packages("devtools") # for development tools

These commands will download and install the specified R packages from CRAN.

Step 5: Verify Installation

After the packages are installed, you can load them to check if everything is working correctly:

r
Copy code
library(ggplot2)
library(dplyr)
library(tidyr)
library(readr)
library(stringr)
library(lubridate)
library(shiny)
library(caret)

If you do not see any error messages, the packages have been installed and loaded successfully.

II. Learn all the basics of R-Programming (Data types, Variables, Operators etc,.)
Basic Syntax

R is case-sensitive, and commands are executed using the console or a script. Each command is
usually executed with a new line or separated by a semicolon (;). Comments are added using #.

Example:

r
Copy code
# This is a comment
print("Hello, World!") # Outputs: [1] "Hello, World!"
2. Data Types in R

R has various data types that are fundamental to working with different kinds of data.

a. Numeric

Numbers, including both integers and floating-point numbers.

r
Copy code
x <- 10 # Integer
y <- 5.67 # Decimal

b. Character

Strings or text data. Enclosed in single or double quotes.

r
Copy code
name <- "Alice"

c. Logical

Boolean values that represent TRUE or FALSE.

r
Copy code
is_true <- TRUE
is_false <- FALSE

d. Complex

Numbers with imaginary parts.

r
Copy code
z <- 2 + 3i

e. Factor

Used to handle categorical data. Factors can be ordered or unordered.

r
Copy code
gender <- factor(c("Male", "Female", "Female", "Male"))
3. Variables

Variables in R are assigned using the assignment operator <-, although = is also used.

r
Copy code
x <- 10 # Assigning 10 to x
y = 20 # Also valid, but `<-` is preferred

Variable names:

 Must start with a letter (A-Z or a-z)

 Can contain letters, digits, periods (.), and underscores (_)
 Cannot contain spaces or begin with numbers

4. Operators in R

a. Arithmetic Operators

Used for basic mathematical calculations:

 +: Addition
 -: Subtraction
 *: Multiplication
 /: Division
 ^: Exponentiation
 %%: Modulus (remainder)
 %/%: Integer Division

Example:

r
Copy code
x <- 5
y <- 2
x + y # Output: 7
x ^ y # Output: 25
x %% y # Output: 1 (remainder of 5/2)

b. Relational Operators

Used for comparisons:

 ==: Equal to
 !=: Not equal to
 >: Greater than
 <: Less than
 >=: Greater than or equal to
 <=: Less than or equal to

Example:

r
Copy code
x <- 5
y <- 10
x == y # Output: FALSE
x < y # Output: TRUE

c. Logical Operators

Used to combine or negate logical conditions:

 &: AND
 |: OR
 !: NOT

Example:

r
Copy code
x <- TRUE
y <- FALSE
x & y # Output: FALSE (because both aren't TRUE)
x | y # Output: TRUE (because at least one is TRUE)
!x # Output: FALSE (negation of TRUE)

5. Data Structures in R

R has several key data structures that allow you to store and manipulate data efficiently.

a. Vectors

A vector is a sequence of elements of the same data type.

r
Copy code
v <- c(1, 2, 3, 4) # Numeric vector
v_char <- c("a", "b", "c") # Character vector

 Indexing: Vectors are 1-indexed (the first element is at position 1).

r
Copy code
v[1] # Output: 1

 Vector Operations: Operations can be applied to entire vectors.

r
Copy code
v <- c(1, 2, 3)
v + 1 # Output: c(2, 3, 4)

b. Matrices

Matrices are 2-dimensional arrays, where all elements must have the same data type.

r
Copy code
m <- matrix(1:9, nrow=3, ncol=3) # Creates a 3x3 matrix

c. Lists

Lists are collections of elements that can be of different data types.

r
Copy code
my_list <- list(1, "Hello", TRUE) # A list with a number, a string, and a logical

d. Data Frames

Data frames are used for storing tabular data, and each column can contain different data types.

r
Copy code
df <- data.frame(Name=c("John", "Alice"), Age=c(25, 30))

 Accessing Data Frame Columns:

r
Copy code
df$Name # Output: "John" "Alice"

6. Control Structures

a. Conditional Statements

You can control the flow of your program using if, else if, and else statements.
r
Copy code
x <- 5
if (x > 3) {
print("x is greater than 3")
} else if (x == 3) {
print("x is 3")
} else {
print("x is less than 3")
}

b. Loops

For Loop
r
Copy code
for (i in 1:5) {
print(i)
}
While Loop
r
Copy code
i <- 1
while (i <= 5) {
print(i)
i <- i + 1
}
Repeat Loop
r
Copy code
i <- 1
repeat {
print(i)
i <- i + 1
if (i > 5) {
break
}
}

7. Functions

Functions in R are defined using the function keyword.

Example:
r
Copy code
# A simple function to calculate the square of a number
square <- function(x) {
return(x^2)
}

square(4) # Output: 16

8. Reading and Writing Data

a. Reading Data

r
Copy code
# Read a CSV file
data <- read.csv("data.csv")

# Read a table (txt file)

data <- read.table("data.txt")

b. Writing Data

r
Copy code
# Write a data frame to a CSV file
write.csv(df, "output.csv")

9. Plotting in R

R provides powerful built-in functions for data visualization. One of the most popular libraries is
ggplot2, but base R also allows simple plots.

r
Copy code
# Simple plot using base R
x <- 1:10
y <- x^2
plot(x, y)

For advanced visualizations, use ggplot2:

r
Copy code
library(ggplot2)
# Create a simple scatter plot
ggplot(data=df, aes(x=Age, y=Name)) + geom_point()

III. Write R command to

i) Illustrate summation, subtraction, multiplication, and division operations on vectors
using vectors.
ii) Enumerate multiplication and division operations between matrices and vectors in R
console

Step 1: Define the vectors

Let’s define two numeric vectors for the illustration:

r
Copy code
# Define two numeric vectors
v1 <- c(2, 4, 6)
v2 <- c(1, 3, 5)

Step 2: Perform operations on the vectors

1. Summation of vectors

When two vectors are added, R adds the corresponding elements.

r
Copy code
# Summation of vectors
sum_result <- v1 + v2
print(sum_result)

Output:

r
Copy code
[1] 3 7 11

Explanation:

 2+1=3
 4+3=7
 6 + 5 = 11
2. Subtraction of vectors

Subtraction works similarly, with corresponding elements subtracted.

r
Copy code
# Subtraction of vectors
sub_result <- v1 - v2
print(sub_result)

Output:

r
Copy code
[1] 1 1 1

Explanation:

 2-1=1
 4-3=1
 6-5=1

3. Multiplication of vectors

Element-wise multiplication multiplies corresponding elements of the vectors.

r
Copy code
# Multiplication of vectors
mul_result <- v1 * v2
print(mul_result)

Output:

r
Copy code
[1] 2 12 30

Explanation:

 2*1=2
 4 * 3 = 12
 6 * 5 = 30
4. Division of vectors

Element-wise division divides corresponding elements of the vectors.

r
Copy code
# Division of vectors
div_result <- v1 / v2
print(div_result)

Output:

r
Copy code
[1] 2.000000 1.333333 1.200000

Explanation:

 2 / 1 = 2.0
 4 / 3 ≈ 1.33
 6 / 5 = 1.2

Summary:

 Addition: v1 + v2 results in a new vector where each element is the sum of

corresponding elements.
 Subtraction: v1 - v2 results in a new vector where each element is the difference of
corresponding elements.
 Multiplication: v1 * v2 results in a new vector where each element is the product of
corresponding elements.
 Division: v1 / v2 results in a new vector where each element is the quotient of
corresponding elements.

This illustrates the element-wise operations performed on vectors in R.

ii) Enumerate multiplication and division operations between matrices and vectors in R console.

In R, you can perform multiplication and division between matrices and vectors. The operations
depend on the dimensions of the matrix and the vector. There are two main types of
multiplication:

1. Element-wise operations: Operations are performed element by element.

2. Matrix multiplication: This follows the rules of linear algebra.
Step 1: Define a matrix and a vector

Let's start by defining a matrix and a vector:

r
Copy code
# Define a 3x3 matrix
matrix_1 <- matrix(1:9, nrow = 3, ncol = 3)

# Define a vector of length 3

vector_1 <- c(2, 4, 6)

Step 2: Perform Multiplication

1. Element-wise multiplication between a matrix and a vector

For element-wise multiplication, the vector is broadcasted across the matrix rows or columns. If
the vector length matches either the number of rows or the number of columns, R will perform
the operation across the respective dimension.

r
Copy code
# Element-wise multiplication of matrix and vector (column-wise)
elementwise_mul <- matrix_1 * vector_1
print(elementwise_mul)

Output:

r
Copy code
[,1] [,2] [,3]
[1,] 2 8 18
[2,] 4 10 24
[3,] 6 12 30

Explanation:

 The first row of the matrix is multiplied by the first element of the vector: 1 * 2, 2 * 4, 3 *
6
 The second row by the second element: 4 * 2, 5 * 4, 6 * 6
 And so on for the third row.

2. Matrix multiplication with a vector (using %*%)

Matrix multiplication requires the number of columns of the matrix to match the length of the
vector. The operation is performed as a linear algebra matrix-vector multiplication.
r
Copy code
# Matrix multiplication (dot product)
matrix_mul <- matrix_1 %*% vector_1
print(matrix_mul)

Output:

r
Copy code
[,1]
[1,] 28
[2,] 64
[3,] 100

Explanation:

 12 + 24 + 3*6 = 28

 4*2 + 5*4 + 6*6 = 64
 7*2 + 8*4 + 9*6 = 100

Step 3: Perform Division

1. Element-wise division between a matrix and a vector

R will broadcast the vector across the matrix (either by rows or columns) and perform element-
wise division.

r
Copy code
# Element-wise division of matrix by vector (column-wise)
elementwise_div <- matrix_1 / vector_1
print(elementwise_div)

Output:

r
Copy code
[,1] [,2] [,3]
[1,] 0.500000 0.5000000 0.5000000
[2,] 2.000000 1.2500000 1.0000000
[3,] 3.500000 2.0000000 1.5000000

Explanation:
 The first row of the matrix is divided by the vector: 1 / 2, 2 / 4, 3 / 6
 The second row: 4 / 2, 5 / 4, 6 / 6
 And so on for the third row.

2. Matrix division (solve)

In linear algebra, matrix division is generally considered as multiplication by the inverse of the
matrix. To divide a matrix by a vector, you can solve the system of linear equations. For this, you
use solve():

r
Copy code
# Solve the system of equations matrix_1 %*% x = vector_1
solve_result <- solve(matrix_1, vector_1)
print(solve_result)

This returns the vector x such that matrix_1 %*% x = vector_1.

Summary:

 Element-wise multiplication (*): Multiplies each element of the matrix by the

corresponding element of the vector.
 Matrix multiplication (%*%): Performs dot product between the matrix and the vector,
requiring the vector length to match the number of columns of the matrix.
 Element-wise division (/): Divides each element of the matrix by the corresponding
element of the vector.
 Matrix division (solve()): Used for solving linear equations, similar to matrix inversion
in division.

. IV. Write R command to

i) Illustrates the usage of Vector subsetting and Matrix subsetting

ii) Write a program to create an array of 3×3 matrices with 3 rows and 3 columns.

i) Illustrates the usage of Vector subsetting and Matrix subsetting

. Vector Subsetting

Vector subsetting allows you to access or extract specific elements from a vector. There are
multiple ways to subset a vector:

a. By Indexing
You can extract elements using their positions (indices). In R, indexing starts at 1.

r
Copy code
# Define a vector
v <- c(10, 20, 30, 40, 50)

# Extract the second element

v[2] # Output: 20

# Extract multiple elements (e.g., 2nd and 4th elements)

v[c(2, 4)] # Output: 20 40

# Extract a sequence of elements (e.g., from 2nd to 4th)

v[2:4] # Output: 20 30 40

b. By Logical Conditions

You can subset based on conditions. Logical subsetting returns elements where the condition is
TRUE.

r
Copy code
# Subset elements greater than 25
v[v > 25] # Output: 30 40 50

# Subset elements equal to 40

v[v == 40] # Output: 40

c. By Logical Vector

If you provide a logical vector with TRUE or FALSE, R will select the elements corresponding
to TRUE.

r
Copy code
# Define a logical vector
logical_vec <- c(TRUE, FALSE, TRUE, FALSE, TRUE)

# Use the logical vector for subsetting

v[logical_vec] # Output: 10 30 50

d. By Negative Indexing

You can exclude specific elements by using negative indices.

r
Copy code
# Exclude the second element
v[-2] # Output: 10 30 40 50

# Exclude the second and fourth elements

v[-c(2, 4)] # Output: 10 30 50

2. Matrix Subsetting

Matrix subsetting allows you to extract rows, columns, or specific elements from a matrix. In R,
a matrix is a 2-dimensional structure, and subsetting is done using row and column indices.

a. By Indexing (Row and Column)

You can specify the row and column indices to extract elements. The format is matrix[row,
column].

r
Copy code
# Define a 3x3 matrix
m <- matrix(1:9, nrow = 3, byrow = TRUE)
print(m)
# Output:
# [,1] [,2] [,3]
# [1,] 1 2 3
# [2,] 4 5 6
# [3,] 7 8 9

# Extract the element at the 2nd row, 3rd column

m[2, 3] # Output: 6

# Extract the entire 1st row

m[1, ] # Output: 1 2 3

# Extract the entire 2nd column

m[, 2] # Output: 2 5 8

# Extract a submatrix (1st and 3rd rows, 2nd and 3rd columns)
m[c(1, 3), c(2, 3)]
# Output:
# [,1] [,2]
# [1,] 2 3
# [2,] 8 9

b. By Logical Conditions
You can subset a matrix based on conditions. If applied to the whole matrix, this will return the
elements that satisfy the condition.

r
Copy code
# Subset matrix elements greater than 5
m[m > 5] # Output: 6 7 8 9

c. Subset Rows or Columns by Logical Vector

You can use a logical vector to select specific rows or columns.

r
Copy code
# Define a logical vector to select rows
row_select <- c(TRUE, FALSE, TRUE)

# Subset matrix to keep 1st and 3rd rows

m[row_select, ]
# Output:
# [,1] [,2] [,3]
# [1,] 1 2 3
# [2,] 7 8 9

# Subset matrix to keep 1st and 3rd columns

col_select <- c(TRUE, FALSE, TRUE)
m[, col_select]
# Output:
# [,1] [,2]
# [1,] 1 3
# [2,] 4 6
# [3,] 7 9

d. By Dropping Dimensions

By default, subsetting returns the result in the lowest possible dimension. However, you can
prevent this behavior by setting drop = FALSE.

r
Copy code
# Extract the 1st column (with default behavior)
m[, 1] # Output: 1 4 7 (a vector)

# Extract the 1st column as a matrix (without dropping dimensions)

m[, 1, drop = FALSE]
# Output:
# [,1]
# [1,] 1
# [2,] 4
# [3,] 7

Summary:

 Vector Subsetting:
o Can be done using indices, logical conditions, or logical vectors.
o Negative indexing allows you to exclude specific elements.

 Matrix Subsetting:
o Uses row and column indices in the format matrix[row, column].
o Allows subsetting by logical conditions or logical vectors.
o Can extract entire rows or columns, or submatrices, and can retain dimensions by
setting drop = FALSE.

These operations help you manipulate and access data efficiently in R, depending on the
structure of your data.

ii) Write a program to create an array of 3×3 matrices with 3 rows and 3 columns.

You can create an array of 3x3 matrices in R using the array() function. An array in R can have
more than two dimensions, and in this case, we'll create an array with multiple 3x3 matrices.

Here’s a program that creates an array of 3x3 matrices, with 3 rows and 3 columns, and two
matrices:

r
Copy code
# Define the elements for the array (total 3*3*2 = 18 elements)
elements <- 1:18

# Create an array with 3 rows, 3 columns, and 2 matrices (depth = 2)

array_3x3 <- array(elements, dim = c(3, 3, 2))

# Print the array

print(array_3x3)

Output:

r
Copy code
,,1
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9

,,2

[,1] [,2] [,3]

[1,] 10 13 16
[2,] 11 14 17
[3,] 12 15 18

Explanation:

 elements <- 1:18: Creates a sequence of numbers from 1 to 18.

 array(elements, dim = c(3, 3, 2)): Creates an array of size 3x3 with two matrices (depth
= 2). The dimension parameter dim = c(3, 3, 2) indicates 3 rows, 3 columns, and 2
matrices.

This creates two 3x3 matrices stacked together in a 3D array. You can modify the depth (number
of matrices) or the elements as needed.

IV. Write an R program to draw i) Pie chart ii) 3D Pie Chart, iii) Bar Chart along with chart
legend by
considering suitable CSV file.

I) Pie chart

In R, you can draw a pie chart using the pie() function. Here’s an example program that creates a
pie chart with labeled slices:

Example Program to Draw a Pie Chart in R:

r
Copy code
# Define data for the pie chart (example: percentage of sales by product category)
sales <- c(25, 15, 35, 10, 15)

# Define labels for the pie chart (example: product categories)

categories <- c("Electronics", "Furniture", "Clothing", "Books", "Others")

# Create a pie chart

pie(sales,
labels = categories,
main = "Sales Distribution by Category", # Title of the chart
col = rainbow(length(sales))) # Color palette
Explanation:

 sales: A vector representing the size of each slice in the pie chart (e.g., percentage of
sales in different categories).
 categories: A vector containing labels for each slice.
 pie(): The function to draw the pie chart.
o labels = categories: Labels for each slice.
o main = "Sales Distribution by Category": Adds a title to the pie chart.
o col = rainbow(length(sales)): Assigns different colors to each slice using the
rainbow() function.

Output:

The pie chart will display slices with the corresponding categories and colors, showing the sales
distribution across different categories.

You can also customize the colors, labels, and add percentage values or legends to the pie chart
if needed.

Output:
3D Pie Chart:

It looks like the plotrix package is not installed on your system, which is why you're seeing the
error. You can resolve this by installing the package first. Here’s the complete process:

Step-by-Step Instructions:

1. Install the plotrix package: Run the following command to install the package:

r
Copy code
install.packages("plotrix")

This will download and install the plotrix package.

2. Load the plotrix package: After installation, load the package using:

r
Copy code
library(plotrix)

3. Create the 3D pie chart: Now, you can use the pie3D() function to create your 3D pie
chart. Here's the full code again:

r
Copy code
# Install the plotrix package if not installed
install.packages("plotrix")

# Load the plotrix package

library(plotrix)

# Define data for the pie chart

sales <- c(25, 15, 35, 10, 15)

# Define labels for the pie chart

categories <- c("Electronics", "Furniture", "Clothing", "Books", "Others")

# Create a 3D pie chart

pie3D(sales,
labels = categories,
explode = 0.1, # Adds separation between the slices
main = "3D Sales Distribution by Category", # Title of the chart
col = rainbow(length(sales))) # Color palette

What to do if the error persists:

If the installation still fails due to network or system issues, try restarting R or updating your R
installation, then attempt the installation again using install.packages("plotrix").

Let me know if you encounter further issues!

Output:
V.
iii) Bar Chart along with chart legend by considering suitable CSV file.

The command write.csv(sales_data, file = "sales_data.csv", row.names = FALSE) is correct for

writing a data frame to a CSV file in R. This command will create a CSV file named
sales_data.csv in your working directory, and the row.names = FALSE argument ensures that
row names are not included in the CSV file.

Here’s a brief summary of what this command does:

 sales_data: The data frame you want to save.

 file = "sales_data.csv": Specifies the name of the CSV file to be created.
 row.names = FALSE: Omits row names from the CSV file, resulting in a cleaner output.
Example Usage in R

Here’s the complete process:

1. Create a Data Frame:

r
Copy code
# Create a data frame
sales_data <- data.frame(
Product = c("Electronics", "Furniture", "Clothing", "Books"),
North = c(120, 80, 140, 70),
South = c(100, 90, 150, 60),
East = c(140, 100, 120, 90),
West = c(160, 110, 130, 80)
)

2. Write the Data Frame to a CSV File:

r
Copy code
# Write the data frame to a CSV file
write.csv(sales_data, file = "sales_data.csv", row.names = FALSE)

Verifying the CSV File

After running the command, you should see a file named sales_data.csv in your current working
directory. You can verify its content by opening it with a text editor or spreadsheet application.

To check your current working directory in R, you can use:

r
Copy code
getwd()

If you want to set a different directory, use:

r
Copy code
setwd("path/to/your/directory")

Replace "path/to/your/directory" with the path where you want to save the CSV file.

VI. 6. Create a CSV file having Speed and Distance attributes with 1000 records. Write R
program to draw
i) Box plots
ii) Histogram
iii) Line Graph
iv) Multiple line graphs
v) Scatter plot
to demonstrate the relation between the cars speed and the distance.

To create a CSV file with Speed and Distance attributes, each having 1000 records, and then use
it to draw a scatter plot in R, follow these steps:

Step 1: Create the CSV File

1. Generate Data: You’ll need to create a dataset with random or synthetic data for Speed
and Distance.
2. Write to CSV: Save this dataset to a CSV file.

Here's how you can do this in R:

r
Copy code
# Set seed for reproducibility
set.seed(123)

# Generate random data

speed <- runif(1000, min = 20, max = 120) # Random speeds between 20 and 120
distance <- runif(1000, min = 50, max = 500) # Random distances between 50 and 500

# Create a data frame

data <- data.frame(Speed = speed, Distance = distance)

# Write the data frame to a CSV file

write.csv(data, file = "speed_distance.csv", row.names = FALSE)

Step 2: Draw a Scatter Plot

Once you have the speed_distance.csv file, you can read it into R and create a scatter plot. Here's
how you can do it:

r
Copy code
# Load the necessary library
library(ggplot2)

# Read the CSV file

data <- read.csv("speed_distance.csv")

# Create a scatter plot

ggplot(data, aes(x = Speed, y = Distance)) +
geom_point(color = "blue", alpha = 0.6) + # Scatter plot with blue points
labs(title = "Scatter Plot of Speed vs. Distance",
x = "Speed (km/h)",
y = "Distance (km)") +
theme_minimal()

Explanation:

1. Generating Random Data:

o runif(1000, min, max): Generates 1000 random numbers between the specified
minimum and maximum values.
o data.frame(Speed = speed, Distance = distance): Creates a data frame with Speed
and Distance columns.
2. Writing Data to CSV:
o write.csv(data, file = "speed_distance.csv", row.names = FALSE): Saves the data
frame to a CSV file named speed_distance.csv.
3. Reading Data and Plotting:
o ggplot(data, aes(x = Speed, y = Distance)): Initializes the ggplot object with the
Speed and Distance variables.
o geom_point(color = "blue", alpha = 0.6): Plots the data points with blue color and
some transparency.
o labs(): Adds titles and axis labels.
o theme_minimal(): Applies a minimal theme to the plot.

This R script will generate a CSV file with 1000 records and then create a scatter plot showing
the relationship between Speed and Distance. Adjust the range and distribution of data as needed
for your specific use case.
OUTPUT:
Box plots:

Box plots are a great way to visualize the distribution of a dataset, showing the median, quartiles,
and potential outliers. You can create box plots in R using the boxplot() function or with ggplot2
for more customization.

Example: Creating a Box Plot in R

Let's use the dataset from the previous example (speed_distance.csv) and create a box plot to
visualize the distribution of Speed and Distance.
Using Base R:

1. Generate or Read Data:

r
Copy code
# Generate random data if not already done
set.seed(123)
speed <- runif(1000, min = 20, max = 120)
distance <- runif(1000, min = 50, max = 500)

# Create a data frame

data <- data.frame(Speed = speed, Distance = distance)

# Save to CSV (if needed)

write.csv(data, file = "speed_distance.csv", row.names = FALSE)

2. Create Box Plots:

r
Copy code
# Read the CSV file
data <- read.csv("speed_distance.csv")

# Create box plots for Speed and Distance

par(mfrow = c(1, 2)) # Set up the plotting area to have 1 row and 2 columns

# Box plot for Speed

boxplot(data$Speed,
main = "Box Plot of Speed",
ylab = "Speed (km/h)",
col = "lightblue")

# Box plot for Distance

boxplot(data$Distance,
main = "Box Plot of Distance",
ylab = "Distance (km)",
col = "lightgreen")

Explanation:

 par(mfrow = c(1, 2)): Sets up the plotting area to show two plots side by side.
 boxplot(data$Speed): Creates a box plot for the Speed variable.
 main: Title of the box plot.
 ylab: Label for the y-axis.
 col: Color of the box plot.
Using ggplot2:

1. Load the ggplot2 Package:

r
Copy code
library(ggplot2)

2. Create Box Plots:

r
Copy code
# Read the CSV file
data <- read.csv("speed_distance.csv")

# Create box plots with ggplot2

ggplot(data, aes(x = "", y = Speed)) +
geom_boxplot(fill = "lightblue") +
labs(title = "Box Plot of Speed", y = "Speed (km/h)") +
theme_minimal()

ggplot(data, aes(x = "", y = Distance)) +

geom_boxplot(fill = "lightgreen") +
labs(title = "Box Plot of Distance", y = "Distance (km)") +
theme_minimal()

Explanation:

 ggplot(data, aes(x = "", y = Speed)): Initializes the ggplot object with Speed as the y-
axis variable.
 geom_boxplot(fill = "lightblue"): Creates the box plot with a light blue fill.
 labs(): Adds titles and y-axis labels.
 theme_minimal(): Applies a minimal theme to the plot.

Both methods will provide a visual representation of the data distribution, highlighting the
median, quartiles, and potential outliers. Choose the method based on your preference and the
level of customization you need.

Generate or Read Data:

Create Box Plots:

OUTPUT:
Histogram :A histogram is used to visualize the distribution of a continuous variable by dividing
the data into bins or intervals and counting the number of observations in each bin. In R,
histograms can be created using the base hist() function or ggplot2 for more customization.

Example: Creating a Histogram in R

Let’s continue with the dataset (speed_distance.csv) and create histograms for both Speed and
Distance.

Using Base R:

1. Generate or Read Data:

If you have already generated or saved the data as shown before, you can load the CSV.
Otherwise, generate new data:

r
Copy code
# Generate random data if not already done
set.seed(123)
speed <- runif(1000, min = 20, max = 120)
distance <- runif(1000, min = 50, max = 500)

# Create a data frame

data <- data.frame(Speed = speed, Distance = distance)

# Save to CSV (if needed)

write.csv(data, file = "speed_distance.csv", row.names = FALSE)

2. Create Histograms:

r
Copy code
# Read the CSV file
data <- read.csv("speed_distance.csv")

# Set up plotting area to show 2 histograms side by side

par(mfrow = c(1, 2))

# Histogram for Speed

hist(data$Speed,
main = "Histogram of Speed",
xlab = "Speed (km/h)",
col = "lightblue",
border = "black")

# Histogram for Distance

hist(data$Distance,
main = "Histogram of Distance",
xlab = "Distance (km)",
col = "lightgreen",
border = "black")

Explanation:

 par(mfrow = c(1, 2)): Arranges the plots to be displayed side by side.

 hist(): Creates the histogram.
o data$Speed or data$Distance: The variable you are plotting.
o main: Title of the histogram.
o xlab: Label for the x-axis.
o col: Fill color of the histogram bars.
o border: Border color of the bars.

Using ggplot2:

1. Load the ggplot2 Package:

r
Copy code
library(ggplot2)

2. Create Histograms:

r
Copy code
# Read the CSV file
data <- read.csv("speed_distance.csv")

# Histogram for Speed

ggplot(data, aes(x = Speed)) +
geom_histogram(binwidth = 5, fill = "lightblue", color = "black") +
labs(title = "Histogram of Speed", x = "Speed (km/h)", y = "Frequency") +
theme_minimal()

# Histogram for Distance

ggplot(data, aes(x = Distance)) +
geom_histogram(binwidth = 20, fill = "lightgreen", color = "black") +
labs(title = "Histogram of Distance", x = "Distance (km)", y = "Frequency") +
theme_minimal()

Explanation:

 ggplot(data, aes(x = Speed)): Initializes the ggplot object with Speed as the x-axis
variable.
 geom_histogram(): Creates the histogram with customizable bin width and colors.
o binwidth: Controls the width of each bin.
o fill: Color of the bars.
o color: Border color of the bars.
 labs(): Adds titles and axis labels.
 theme_minimal(): Applies a minimal theme to the plot.
Output:
Output:

Line Graph:

A line graph is a great way to visualize trends in data, especially when there is a continuous
relationship between variables. In R, you can create line graphs using the base plot() function or
with ggplot2 for more customization.

Example: Creating a Line Graph in R

Let’s use the same dataset (speed_distance.csv) and create a line graph to show the trend
between Speed and Distance.
Using Base R:

1. Generate or Read Data:

If you already have the speed_distance.csv file, you can load it. Otherwise, you can
generate new data:

r
Copy code
# Generate random data if not already done
set.seed(123)
speed <- runif(1000, min = 20, max = 120)
distance <- runif(1000, min = 50, max = 500)

# Create a data frame

data <- data.frame(Speed = speed, Distance = distance)

# Save to CSV (if needed)

write.csv(data, file = "speed_distance.csv", row.names = FALSE)

2. Create a Line Graph:

r
Copy code
# Read the CSV file
data <- read.csv("speed_distance.csv")

# Sort the data by Speed to make the line graph more meaningful
data <- data[order(data$Speed), ]

# Create a line plot

plot(data$Speed, data$Distance, type = "l", # 'l' stands for line graph
col = "blue", # Line color
main = "Line Graph of Speed vs Distance",
xlab = "Speed (km/h)",
ylab = "Distance (km)")

Explanation:

 plot(): Base R function to create a scatter plot or line graph.

o data$Speed, data$Distance: Variables to be plotted on the x and y axes.
o type = "l": Specifies that the plot should be a line graph.
o col: Sets the color of the line.
o main, xlab, ylab: Add titles and axis labels.
Using ggplot2:

1. Load the ggplot2 Package:

r
Copy code
library(ggplot2)

2. Create a Line Graph:

r
Copy code
# Read the CSV file
data <- read.csv("speed_distance.csv")

# Sort the data by Speed to make the line graph more meaningful
data <- data[order(data$Speed), ]

# Create the line graph using ggplot

ggplot(data, aes(x = Speed, y = Distance)) +
geom_line(color = "blue") +
labs(title = "Line Graph of Speed vs Distance",
x = "Speed (km/h)",
y = "Distance (km)") +
theme_minimal()

Explanation:

 ggplot(data, aes(x = Speed, y = Distance)): Initializes the ggplot object with Speed on
the x-axis and Distance on the y-axis.
 geom_line(color = "blue"): Adds a line graph with a blue line.
 labs(): Adds titles and axis labels.
 theme_minimal(): Applies a minimal theme to the plot for a clean appearance.
Multiple line graphs

Creating multiple line graphs on the same plot allows you to compare trends between different
variables. In R, this can be achieved using either base R or ggplot2.

Example: Creating Multiple Line Graphs in R

Let’s assume you want to compare two or more sets of data, such as Speed, Distance, and
perhaps another variable like FuelConsumption. Here’s how to do that.

Using Base R:

1. Generate or Read Data:

First, let’s create the dataset, including an additional variable (FuelConsumption):

r
Copy code
# Generate random data
set.seed(123)
speed <- runif(1000, min = 20, max = 120)
distance <- runif(1000, min = 50, max = 500)
fuel_consumption <- runif(1000, min = 5, max = 20)

# Create a data frame

data <- data.frame(Speed = speed, Distance = distance, FuelConsumption =
fuel_consumption)

# Save to CSV (if needed)

write.csv(data, file = "speed_distance_fuel.csv", row.names = FALSE)

2. Create Multiple Line Graphs:

r
Copy code
# Read the CSV file
data <- read.csv("speed_distance_fuel.csv")

# Sort the data by Speed for meaningful plotting

data <- data[order(data$Speed), ]

# Plot the first line (Speed vs Distance)

plot(data$Speed, data$Distance, type = "l", col = "blue",
ylim = c(min(data$Distance, data$FuelConsumption), max(data$Distance,
data$FuelConsumption)),
xlab = "Speed (km/h)", ylab = "Value",
main = "Multiple Line Graph: Speed vs Distance and Fuel Consumption")

# Add the second line (Speed vs Fuel Consumption)

lines(data$Speed, data$FuelConsumption, type = "l", col = "red")

# Add a legend
legend("topright", legend = c("Distance", "Fuel Consumption"), col = c("blue", "red"), lty
= 1)

Explanation:

 plot(): Plots the first line (Speed vs Distance) with type "l" for a line graph.
 lines(): Adds another line (Speed vs Fuel Consumption) on the same plot.
 ylim: Sets the y-axis range to ensure both lines fit.
 legend(): Adds a legend to the plot.

Using ggplot2:

1. Load the ggplot2 Package:

r
Copy code
library(ggplot2)

2. Create Multiple Line Graphs:

r
Copy code
# Read the CSV file
data <- read.csv("speed_distance_fuel.csv")

# Reshape the data for ggplot (long format)

library(reshape2)
data_long <- melt(data, id.vars = "Speed", variable.name = "Variable", value.name =
"Value")

# Create the line graph

ggplot(data_long, aes(x = Speed, y = Value, color = Variable)) +
geom_line() +
labs(title = "Multiple Line Graph: Speed vs Distance and Fuel Consumption",
x = "Speed (km/h)",
y = "Value") +
theme_minimal() +
scale_color_manual(values = c("Distance" = "blue", "FuelConsumption" = "red"))

Explanation:

 melt(): Converts the data to long format, where each row corresponds to a single
observation (necessary for ggplot).
 geom_line(): Adds line graphs to the plot.
 aes(color = Variable): Uses different colors for Distance and FuelConsumption.
 scale_color_manual(): Manually sets the colors for the variables.
Scatter plot

A scatter plot is used to visualize the relationship between two continuous variables by plotting
points where each point represents an observation. In R, scatter plots can be created using either
base R or ggplot2.

Example: Creating a Scatter Plot in R

Let’s use the speed_distance.csv dataset and create a scatter plot of Speed and Distance.

Using Base R:

1. Generate or Read Data:

If you already have the speed_distance.csv file, you can load it. Otherwise, you can
generate new data:

r
Copy code
# Generate random data if not already done
set.seed(123)
speed <- runif(1000, min = 20, max = 120)
distance <- runif(1000, min = 50, max = 500)

# Create a data frame

data <- data.frame(Speed = speed, Distance = distance)

# Save to CSV (if needed)

write.csv(data, file = "speed_distance.csv", row.names = FALSE)

2. Create a Scatter Plot:

r
Copy code
# Read the CSV file
data <- read.csv("speed_distance.csv")

# Create the scatter plot

plot(data$Speed, data$Distance,
main = "Scatter Plot of Speed vs Distance",
xlab = "Speed (km/h)", ylab = "Distance (km)",
col = "blue", pch = 19)

Explanation:

 plot(): The base R function for creating scatter plots.

o data$Speed, data$Distance: Variables plotted on the x and y axes.
o main: Title of the plot.
o xlab, ylab: Labels for the x and y axes.
o col: Color of the points.
o pch: Shape of the points (19 is a filled circle).

Using ggplot2:

1. Load the ggplot2 Package:

r
Copy code
library(ggplot2)
2. Create a Scatter Plot:

r
Copy code
# Read the CSV file
data <- read.csv("speed_distance.csv")

# Create the scatter plot using ggplot

ggplot(data, aes(x = Speed, y = Distance)) +
geom_point(color = "blue") +
labs(title = "Scatter Plot of Speed vs Distance",
x = "Speed (km/h)",
y = "Distance (km)") +
theme_minimal()

Explanation:

 ggplot(data, aes(x = Speed, y = Distance)): Initializes the ggplot object with Speed and
Distance as the x and y axes.
 geom_point(): Adds the points (scatter plot).
 labs(): Adds the title and axis labels.
 theme_minimal(): Applies a minimal theme for a clean look.
VII. 7. Implement different data structures in R (Vectors, Lists, Data Frames)

In R, three commonly used data structures are vectors, lists, and data frames. Each has different
characteristics and use cases. Here is how you can implement each of them:

1. Vectors:

A vector is a simple data structure that contains elements of the same type (numeric, character,
logical, etc.).

Example:
r
Copy code
# Numeric vector
num_vector <- c(1, 2, 3, 4, 5)

# Character vector
char_vector <- c("apple", "banana", "cherry")

# Logical vector
log_vector <- c(TRUE, FALSE, TRUE)

# Print the vectors

print(num_vector)
print(char_vector)
print(log_vector)

2. Lists:

A list is a more flexible data structure in R. It can contain elements of different types (vectors,
matrices, even other lists).

Example:
r
Copy code
# Create a list with different types of elements
my_list <- list(name = "John", age = 30, scores = c(85, 90, 95))

# Access elements of the list

print(my_list$name) # Access by name
print(my_list[[2]]) # Access by index
print(my_list$scores) # Access the vector inside the list

3. Data Frames:

A data frame is a table-like structure, similar to a spreadsheet or SQL table. Each column in a
data frame can contain different types of data (e.g., numeric, character), but each column must
contain the same type of data.

Example:
r
Copy code
# Create a data frame
df <- data.frame(
Name = c("Alice", "Bob", "Charlie"),
Age = c(25, 30, 35),
Score = c(85, 90, 88)
)

# View the data frame

print(df)

# Access a specific column

print(df$Name)

# Access a specific row (the second row)

print(df[2,])

# Access a specific value (second row, third column)

print(df[2, 3])

VECTOR
LIST:

Data Frames:
VIII. 8. Write an R program to read a csv file and analyze the data in the file using EDA
(Explorative Data Analysis) techniques.

Here’s a basic R program that reads a CSV file and performs Exploratory Data Analysis (EDA)
on the data. The program includes reading the file, summarizing data, visualizing distributions,
and identifying correlations.

r
Copy code
# Load necessary libraries
library(ggplot2)
library(dplyr)
library(readr)
library(tidyr)
library(corrplot)
library(psych)

# 1. Read CSV file

data <- read_csv("yourfile.csv")

# 2. Get an overview of the data

# Display the first few rows of the data
head(data)

# Display the structure of the data (column types, etc.)

str(data)

# Display summary statistics (mean, median, min, max, etc.)

summary(data)

# Check for missing values

missing_values <- colSums(is.na(data))
print(missing_values)

# 3. Analyze categorical variables

cat_vars <- data %>% select_if(is.factor)

# Frequency distribution of categorical variables

for (var in colnames(cat_vars)) {
cat("Frequency distribution of ", var, ":\n")
print(table(data[[var]]))
}

# 4. Analyze numerical variables

num_vars <- data %>% select_if(is.numeric)

# Summary statistics for numerical variables

describe(num_vars)

# Histograms for each numerical variable

for (var in colnames(num_vars)) {
ggplot(data, aes(x = .data[[var]])) +
geom_histogram(binwidth = 30, fill = "blue", alpha = 0.7) +
labs(title = paste("Distribution of", var)) +
theme_minimal() +
print()
}
# 5. Analyze correlations between numerical variables
corr_matrix <- cor(num_vars, use = "complete.obs")
corrplot(corr_matrix, method = "circle")

# 6. Boxplots to detect outliers

for (var in colnames(num_vars)) {
ggplot(data, aes(y = .data[[var]])) +
geom_boxplot(fill = "orange", alpha = 0.6) +
labs(title = paste("Boxplot of", var)) +
theme_minimal() +
print()
}

# 7. Scatterplot matrix for numerical variables

pairs(num_vars)

# 8. Conclusion of the analysis

# Print correlation matrix for numeric variables
print(corr_matrix)

# Print the overall summary

print(summary(data))

Key Sections of the Code:

1. Reading the CSV: The file is read using read_csv() from the readr library.
2. Overview: head(), str(), and summary() provide an initial understanding of the data.
3. Missing Values: The is.na() function helps identify missing values in each column.
4. Categorical Analysis: Frequency distributions are printed for categorical variables.
5. Numerical Analysis: Histograms and summary statistics (describe() from psych
package) give an overview of the distribution.
6. Correlations: The corrplot library visualizes the correlation matrix between numerical
variables.
7. Outlier Detection: Boxplots help in detecting outliers for each numerical variable.
8. Scatterplot Matrix: pairs() creates a matrix of scatterplots for pairwise relationships
between numerical variables.

This script provides a solid starting point for exploring a dataset. You can modify it based on the
specific variables and data structure you're working with.
OUTPUT:
IX. 9. Write an R program to illustrate Linear Regression and Multi linear Regression
considering suitable CSV file.

Here's an R program that illustrates Simple Linear Regression and Multiple Linear
Regression using a CSV file. The program will read the data from a CSV file, fit both linear
models, and evaluate their performance. I’ve included basic steps for handling both simple and
multiple linear regression.

Sample CSV Data

The CSV file could have columns like:

makefile
Copy code
advertising.csv:
TV, Radio, Newspaper, Sales
230.1, 37.8, 69.2, 22.1
44.5, 39.3, 45.1, 10.4
17.2, 45.9, 69.3, 9.3
151.5, 41.3, 58.5, 18.5

In this example, we are trying to predict Sales based on the features: TV, Radio, and Newspaper
advertising budgets.

R Program

r
Copy code
# Load required libraries
library(ggplot2)
library(readr)
library(caret) # For performance metrics

# 1. Read the CSV file

data <- read_csv("advertising.csv")

# 2. Get an overview of the data

str(data)
summary(data)

# 3. Simple Linear Regression (using TV as predictor)

# Fit the model

simple_model <- lm(Sales ~ TV, data = data)

# Display model summary

summary(simple_model)

# Plot the regression line with the data points

ggplot(data, aes(x = TV, y = Sales)) +
geom_point(color = 'blue', alpha = 0.5) +
geom_smooth(method = "lm", se = FALSE, color = 'red') +
labs(title = "Simple Linear Regression: Sales vs TV",
x = "TV Advertising Budget",
y = "Sales") +
theme_minimal()

# Predict Sales using the model

simple_predictions <- predict(simple_model, newdata = data)

# Evaluate the model performance

simple_mse <- mean((data$Sales - simple_predictions)^2)
cat("Simple Linear Regression MSE: ", simple_mse, "\n")

# 4. Multiple Linear Regression (using TV, Radio, and Newspaper as predictors)

# Fit the multiple regression model

multi_model <- lm(Sales ~ TV + Radio + Newspaper, data = data)

# Display model summary

summary(multi_model)

# Predict Sales using the model

multi_predictions <- predict(multi_model, newdata = data)

# Evaluate the multiple regression model

multi_mse <- mean((data$Sales - multi_predictions)^2)
cat("Multiple Linear Regression MSE: ", multi_mse, "\n")

# 5. Visualizing the residuals for the multiple regression model

ggplot(data, aes(x = multi_predictions, y = data$Sales - multi_predictions)) +
geom_point(color = 'blue', alpha = 0.5) +
geom_hline(yintercept = 0, linetype = "dashed", color = 'red') +
labs(title = "Residual Plot for Multiple Linear Regression",
x = "Predicted Sales",
y = "Residuals") +
theme_minimal()

# 6. Checking for multicollinearity (optional)

vif(multi_model) # Variance Inflation Factor

Key Sections of the Code:

1. Reading the CSV: The read_csv() function is used to load the data from a CSV file. You
can replace "advertising.csv" with your actual file path.
2. Simple Linear Regression:
o The linear regression model is created with the formula Sales ~ TV.
o The lm() function fits the linear model.
o The summary() function gives an overview of the model's performance, including
coefficients and p-values.
o A scatterplot of TV vs. Sales with a regression line is created using ggplot2.
3. Multiple Linear Regression:
o A multiple linear regression model is created with the formula Sales ~ TV +
Radio + Newspaper.
oThe model summary shows the coefficients and overall fit statistics.
oPredictions are made using the predict() function.
oThe Mean Squared Error (MSE) is calculated to evaluate the model’s
performance.
4. Model Performance Evaluation:
o The residual plot helps visualize how well the model fits the data.
o vif() (Variance Inflation Factor) can be used to check for multicollinearity in the
multiple regression model.

Here’s an R program that reads a CSV file and performs Exploratory Data Analysis (EDA). The
program will load the CSV, summarize the data, detect missing values, and generate basic
visualizations.

R Program for EDA

r
Copy code
# Load necessary libraries
library(ggplot2)
library(readr)
library(dplyr)
library(tidyr)
library(corrplot)
library(psych)

# 1. Read CSV file

data <- read_csv("yourfile.csv")

# 2. Get an overview of the data

# Display the first few rows
print(head(data))

# Display structure (data types)

str(data)

# Display summary statistics (mean, median, min, max, etc.)

print(summary(data))

# Check for missing values

missing_values <- colSums(is.na(data))
print("Missing values in each column:")
print(missing_values)

# 3. Analyze categorical variables

cat_vars <- data %>% select_if(is.character)

# Frequency distribution of categorical variables

if (ncol(cat_vars) > 0) {
print("Categorical Variable Distribution:")
for (var in colnames(cat_vars)) {
print(paste("Distribution of", var))
print(table(data[[var]]))
}
}

# 4. Analyze numerical variables

num_vars <- data %>% select_if(is.numeric)

# Summary statistics for numerical variables

print("Numerical Variables Summary:")
print(describe(num_vars))

# Histograms for numerical variables

print("Histograms for numerical variables:")
for (var in colnames(num_vars)) {
p <- ggplot(data, aes_string(x = var)) +
geom_histogram(binwidth = 30, fill = "blue", alpha = 0.7) +
labs(title = paste("Distribution of", var), x = var, y = "Count") +
theme_minimal()
print(p)
}

# 5. Correlation analysis for numerical variables

if (ncol(num_vars) > 1) {
corr_matrix <- cor(num_vars, use = "complete.obs")
print("Correlation Matrix:")
print(corr_matrix)
corrplot(corr_matrix, method = "circle")
}

# 6. Boxplots to detect outliers for numerical variables

print("Boxplots for numerical variables:")
for (var in colnames(num_vars)) {
p <- ggplot(data, aes_string(y = var)) +
geom_boxplot(fill = "orange", alpha = 0.6) +
labs(title = paste("Boxplot of", var), y = var) +
theme_minimal()
print(p)
}

# 7. Pairwise scatterplots for numerical variables

if (ncol(num_vars) > 1) {
print("Pairwise scatterplots for numerical variables:")
pairs(num_vars)
}

# 8. Conclusion of the analysis

print("Exploratory Data Analysis Completed")

Key Sections of the Program:

1. Reading the CSV: The file is read using read_csv(). Replace "yourfile.csv" with your
actual file path.
2. Data Overview: head(), str(), and summary() provide an initial understanding of the data.
3. Missing Values: is.na() helps identify missing values.
4. Categorical Variables: Categorical columns are analyzed using table(), which shows
frequency counts.
5. Numerical Variables: The describe() function from the psych package provides
summary statistics. Histograms and boxplots give insights into the distribution and
potential outliers.
6. Correlation Analysis: For numerical variables, a correlation matrix is generated and
visualized using corrplot().
7. Pairwise Scatterplots: For all numeric columns, scatterplots visualize the relationships
between pairs of variables.

Example Visualizations:

 Histograms: Shows the distribution of each numeric variable.

 Correlation Plot: Displays correlations between numeric variables.
 Boxplots: Identifies potential outliers for each numeric variable.
 Pairwise Scatterplots: Displays relationships between numeric variables.
This program serves as a template for performing EDA on any dataset. You can adjust it based
on the structure of your data.
OUTPUT:

R Programming Course Material
No ratings yet
R Programming Course Material
217 pages
R-Programming Notes
100% (2)
R-Programming Notes
33 pages
R Program Questions 1-24
No ratings yet
R Program Questions 1-24
56 pages
What Does "Free and Open-Source" Mean?: (You Don't Have To Pay For It)
No ratings yet
What Does "Free and Open-Source" Mean?: (You Don't Have To Pay For It)
6 pages
R Project
0% (1)
R Project
25 pages
R Record A Section
No ratings yet
R Record A Section
54 pages
R Programming Detailed Notes
No ratings yet
R Programming Detailed Notes
5 pages
2 Program
No ratings yet
2 Program
11 pages
R Programmimg Lab FIle
No ratings yet
R Programmimg Lab FIle
35 pages
R Statistical Package
No ratings yet
R Statistical Package
63 pages
BA Lab - 53
No ratings yet
BA Lab - 53
11 pages
Unit III R Programming Fundamentals
No ratings yet
Unit III R Programming Fundamentals
33 pages
R for NGS Data Analysis Beginners
No ratings yet
R for NGS Data Analysis Beginners
5 pages
R Language Basics
No ratings yet
R Language Basics
13 pages
Statistical Lab Using R-Programming Lab Manual and Workbook: Department of Mathematics
No ratings yet
Statistical Lab Using R-Programming Lab Manual and Workbook: Department of Mathematics
58 pages
R Programming Lab Manual
No ratings yet
R Programming Lab Manual
35 pages
Lab Manual
No ratings yet
Lab Manual
46 pages
R Language Lab Manual Lab 1
No ratings yet
R Language Lab Manual Lab 1
32 pages
ProgrammingForDS13 Intror
No ratings yet
ProgrammingForDS13 Intror
25 pages
R Programming Lab
No ratings yet
R Programming Lab
33 pages
R Programming
No ratings yet
R Programming
1 page
R Assignment Till Question 11
No ratings yet
R Assignment Till Question 11
13 pages
Ba Assignment Sem 6 (22504025) Dhruvi Pathania
No ratings yet
Ba Assignment Sem 6 (22504025) Dhruvi Pathania
28 pages
Unit 4 - Big Data Technologies
No ratings yet
Unit 4 - Big Data Technologies
48 pages
Business Analytics Question Bank
No ratings yet
Business Analytics Question Bank
65 pages
Lecture Notes - Programming in R
No ratings yet
Lecture Notes - Programming in R
9 pages
Java Unit II
No ratings yet
Java Unit II
35 pages
Data Science Using R - Lab Manual-Complete Ver 2.0 - Nov 2024
No ratings yet
Data Science Using R - Lab Manual-Complete Ver 2.0 - Nov 2024
36 pages
About R Language
No ratings yet
About R Language
15 pages
Unit 1 Notes R Programming
No ratings yet
Unit 1 Notes R Programming
7 pages
Pushpendra Lab File
No ratings yet
Pushpendra Lab File
51 pages
R - Data Types
No ratings yet
R - Data Types
4 pages
R Code Intro
No ratings yet
R Code Intro
46 pages
Satyam Jha R File
No ratings yet
Satyam Jha R File
41 pages
Introduction To R
No ratings yet
Introduction To R
23 pages
53.understanding How Brands Compete or A Guide To Duplication of Purchase Analysis
No ratings yet
53.understanding How Brands Compete or A Guide To Duplication of Purchase Analysis
12 pages
Introduction To R
No ratings yet
Introduction To R
34 pages
R Programming Lab
100% (1)
R Programming Lab
46 pages
R Programming Basics for Beginners
No ratings yet
R Programming Basics for Beginners
14 pages
SSMDA Expt 7
No ratings yet
SSMDA Expt 7
16 pages
Lecture 1
No ratings yet
Lecture 1
42 pages
R Data Types and Variables Guide
No ratings yet
R Data Types and Variables Guide
19 pages
The Impact of Information Technology Methods On Accounting Information Quality
No ratings yet
The Impact of Information Technology Methods On Accounting Information Quality
15 pages
Introduction To R PDF
No ratings yet
Introduction To R PDF
56 pages
DSF Gourav-2
No ratings yet
DSF Gourav-2
30 pages
FMCG Buying Behavior in U.P.
No ratings yet
FMCG Buying Behavior in U.P.
7 pages
R Lanaguage
No ratings yet
R Lanaguage
25 pages
WINSEM2021-22 MAT2001 ELA VL2021220501462 Reference Material I 04-01-2022 1. Introduction of R Language - I
No ratings yet
WINSEM2021-22 MAT2001 ELA VL2021220501462 Reference Material I 04-01-2022 1. Introduction of R Language - I
15 pages
R Programming
No ratings yet
R Programming
22 pages
Introduction To Analytics and R File
No ratings yet
Introduction To Analytics and R File
29 pages
Computing With R
No ratings yet
Computing With R
20 pages
STAT 110 TP-153 Course Specifications 2024
No ratings yet
STAT 110 TP-153 Course Specifications 2024
6 pages
2 Undefined
No ratings yet
2 Undefined
86 pages
R and R Studio Introduction
No ratings yet
R and R Studio Introduction
24 pages
R File Finall
No ratings yet
R File Finall
75 pages
R Lab
No ratings yet
R Lab
114 pages
Unit I R Data Structures
No ratings yet
Unit I R Data Structures
30 pages
Final Paper June 16 2023
No ratings yet
Final Paper June 16 2023
43 pages
Data Analysis Using R and Vectors
No ratings yet
Data Analysis Using R and Vectors
35 pages
Lnmiit BTP Project Report Final
No ratings yet
Lnmiit BTP Project Report Final
28 pages
Introduction To R: 1 Getting Started
No ratings yet
Introduction To R: 1 Getting Started
14 pages
R For Absolute Beginners - Hands-On R Tutorial: June 2018
No ratings yet
R For Absolute Beginners - Hands-On R Tutorial: June 2018
43 pages
Rintro
No ratings yet
Rintro
14 pages
Forecasting Methods Guide
No ratings yet
Forecasting Methods Guide
62 pages
Ch5 - Demand Based Planning2022
No ratings yet
Ch5 - Demand Based Planning2022
77 pages
Introduction To R
No ratings yet
Introduction To R
39 pages
(Ebook PDF) Introductory Econometrics: Asia-Pacific 2nd Edition PDF Download
100% (1)
(Ebook PDF) Introductory Econometrics: Asia-Pacific 2nd Edition PDF Download
47 pages
STATS LAB Basics of R PDF
No ratings yet
STATS LAB Basics of R PDF
77 pages
SLG Statistics and Probability Q2W7
No ratings yet
SLG Statistics and Probability Q2W7
6 pages
Analytical Techniques Final OSA
No ratings yet
Analytical Techniques Final OSA
15 pages
Part I: Introductory Materials: Introduction To R
No ratings yet
Part I: Introductory Materials: Introduction To R
25 pages
CGLOPS1 ATBD SSM1km-V1 I1.20
No ratings yet
CGLOPS1 ATBD SSM1km-V1 I1.20
37 pages
Data Analysis With Microsoft Excel Updated For Office 2007 3rd Edition Kenneth N. Berk No Waiting Time
100% (1)
Data Analysis With Microsoft Excel Updated For Office 2007 3rd Edition Kenneth N. Berk No Waiting Time
140 pages
Cipla and Cadila Pharamaceuticals
No ratings yet
Cipla and Cadila Pharamaceuticals
67 pages
The Correlation Between Grammar Mastery and Speaking Ability of The Eighth Grade Students
No ratings yet
The Correlation Between Grammar Mastery and Speaking Ability of The Eighth Grade Students
13 pages
Empirical Investigation of The Reasons For Marital Infidelity Among Men in Benin City, Nigeria Emmanuel Imuetinyan Obarisiagbon, PHD & Okorie Oko Ume
No ratings yet
Empirical Investigation of The Reasons For Marital Infidelity Among Men in Benin City, Nigeria Emmanuel Imuetinyan Obarisiagbon, PHD & Okorie Oko Ume
288 pages
Factors Affecting Employee Turnover
No ratings yet
Factors Affecting Employee Turnover
21 pages
European Studies Master Thesis Topics
100% (3)
European Studies Master Thesis Topics
7 pages
Irrational 12 Thoughts Summary
No ratings yet
Irrational 12 Thoughts Summary
93 pages
Multithreaded Programming
No ratings yet
Multithreaded Programming
13 pages
B.B.A.LL.B 5 Year 2022-23
No ratings yet
B.B.A.LL.B 5 Year 2022-23
87 pages
Internalized Misogyny As Correlates To Experiences With Ambivalent Sexism Among Female Engineering Student: Basis For SAGIP Gender Awareness Program
No ratings yet
Internalized Misogyny As Correlates To Experiences With Ambivalent Sexism Among Female Engineering Student: Basis For SAGIP Gender Awareness Program
36 pages
921 6537 1 PB
No ratings yet
921 6537 1 PB
21 pages
Lecture 11, Portfolio Optimization With Annotation
No ratings yet
Lecture 11, Portfolio Optimization With Annotation
49 pages
2021 Auditorcareerconcernsauditfeesandauditquality APJAE
No ratings yet
2021 Auditorcareerconcernsauditfeesandauditquality APJAE
28 pages
Data Analysis 3rd Sem
No ratings yet
Data Analysis 3rd Sem
15 pages
Statistics Assignment 1 Square Root Correlation Coefficient and Linear Regression
No ratings yet
Statistics Assignment 1 Square Root Correlation Coefficient and Linear Regression
3 pages
Improving Pretraining Data Using Perplexity Correlations
No ratings yet
Improving Pretraining Data Using Perplexity Correlations
31 pages
Exception Handling
No ratings yet
Exception Handling
17 pages
CH 4 - Variance Covariance VaR
No ratings yet
CH 4 - Variance Covariance VaR
9 pages
Unit 2
No ratings yet
Unit 2
13 pages
Enterprise Requirements
No ratings yet
Enterprise Requirements
2 pages
(Ebook PDF) Marketing Research 6th Edition Available Full Chapters
100% (1)
(Ebook PDF) Marketing Research 6th Edition Available Full Chapters
128 pages

R Programming Lab

Uploaded by

R Programming Lab

Uploaded by

I.

1. Go to the official R project website: https://cran.r-project.org/.

Step 3: Install RStudio (Optional but Recommended)

RStudio provides a user-friendly integrated development environment (IDE) for R.

1. Download RStudio from https://rstudio.com/products/rstudio/download/.

Step 4: Install Basic R Packages

# Check for any package dependencies

Step 5: Verify Installation

Numbers, including both integers and floating-point numbers.

Strings or text data. Enclosed in single or double quotes.

Boolean values that represent TRUE or FALSE.

Numbers with imaginary parts.

Used to handle categorical data. Factors can be ordered or unordered.

 Must start with a letter (A-Z or a-z)

Used for basic mathematical calculations:

Used for comparisons:

Used to combine or negate logical conditions:

A vector is a sequence of elements of the same data type.

 Indexing: Vectors are 1-indexed (the first element is at position 1).

 Vector Operations: Operations can be applied to entire vectors.

Lists are collections of elements that can be of different data types.

 Accessing Data Frame Columns:

Functions in R are defined using the function keyword.

8. Reading and Writing Data

# Read a table (txt file)

For advanced visualizations, use ggplot2:

III. Write R command to

Step 1: Define the vectors

Let’s define two numeric vectors for the illustration:

Step 2: Perform operations on the vectors

When two vectors are added, R adds the corresponding elements.

Subtraction works similarly, with corresponding elements subtracted.

Element-wise multiplication multiplies corresponding elements of the vectors.

Element-wise division divides corresponding elements of the vectors.

 Addition: v1 + v2 results in a new vector where each element is the sum of

This illustrates the element-wise operations performed on vectors in R.

1. Element-wise operations: Operations are performed element by element.

Let's start by defining a matrix and a vector:

# Define a vector of length 3

Step 2: Perform Multiplication

1. Element-wise multiplication between a matrix and a vector

2. Matrix multiplication with a vector (using %*%)

 1*2 + 2*4 + 3*6 = 28

Step 3: Perform Division

1. Element-wise division between a matrix and a vector

2. Matrix division (solve)

This returns the vector x such that matrix_1 %*% x = vector_1.

 Element-wise multiplication (*): Multiplies each element of the matrix by the

. IV. Write R command to

i) Illustrates the usage of Vector subsetting and Matrix subsetting

i) Illustrates the usage of Vector subsetting and Matrix subsetting

# Extract the second element

# Extract multiple elements (e.g., 2nd and 4th elements)

# Extract a sequence of elements (e.g., from 2nd to 4th)

# Subset elements equal to 40

# Use the logical vector for subsetting

You can exclude specific elements by using negative indices.

# Exclude the second and fourth elements

a. By Indexing (Row and Column)

# Extract the element at the 2nd row, 3rd column

# Extract the entire 1st row

# Extract the entire 2nd column

c. Subset Rows or Columns by Logical Vector

You can use a logical vector to select specific rows or columns.

# Subset matrix to keep 1st and 3rd rows

# Subset matrix to keep 1st and 3rd columns

# Extract the 1st column as a matrix (without dropping dimensions)

# Create an array with 3 rows, 3 columns, and 2 matrices (depth = 2)

# Print the array

[,1] [,2] [,3]

 elements <- 1:18: Creates a sequence of numbers from 1 to 18.

Example Program to Draw a Pie Chart in R:

# Define labels for the pie chart (example: product categories)

# Create a pie chart

This will download and install the plotrix package.

# Load the plotrix package

 12 + 24 + 3*6 = 28