R Programming Lab
R Programming Lab
Download and install R-Programming environment and install basic packages using
install.packages() command in R.
To download, install, and set up the R programming environment, follow these steps:
Step 1: Download R
Step 2: Install R
1. For Windows:
o Run the downloaded .exe file and follow the installation instructions.
2. For macOS:
o Open the downloaded .pkg file and follow the installation instructions.
3. For Linux:
o Depending on your distribution, you can install R via your package manager. For
example, on Ubuntu/Debian:
bash
Copy code
sudo apt-get update
sudo apt-get install r-base
Once you have R installed, you can install R packages using the install.packages() command.
Here’s how to do that:
1. Open R or RStudio.
2. In the R console, install the necessary packages by running the following commands:
r
Copy code
# Install basic packages
install.packages("ggplot2") # for data visualization
install.packages("dplyr") # for data manipulation
install.packages("tidyr") # for data tidying
install.packages("readr") # for reading data
install.packages("stringr") # for string manipulation
install.packages("lubridate") # for date/time manipulation
install.packages("shiny") # for building web apps
install.packages("caret") # for machine learning
These commands will download and install the specified R packages from CRAN.
After the packages are installed, you can load them to check if everything is working correctly:
r
Copy code
library(ggplot2)
library(dplyr)
library(tidyr)
library(readr)
library(stringr)
library(lubridate)
library(shiny)
library(caret)
If you do not see any error messages, the packages have been installed and loaded successfully.
II. Learn all the basics of R-Programming (Data types, Variables, Operators etc,.)
Basic Syntax
R is case-sensitive, and commands are executed using the console or a script. Each command is
usually executed with a new line or separated by a semicolon (;). Comments are added using #.
Example:
r
Copy code
# This is a comment
print("Hello, World!") # Outputs: [1] "Hello, World!"
2. Data Types in R
R has various data types that are fundamental to working with different kinds of data.
a. Numeric
r
Copy code
x <- 10 # Integer
y <- 5.67 # Decimal
b. Character
r
Copy code
name <- "Alice"
c. Logical
r
Copy code
is_true <- TRUE
is_false <- FALSE
d. Complex
r
Copy code
z <- 2 + 3i
e. Factor
r
Copy code
gender <- factor(c("Male", "Female", "Female", "Male"))
3. Variables
Variables in R are assigned using the assignment operator <-, although = is also used.
r
Copy code
x <- 10 # Assigning 10 to x
y = 20 # Also valid, but `<-` is preferred
Variable names:
4. Operators in R
a. Arithmetic Operators
+: Addition
-: Subtraction
*: Multiplication
/: Division
^: Exponentiation
%%: Modulus (remainder)
%/%: Integer Division
Example:
r
Copy code
x <- 5
y <- 2
x + y # Output: 7
x ^ y # Output: 25
x %% y # Output: 1 (remainder of 5/2)
b. Relational Operators
==: Equal to
!=: Not equal to
>: Greater than
<: Less than
>=: Greater than or equal to
<=: Less than or equal to
Example:
r
Copy code
x <- 5
y <- 10
x == y # Output: FALSE
x < y # Output: TRUE
c. Logical Operators
&: AND
|: OR
!: NOT
Example:
r
Copy code
x <- TRUE
y <- FALSE
x & y # Output: FALSE (because both aren't TRUE)
x | y # Output: TRUE (because at least one is TRUE)
!x # Output: FALSE (negation of TRUE)
5. Data Structures in R
R has several key data structures that allow you to store and manipulate data efficiently.
a. Vectors
r
Copy code
v <- c(1, 2, 3, 4) # Numeric vector
v_char <- c("a", "b", "c") # Character vector
r
Copy code
v <- c(1, 2, 3)
v + 1 # Output: c(2, 3, 4)
b. Matrices
Matrices are 2-dimensional arrays, where all elements must have the same data type.
r
Copy code
m <- matrix(1:9, nrow=3, ncol=3) # Creates a 3x3 matrix
c. Lists
r
Copy code
my_list <- list(1, "Hello", TRUE) # A list with a number, a string, and a logical
d. Data Frames
Data frames are used for storing tabular data, and each column can contain different data types.
r
Copy code
df <- data.frame(Name=c("John", "Alice"), Age=c(25, 30))
r
Copy code
df$Name # Output: "John" "Alice"
6. Control Structures
a. Conditional Statements
You can control the flow of your program using if, else if, and else statements.
r
Copy code
x <- 5
if (x > 3) {
print("x is greater than 3")
} else if (x == 3) {
print("x is 3")
} else {
print("x is less than 3")
}
b. Loops
For Loop
r
Copy code
for (i in 1:5) {
print(i)
}
While Loop
r
Copy code
i <- 1
while (i <= 5) {
print(i)
i <- i + 1
}
Repeat Loop
r
Copy code
i <- 1
repeat {
print(i)
i <- i + 1
if (i > 5) {
break
}
}
7. Functions
Example:
r
Copy code
# A simple function to calculate the square of a number
square <- function(x) {
return(x^2)
}
square(4) # Output: 16
a. Reading Data
r
Copy code
# Read a CSV file
data <- read.csv("data.csv")
b. Writing Data
r
Copy code
# Write a data frame to a CSV file
write.csv(df, "output.csv")
9. Plotting in R
R provides powerful built-in functions for data visualization. One of the most popular libraries is
ggplot2, but base R also allows simple plots.
r
Copy code
# Simple plot using base R
x <- 1:10
y <- x^2
plot(x, y)
r
Copy code
library(ggplot2)
# Create a simple scatter plot
ggplot(data=df, aes(x=Age, y=Name)) + geom_point()
r
Copy code
# Define two numeric vectors
v1 <- c(2, 4, 6)
v2 <- c(1, 3, 5)
1. Summation of vectors
r
Copy code
# Summation of vectors
sum_result <- v1 + v2
print(sum_result)
Output:
r
Copy code
[1] 3 7 11
Explanation:
2+1=3
4+3=7
6 + 5 = 11
2. Subtraction of vectors
r
Copy code
# Subtraction of vectors
sub_result <- v1 - v2
print(sub_result)
Output:
r
Copy code
[1] 1 1 1
Explanation:
2-1=1
4-3=1
6-5=1
3. Multiplication of vectors
r
Copy code
# Multiplication of vectors
mul_result <- v1 * v2
print(mul_result)
Output:
r
Copy code
[1] 2 12 30
Explanation:
2*1=2
4 * 3 = 12
6 * 5 = 30
4. Division of vectors
r
Copy code
# Division of vectors
div_result <- v1 / v2
print(div_result)
Output:
r
Copy code
[1] 2.000000 1.333333 1.200000
Explanation:
2 / 1 = 2.0
4 / 3 ≈ 1.33
6 / 5 = 1.2
Summary:
ii) Enumerate multiplication and division operations between matrices and vectors in R console.
In R, you can perform multiplication and division between matrices and vectors. The operations
depend on the dimensions of the matrix and the vector. There are two main types of
multiplication:
r
Copy code
# Define a 3x3 matrix
matrix_1 <- matrix(1:9, nrow = 3, ncol = 3)
For element-wise multiplication, the vector is broadcasted across the matrix rows or columns. If
the vector length matches either the number of rows or the number of columns, R will perform
the operation across the respective dimension.
r
Copy code
# Element-wise multiplication of matrix and vector (column-wise)
elementwise_mul <- matrix_1 * vector_1
print(elementwise_mul)
Output:
r
Copy code
[,1] [,2] [,3]
[1,] 2 8 18
[2,] 4 10 24
[3,] 6 12 30
Explanation:
The first row of the matrix is multiplied by the first element of the vector: 1 * 2, 2 * 4, 3 *
6
The second row by the second element: 4 * 2, 5 * 4, 6 * 6
And so on for the third row.
Matrix multiplication requires the number of columns of the matrix to match the length of the
vector. The operation is performed as a linear algebra matrix-vector multiplication.
r
Copy code
# Matrix multiplication (dot product)
matrix_mul <- matrix_1 %*% vector_1
print(matrix_mul)
Output:
r
Copy code
[,1]
[1,] 28
[2,] 64
[3,] 100
Explanation:
R will broadcast the vector across the matrix (either by rows or columns) and perform element-
wise division.
r
Copy code
# Element-wise division of matrix by vector (column-wise)
elementwise_div <- matrix_1 / vector_1
print(elementwise_div)
Output:
r
Copy code
[,1] [,2] [,3]
[1,] 0.500000 0.5000000 0.5000000
[2,] 2.000000 1.2500000 1.0000000
[3,] 3.500000 2.0000000 1.5000000
Explanation:
The first row of the matrix is divided by the vector: 1 / 2, 2 / 4, 3 / 6
The second row: 4 / 2, 5 / 4, 6 / 6
And so on for the third row.
In linear algebra, matrix division is generally considered as multiplication by the inverse of the
matrix. To divide a matrix by a vector, you can solve the system of linear equations. For this, you
use solve():
r
Copy code
# Solve the system of equations matrix_1 %*% x = vector_1
solve_result <- solve(matrix_1, vector_1)
print(solve_result)
Summary:
ii) Write a program to create an array of 3×3 matrices with 3 rows and 3 columns.
. Vector Subsetting
Vector subsetting allows you to access or extract specific elements from a vector. There are
multiple ways to subset a vector:
a. By Indexing
You can extract elements using their positions (indices). In R, indexing starts at 1.
r
Copy code
# Define a vector
v <- c(10, 20, 30, 40, 50)
b. By Logical Conditions
You can subset based on conditions. Logical subsetting returns elements where the condition is
TRUE.
r
Copy code
# Subset elements greater than 25
v[v > 25] # Output: 30 40 50
c. By Logical Vector
If you provide a logical vector with TRUE or FALSE, R will select the elements corresponding
to TRUE.
r
Copy code
# Define a logical vector
logical_vec <- c(TRUE, FALSE, TRUE, FALSE, TRUE)
d. By Negative Indexing
2. Matrix Subsetting
Matrix subsetting allows you to extract rows, columns, or specific elements from a matrix. In R,
a matrix is a 2-dimensional structure, and subsetting is done using row and column indices.
You can specify the row and column indices to extract elements. The format is matrix[row,
column].
r
Copy code
# Define a 3x3 matrix
m <- matrix(1:9, nrow = 3, byrow = TRUE)
print(m)
# Output:
# [,1] [,2] [,3]
# [1,] 1 2 3
# [2,] 4 5 6
# [3,] 7 8 9
# Extract a submatrix (1st and 3rd rows, 2nd and 3rd columns)
m[c(1, 3), c(2, 3)]
# Output:
# [,1] [,2]
# [1,] 2 3
# [2,] 8 9
b. By Logical Conditions
You can subset a matrix based on conditions. If applied to the whole matrix, this will return the
elements that satisfy the condition.
r
Copy code
# Subset matrix elements greater than 5
m[m > 5] # Output: 6 7 8 9
r
Copy code
# Define a logical vector to select rows
row_select <- c(TRUE, FALSE, TRUE)
d. By Dropping Dimensions
By default, subsetting returns the result in the lowest possible dimension. However, you can
prevent this behavior by setting drop = FALSE.
r
Copy code
# Extract the 1st column (with default behavior)
m[, 1] # Output: 1 4 7 (a vector)
Summary:
Vector Subsetting:
o Can be done using indices, logical conditions, or logical vectors.
o Negative indexing allows you to exclude specific elements.
Matrix Subsetting:
o Uses row and column indices in the format matrix[row, column].
o Allows subsetting by logical conditions or logical vectors.
o Can extract entire rows or columns, or submatrices, and can retain dimensions by
setting drop = FALSE.
These operations help you manipulate and access data efficiently in R, depending on the
structure of your data.
ii) Write a program to create an array of 3×3 matrices with 3 rows and 3 columns.
You can create an array of 3x3 matrices in R using the array() function. An array in R can have
more than two dimensions, and in this case, we'll create an array with multiple 3x3 matrices.
Here’s a program that creates an array of 3x3 matrices, with 3 rows and 3 columns, and two
matrices:
r
Copy code
# Define the elements for the array (total 3*3*2 = 18 elements)
elements <- 1:18
Output:
r
Copy code
,,1
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
,,2
Explanation:
This creates two 3x3 matrices stacked together in a 3D array. You can modify the depth (number
of matrices) or the elements as needed.
IV. Write an R program to draw i) Pie chart ii) 3D Pie Chart, iii) Bar Chart along with chart
legend by
considering suitable CSV file.
I) Pie chart
In R, you can draw a pie chart using the pie() function. Here’s an example program that creates a
pie chart with labeled slices:
r
Copy code
# Define data for the pie chart (example: percentage of sales by product category)
sales <- c(25, 15, 35, 10, 15)
sales: A vector representing the size of each slice in the pie chart (e.g., percentage of
sales in different categories).
categories: A vector containing labels for each slice.
pie(): The function to draw the pie chart.
o labels = categories: Labels for each slice.
o main = "Sales Distribution by Category": Adds a title to the pie chart.
o col = rainbow(length(sales)): Assigns different colors to each slice using the
rainbow() function.
Output:
The pie chart will display slices with the corresponding categories and colors, showing the sales
distribution across different categories.
You can also customize the colors, labels, and add percentage values or legends to the pie chart
if needed.
Output:
3D Pie Chart:
It looks like the plotrix package is not installed on your system, which is why you're seeing the
error. You can resolve this by installing the package first. Here’s the complete process:
Step-by-Step Instructions:
1. Install the plotrix package: Run the following command to install the package:
r
Copy code
install.packages("plotrix")
r
Copy code
library(plotrix)
3. Create the 3D pie chart: Now, you can use the pie3D() function to create your 3D pie
chart. Here's the full code again:
r
Copy code
# Install the plotrix package if not installed
install.packages("plotrix")
If the installation still fails due to network or system issues, try restarting R or updating your R
installation, then attempt the installation again using install.packages("plotrix").
r
Copy code
# Create a data frame
sales_data <- data.frame(
Product = c("Electronics", "Furniture", "Clothing", "Books"),
North = c(120, 80, 140, 70),
South = c(100, 90, 150, 60),
East = c(140, 100, 120, 90),
West = c(160, 110, 130, 80)
)
r
Copy code
# Write the data frame to a CSV file
write.csv(sales_data, file = "sales_data.csv", row.names = FALSE)
After running the command, you should see a file named sales_data.csv in your current working
directory. You can verify its content by opening it with a text editor or spreadsheet application.
r
Copy code
getwd()
r
Copy code
setwd("path/to/your/directory")
Replace "path/to/your/directory" with the path where you want to save the CSV file.
VI. 6. Create a CSV file having Speed and Distance attributes with 1000 records. Write R
program to draw
i) Box plots
ii) Histogram
iii) Line Graph
iv) Multiple line graphs
v) Scatter plot
to demonstrate the relation between the cars speed and the distance.
To create a CSV file with Speed and Distance attributes, each having 1000 records, and then use
it to draw a scatter plot in R, follow these steps:
1. Generate Data: You’ll need to create a dataset with random or synthetic data for Speed
and Distance.
2. Write to CSV: Save this dataset to a CSV file.
r
Copy code
# Set seed for reproducibility
set.seed(123)
Once you have the speed_distance.csv file, you can read it into R and create a scatter plot. Here's
how you can do it:
r
Copy code
# Load the necessary library
library(ggplot2)
Explanation:
This R script will generate a CSV file with 1000 records and then create a scatter plot showing
the relationship between Speed and Distance. Adjust the range and distribution of data as needed
for your specific use case.
OUTPUT:
Box plots:
Box plots are a great way to visualize the distribution of a dataset, showing the median, quartiles,
and potential outliers. You can create box plots in R using the boxplot() function or with ggplot2
for more customization.
Let's use the dataset from the previous example (speed_distance.csv) and create a box plot to
visualize the distribution of Speed and Distance.
Using Base R:
r
Copy code
# Generate random data if not already done
set.seed(123)
speed <- runif(1000, min = 20, max = 120)
distance <- runif(1000, min = 50, max = 500)
r
Copy code
# Read the CSV file
data <- read.csv("speed_distance.csv")
Explanation:
par(mfrow = c(1, 2)): Sets up the plotting area to show two plots side by side.
boxplot(data$Speed): Creates a box plot for the Speed variable.
main: Title of the box plot.
ylab: Label for the y-axis.
col: Color of the box plot.
Using ggplot2:
r
Copy code
library(ggplot2)
r
Copy code
# Read the CSV file
data <- read.csv("speed_distance.csv")
Explanation:
ggplot(data, aes(x = "", y = Speed)): Initializes the ggplot object with Speed as the y-
axis variable.
geom_boxplot(fill = "lightblue"): Creates the box plot with a light blue fill.
labs(): Adds titles and y-axis labels.
theme_minimal(): Applies a minimal theme to the plot.
Both methods will provide a visual representation of the data distribution, highlighting the
median, quartiles, and potential outliers. Choose the method based on your preference and the
level of customization you need.
OUTPUT:
Histogram :A histogram is used to visualize the distribution of a continuous variable by dividing
the data into bins or intervals and counting the number of observations in each bin. In R,
histograms can be created using the base hist() function or ggplot2 for more customization.
Let’s continue with the dataset (speed_distance.csv) and create histograms for both Speed and
Distance.
Using Base R:
r
Copy code
# Generate random data if not already done
set.seed(123)
speed <- runif(1000, min = 20, max = 120)
distance <- runif(1000, min = 50, max = 500)
2. Create Histograms:
r
Copy code
# Read the CSV file
data <- read.csv("speed_distance.csv")
Explanation:
Using ggplot2:
r
Copy code
library(ggplot2)
2. Create Histograms:
r
Copy code
# Read the CSV file
data <- read.csv("speed_distance.csv")
Explanation:
ggplot(data, aes(x = Speed)): Initializes the ggplot object with Speed as the x-axis
variable.
geom_histogram(): Creates the histogram with customizable bin width and colors.
o binwidth: Controls the width of each bin.
o fill: Color of the bars.
o color: Border color of the bars.
labs(): Adds titles and axis labels.
theme_minimal(): Applies a minimal theme to the plot.
Output:
Output:
Line Graph:
A line graph is a great way to visualize trends in data, especially when there is a continuous
relationship between variables. In R, you can create line graphs using the base plot() function or
with ggplot2 for more customization.
Let’s use the same dataset (speed_distance.csv) and create a line graph to show the trend
between Speed and Distance.
Using Base R:
If you already have the speed_distance.csv file, you can load it. Otherwise, you can
generate new data:
r
Copy code
# Generate random data if not already done
set.seed(123)
speed <- runif(1000, min = 20, max = 120)
distance <- runif(1000, min = 50, max = 500)
r
Copy code
# Read the CSV file
data <- read.csv("speed_distance.csv")
# Sort the data by Speed to make the line graph more meaningful
data <- data[order(data$Speed), ]
Explanation:
r
Copy code
library(ggplot2)
r
Copy code
# Read the CSV file
data <- read.csv("speed_distance.csv")
# Sort the data by Speed to make the line graph more meaningful
data <- data[order(data$Speed), ]
Explanation:
ggplot(data, aes(x = Speed, y = Distance)): Initializes the ggplot object with Speed on
the x-axis and Distance on the y-axis.
geom_line(color = "blue"): Adds a line graph with a blue line.
labs(): Adds titles and axis labels.
theme_minimal(): Applies a minimal theme to the plot for a clean appearance.
Multiple line graphs
Creating multiple line graphs on the same plot allows you to compare trends between different
variables. In R, this can be achieved using either base R or ggplot2.
Let’s assume you want to compare two or more sets of data, such as Speed, Distance, and
perhaps another variable like FuelConsumption. Here’s how to do that.
Using Base R:
r
Copy code
# Generate random data
set.seed(123)
speed <- runif(1000, min = 20, max = 120)
distance <- runif(1000, min = 50, max = 500)
fuel_consumption <- runif(1000, min = 5, max = 20)
r
Copy code
# Read the CSV file
data <- read.csv("speed_distance_fuel.csv")
# Add a legend
legend("topright", legend = c("Distance", "Fuel Consumption"), col = c("blue", "red"), lty
= 1)
Explanation:
plot(): Plots the first line (Speed vs Distance) with type "l" for a line graph.
lines(): Adds another line (Speed vs Fuel Consumption) on the same plot.
ylim: Sets the y-axis range to ensure both lines fit.
legend(): Adds a legend to the plot.
Using ggplot2:
r
Copy code
library(ggplot2)
r
Copy code
# Read the CSV file
data <- read.csv("speed_distance_fuel.csv")
Explanation:
melt(): Converts the data to long format, where each row corresponds to a single
observation (necessary for ggplot).
geom_line(): Adds line graphs to the plot.
aes(color = Variable): Uses different colors for Distance and FuelConsumption.
scale_color_manual(): Manually sets the colors for the variables.
Scatter plot
A scatter plot is used to visualize the relationship between two continuous variables by plotting
points where each point represents an observation. In R, scatter plots can be created using either
base R or ggplot2.
Let’s use the speed_distance.csv dataset and create a scatter plot of Speed and Distance.
Using Base R:
r
Copy code
# Generate random data if not already done
set.seed(123)
speed <- runif(1000, min = 20, max = 120)
distance <- runif(1000, min = 50, max = 500)
r
Copy code
# Read the CSV file
data <- read.csv("speed_distance.csv")
Explanation:
Using ggplot2:
r
Copy code
library(ggplot2)
2. Create a Scatter Plot:
r
Copy code
# Read the CSV file
data <- read.csv("speed_distance.csv")
Explanation:
ggplot(data, aes(x = Speed, y = Distance)): Initializes the ggplot object with Speed and
Distance as the x and y axes.
geom_point(): Adds the points (scatter plot).
labs(): Adds the title and axis labels.
theme_minimal(): Applies a minimal theme for a clean look.
VII. 7. Implement different data structures in R (Vectors, Lists, Data Frames)
In R, three commonly used data structures are vectors, lists, and data frames. Each has different
characteristics and use cases. Here is how you can implement each of them:
1. Vectors:
A vector is a simple data structure that contains elements of the same type (numeric, character,
logical, etc.).
Example:
r
Copy code
# Numeric vector
num_vector <- c(1, 2, 3, 4, 5)
# Character vector
char_vector <- c("apple", "banana", "cherry")
# Logical vector
log_vector <- c(TRUE, FALSE, TRUE)
2. Lists:
A list is a more flexible data structure in R. It can contain elements of different types (vectors,
matrices, even other lists).
Example:
r
Copy code
# Create a list with different types of elements
my_list <- list(name = "John", age = 30, scores = c(85, 90, 95))
3. Data Frames:
A data frame is a table-like structure, similar to a spreadsheet or SQL table. Each column in a
data frame can contain different types of data (e.g., numeric, character), but each column must
contain the same type of data.
Example:
r
Copy code
# Create a data frame
df <- data.frame(
Name = c("Alice", "Bob", "Charlie"),
Age = c(25, 30, 35),
Score = c(85, 90, 88)
)
VECTOR
LIST:
Data Frames:
VIII. 8. Write an R program to read a csv file and analyze the data in the file using EDA
(Explorative Data Analysis) techniques.
Here’s a basic R program that reads a CSV file and performs Exploratory Data Analysis (EDA)
on the data. The program includes reading the file, summarizing data, visualizing distributions,
and identifying correlations.
r
Copy code
# Load necessary libraries
library(ggplot2)
library(dplyr)
library(readr)
library(tidyr)
library(corrplot)
library(psych)
1. Reading the CSV: The file is read using read_csv() from the readr library.
2. Overview: head(), str(), and summary() provide an initial understanding of the data.
3. Missing Values: The is.na() function helps identify missing values in each column.
4. Categorical Analysis: Frequency distributions are printed for categorical variables.
5. Numerical Analysis: Histograms and summary statistics (describe() from psych
package) give an overview of the distribution.
6. Correlations: The corrplot library visualizes the correlation matrix between numerical
variables.
7. Outlier Detection: Boxplots help in detecting outliers for each numerical variable.
8. Scatterplot Matrix: pairs() creates a matrix of scatterplots for pairwise relationships
between numerical variables.
This script provides a solid starting point for exploring a dataset. You can modify it based on the
specific variables and data structure you're working with.
OUTPUT:
IX. 9. Write an R program to illustrate Linear Regression and Multi linear Regression
considering suitable CSV file.
Here's an R program that illustrates Simple Linear Regression and Multiple Linear
Regression using a CSV file. The program will read the data from a CSV file, fit both linear
models, and evaluate their performance. I’ve included basic steps for handling both simple and
multiple linear regression.
In this example, we are trying to predict Sales based on the features: TV, Radio, and Newspaper
advertising budgets.
R Program
r
Copy code
# Load required libraries
library(ggplot2)
library(readr)
library(caret) # For performance metrics
1. Reading the CSV: The read_csv() function is used to load the data from a CSV file. You
can replace "advertising.csv" with your actual file path.
2. Simple Linear Regression:
o The linear regression model is created with the formula Sales ~ TV.
o The lm() function fits the linear model.
o The summary() function gives an overview of the model's performance, including
coefficients and p-values.
o A scatterplot of TV vs. Sales with a regression line is created using ggplot2.
3. Multiple Linear Regression:
o A multiple linear regression model is created with the formula Sales ~ TV +
Radio + Newspaper.
oThe model summary shows the coefficients and overall fit statistics.
oPredictions are made using the predict() function.
oThe Mean Squared Error (MSE) is calculated to evaluate the model’s
performance.
4. Model Performance Evaluation:
o The residual plot helps visualize how well the model fits the data.
o vif() (Variance Inflation Factor) can be used to check for multicollinearity in the
multiple regression model.
Here’s an R program that reads a CSV file and performs Exploratory Data Analysis (EDA). The
program will load the CSV, summarize the data, detect missing values, and generate basic
visualizations.
r
Copy code
# Load necessary libraries
library(ggplot2)
library(readr)
library(dplyr)
library(tidyr)
library(corrplot)
library(psych)
1. Reading the CSV: The file is read using read_csv(). Replace "yourfile.csv" with your
actual file path.
2. Data Overview: head(), str(), and summary() provide an initial understanding of the data.
3. Missing Values: is.na() helps identify missing values.
4. Categorical Variables: Categorical columns are analyzed using table(), which shows
frequency counts.
5. Numerical Variables: The describe() function from the psych package provides
summary statistics. Histograms and boxplots give insights into the distribution and
potential outliers.
6. Correlation Analysis: For numerical variables, a correlation matrix is generated and
visualized using corrplot().
7. Pairwise Scatterplots: For all numeric columns, scatterplots visualize the relationships
between pairs of variables.
Example Visualizations: