Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
14 views75 pages

R Programming Lab

data
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views75 pages

R Programming Lab

data
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 75

I.

Download and install R-Programming environment and install basic packages using
install.packages() command in R.

To download, install, and set up the R programming environment, follow these steps:

Step 1: Download R

1. Go to the official R project website: https://cran.r-project.org/.


2. Click on the "Download R" link, and then choose a CRAN mirror (choose a server
location close to you).
3. Download the appropriate version of R for your operating system (Windows, macOS, or
Linux).

Step 2: Install R

1. For Windows:
o Run the downloaded .exe file and follow the installation instructions.
2. For macOS:
o Open the downloaded .pkg file and follow the installation instructions.
3. For Linux:
o Depending on your distribution, you can install R via your package manager. For
example, on Ubuntu/Debian:

bash
Copy code
sudo apt-get update
sudo apt-get install r-base

Step 3: Install RStudio (Optional but Recommended)

RStudio provides a user-friendly integrated development environment (IDE) for R.

1. Download RStudio from https://rstudio.com/products/rstudio/download/.


2. Install it by following the instructions for your operating system.

Step 4: Install Basic R Packages

Once you have R installed, you can install R packages using the install.packages() command.
Here’s how to do that:

1. Open R or RStudio.
2. In the R console, install the necessary packages by running the following commands:

r
Copy code
# Install basic packages
install.packages("ggplot2") # for data visualization
install.packages("dplyr") # for data manipulation
install.packages("tidyr") # for data tidying
install.packages("readr") # for reading data
install.packages("stringr") # for string manipulation
install.packages("lubridate") # for date/time manipulation
install.packages("shiny") # for building web apps
install.packages("caret") # for machine learning

# Check for any package dependencies


install.packages("devtools") # for development tools

These commands will download and install the specified R packages from CRAN.

Step 5: Verify Installation

After the packages are installed, you can load them to check if everything is working correctly:

r
Copy code
library(ggplot2)
library(dplyr)
library(tidyr)
library(readr)
library(stringr)
library(lubridate)
library(shiny)
library(caret)

If you do not see any error messages, the packages have been installed and loaded successfully.

II. Learn all the basics of R-Programming (Data types, Variables, Operators etc,.)
Basic Syntax

R is case-sensitive, and commands are executed using the console or a script. Each command is
usually executed with a new line or separated by a semicolon (;). Comments are added using #.

Example:

r
Copy code
# This is a comment
print("Hello, World!") # Outputs: [1] "Hello, World!"
2. Data Types in R

R has various data types that are fundamental to working with different kinds of data.

a. Numeric

Numbers, including both integers and floating-point numbers.

r
Copy code
x <- 10 # Integer
y <- 5.67 # Decimal

b. Character

Strings or text data. Enclosed in single or double quotes.

r
Copy code
name <- "Alice"

c. Logical

Boolean values that represent TRUE or FALSE.

r
Copy code
is_true <- TRUE
is_false <- FALSE

d. Complex

Numbers with imaginary parts.

r
Copy code
z <- 2 + 3i

e. Factor

Used to handle categorical data. Factors can be ordered or unordered.

r
Copy code
gender <- factor(c("Male", "Female", "Female", "Male"))
3. Variables

Variables in R are assigned using the assignment operator <-, although = is also used.

r
Copy code
x <- 10 # Assigning 10 to x
y = 20 # Also valid, but `<-` is preferred

Variable names:

 Must start with a letter (A-Z or a-z)


 Can contain letters, digits, periods (.), and underscores (_)
 Cannot contain spaces or begin with numbers

4. Operators in R

a. Arithmetic Operators

Used for basic mathematical calculations:

 +: Addition
 -: Subtraction
 *: Multiplication
 /: Division
 ^: Exponentiation
 %%: Modulus (remainder)
 %/%: Integer Division

Example:

r
Copy code
x <- 5
y <- 2
x + y # Output: 7
x ^ y # Output: 25
x %% y # Output: 1 (remainder of 5/2)

b. Relational Operators

Used for comparisons:

 ==: Equal to
 !=: Not equal to
 >: Greater than
 <: Less than
 >=: Greater than or equal to
 <=: Less than or equal to

Example:

r
Copy code
x <- 5
y <- 10
x == y # Output: FALSE
x < y # Output: TRUE

c. Logical Operators

Used to combine or negate logical conditions:

 &: AND
 |: OR
 !: NOT

Example:

r
Copy code
x <- TRUE
y <- FALSE
x & y # Output: FALSE (because both aren't TRUE)
x | y # Output: TRUE (because at least one is TRUE)
!x # Output: FALSE (negation of TRUE)

5. Data Structures in R

R has several key data structures that allow you to store and manipulate data efficiently.

a. Vectors

A vector is a sequence of elements of the same data type.

r
Copy code
v <- c(1, 2, 3, 4) # Numeric vector
v_char <- c("a", "b", "c") # Character vector

 Indexing: Vectors are 1-indexed (the first element is at position 1).


r
Copy code
v[1] # Output: 1

 Vector Operations: Operations can be applied to entire vectors.

r
Copy code
v <- c(1, 2, 3)
v + 1 # Output: c(2, 3, 4)

b. Matrices

Matrices are 2-dimensional arrays, where all elements must have the same data type.

r
Copy code
m <- matrix(1:9, nrow=3, ncol=3) # Creates a 3x3 matrix

c. Lists

Lists are collections of elements that can be of different data types.

r
Copy code
my_list <- list(1, "Hello", TRUE) # A list with a number, a string, and a logical

d. Data Frames

Data frames are used for storing tabular data, and each column can contain different data types.

r
Copy code
df <- data.frame(Name=c("John", "Alice"), Age=c(25, 30))

 Accessing Data Frame Columns:

r
Copy code
df$Name # Output: "John" "Alice"

6. Control Structures

a. Conditional Statements

You can control the flow of your program using if, else if, and else statements.
r
Copy code
x <- 5
if (x > 3) {
print("x is greater than 3")
} else if (x == 3) {
print("x is 3")
} else {
print("x is less than 3")
}

b. Loops

For Loop
r
Copy code
for (i in 1:5) {
print(i)
}
While Loop
r
Copy code
i <- 1
while (i <= 5) {
print(i)
i <- i + 1
}
Repeat Loop
r
Copy code
i <- 1
repeat {
print(i)
i <- i + 1
if (i > 5) {
break
}
}

7. Functions

Functions in R are defined using the function keyword.

Example:
r
Copy code
# A simple function to calculate the square of a number
square <- function(x) {
return(x^2)
}

square(4) # Output: 16

8. Reading and Writing Data

a. Reading Data

r
Copy code
# Read a CSV file
data <- read.csv("data.csv")

# Read a table (txt file)


data <- read.table("data.txt")

b. Writing Data

r
Copy code
# Write a data frame to a CSV file
write.csv(df, "output.csv")

9. Plotting in R

R provides powerful built-in functions for data visualization. One of the most popular libraries is
ggplot2, but base R also allows simple plots.

r
Copy code
# Simple plot using base R
x <- 1:10
y <- x^2
plot(x, y)

For advanced visualizations, use ggplot2:

r
Copy code
library(ggplot2)
# Create a simple scatter plot
ggplot(data=df, aes(x=Age, y=Name)) + geom_point()

III. Write R command to


i) Illustrate summation, subtraction, multiplication, and division operations on vectors
using vectors.
ii) Enumerate multiplication and division operations between matrices and vectors in R
console

Step 1: Define the vectors

Let’s define two numeric vectors for the illustration:

r
Copy code
# Define two numeric vectors
v1 <- c(2, 4, 6)
v2 <- c(1, 3, 5)

Step 2: Perform operations on the vectors

1. Summation of vectors

When two vectors are added, R adds the corresponding elements.

r
Copy code
# Summation of vectors
sum_result <- v1 + v2
print(sum_result)

Output:

r
Copy code
[1] 3 7 11

Explanation:

 2+1=3
 4+3=7
 6 + 5 = 11
2. Subtraction of vectors

Subtraction works similarly, with corresponding elements subtracted.

r
Copy code
# Subtraction of vectors
sub_result <- v1 - v2
print(sub_result)

Output:

r
Copy code
[1] 1 1 1

Explanation:

 2-1=1
 4-3=1
 6-5=1

3. Multiplication of vectors

Element-wise multiplication multiplies corresponding elements of the vectors.

r
Copy code
# Multiplication of vectors
mul_result <- v1 * v2
print(mul_result)

Output:

r
Copy code
[1] 2 12 30

Explanation:

 2*1=2
 4 * 3 = 12
 6 * 5 = 30
4. Division of vectors

Element-wise division divides corresponding elements of the vectors.

r
Copy code
# Division of vectors
div_result <- v1 / v2
print(div_result)

Output:

r
Copy code
[1] 2.000000 1.333333 1.200000

Explanation:

 2 / 1 = 2.0
 4 / 3 ≈ 1.33
 6 / 5 = 1.2

Summary:

 Addition: v1 + v2 results in a new vector where each element is the sum of


corresponding elements.
 Subtraction: v1 - v2 results in a new vector where each element is the difference of
corresponding elements.
 Multiplication: v1 * v2 results in a new vector where each element is the product of
corresponding elements.
 Division: v1 / v2 results in a new vector where each element is the quotient of
corresponding elements.

This illustrates the element-wise operations performed on vectors in R.

ii) Enumerate multiplication and division operations between matrices and vectors in R console.

In R, you can perform multiplication and division between matrices and vectors. The operations
depend on the dimensions of the matrix and the vector. There are two main types of
multiplication:

1. Element-wise operations: Operations are performed element by element.


2. Matrix multiplication: This follows the rules of linear algebra.
Step 1: Define a matrix and a vector

Let's start by defining a matrix and a vector:

r
Copy code
# Define a 3x3 matrix
matrix_1 <- matrix(1:9, nrow = 3, ncol = 3)

# Define a vector of length 3


vector_1 <- c(2, 4, 6)

Step 2: Perform Multiplication

1. Element-wise multiplication between a matrix and a vector

For element-wise multiplication, the vector is broadcasted across the matrix rows or columns. If
the vector length matches either the number of rows or the number of columns, R will perform
the operation across the respective dimension.

r
Copy code
# Element-wise multiplication of matrix and vector (column-wise)
elementwise_mul <- matrix_1 * vector_1
print(elementwise_mul)

Output:

r
Copy code
[,1] [,2] [,3]
[1,] 2 8 18
[2,] 4 10 24
[3,] 6 12 30

Explanation:

 The first row of the matrix is multiplied by the first element of the vector: 1 * 2, 2 * 4, 3 *
6
 The second row by the second element: 4 * 2, 5 * 4, 6 * 6
 And so on for the third row.

2. Matrix multiplication with a vector (using %*%)

Matrix multiplication requires the number of columns of the matrix to match the length of the
vector. The operation is performed as a linear algebra matrix-vector multiplication.
r
Copy code
# Matrix multiplication (dot product)
matrix_mul <- matrix_1 %*% vector_1
print(matrix_mul)

Output:

r
Copy code
[,1]
[1,] 28
[2,] 64
[3,] 100

Explanation:

 1*2 + 2*4 + 3*6 = 28


 4*2 + 5*4 + 6*6 = 64
 7*2 + 8*4 + 9*6 = 100

Step 3: Perform Division

1. Element-wise division between a matrix and a vector

R will broadcast the vector across the matrix (either by rows or columns) and perform element-
wise division.

r
Copy code
# Element-wise division of matrix by vector (column-wise)
elementwise_div <- matrix_1 / vector_1
print(elementwise_div)

Output:

r
Copy code
[,1] [,2] [,3]
[1,] 0.500000 0.5000000 0.5000000
[2,] 2.000000 1.2500000 1.0000000
[3,] 3.500000 2.0000000 1.5000000

Explanation:
 The first row of the matrix is divided by the vector: 1 / 2, 2 / 4, 3 / 6
 The second row: 4 / 2, 5 / 4, 6 / 6
 And so on for the third row.

2. Matrix division (solve)

In linear algebra, matrix division is generally considered as multiplication by the inverse of the
matrix. To divide a matrix by a vector, you can solve the system of linear equations. For this, you
use solve():

r
Copy code
# Solve the system of equations matrix_1 %*% x = vector_1
solve_result <- solve(matrix_1, vector_1)
print(solve_result)

This returns the vector x such that matrix_1 %*% x = vector_1.

Summary:

 Element-wise multiplication (*): Multiplies each element of the matrix by the


corresponding element of the vector.
 Matrix multiplication (%*%): Performs dot product between the matrix and the vector,
requiring the vector length to match the number of columns of the matrix.
 Element-wise division (/): Divides each element of the matrix by the corresponding
element of the vector.
 Matrix division (solve()): Used for solving linear equations, similar to matrix inversion
in division.

. IV. Write R command to

i) Illustrates the usage of Vector subsetting and Matrix subsetting

ii) Write a program to create an array of 3×3 matrices with 3 rows and 3 columns.

i) Illustrates the usage of Vector subsetting and Matrix subsetting

. Vector Subsetting

Vector subsetting allows you to access or extract specific elements from a vector. There are
multiple ways to subset a vector:

a. By Indexing
You can extract elements using their positions (indices). In R, indexing starts at 1.

r
Copy code
# Define a vector
v <- c(10, 20, 30, 40, 50)

# Extract the second element


v[2] # Output: 20

# Extract multiple elements (e.g., 2nd and 4th elements)


v[c(2, 4)] # Output: 20 40

# Extract a sequence of elements (e.g., from 2nd to 4th)


v[2:4] # Output: 20 30 40

b. By Logical Conditions

You can subset based on conditions. Logical subsetting returns elements where the condition is
TRUE.

r
Copy code
# Subset elements greater than 25
v[v > 25] # Output: 30 40 50

# Subset elements equal to 40


v[v == 40] # Output: 40

c. By Logical Vector

If you provide a logical vector with TRUE or FALSE, R will select the elements corresponding
to TRUE.

r
Copy code
# Define a logical vector
logical_vec <- c(TRUE, FALSE, TRUE, FALSE, TRUE)

# Use the logical vector for subsetting


v[logical_vec] # Output: 10 30 50

d. By Negative Indexing

You can exclude specific elements by using negative indices.


r
Copy code
# Exclude the second element
v[-2] # Output: 10 30 40 50

# Exclude the second and fourth elements


v[-c(2, 4)] # Output: 10 30 50

2. Matrix Subsetting

Matrix subsetting allows you to extract rows, columns, or specific elements from a matrix. In R,
a matrix is a 2-dimensional structure, and subsetting is done using row and column indices.

a. By Indexing (Row and Column)

You can specify the row and column indices to extract elements. The format is matrix[row,
column].

r
Copy code
# Define a 3x3 matrix
m <- matrix(1:9, nrow = 3, byrow = TRUE)
print(m)
# Output:
# [,1] [,2] [,3]
# [1,] 1 2 3
# [2,] 4 5 6
# [3,] 7 8 9

# Extract the element at the 2nd row, 3rd column


m[2, 3] # Output: 6

# Extract the entire 1st row


m[1, ] # Output: 1 2 3

# Extract the entire 2nd column


m[, 2] # Output: 2 5 8

# Extract a submatrix (1st and 3rd rows, 2nd and 3rd columns)
m[c(1, 3), c(2, 3)]
# Output:
# [,1] [,2]
# [1,] 2 3
# [2,] 8 9

b. By Logical Conditions
You can subset a matrix based on conditions. If applied to the whole matrix, this will return the
elements that satisfy the condition.

r
Copy code
# Subset matrix elements greater than 5
m[m > 5] # Output: 6 7 8 9

c. Subset Rows or Columns by Logical Vector

You can use a logical vector to select specific rows or columns.

r
Copy code
# Define a logical vector to select rows
row_select <- c(TRUE, FALSE, TRUE)

# Subset matrix to keep 1st and 3rd rows


m[row_select, ]
# Output:
# [,1] [,2] [,3]
# [1,] 1 2 3
# [2,] 7 8 9

# Subset matrix to keep 1st and 3rd columns


col_select <- c(TRUE, FALSE, TRUE)
m[, col_select]
# Output:
# [,1] [,2]
# [1,] 1 3
# [2,] 4 6
# [3,] 7 9

d. By Dropping Dimensions

By default, subsetting returns the result in the lowest possible dimension. However, you can
prevent this behavior by setting drop = FALSE.

r
Copy code
# Extract the 1st column (with default behavior)
m[, 1] # Output: 1 4 7 (a vector)

# Extract the 1st column as a matrix (without dropping dimensions)


m[, 1, drop = FALSE]
# Output:
# [,1]
# [1,] 1
# [2,] 4
# [3,] 7

Summary:

 Vector Subsetting:
o Can be done using indices, logical conditions, or logical vectors.
o Negative indexing allows you to exclude specific elements.

 Matrix Subsetting:
o Uses row and column indices in the format matrix[row, column].
o Allows subsetting by logical conditions or logical vectors.
o Can extract entire rows or columns, or submatrices, and can retain dimensions by
setting drop = FALSE.

These operations help you manipulate and access data efficiently in R, depending on the
structure of your data.

ii) Write a program to create an array of 3×3 matrices with 3 rows and 3 columns.

You can create an array of 3x3 matrices in R using the array() function. An array in R can have
more than two dimensions, and in this case, we'll create an array with multiple 3x3 matrices.

Here’s a program that creates an array of 3x3 matrices, with 3 rows and 3 columns, and two
matrices:

r
Copy code
# Define the elements for the array (total 3*3*2 = 18 elements)
elements <- 1:18

# Create an array with 3 rows, 3 columns, and 2 matrices (depth = 2)


array_3x3 <- array(elements, dim = c(3, 3, 2))

# Print the array


print(array_3x3)

Output:

r
Copy code
,,1
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9

,,2

[,1] [,2] [,3]


[1,] 10 13 16
[2,] 11 14 17
[3,] 12 15 18

Explanation:

 elements <- 1:18: Creates a sequence of numbers from 1 to 18.


 array(elements, dim = c(3, 3, 2)): Creates an array of size 3x3 with two matrices (depth
= 2). The dimension parameter dim = c(3, 3, 2) indicates 3 rows, 3 columns, and 2
matrices.

This creates two 3x3 matrices stacked together in a 3D array. You can modify the depth (number
of matrices) or the elements as needed.

IV. Write an R program to draw i) Pie chart ii) 3D Pie Chart, iii) Bar Chart along with chart
legend by
considering suitable CSV file.

I) Pie chart

In R, you can draw a pie chart using the pie() function. Here’s an example program that creates a
pie chart with labeled slices:

Example Program to Draw a Pie Chart in R:

r
Copy code
# Define data for the pie chart (example: percentage of sales by product category)
sales <- c(25, 15, 35, 10, 15)

# Define labels for the pie chart (example: product categories)


categories <- c("Electronics", "Furniture", "Clothing", "Books", "Others")

# Create a pie chart


pie(sales,
labels = categories,
main = "Sales Distribution by Category", # Title of the chart
col = rainbow(length(sales))) # Color palette
Explanation:

 sales: A vector representing the size of each slice in the pie chart (e.g., percentage of
sales in different categories).
 categories: A vector containing labels for each slice.
 pie(): The function to draw the pie chart.
o labels = categories: Labels for each slice.
o main = "Sales Distribution by Category": Adds a title to the pie chart.
o col = rainbow(length(sales)): Assigns different colors to each slice using the
rainbow() function.

Output:

The pie chart will display slices with the corresponding categories and colors, showing the sales
distribution across different categories.

You can also customize the colors, labels, and add percentage values or legends to the pie chart
if needed.

Output:
3D Pie Chart:

It looks like the plotrix package is not installed on your system, which is why you're seeing the
error. You can resolve this by installing the package first. Here’s the complete process:

Step-by-Step Instructions:

1. Install the plotrix package: Run the following command to install the package:

r
Copy code
install.packages("plotrix")

This will download and install the plotrix package.


2. Load the plotrix package: After installation, load the package using:

r
Copy code
library(plotrix)

3. Create the 3D pie chart: Now, you can use the pie3D() function to create your 3D pie
chart. Here's the full code again:

r
Copy code
# Install the plotrix package if not installed
install.packages("plotrix")

# Load the plotrix package


library(plotrix)

# Define data for the pie chart


sales <- c(25, 15, 35, 10, 15)

# Define labels for the pie chart


categories <- c("Electronics", "Furniture", "Clothing", "Books", "Others")

# Create a 3D pie chart


pie3D(sales,
labels = categories,
explode = 0.1, # Adds separation between the slices
main = "3D Sales Distribution by Category", # Title of the chart
col = rainbow(length(sales))) # Color palette

What to do if the error persists:

If the installation still fails due to network or system issues, try restarting R or updating your R
installation, then attempt the installation again using install.packages("plotrix").

Let me know if you encounter further issues!


Output:
V.
iii) Bar Chart along with chart legend by considering suitable CSV file.

The command write.csv(sales_data, file = "sales_data.csv", row.names = FALSE) is correct for


writing a data frame to a CSV file in R. This command will create a CSV file named
sales_data.csv in your working directory, and the row.names = FALSE argument ensures that
row names are not included in the CSV file.

Here’s a brief summary of what this command does:

 sales_data: The data frame you want to save.


 file = "sales_data.csv": Specifies the name of the CSV file to be created.
 row.names = FALSE: Omits row names from the CSV file, resulting in a cleaner output.
Example Usage in R

Here’s the complete process:

1. Create a Data Frame:

r
Copy code
# Create a data frame
sales_data <- data.frame(
Product = c("Electronics", "Furniture", "Clothing", "Books"),
North = c(120, 80, 140, 70),
South = c(100, 90, 150, 60),
East = c(140, 100, 120, 90),
West = c(160, 110, 130, 80)
)

2. Write the Data Frame to a CSV File:

r
Copy code
# Write the data frame to a CSV file
write.csv(sales_data, file = "sales_data.csv", row.names = FALSE)

Verifying the CSV File

After running the command, you should see a file named sales_data.csv in your current working
directory. You can verify its content by opening it with a text editor or spreadsheet application.

To check your current working directory in R, you can use:

r
Copy code
getwd()

If you want to set a different directory, use:

r
Copy code
setwd("path/to/your/directory")

Replace "path/to/your/directory" with the path where you want to save the CSV file.

VI. 6. Create a CSV file having Speed and Distance attributes with 1000 records. Write R
program to draw
i) Box plots
ii) Histogram
iii) Line Graph
iv) Multiple line graphs
v) Scatter plot
to demonstrate the relation between the cars speed and the distance.

To create a CSV file with Speed and Distance attributes, each having 1000 records, and then use
it to draw a scatter plot in R, follow these steps:

Step 1: Create the CSV File

1. Generate Data: You’ll need to create a dataset with random or synthetic data for Speed
and Distance.
2. Write to CSV: Save this dataset to a CSV file.

Here's how you can do this in R:

r
Copy code
# Set seed for reproducibility
set.seed(123)

# Generate random data


speed <- runif(1000, min = 20, max = 120) # Random speeds between 20 and 120
distance <- runif(1000, min = 50, max = 500) # Random distances between 50 and 500

# Create a data frame


data <- data.frame(Speed = speed, Distance = distance)

# Write the data frame to a CSV file


write.csv(data, file = "speed_distance.csv", row.names = FALSE)

Step 2: Draw a Scatter Plot

Once you have the speed_distance.csv file, you can read it into R and create a scatter plot. Here's
how you can do it:

r
Copy code
# Load the necessary library
library(ggplot2)

# Read the CSV file


data <- read.csv("speed_distance.csv")

# Create a scatter plot


ggplot(data, aes(x = Speed, y = Distance)) +
geom_point(color = "blue", alpha = 0.6) + # Scatter plot with blue points
labs(title = "Scatter Plot of Speed vs. Distance",
x = "Speed (km/h)",
y = "Distance (km)") +
theme_minimal()

Explanation:

1. Generating Random Data:


o runif(1000, min, max): Generates 1000 random numbers between the specified
minimum and maximum values.
o data.frame(Speed = speed, Distance = distance): Creates a data frame with Speed
and Distance columns.
2. Writing Data to CSV:
o write.csv(data, file = "speed_distance.csv", row.names = FALSE): Saves the data
frame to a CSV file named speed_distance.csv.
3. Reading Data and Plotting:
o ggplot(data, aes(x = Speed, y = Distance)): Initializes the ggplot object with the
Speed and Distance variables.
o geom_point(color = "blue", alpha = 0.6): Plots the data points with blue color and
some transparency.
o labs(): Adds titles and axis labels.
o theme_minimal(): Applies a minimal theme to the plot.

This R script will generate a CSV file with 1000 records and then create a scatter plot showing
the relationship between Speed and Distance. Adjust the range and distribution of data as needed
for your specific use case.
OUTPUT:
Box plots:

Box plots are a great way to visualize the distribution of a dataset, showing the median, quartiles,
and potential outliers. You can create box plots in R using the boxplot() function or with ggplot2
for more customization.

Example: Creating a Box Plot in R

Let's use the dataset from the previous example (speed_distance.csv) and create a box plot to
visualize the distribution of Speed and Distance.
Using Base R:

1. Generate or Read Data:

r
Copy code
# Generate random data if not already done
set.seed(123)
speed <- runif(1000, min = 20, max = 120)
distance <- runif(1000, min = 50, max = 500)

# Create a data frame


data <- data.frame(Speed = speed, Distance = distance)

# Save to CSV (if needed)


write.csv(data, file = "speed_distance.csv", row.names = FALSE)

2. Create Box Plots:

r
Copy code
# Read the CSV file
data <- read.csv("speed_distance.csv")

# Create box plots for Speed and Distance


par(mfrow = c(1, 2)) # Set up the plotting area to have 1 row and 2 columns

# Box plot for Speed


boxplot(data$Speed,
main = "Box Plot of Speed",
ylab = "Speed (km/h)",
col = "lightblue")

# Box plot for Distance


boxplot(data$Distance,
main = "Box Plot of Distance",
ylab = "Distance (km)",
col = "lightgreen")

Explanation:

 par(mfrow = c(1, 2)): Sets up the plotting area to show two plots side by side.
 boxplot(data$Speed): Creates a box plot for the Speed variable.
 main: Title of the box plot.
 ylab: Label for the y-axis.
 col: Color of the box plot.
Using ggplot2:

1. Load the ggplot2 Package:

r
Copy code
library(ggplot2)

2. Create Box Plots:

r
Copy code
# Read the CSV file
data <- read.csv("speed_distance.csv")

# Create box plots with ggplot2


ggplot(data, aes(x = "", y = Speed)) +
geom_boxplot(fill = "lightblue") +
labs(title = "Box Plot of Speed", y = "Speed (km/h)") +
theme_minimal()

ggplot(data, aes(x = "", y = Distance)) +


geom_boxplot(fill = "lightgreen") +
labs(title = "Box Plot of Distance", y = "Distance (km)") +
theme_minimal()

Explanation:

 ggplot(data, aes(x = "", y = Speed)): Initializes the ggplot object with Speed as the y-
axis variable.
 geom_boxplot(fill = "lightblue"): Creates the box plot with a light blue fill.
 labs(): Adds titles and y-axis labels.
 theme_minimal(): Applies a minimal theme to the plot.

Both methods will provide a visual representation of the data distribution, highlighting the
median, quartiles, and potential outliers. Choose the method based on your preference and the
level of customization you need.

Generate or Read Data:


Create Box Plots:

OUTPUT:
Histogram :A histogram is used to visualize the distribution of a continuous variable by dividing
the data into bins or intervals and counting the number of observations in each bin. In R,
histograms can be created using the base hist() function or ggplot2 for more customization.

Example: Creating a Histogram in R

Let’s continue with the dataset (speed_distance.csv) and create histograms for both Speed and
Distance.

Using Base R:

1. Generate or Read Data:


If you have already generated or saved the data as shown before, you can load the CSV.
Otherwise, generate new data:

r
Copy code
# Generate random data if not already done
set.seed(123)
speed <- runif(1000, min = 20, max = 120)
distance <- runif(1000, min = 50, max = 500)

# Create a data frame


data <- data.frame(Speed = speed, Distance = distance)

# Save to CSV (if needed)


write.csv(data, file = "speed_distance.csv", row.names = FALSE)

2. Create Histograms:

r
Copy code
# Read the CSV file
data <- read.csv("speed_distance.csv")

# Set up plotting area to show 2 histograms side by side


par(mfrow = c(1, 2))

# Histogram for Speed


hist(data$Speed,
main = "Histogram of Speed",
xlab = "Speed (km/h)",
col = "lightblue",
border = "black")

# Histogram for Distance


hist(data$Distance,
main = "Histogram of Distance",
xlab = "Distance (km)",
col = "lightgreen",
border = "black")

Explanation:

 par(mfrow = c(1, 2)): Arranges the plots to be displayed side by side.


 hist(): Creates the histogram.
o data$Speed or data$Distance: The variable you are plotting.
o main: Title of the histogram.
o xlab: Label for the x-axis.
o col: Fill color of the histogram bars.
o border: Border color of the bars.

Using ggplot2:

1. Load the ggplot2 Package:

r
Copy code
library(ggplot2)

2. Create Histograms:

r
Copy code
# Read the CSV file
data <- read.csv("speed_distance.csv")

# Histogram for Speed


ggplot(data, aes(x = Speed)) +
geom_histogram(binwidth = 5, fill = "lightblue", color = "black") +
labs(title = "Histogram of Speed", x = "Speed (km/h)", y = "Frequency") +
theme_minimal()

# Histogram for Distance


ggplot(data, aes(x = Distance)) +
geom_histogram(binwidth = 20, fill = "lightgreen", color = "black") +
labs(title = "Histogram of Distance", x = "Distance (km)", y = "Frequency") +
theme_minimal()

Explanation:

 ggplot(data, aes(x = Speed)): Initializes the ggplot object with Speed as the x-axis
variable.
 geom_histogram(): Creates the histogram with customizable bin width and colors.
o binwidth: Controls the width of each bin.
o fill: Color of the bars.
o color: Border color of the bars.
 labs(): Adds titles and axis labels.
 theme_minimal(): Applies a minimal theme to the plot.
Output:
Output:

Line Graph:

A line graph is a great way to visualize trends in data, especially when there is a continuous
relationship between variables. In R, you can create line graphs using the base plot() function or
with ggplot2 for more customization.

Example: Creating a Line Graph in R

Let’s use the same dataset (speed_distance.csv) and create a line graph to show the trend
between Speed and Distance.
Using Base R:

1. Generate or Read Data:

If you already have the speed_distance.csv file, you can load it. Otherwise, you can
generate new data:

r
Copy code
# Generate random data if not already done
set.seed(123)
speed <- runif(1000, min = 20, max = 120)
distance <- runif(1000, min = 50, max = 500)

# Create a data frame


data <- data.frame(Speed = speed, Distance = distance)

# Save to CSV (if needed)


write.csv(data, file = "speed_distance.csv", row.names = FALSE)

2. Create a Line Graph:

r
Copy code
# Read the CSV file
data <- read.csv("speed_distance.csv")

# Sort the data by Speed to make the line graph more meaningful
data <- data[order(data$Speed), ]

# Create a line plot


plot(data$Speed, data$Distance, type = "l", # 'l' stands for line graph
col = "blue", # Line color
main = "Line Graph of Speed vs Distance",
xlab = "Speed (km/h)",
ylab = "Distance (km)")

Explanation:

 plot(): Base R function to create a scatter plot or line graph.


o data$Speed, data$Distance: Variables to be plotted on the x and y axes.
o type = "l": Specifies that the plot should be a line graph.
o col: Sets the color of the line.
o main, xlab, ylab: Add titles and axis labels.
Using ggplot2:

1. Load the ggplot2 Package:

r
Copy code
library(ggplot2)

2. Create a Line Graph:

r
Copy code
# Read the CSV file
data <- read.csv("speed_distance.csv")

# Sort the data by Speed to make the line graph more meaningful
data <- data[order(data$Speed), ]

# Create the line graph using ggplot


ggplot(data, aes(x = Speed, y = Distance)) +
geom_line(color = "blue") +
labs(title = "Line Graph of Speed vs Distance",
x = "Speed (km/h)",
y = "Distance (km)") +
theme_minimal()

Explanation:

 ggplot(data, aes(x = Speed, y = Distance)): Initializes the ggplot object with Speed on
the x-axis and Distance on the y-axis.
 geom_line(color = "blue"): Adds a line graph with a blue line.
 labs(): Adds titles and axis labels.
 theme_minimal(): Applies a minimal theme to the plot for a clean appearance.
Multiple line graphs

Creating multiple line graphs on the same plot allows you to compare trends between different
variables. In R, this can be achieved using either base R or ggplot2.

Example: Creating Multiple Line Graphs in R

Let’s assume you want to compare two or more sets of data, such as Speed, Distance, and
perhaps another variable like FuelConsumption. Here’s how to do that.

Using Base R:

1. Generate or Read Data:


First, let’s create the dataset, including an additional variable (FuelConsumption):

r
Copy code
# Generate random data
set.seed(123)
speed <- runif(1000, min = 20, max = 120)
distance <- runif(1000, min = 50, max = 500)
fuel_consumption <- runif(1000, min = 5, max = 20)

# Create a data frame


data <- data.frame(Speed = speed, Distance = distance, FuelConsumption =
fuel_consumption)

# Save to CSV (if needed)


write.csv(data, file = "speed_distance_fuel.csv", row.names = FALSE)

2. Create Multiple Line Graphs:

r
Copy code
# Read the CSV file
data <- read.csv("speed_distance_fuel.csv")

# Sort the data by Speed for meaningful plotting


data <- data[order(data$Speed), ]

# Plot the first line (Speed vs Distance)


plot(data$Speed, data$Distance, type = "l", col = "blue",
ylim = c(min(data$Distance, data$FuelConsumption), max(data$Distance,
data$FuelConsumption)),
xlab = "Speed (km/h)", ylab = "Value",
main = "Multiple Line Graph: Speed vs Distance and Fuel Consumption")

# Add the second line (Speed vs Fuel Consumption)


lines(data$Speed, data$FuelConsumption, type = "l", col = "red")

# Add a legend
legend("topright", legend = c("Distance", "Fuel Consumption"), col = c("blue", "red"), lty
= 1)

Explanation:

 plot(): Plots the first line (Speed vs Distance) with type "l" for a line graph.
 lines(): Adds another line (Speed vs Fuel Consumption) on the same plot.
 ylim: Sets the y-axis range to ensure both lines fit.
 legend(): Adds a legend to the plot.

Using ggplot2:

1. Load the ggplot2 Package:

r
Copy code
library(ggplot2)

2. Create Multiple Line Graphs:

r
Copy code
# Read the CSV file
data <- read.csv("speed_distance_fuel.csv")

# Reshape the data for ggplot (long format)


library(reshape2)
data_long <- melt(data, id.vars = "Speed", variable.name = "Variable", value.name =
"Value")

# Create the line graph


ggplot(data_long, aes(x = Speed, y = Value, color = Variable)) +
geom_line() +
labs(title = "Multiple Line Graph: Speed vs Distance and Fuel Consumption",
x = "Speed (km/h)",
y = "Value") +
theme_minimal() +
scale_color_manual(values = c("Distance" = "blue", "FuelConsumption" = "red"))

Explanation:

 melt(): Converts the data to long format, where each row corresponds to a single
observation (necessary for ggplot).
 geom_line(): Adds line graphs to the plot.
 aes(color = Variable): Uses different colors for Distance and FuelConsumption.
 scale_color_manual(): Manually sets the colors for the variables.
Scatter plot

A scatter plot is used to visualize the relationship between two continuous variables by plotting
points where each point represents an observation. In R, scatter plots can be created using either
base R or ggplot2.

Example: Creating a Scatter Plot in R

Let’s use the speed_distance.csv dataset and create a scatter plot of Speed and Distance.

Using Base R:

1. Generate or Read Data:


If you already have the speed_distance.csv file, you can load it. Otherwise, you can
generate new data:

r
Copy code
# Generate random data if not already done
set.seed(123)
speed <- runif(1000, min = 20, max = 120)
distance <- runif(1000, min = 50, max = 500)

# Create a data frame


data <- data.frame(Speed = speed, Distance = distance)

# Save to CSV (if needed)


write.csv(data, file = "speed_distance.csv", row.names = FALSE)

2. Create a Scatter Plot:

r
Copy code
# Read the CSV file
data <- read.csv("speed_distance.csv")

# Create the scatter plot


plot(data$Speed, data$Distance,
main = "Scatter Plot of Speed vs Distance",
xlab = "Speed (km/h)", ylab = "Distance (km)",
col = "blue", pch = 19)

Explanation:

 plot(): The base R function for creating scatter plots.


o data$Speed, data$Distance: Variables plotted on the x and y axes.
o main: Title of the plot.
o xlab, ylab: Labels for the x and y axes.
o col: Color of the points.
o pch: Shape of the points (19 is a filled circle).

Using ggplot2:

1. Load the ggplot2 Package:

r
Copy code
library(ggplot2)
2. Create a Scatter Plot:

r
Copy code
# Read the CSV file
data <- read.csv("speed_distance.csv")

# Create the scatter plot using ggplot


ggplot(data, aes(x = Speed, y = Distance)) +
geom_point(color = "blue") +
labs(title = "Scatter Plot of Speed vs Distance",
x = "Speed (km/h)",
y = "Distance (km)") +
theme_minimal()

Explanation:

 ggplot(data, aes(x = Speed, y = Distance)): Initializes the ggplot object with Speed and
Distance as the x and y axes.
 geom_point(): Adds the points (scatter plot).
 labs(): Adds the title and axis labels.
 theme_minimal(): Applies a minimal theme for a clean look.
VII. 7. Implement different data structures in R (Vectors, Lists, Data Frames)

In R, three commonly used data structures are vectors, lists, and data frames. Each has different
characteristics and use cases. Here is how you can implement each of them:

1. Vectors:

A vector is a simple data structure that contains elements of the same type (numeric, character,
logical, etc.).

Example:
r
Copy code
# Numeric vector
num_vector <- c(1, 2, 3, 4, 5)

# Character vector
char_vector <- c("apple", "banana", "cherry")

# Logical vector
log_vector <- c(TRUE, FALSE, TRUE)

# Print the vectors


print(num_vector)
print(char_vector)
print(log_vector)

2. Lists:

A list is a more flexible data structure in R. It can contain elements of different types (vectors,
matrices, even other lists).

Example:
r
Copy code
# Create a list with different types of elements
my_list <- list(name = "John", age = 30, scores = c(85, 90, 95))

# Access elements of the list


print(my_list$name) # Access by name
print(my_list[[2]]) # Access by index
print(my_list$scores) # Access the vector inside the list

3. Data Frames:

A data frame is a table-like structure, similar to a spreadsheet or SQL table. Each column in a
data frame can contain different types of data (e.g., numeric, character), but each column must
contain the same type of data.

Example:
r
Copy code
# Create a data frame
df <- data.frame(
Name = c("Alice", "Bob", "Charlie"),
Age = c(25, 30, 35),
Score = c(85, 90, 88)
)

# View the data frame


print(df)

# Access a specific column


print(df$Name)

# Access a specific row (the second row)


print(df[2,])

# Access a specific value (second row, third column)


print(df[2, 3])

VECTOR
LIST:

Data Frames:
VIII. 8. Write an R program to read a csv file and analyze the data in the file using EDA
(Explorative Data Analysis) techniques.

Here’s a basic R program that reads a CSV file and performs Exploratory Data Analysis (EDA)
on the data. The program includes reading the file, summarizing data, visualizing distributions,
and identifying correlations.

r
Copy code
# Load necessary libraries
library(ggplot2)
library(dplyr)
library(readr)
library(tidyr)
library(corrplot)
library(psych)

# 1. Read CSV file


data <- read_csv("yourfile.csv")

# 2. Get an overview of the data


# Display the first few rows of the data
head(data)

# Display the structure of the data (column types, etc.)


str(data)

# Display summary statistics (mean, median, min, max, etc.)


summary(data)

# Check for missing values


missing_values <- colSums(is.na(data))
print(missing_values)

# 3. Analyze categorical variables


cat_vars <- data %>% select_if(is.factor)

# Frequency distribution of categorical variables


for (var in colnames(cat_vars)) {
cat("Frequency distribution of ", var, ":\n")
print(table(data[[var]]))
}

# 4. Analyze numerical variables


num_vars <- data %>% select_if(is.numeric)

# Summary statistics for numerical variables


describe(num_vars)

# Histograms for each numerical variable


for (var in colnames(num_vars)) {
ggplot(data, aes(x = .data[[var]])) +
geom_histogram(binwidth = 30, fill = "blue", alpha = 0.7) +
labs(title = paste("Distribution of", var)) +
theme_minimal() +
print()
}
# 5. Analyze correlations between numerical variables
corr_matrix <- cor(num_vars, use = "complete.obs")
corrplot(corr_matrix, method = "circle")

# 6. Boxplots to detect outliers


for (var in colnames(num_vars)) {
ggplot(data, aes(y = .data[[var]])) +
geom_boxplot(fill = "orange", alpha = 0.6) +
labs(title = paste("Boxplot of", var)) +
theme_minimal() +
print()
}

# 7. Scatterplot matrix for numerical variables


pairs(num_vars)

# 8. Conclusion of the analysis


# Print correlation matrix for numeric variables
print(corr_matrix)

# Print the overall summary


print(summary(data))

Key Sections of the Code:

1. Reading the CSV: The file is read using read_csv() from the readr library.
2. Overview: head(), str(), and summary() provide an initial understanding of the data.
3. Missing Values: The is.na() function helps identify missing values in each column.
4. Categorical Analysis: Frequency distributions are printed for categorical variables.
5. Numerical Analysis: Histograms and summary statistics (describe() from psych
package) give an overview of the distribution.
6. Correlations: The corrplot library visualizes the correlation matrix between numerical
variables.
7. Outlier Detection: Boxplots help in detecting outliers for each numerical variable.
8. Scatterplot Matrix: pairs() creates a matrix of scatterplots for pairwise relationships
between numerical variables.

This script provides a solid starting point for exploring a dataset. You can modify it based on the
specific variables and data structure you're working with.
OUTPUT:
IX. 9. Write an R program to illustrate Linear Regression and Multi linear Regression
considering suitable CSV file.

Here's an R program that illustrates Simple Linear Regression and Multiple Linear
Regression using a CSV file. The program will read the data from a CSV file, fit both linear
models, and evaluate their performance. I’ve included basic steps for handling both simple and
multiple linear regression.

Sample CSV Data

The CSV file could have columns like:


makefile
Copy code
advertising.csv:
TV, Radio, Newspaper, Sales
230.1, 37.8, 69.2, 22.1
44.5, 39.3, 45.1, 10.4
17.2, 45.9, 69.3, 9.3
151.5, 41.3, 58.5, 18.5

In this example, we are trying to predict Sales based on the features: TV, Radio, and Newspaper
advertising budgets.

R Program

r
Copy code
# Load required libraries
library(ggplot2)
library(readr)
library(caret) # For performance metrics

# 1. Read the CSV file


data <- read_csv("advertising.csv")

# 2. Get an overview of the data


str(data)
summary(data)

# 3. Simple Linear Regression (using TV as predictor)

# Fit the model


simple_model <- lm(Sales ~ TV, data = data)

# Display model summary


summary(simple_model)

# Plot the regression line with the data points


ggplot(data, aes(x = TV, y = Sales)) +
geom_point(color = 'blue', alpha = 0.5) +
geom_smooth(method = "lm", se = FALSE, color = 'red') +
labs(title = "Simple Linear Regression: Sales vs TV",
x = "TV Advertising Budget",
y = "Sales") +
theme_minimal()

# Predict Sales using the model


simple_predictions <- predict(simple_model, newdata = data)

# Evaluate the model performance


simple_mse <- mean((data$Sales - simple_predictions)^2)
cat("Simple Linear Regression MSE: ", simple_mse, "\n")

# 4. Multiple Linear Regression (using TV, Radio, and Newspaper as predictors)

# Fit the multiple regression model


multi_model <- lm(Sales ~ TV + Radio + Newspaper, data = data)

# Display model summary


summary(multi_model)

# Predict Sales using the model


multi_predictions <- predict(multi_model, newdata = data)

# Evaluate the multiple regression model


multi_mse <- mean((data$Sales - multi_predictions)^2)
cat("Multiple Linear Regression MSE: ", multi_mse, "\n")

# 5. Visualizing the residuals for the multiple regression model


ggplot(data, aes(x = multi_predictions, y = data$Sales - multi_predictions)) +
geom_point(color = 'blue', alpha = 0.5) +
geom_hline(yintercept = 0, linetype = "dashed", color = 'red') +
labs(title = "Residual Plot for Multiple Linear Regression",
x = "Predicted Sales",
y = "Residuals") +
theme_minimal()

# 6. Checking for multicollinearity (optional)


vif(multi_model) # Variance Inflation Factor

Key Sections of the Code:

1. Reading the CSV: The read_csv() function is used to load the data from a CSV file. You
can replace "advertising.csv" with your actual file path.
2. Simple Linear Regression:
o The linear regression model is created with the formula Sales ~ TV.
o The lm() function fits the linear model.
o The summary() function gives an overview of the model's performance, including
coefficients and p-values.
o A scatterplot of TV vs. Sales with a regression line is created using ggplot2.
3. Multiple Linear Regression:
o A multiple linear regression model is created with the formula Sales ~ TV +
Radio + Newspaper.
oThe model summary shows the coefficients and overall fit statistics.
oPredictions are made using the predict() function.
oThe Mean Squared Error (MSE) is calculated to evaluate the model’s
performance.
4. Model Performance Evaluation:
o The residual plot helps visualize how well the model fits the data.
o vif() (Variance Inflation Factor) can be used to check for multicollinearity in the
multiple regression model.

Here’s an R program that reads a CSV file and performs Exploratory Data Analysis (EDA). The
program will load the CSV, summarize the data, detect missing values, and generate basic
visualizations.

R Program for EDA

r
Copy code
# Load necessary libraries
library(ggplot2)
library(readr)
library(dplyr)
library(tidyr)
library(corrplot)
library(psych)

# 1. Read CSV file


data <- read_csv("yourfile.csv")

# 2. Get an overview of the data


# Display the first few rows
print(head(data))

# Display structure (data types)


str(data)

# Display summary statistics (mean, median, min, max, etc.)


print(summary(data))

# Check for missing values


missing_values <- colSums(is.na(data))
print("Missing values in each column:")
print(missing_values)

# 3. Analyze categorical variables


cat_vars <- data %>% select_if(is.character)

# Frequency distribution of categorical variables


if (ncol(cat_vars) > 0) {
print("Categorical Variable Distribution:")
for (var in colnames(cat_vars)) {
print(paste("Distribution of", var))
print(table(data[[var]]))
}
}

# 4. Analyze numerical variables


num_vars <- data %>% select_if(is.numeric)

# Summary statistics for numerical variables


print("Numerical Variables Summary:")
print(describe(num_vars))

# Histograms for numerical variables


print("Histograms for numerical variables:")
for (var in colnames(num_vars)) {
p <- ggplot(data, aes_string(x = var)) +
geom_histogram(binwidth = 30, fill = "blue", alpha = 0.7) +
labs(title = paste("Distribution of", var), x = var, y = "Count") +
theme_minimal()
print(p)
}

# 5. Correlation analysis for numerical variables


if (ncol(num_vars) > 1) {
corr_matrix <- cor(num_vars, use = "complete.obs")
print("Correlation Matrix:")
print(corr_matrix)
corrplot(corr_matrix, method = "circle")
}

# 6. Boxplots to detect outliers for numerical variables


print("Boxplots for numerical variables:")
for (var in colnames(num_vars)) {
p <- ggplot(data, aes_string(y = var)) +
geom_boxplot(fill = "orange", alpha = 0.6) +
labs(title = paste("Boxplot of", var), y = var) +
theme_minimal()
print(p)
}

# 7. Pairwise scatterplots for numerical variables


if (ncol(num_vars) > 1) {
print("Pairwise scatterplots for numerical variables:")
pairs(num_vars)
}

# 8. Conclusion of the analysis


print("Exploratory Data Analysis Completed")

Key Sections of the Program:

1. Reading the CSV: The file is read using read_csv(). Replace "yourfile.csv" with your
actual file path.
2. Data Overview: head(), str(), and summary() provide an initial understanding of the data.
3. Missing Values: is.na() helps identify missing values.
4. Categorical Variables: Categorical columns are analyzed using table(), which shows
frequency counts.
5. Numerical Variables: The describe() function from the psych package provides
summary statistics. Histograms and boxplots give insights into the distribution and
potential outliers.
6. Correlation Analysis: For numerical variables, a correlation matrix is generated and
visualized using corrplot().
7. Pairwise Scatterplots: For all numeric columns, scatterplots visualize the relationships
between pairs of variables.

Example Visualizations:

 Histograms: Shows the distribution of each numeric variable.


 Correlation Plot: Displays correlations between numeric variables.
 Boxplots: Identifies potential outliers for each numeric variable.
 Pairwise Scatterplots: Displays relationships between numeric variables.
This program serves as a template for performing EDA on any dataset. You can adjust it based
on the structure of your data.
OUTPUT:

You might also like