Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
24 views14 pages

BDS306C - Imp Questions & Answers - Module 2-2

This document covers key concepts in R programming, including factors and strings, matrix operations, date differences, lists, data frames, and arrays. It provides examples of how to create and manipulate these data structures, as well as practical applications in data analysis. Additionally, it includes a program for reading and summarizing CSV files.

Uploaded by

truanimea351
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views14 pages

BDS306C - Imp Questions & Answers - Module 2-2

This document covers key concepts in R programming, including factors and strings, matrix operations, date differences, lists, data frames, and arrays. It provides examples of how to create and manipulate these data structures, as well as practical applications in data analysis. Additionally, it includes a program for reading and summarizing CSV files.

Uploaded by

truanimea351
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

MODULE - 2

1. Illustrate the concept of factors and strings in R.

1. Factors in R

A factor is used to represent categorical data. They store both string and integer
values, but are treated specially because they help in statistical modeling,
especially for categorical variables (e.g., gender, country, etc.). Factors are useful
when the data has a fixed number of unique values (categories or levels).
Key Features of Factors:

● Factors can be ordered or unordered.


● They are used in statistical modeling because categorical variables are often
involved in models like regression or classification.
● Factors store levels (unique categories) and assign a level to each
observation.
Example of Factor Creation:
# Creating a factor for Gender
gender <- factor(c("Male", "Female", "Female", "Male", "Female"))

# Output the factor


print(gender)
# Levels: Female Male

Modifying Levels of a Factor:


# Changing levels of the factor
levels(gender) <- c("F", "M")
print(gender)

Ordered Factor:

You can create ordered factors for categories that have a natural order (e.g., Low,
Medium, High).
# Creating an ordered factor for levels of education
education <- factor(c("High School", "College", "Masters", "PhD", "High
School"),
levels = c("High School", "College", "Masters", "PhD"),
ordered = TRUE)

print(education)

Factor Benefits:

● Efficient storage of categorical data.


● Factors allow for easy aggregation and summarization in data analysis.
● Useful for regression models, where categorical variables need to be treated
differently from continuous ones.

2. Strings in R

Strings are sequences of characters and are used to store text data in R. Unlike
factors, strings do not have levels or categories. They are treated as simple text and
are often used for names, labels, or general text processing tasks.
Key Features of Strings:

● Strings can be concatenated, manipulated, and processed using various string


functions.
● Strings are used when you need text data without any categorization.
Example of Creating Strings:
# Creating a string
name <- "John Doe"
print(name)

Common String Functions in R:

1. The paste() and paste0() functions are used to


Concatenating Strings:
concatenate (combine) strings.
# Concatenating two strings with a space
full_name <- paste("John", "Doe")
print(full_name) # Output: "John Doe"

# Concatenating without space


full_name <- paste0("John", "Doe")
print(full_name) # Output: "JohnDoe"

Changing Case: Convert strings to uppercase or lowercase using toupper()


and tolower() functions.
# Convert to upper case
upper_name <- toupper(full_name)
print(upper_name) # Output: "JOHN DOE"

# Convert to lower case


lower_name <- tolower(full_name)
print(lower_name) # Output: "john doe"

Extracting Substrings: You can extract parts of a string using substring() or


substr().

# Extracting a substring from a string


substr(full_name, 1, 4) # Output: "John"

String Manipulation for Data Analysis:

● String functions can be very useful in cleaning or formatting data, such as


manipulating column names or parsing text data.
● Useful in tasks such as text mining, where string processing is essential.
Practical Applications in Data Analysis

● Factors: When working with survey data, for example, gender, education
level, or country of residence are often represented as factors since they are
categorical.
● Strings: Strings are used in data cleaning tasks, such as cleaning column
names, parsing text files, or creating descriptive labels for data.
Example: Factor and String Usage in Data Analysis
# Data frame with factors and strings
data <- data.frame(
Name = c("Alice", "Bob", "Charlie"), # String
Gender = factor(c("Female", "Male", "Male")), # Factor
Age = c(25, 30, 22)
)

# Output the data frame


print(data)

# Summarizing the Gender factor


summary(data$Gender)
2. Create two 3x3 matrices A and B in R and perform various operations.
# Creating two 3x3 matrices A and B
A <- matrix(c(1, 2, 3, 4, 5, 6, 7, 8, 9), nrow = 3, byrow = TRUE)
B <- matrix(c(9, 8, 7, 6, 5, 4, 3, 2, 1), nrow = 3, byrow = TRUE)

# Display matrices
cat("Matrix A:\n")
print(A)

cat("\nMatrix B:\n")
print(B)

# i) Transpose of Matrices
transpose_A <- t(A)
transpose_B <- t(B)

cat("\nTranspose of Matrix A:\n")


print(transpose_A)
cat("\nTranspose of Matrix B:\n")
print(transpose_B)

# ii) Matrix Addition


matrix_addition <- A + B
cat("\nMatrix Addition (A + B):\n")
print(matrix_addition)

# iii) Matrix Subtraction


matrix_subtraction <- A - B
cat("\nMatrix Subtraction (A - B):\n")
print(matrix_subtraction)

# iv) Matrix Multiplication


matrix_multiplication <- A %*% B
cat("\nMatrix Multiplication (A * B):\n")
print(matrix_multiplication)
Explanation of the Code:

1. Matrix Creation:
○ A and B are two 3 × 3 matrices created using the matrix() function.
○ The nrow = 3 argument specifies that the matrix should have 3
rows, and byrow = TRUE ensures that elements are filled row-wise.
2. Matrix Transposition:
○ The t() function is used to compute the transpose of matrices A and
B.
3. Matrix Addition:
○ Matrix addition is done by using the + operator between two matrices,
which adds corresponding elements.
4. Matrix Subtraction:
○ Matrix subtraction is performed using the - operator between two
matrices, which subtracts corresponding elements.
5. Matrix Multiplication:
○ The matrix multiplication is done using the %*% operator, which
performs matrix multiplication according to the rules of linear algebra.

3. Develop an R program to calculate the difference between two dates and


determine the number of days, weeks, and months between them. Use the
appropriate functions to handle date conversions and arithmetic.

# Function to calculate the difference between two dates


calculate_date_difference <- function(date1, date2) {

# Convert the input strings to Date objects


date1 <- as.Date(date1, format = "%Y-%m-%d")
date2 <- as.Date(date2, format = "%Y-%m-%d")

# Calculate the difference in days


difference_in_days <- as.numeric(difftime(date2, date1, units = "days"))

# Calculate the difference in weeks (7 days per week)


difference_in_weeks <- difference_in_days / 7

# Calculate the difference in months (using 'months' function from lubridate)


library(lubridate)
difference_in_months <- interval(date1, date2) / months(1)

# Print the results


cat("Difference in Days:", difference_in_days, "\n")
cat("Difference in Weeks:", difference_in_weeks, "\n")
cat("Difference in Months:", round(difference_in_months, 2), "\n")
}

# Example usage of the function


date1 <- "2023-01-01"
date2 <- "2024-01-01"

# Call the function to calculate the differences


calculate_date_difference(date1, date2)

Explanation:

1. Date Conversion:
○ The dates are input as strings and then converted to Date objects
using the as.Date() function.
○ The format "%Y-%m-%d" is specified to match the input format (e.g.,
"2023-01-01").
2. Difference Calculation:
○ difftime() is used to calculate the difference between the two
dates in terms of days.
○ The difference in weeks is calculated by dividing the number of days
by 7.
○ The lubridate package's interval() function is used to
calculate the difference in months, and the result is rounded to two
decimal places for clarity.
3. Output:
○ The function prints the difference in days, weeks, and months.

4. Explain lists and data frames in R with examples.

Lists in R

A list in R is a 1-dimensional data structure that can hold multiple types of data
(e.g., numbers, characters, other lists, and even functions). Lists are highly flexible
because they can store elements of different types, unlike vectors, which are
restricted to storing elements of a single type.
Key Features of Lists:

1. Heterogeneous Data:
Lists can hold various data types such as numbers, strings,
vectors, matrices, other lists, or even functions. This flexibility makes lists
suitable for tasks that require combining different data types.
2. Indexing by Name or Position: List elements can be accessed using their
position or by names (if assigned).
3. Nested Structures: Lists can contain other lists as elements, making them
useful for hierarchical or nested data structures.

Creation and Accessing Elements:


You can create a list in R using the list() function. Here's how you can create
and access a list:

# Create a list with different data types


my_list <- list(
Name = "Alice",
Age = 25,
Scores = c(90, 85, 88),
Contact = list(phone = "123-456", email = "[email protected]")
)
# Accessing elements by name
print(my_list$Name) # Outputs: "Alice"
print(my_list$Scores) # Outputs: 90, 85, 88

# Accessing elements by index


print(my_list[[2]]) # Outputs: 25

# Accessing elements from a nested list


print(my_list$Contact$phone) # Outputs: "123-456"

Use Cases:

● Storing Heterogeneous Data:


Lists are used when you need to store diverse
elements like model results, vectors, and even functions in one container.
● Storing Function Results: In machine learning, lists are commonly used to
store model outcomes (e.g., coefficients, residuals, and diagnostics).
● Nested Data Structures: When dealing with hierarchical data such as
JSON, you can use lists to represent different levels of nesting.

# Create a simple list for storing student details


student <- list(
name = "Alice",
age = 18,
subjects = c("Math", "English", "Science")
)

# Access student's name


print(student$name) # Outputs: "Alice"

# Access student's age


print(student$age) # Outputs: 18

# Access student's subjects


print(student$subjects) # Outputs: "Math", "English", "Science"
Data Frames in R

A data frame is a 2-dimensional table-like structure where each column can store
data of a different type, but all values in a column must be of the same type. A data
frame is similar to a table in a relational database or a spreadsheet. Data frames
are widely used for storing datasets in R because of their organized row-column
structure.
Key Features of Data Frames:

1. Tabular Structure:
Data frames have rows and columns. Each column can have
different types (e.g., numeric, character, or logical), but every value in a
particular column must be of the same type.
2. Column Names: Columns typically have names, which makes data frames
easy to reference.
3. Access by Row/Column: You can access data in a data frame by specifying
row and column indices, or by column name.
Creation and Accessing Elements:

You can create a data frame using the data.frame() function in R.

# Create a simple data frame


students <- data.frame(
Name = c("John", "Alice", "Bob"),
Age = c(22, 24, 23),
Marks = c(88, 95, 78)
)

# Access the entire data frame


print(students)

# Access a specific column


print(students$Name) # Outputs: "John", "Alice", "Bob"

# Access a specific row


print(students[2, ]) # Outputs the second row
# Access a specific value
print(students[2, 3]) # Outputs: 95 (Marks of Alice)

Use Cases:

● Storing and Analyzing Datasets:


Data frames are essential for storing structured
datasets where different columns represent different variables, and rows
represent different observations.
● Data Manipulation: Data frames are the primary data structure used in R
for data cleaning, transformation, and analysis.
● Working with CSV Files: Data frames are used to import and work with
data from CSV files, Excel spreadsheets, and other external data sources.
Example:

Consider a scenario where you're analyzing the scores of students in a class. You
can store the data in a data frame and then calculate summaries or perform
analyses.

# Example: Data frame for students' scores


students <- data.frame(
Name = c("John", "Alice", "Bob"),
Age = c(22, 24, 23),
Marks = c(88, 95, 78)
)

# Calculate the average marks


average_marks <- mean(students$Marks)
print(average_marks) # Outputs: 87

# Subsetting data frame (only students older than 22)


older_students <- students[students$Age > 22, ]
print(older_students)
5. Develop an R program that reads a CSV file and summarizes the data.
# Step 1: Read the CSV file into a data frame
data <- read.csv("your_file.csv")

# Step 2: Display the structure of the data (type of columns, data types)
print("Structure of the data:")
str(data)

# Step 3: Display a summary of the data (mean, median, min, max, etc. for numeric
columns)
print("Summary statistics:")
summary(data)

# Step 4: Display the first few rows of the data to inspect


print("First few rows of the data:")
head(data)

# Step 5: Count the number of rows and columns


print(paste("Number of rows:", nrow(data)))
print(paste("Number of columns:", ncol(data)))

Explanation of the Program:

1. read.csv() function reads the CSV file into a data frame.


2. str() function displays the structure of the data, showing column types
and data types.
3. summary() function provides summary statistics (min, max, mean,
median) for numeric columns and basic info for categorical columns.
4. head() shows the first few rows of the data.
5. nrow() and ncol() are used to count the number of rows and columns,
respectively.

Usage:

● Make sure to replace "your_file.csv" with the actual path to your


CSV file. You can load any dataset in CSV format and use this program to
analyze it quickly.

6. Demonstrate array with an example.


An array in R is a multi-dimensional data structure that can hold data elements in
more than two dimensions.
Unlike vectors or matrices, which are one-dimensional and two-dimensional
respectively, arrays allow for storing data in three or more dimensions.
They are useful when working with large datasets that can be categorized in
different ways, like students' scores in various subjects over multiple semesters.

Syntax for Creating an Array in R:


array(data, dim = c(dim1, dim2, dim3, ...), dimnames = NULL)

data: A vector of elements that will be arranged in the array.


dim: A vector specifying the dimensions (e.g., rows, columns, and layers).
dimnames: (Optional) A list of names for the dimensions.
Example of Creating a Simple Array in R:
# Step 1: Define the data (marks)
marks <- c(85, 90, 78, 88, 92, 81, 87, 89)

# Step 2: Define dimensions (2 subjects, 2 semesters, 2 students)


array_dims <- c(2, 2, 2) # 2 subjects, 2 semesters, 2 students

# Step 3: Create the array


student_marks <- array(marks, dim = array_dims)

# Step 4: Print the array


print(student_marks)

,,1

[,1] [,2]
[1,] 85 88
[2,] 90 92

,,2

[,1] [,2]
[1,] 78 81
[2,] 87 89

Explanation:

● Data Input: The marks vector holds the students' marks.


● Array Creation: The array() function arranges the data into the
specified dimensions.
● Output: The array is displayed with dimensions showing marks for different
subjects and semesters.

You might also like