Machine Learning - R Programming
Unit IV
1. Manipulating Objects
2. Viewing Objects within Objects
3. Forms of Data Objects
4. Convert a Matrix to a Data Frame
5. Convert a Data Frame into a Matrix
6. Convert a Data Frame into a List
7. Convert a Matrix into a List
1. Manipulating Objects
In order to manipulate the data, R provides a library called dplyr which
consists of many built-in methods to manipulate the data. So to use the data
manipulation function, first need to import the dplyr package using
library(dplyr) line of code.
Function Name Description
filter() Produces a subset of a Data Frame.
distinct() Removes duplicate rows in a Data Frame
arrange() Reorder the rows of a Data Frame
select() Produces data in required columns of a Data Frame
rename() Renames the variable names
mutate() Creates new variables without dropping old ones.
transmute() Creates new variables by dropping the old.
summarize() Gives summarized data like Average, Sum, etc.
filter() method
The filter() function is used to produce the subset of the data that satisfies the
condition specified in the filter() method. In the condition, we can use
conditional operators, logical operators, NA values, range operators etc. to
filter out data. Syntax of filter() function is given below-
filter(dataframeName, condition)
distinct() method
The distinct() method removes duplicate rows from data frame or based on
the specified columns. The syntax of distinct() method is given below-
distinct(dataframeName, col1, col2,.., .keep_all=TRUE)
arrange() method
In R, the arrange() method is used to order the rows based on a specified
column. The syntax of arrange() method is specified below-
arrange(dataframeName, columnName)
select() method
The select() method is used to extract the required columns as a table by
specifying the required column names in select() method. The syntax of
select() method is mentioned below-
select(dataframeName, col1,col2,…)
rename() method
The rename() function is used to change the column names. This can be done
by the below syntax-
rename(dataframeName, newName=oldName)
summarize() method
Using the summarize method we can summarize the data in the data frame by
using aggregate functions like sum(), mean(), etc. The syntax of summarize()
method is specified below-
summarize(dataframeName, aggregate_function(columnName))
mutate() & transmute() methods
These methods are used to create new variables. The mutate() function
creates new variables without dropping the old ones but transmute() function
drops the old variables and creates new variables. The syntax of both
methods is mentioned below-
mutate(dataframeName, newVariable=formula)
transmute(dataframeName, newVariable=formula)
Example
library(dplyr)
# create a data frame
stats <- data.frame(player=c('A', 'B', 'C', 'D'),
runs=c(100, 200, 408, 19),
wickets=c(17, 20, NA, 5))
# fetch players who scored more
# than 100 runs
filter(stats, runs>100)
Output
distinct(stats)
output
#remove duplicates based on a column
distinct(stats, player, .keep_all = TRUE)
arrange(stats, runs) # Ascending order
Output
arrange(start,desc(run)) # Descending
Output
select(stats, player,wickets)
Rename the heading and return the new data frame. The original will remain
the same.
# change the heading of runs to runs_scored
NewData = rename(stats, runs_scored=runs)
NewData
Output
summarize(stats, sum(runs), mean(runs))
Output
# add new column avg
# The original data frame will remain the same.
# The function return new data frame which include old data with new column
NewColumnAdded = mutate(stats, avg=runs/4)
NewColumnAdded
Output
# drop all the existing column and create a new column.
# the data frame passed in function will remain the same
# the function create new data frame only with the column based on the
expression given
DF_RemoveColumn = transmute(stats, avg=runs/4)
DF_RemoveColumn
2. Viewing Objects within Objects
There are 5 basic types of objects in the R language:
● Vectors
● List
● Array
● Matrix
● Factors
● DataFrame
2.1 Vectors
Atomic vectors are one of the basic types of objects in R programming. Atomic
vectors can store homogeneous data types such as character, doubles,
integers, raw, logical, and complex. A single element variable is also said to
be vector.
x <- c(1, 2, 3, 4)
y <- c("a", "b", "c", "d")
z <- 5
# Print vector and class of vector
print(x)
print(class(x))
print(y)
print(class(y))
print(z)
print(class(z))
2.2 Lists
List is another type of object in R programming. List can contain heterogeneous
data types such as vectors or another lists.
Example:
# Create list
ls <- list(c(1, 2, 3, 4), list("a", "b", "c"))
# Print
print(ls)
print(class(ls))
2.3 Matrices
To store values as 2-Dimensional array, matrices are used in R. Data, number
of rows and columns are defined in the matrix() function.
Syntax:
matrix(data = NA, nrow = 1, ncol = 1, byrow = FALSE, dimnames = NULL)
x <- c(1, 2, 3, 4, 5, 6)
# Create matrix
mat <- matrix(x, nrow = 2)
print(mat)
print(class(mat))
2.4 Factors
Factor object encodes a vector of unique elements (levels) from the given data
vector.
Example:
# Create vector
s <- c("spring", "autumn", "winter", "summer",
"spring", "autumn")
print(factor(s))
print(nlevels(factor(s)))
2.5 Arrays
array() function is used to create n-dimensional array. This function takes dim
attribute as an argument and creates required length of each dimension as
specified in the attribute.
Syntax:
array(data, dim = length(data), dimnames = NULL)
2.6 Data Frames
Data frames are 2-dimensional tabular data object in R programming. Data
frames consists of multiple columns and each column represents a vector.
Columns in data frame can have different modes of data unlike matrices.
Example:
# Create vectors
x <- 1:5
y <- LETTERS[1:5]
z <- c("Albert", "Bob", "Charlie", "Denver", "Elie")
# Create data frame of vectors
df <- data.frame(x, y, z)
# Print data frame
print(df)
3. Convert a Matrix to a Data Frame
A matrix can be converted to a dataframe by using a function called
as.data.frame(). It will take each column from the matrix and convert it to each
column in the dataframe.
Syntax:
as.data.frame(matrix_data)
Ex 1
matrix_data=matrix(c(1,2,3,4,5,6,7,8),nrow=4)
# display the data
print(matrix_data)
# convert the matrix into dataframe
dataframe_data=as.data.frame(matrix_data)
# print dataframe data
print(dataframe_data)
Ex 2
# create the matrix with 8 rows
# with different elements
matrix_data=matrix(c(
"bobby","pinkey","rohith","gnanesh",5.3,6.6,7,8,11:18),nrow=8)
# display the data
print(matrix_data)
Output - Matrix
# convert the matrix into dataframe
dataframe_data=as.data.frame(matrix_data)
# print dataframe data
print(dataframe_data)
output
4. Convert a Data Frame into a Matrix
data.matrix() function in R Language is used to create a matrix by converting
all the values of a Data Frame into numeric mode and then binding them as a
matrix.
Syntax: data.matrix(df)
# Creating a dataframe
df1 = data.frame(
"Name" = c("Amar", "Akbar", "Ronald"),
"Language" = c("R", "Python", "C#"),
"Age" = c(26, 38, 22)
)
# Printing data frame
print(df1)
# Converting into numeric matrix
df2 <- data.matrix(df1)
df2
Output
Name Language Age
[1,] 2 3 26
[2,] 1 2 38
[3,] 3 1 22
All the string values will be converted to categorical values.
5. Convert a Data Frame into a List
as.list() function in R Language is used to convert an object to a list.
These objects can be Vectors, Matrices, Factors, and data frames.
Syntax: as.list( object )
df<-data.frame(c1=c(1:5),
c2=c(6:10),
c3=c(11:15),
c4=c(16:20))
print("Sample Dataframe")
print (df)
Output
list=as.list(df) #each column in one list
print("After Conversion of Dataframe into list of Vectors")
print(list)
Data Frame with String Value
df <- data.frame(name = c("Test", "for", "Mark"),
roll_no = c(10, 20, 30),
age=c(20,21,22)
)
print("Sample Dataframe")
print (df)
print("Our list after being converted from a dataframe: ")
list=as.list(df)
list
5.1 Dataframe rows as a list of vectors
split() function in R Language is used to divide a data vector into
groups as defined by the factor provided.
Syntax: split(x, f)
Parameters:
x: represents data vector or data frame
f: represents factor to divide the data
df<-data.frame(c1=c(1:5),
c2=c(6:10),
c3=c(11:15),
c4=c(16:20))
print("Sample Dataframe")
print (df)
print("Result after conversion")
split(df, 1:nrow(df))
6. Convert a Matrix into a List
The as.list() is an inbuilt function that takes an R language object as an
argument and converts the object into a list. The same function is used to
convert matrix to a list. These objects can be Vectors, Matrices, Factors, and
data frames. By default, as.list() converts the matrix to a list of lists in column-
major order.
unlist(as.list(matrix))
By default it is column majour. Inorder to do it in row majour, the matrix is
transposed and converted to list.
Example: Column Major
mat = matrix(1:12,nrow=3, ncol=4)
print("Sample matrix:")
print(mat)
print("Matrix into a single list")
unlist(as.list(mat))
Example: Row Major
Rind the transpose of the matrix using t() function. And convert into list using
as.list().
mat = matrix(1:12,nrow=3, ncol=4)
print("Sample matrix:")
print(mat)
print("Result after conversion")
unlist(as.list ( t (mat)))
7. Convert List to matrix
x<-list(1:25,26:50,51:75,76:100,101:125,126:150,151:175,176:200)
x <- matrix(unlist(x), ncol = 10, byrow = TRUE)
x