BSC – V SEMESTER
Computational Statistics and R Programming
(Lab Manual)
l. Create a vector in R and perform operations on it.
# Use of 'c' function
# to combine the values as a vector.
# by default the type will be double
X <- c(1, 4, 5, 2, 6, 7)
print('using c function')
print(X)
# using the seq() function to generate
# a sequence of continuous values
# with different step-size and length.
# length.out defines the length of vector.
Y <- seq(1, 10, length.out = 5)
print('using seq() function')
print(Y)
# using ':' operator to create
# a vector of continuous values.
Z <- 5:10
print('using colon')
print(Z)
output :
using c function 1 4 5 2 6 7
using seq function 1.00 3.25 5.50 7.75 10.00
using colon 5 6 7 8 9 10
# Accessing elements using the position number.
X <- c(2, 5, 8, 1, 2)
print('using Subscript operator')
print(X[4])
# Accessing specific values by passing
# a vector inside another vector.
Y <- c(4, 5, 2, 1, 7)
print('using c function')
print(Y[c(4, 1)])
# Logical indexing
Z <- c(5, 2, 1, 4, 4, 3)
print('Logical indexing')
print(Z[Z>3])
using Subscript operator 5
using c function 1 4
Logical indexing 5 4 4
# Creating a vector
V.SAI KRISHNA M.Sc.,M.Tech.,(PhD) –SLNDC - ATP Page 1
X <- c(2, 5, 1, 7, 8, 2)
# modify a specific element
X[3] <- 11
print('Using subscript operator')
print(X)
# Modify using different logics.
X[X>9] <- 0
print('Logical indexing')
print(X)
# Modify by specifying the position or elements.
X <- X[c(5, 2, 1)]
print('using c function')
print(X)
Output:
Using subscript operator 2 5 11 7 8 2
Logical indexing 2 5 0 7 8 2
using c function 8 5 2
# Creating a vector
X <- c(5, 2, 1, 6)
# Deleting a vector
X <- NULL
print('Deleted vector')
print(X)
# Creating Vectors
X <- c(5, 2, 5, 1, 51, 2)
Y <- c(7, 9, 1, 5, 2, 1)
# Addition
Z <- X + Y
print('Addition')
print(Z)
# Subtraction
S <- X - Y
print('Subtraction')
print(S)
# Multiplication
M <- X * Y
print('Multiplication')
print(M)
# Division
D <- X / Y
print('Division')
V.SAI KRISHNA M.Sc.,M.Tech.,(PhD) –SLNDC - ATP Page 2
print(D)
Output:
Addition 12 11 6 6 53 3
Subtraction -2 -7 4 -4 49 1
Multiplication 35 18 5 5 102 2
Division 0.7142857 0.2222222 5.0000000 0.2000000 25.5000000
2.0000000
Sorting of Vectors
For sorting user use the sort() function which sorts the vector in ascending
order by default.
# Creating a Vector
X <- c(5, 2, 5, 1, 51, 2)
# Sort in ascending order
A <- sort(X)
print('sorting done in ascending order')
print(A)
# sort in descending order.
B <- sort(X, decreasing = TRUE)
print('sorting done in descending order')
print(B)
Output:
sorting done in ascending order 1 2 2 5 5 51
sorting done in descending order 51 5 5 2 2 1
2. Create integer, complex, logical, character data type objects in R
and print their values and their class using print and class functions.
Data Types
In programming, data type is an important concept. Variables can store data
of different types, and different types can do different things.
In R, variables do not need to be declared with any particular type, and can
even change type after they have been set:
Example
my_var <- 30 # my_var is type of numeric
my_var <- "Sally" # my_var is now of type character (aka string)
[1] "Sally"
R has a variety of data types and object classes. You will learn much more
about these as you continue to get to know R.
Basic Data Types
Basic data types in R can be divided into the following types:
numeric - (10.5, 55, 787)
integer - (1L, 55L, 100L, where the letter "L" declares this as an integer)
complex - (9 + 3i, where "i" is the imaginary part)
character (a.k.a. string) - ("k", "R is exciting", "FALSE", "11.5")
logical (a.k.a. boolean) - (TRUE or FALSE)
User can use the class() function to check the data type of a variable:
V.SAI KRISHNA M.Sc.,M.Tech.,(PhD) –SLNDC - ATP Page 3
Example
# numeric
x <- 10.5
class(x)
# integer
x <- 1000L
class(x)
# complex
x <- 9i + 3
class(x)
# character/string
x <- "R is exciting"
class(x)
# logical/boolean
x <- TRUE
class(x)
[1] "numeric"
[1] "integer"
[1] "complex"
[1] "character"
[1] "logical"
3. Write code in R to demonstrate sum(), min(), max() and seqQ
functions.
sum() function in R
sum() function in R Programming Language returns the addition of the values
passed as arguments to the function.
Syntax: sum(…)
Parameters:
numeric or complex or logical vectors
sum() Function in R Example
R program to add two numbers
Here user will use sum() functions to add two numbers.
R
a1=c(12,13)
sum(a1)
Output:
[1] 10
Sum() function with vector
Here user will use the sum() function with a vector, for this user will create a
vector and then pass each vector into sum() methods as a parameter.
R
# R program to illustrate
V.SAI KRISHNA M.Sc.,M.Tech.,(PhD) –SLNDC - ATP Page 4
# sum function
# Creating Vectors
x <- c(10, 20, 30, 40)
y <- c(1.8, 2.4, 3.9)
z <- c(0, -2, 4, -6)
# Calling the sum() function
sum(x)
sum(y)
sum(z)
sum(x, y, z)
Output:
[1] 100
[1] 8.1
[1] -4
[1] 104.1
Sum() function in a range
For this, user will pass the range in the sum() function.
R
# R program to illustrate
# sum function
# Calling the sum() function
sum(1:5) # Adding a range
sum(-1:-10)
sum(4:12)
Output:
[1] 15
[1] -55
[1] 72
Sum() function with NA
Here user will create a vector with NA value and then add using sum()
function.
R
x = c(1,2,-4,5,12,NA)
sum(x,na.rm=TRUE)
Output:
16
Sum() function with Single Dataframe Column.
R
data = data.frame(iris)
print(head(data))
sum(data$Sepal.Width)
Output:
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
V.SAI KRISHNA M.Sc.,M.Tech.,(PhD) –SLNDC - ATP Page 5
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
458.6
Sum() function with Multiple Dataframe Column.
R
data = data.frame(iris)
print(head(data))
sum(data$Sepal.Length,data$Sepal.Width,data$Petal.Length,data$Petal.Width
)
Output:
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
2078.7
Calculate the sum of values in a matrix.
R
# Create a 2x3 matrix of numeric values
m <- matrix(c(1, 2, 3, 4, 5, 6, 7), nrow = 2)
# Calculate the sum of values in the matrix
sum_m <- sum(m)
# Print the result
print(sum_m)
output :
[1] 29
In this article, user will discuss Min and Max functions in R Programming
Language.
Min:
The Min function is used to return the minimum value in a vector or the data
frame.
Syntax:
In a vector:
min(vector_name)
In a dataframe with in a column:
min(dataframe$column_name)
In a dataframe multiple columns:
min(c(dataframe$column1,dataframe$column2,......,dataframe$columnn))
In a whole dataframe across all columns
sapply(dataframe,min)
Max:
V.SAI KRISHNA M.Sc.,M.Tech.,(PhD) –SLNDC - ATP Page 6
Max function is used to return the maximum value in a vector or the data
frame.
Syntax:
In a vector:
max(vector_name)
In a dataframe with in a column:
max(dataframe$column_name)
In a dataframe multiple columns:
max(c(dataframe$column1,dataframe$column2,......,dataframe$columnn))
In a whole dataframe across all columns
sapply(dataframe,max)
Example 1:
This example is an R program to get minimum and maximum values in the
vector.
R
# create a vector
data = c(23, 4, 56, 21, 34, 56, 73)
# get the minimum value
print(min(data))
# get the maximum value
print(max(data))
Output:
[1] 4
[1] 73
Example 2:
This example is an R program to get the minimum and maximum values in
the data frame column.
R
# create a dataframe
data=data.frame(column1=c(23,4,56,21),
column2=c("sai","deepu","ram","govind"),
column3=c(1.3,4.6,7.8,6.3))
# get the minimum value in first column
print(min(data$column1))
# get the minimum value in second column
print(min(data$column2))
# get the minimum value in third column
print(min(data$column3))
# get the maximum value in first column
print(max(data$column1))
# get the maximum value in second column
V.SAI KRISHNA M.Sc.,M.Tech.,(PhD) –SLNDC - ATP Page 7
print(max(data$column2))
# get the maximumvalue in third column
print(max(data$column3))
Output:
[1] 4
[1] "deepu"
[1] 1.3
[1] 56
[1] "sai"
[1] 7.8
The min() and max() function in R compare character values according to their
ASCII codes when applied to character data.
Each character in the English language has a specific number thanks to the
ASCII (American Standard Code for Information Interchange) encoding system.
A character with the shortest ASCII code value is returned by the min()
function, while a character with the highest ASCII code value is returned by
the max() function.
Example 3:
This example is an R program to get min and max values across the data
frame using fapply() function.
R
# create a dataframe
data = data.frame(column1=c(23, 4, 56, 21),
column2=c("sai", "deepu", "ram", "govind"),
column3=c(1.3, 4.6, 7.8, 6.3))
# get the minimum value across dataframe
print(sapply(data, min))
# get the maximum value across dataframe
print(sapply(data, max))
Output:
column1 column2 column3
"4" "deepu" "1.3"
column1 column2 column3
"56" "sai" "7.8"
Example 4:
Max() and Min() function in R with NA values.
R
# A vector with missing values
x <- c(2, 4, 1, NA, 3)
# Finding maximum value
max(x)
# Finding minimum value
min(x)
Output:
> max(x)
V.SAI KRISHNA M.Sc.,M.Tech.,(PhD) –SLNDC - ATP Page 8
[1] NA
> min(x)
[1] NA
eq () is an operator in R for creating a sequence of numbers. syntax —
**seq(from, to, by,length.out)** where, from — the starting number in the
sequence to — the last number in the sequence by — difference between the
numbers in the sequence (optional) length.out — maximum length of the
vector (optional) The following recipe demonstrates an example on seq()
operator in R.
Step 1 - Define a vector and use seq()
x = seq(1,100) # Sequence of numbers between 1 - 100
print(x)
"Output of the code is:"
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
[19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
[37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
[55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
[73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
[91] 91 92 93 94 95 96 97 98 99 100
y = seq(1,100,by=2) # Sequence of numbers between 1 - 100 with a
difference of 2 print(y)
"Output of the code is:"
[1] 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47
49
[26] 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 87 89 91 93 95
97 99
z = seq(1,100,length.out = 10) # Sequence of numbers between 1 - 100 with
a maximum length of the sequence as 10 print(z)
"Output of the code is:"
[1] 1 12 23 34 45 56 67 78 89 100
{"mode":"full","isActive":false}
4. Write code in R to manipulate text in R using grep(), toupper(),
tolower() and substr() functions.
String Manipulation: Definition
A process using which you can analyze the strings is called string
manipulation. However, it is not only limited to analyzing but also
includes changing, slicing, parsing, the strings. The R programming language
does have built-in functions that will do this task most of the time. The
following are a few of them listed and which user are going to cover
throughout this article.
V.SAI KRISHNA M.Sc.,M.Tech.,(PhD) –SLNDC - ATP Page 9
The paste() Function
The substr() Function
The strsplit() Function
The grep() Function
The abbreviate() Function
The toupper() Function
The tolower() Function
The casefold() Function
The paste() Function
The paste() function in R programming allows us to concatenate multiple
string values to create a longer string. The function can use different
separators to separate each string values with the help of the “sep
=” argument. Moreover, it has the “collapse =” argument, that helps us to
make a single long line of string by collapsing multiple strings provided. The
syntax for the paste() function is as shown below:
paste(…, sep = " ", collapse = NULL)
Where,
... – specifies any R object that user need to print.
sep – specifies the separator which separates the multiple objects. It is an
optional argument
collapse – specifies whether the multiple inputs should be collapsed to create
a long string or not. It is an optional argument.
Let us see some examples for the paste() function.
> #Using paste() function to concatenate strings
> paste("My", "name", "is", "Lalit") #Without sep = argument
[1] "My name is Lalit" #Output
>
> paste("My", "name", "is", "Lalit", sep = ",") #With sep = argument
[1] "My,name,is,Lalit"
In the first example, user haven’t specified the “sep =” argument which leads
us to get a standard separator (white space it is). In the second example, user
have used the comma as a separator.
> a <- paste(c(2:5), c(10:13), sep = "--")
> print(a)
[1] "2--10" "3--11" "4--12" "5--13"
Here, the first and second string arguments are of the same length, therefore
each element of the first is concatenated with each element of the second
with separator “–“.
The substr() Function
The substr() function in R, helps us extracting or replacing a substring out of a
given string value or out of a vector of strings. This extraction/replacement
happens by specifying the beginning and the ending point. Meaning, the
substring will be extracted/replaced out of a given string based on the starting
index value and the ending index value.
The syntax for this function is as shown below:
substr(x, start, stop)
Where,
x – specifies the argument which is nothing but a vector of string/s or a direct
string itself.
V.SAI KRISHNA M.Sc.,M.Tech.,(PhD) –SLNDC - ATP Page 10
start – means the starting index point from where the function should start
extracting/replacing the substring.
end – means the ending index point at where the function will stop
extracting/replacing the substring and the result will be generated.
Let us see some examples of the substr() function
> #Extracting a substring out of a string
>
> x <- "Programming is fun"
> substr(x, 16, 18)
[1] "fun" #substring extracted from index 16 upto index 18
Similarly, user can replace the substring from a given string using a
combination of the assignment operator and the substr() function. Let us
replace “fun”, with “luv”
> #Replacing a substring out of a string
> substr(x, 16, 18) <- "luv"
> print(x)
[1] "Programming is luv"
This code replaces the substring from 16 to 18th position (i.e. “fun”) with a
new string “luv”.
> p <- c("I", "Like", "R", "Programming")
> substr(p, 1, 1) <- "$" #Replacing character at first position from
each
string under P
> print(p)
[1] "$" "$ike" "$" "$rogramming"
Here in this example, the first character of each string is replaced
with “$” from the original vector of strings.
> p <- c("I", "Like", "R", "Programming")
> #Replacing first character of each string with respective symbols as shown
below:
> substr(p, 1, 1) <- c("@", "#", "$", "%")
> print(p)
[1] "@" "#ike" "$" "%rogramming"
Here, the first character from each string is replaced with the subsequent
string symbols under the original vector of strings.
The strsplit() Function
When you are in need of splitting an entire string or a vector of strings into
multiple substrings using a literal or a regular expression, you can use
the strsplit() function. This function takes a vector of strings or a string as
input and returns a list of all elements that are repeated at a specific
character.
This function can be very useful in the case of Text-Mining or Text Analytics
and has syntax as shown below:
strsplit(x, split, fixed = FALSE)
Here,
x – specifies a vector of character strings or a string.
split – specifies the delimiter at which the split will happen. This can also be a
vector of characters at the occurrence of which the split is supposed to
happen.
V.SAI KRISHNA M.Sc.,M.Tech.,(PhD) –SLNDC - ATP Page 11
fixed – if set TRUE, this argument will allow you to split the text at a fixed
width.
> z <- "Next few months will be important to keep COVID-19 down."
> strsplit(z, " ") #Making a split at every space.
[[1]]
[1] "Next" "few" "months" "will" "be" "important"
[7] "to" "keep" "COVID-19" "down."
Here in this code, split happens at the occurrence of every space.
> o <- "Let's-split-text-at-every-hyphen"
> strsplit(o, "-") #Making split at every hyphen.
[[1]]
[1] "Let's" "split" "text" "at" "every" "hyphen"
In the example above, the split is happening at each hyphen. If you would
have noticed, the output generated by strsplit() after splitting is a list of
elements from the original vector.
> class(strsplit(o, "-"))
[1] "list"
> class(strsplit(z, " "))
[1] "list"
if you want to convert the output into a simple atomic vector of dimension
one, you can use the unlist() function and enclose the strsplit() into it to get a
vectored output.
> unlist(strsplit(z, " "))
[1] "Next" "few" "months" "will" "be" "important"
[7] "to" "keep" "COVID-19" "down."
>
> unlist(strsplit(o, "-"))
[1] "Let's" "split" "text" "at" "every" "hyphen"
The grep() Function
The grep() function is a pattern matcher as well as a searcher in R. This
function, if used on a string or a vector of strings, could search a specific
pattern of strings within the given string and return a relative index as an
output everywhere it finds the pattern match. Please note that it just returns
the relative index of the matched string and not the string itself. Syntax of this
function is as shown below:
grep(pattern, x, ignore.case = FALSE, perl = FALSE, value = FALSE,
fixed = FALSE, useBytes = FALSE, invert = FALSE)
Where,
pattern – specifies a string pattern that to be matched in a given string vector.
x – specifies a string or a vector of strings under which user are willing to
search the pattern.
ignore.case – ignores the case of string letters. If it has the value FALSE, the
pattern match will be case sensitive, if TRUE, the case of strings will be of no
concern.
perl – this argument specifies if the regular expressions that are compatible
with perl can be used or not. This is a logical argument.
value – This argument specifies whether the values are to be returned after
matching or just the indices of the matching values. The argument is logical.
fixed – lets the system know whether the matching is to be done as it is or not.
If TRUE, this overrides the conflicting arguments.
V.SAI KRISHNA M.Sc.,M.Tech.,(PhD) –SLNDC - ATP Page 12
useBytes – If set to TRUE, a byte-by-byte matching will be made for the
pattern.
invert – if set TRUE, it reverses the function and returns the indices of a non-
matching pattern.
Time for the examples associated with grep() function:
> #Using grep() to return indices of matched strings
> vect <- c("Mohan", "Sam", "Ben", "Eliza", "Tucker")
> grep("a", vect, value = FALSE)
[1] 1 2 4 #Returning indices of strings where pattern matched.
>
> #using grep() to return the strings with matched patern instead of indices
> vect <- c("Mohan", "Sam", "Ben", "Eliza", "Tucker")
> grep("a", vect, value = TRUE)
[1] "Mohan" "Sam" "Eliza" #Returning string where pattern matched.
Here, in first example, when user use the grep() function, it returns the indices
where pattern match found (i.e. “a”). Whereas, in the second example, it
returns all the values which are having a pattern match.
#Inverting the results of grep() function
> vect <- c("Mohan", "Sam", "Ben", "Eliza", "Tucker")
> grep("a", vect, value = TRUE, invert = TRUE)
[1] "Ben" "Tucker"
In this example, the “invert = TRUE” argument is used; which causes the
function to work inversely and it returns all those string values where pattern
in not matching.
The abbreviate() Function
You can abbreviate strings under R using the abbreviate() function. This
function abbreviate strings upto minimum length of four letters by default
(you can reduce this length and make abbreviations of two letters as well) in a
way that the string remains unique (unless there are duplicates).
Following is the syntax for this function:
abbreviate(names.arg, minlength = 4, use.classes = TRUE, dot = FALSE, strict
= FALSE, method = c("left.kept", "both.sides"), named = TRUE)
Here in this function:
names.arg – Meaning a vector of strings/names to be abbreviated.
minlength – specifies the minimum length up to which the abbreviation should
happen. By default, it takes four-letter abbreviations.
use.classes – specifies whether the abbreviation should follow the case class?
should lower case letters be truncated first? a logical argument and get
ignored by R right now.
dot – specifies whether the dots should be appended under abbreviated texts
or not. This is a logical argument.
strict – this argument keeps a strict look on minlength argument above if
mentioned TRUE. The default value is FALSE.
method – specifies how the truncation should happen. Whether the character
from left to keep or truncation should happen from both sides.
named – if TRUE, returns the names from the original vector along with the
vector name.
> #using abbreviate() function with default arguments
> continents <- c("Asia", "Europe", "North America", "South America",
"Antartica", "Africa", "Oceania")
> abbreviate(continents)
Asia Europe North America South America Antartica
V.SAI KRISHNA M.Sc.,M.Tech.,(PhD) –SLNDC - ATP Page 13
"Asia" "Eurp" "NrtA" "SthA" "Antr"
Africa Oceania
"Afrc" "Ocen"
You can see the abbrevations with minimum length as four characters. Also,
original names of the objects are kept in output.
> #abbrevating the string to two characters
> abbreviate(continents, minlength = 2)
Asia Europe North America South America Antartica
"As" "Er" "NA" "SA" "An"
Africa Oceania
"Af" "Oc"
Here, the minimum length is set to 2 characters. This causes us to abbrevate
the names upto two characters.
> #User can also remove the original names
> abbreviate(continents, minlength = 2, named = FALSE)
[1] "As" "Er" "NA" "SA" "An" "Af" "Oc"
Since the “named =” argument is specified as FALSE, the original names are
ignored from the output.
The toupper() Function
When user need to convert the entire string into the upper case, user can use
the toupper() function which is built-in under R. This function takes a string or
a vector of strings as an argument and then converts it into the upper case.
Syntax of the toupper() function is as below:
toupper(x)
Where,
x – specifies a character/string vector.
> #The toupper() function converts the string into upper case
> x <- "Let's manipulate the strings in R using different Functions"
> toupper(x)
[1] "LET'S MANIPULATE THE STRINGS IN R USING DIFFERENT FUNCTIONS"
This function, as said above, converts the string into upper case.
The tolower() Function
As countarary to the toupper() function, the tolower() function converts the
given upper case string or vector of upper case strings into lower case. Syntax
for this function is as shown below:
tolower(x)
Where,
x – specifies a character vector (with characters into upper case).
> # Converting string into lower case
> x <- "LET'S MANIPULATE THE STRINGS IN R USING DIFFERENT FUNCTIONS"
> tolower(x)
[1] "let's manipulate the strings in r using different functions"
The casefold() Function
The casefold() function is not much different from
the tolower() and toupper() functions. It also converts the given string
argument into lower or upper case depending on the value of
argument “upper”. If the value for upper is specified as TRUE, it converts the
string into the upper case; else converts the string into lower case by default.
Syntax of the casefold() function is as shown below:
casefold(x, upper = FALSE)
Where,
V.SAI KRISHNA M.Sc.,M.Tech.,(PhD) –SLNDC - ATP Page 14
x – represents the character vector that needs to be converted to a lower or
an upper case.
upper – argument specifies whether the string should be converted into an
upper case or lower case. Particularly, it is a logical argument with a default
value FALSE.
> #Examle of casefold() Function
> x <- "Let's use the casefold function"
> casefold(x, upper = FALSE) #Converting to lower case
[1] "let's use the casefold function"
> casefold(x, upper = TRUE) #Converting to upper case.
[1] "LET'S USE THE CASEFOLD FUNCTION"
5. Create data frame in R and perform operations on it.
DataFrame Operations in R
DataFrames are generic data objects of R which are used to store the tabular
data. Data frames are considered to be the most popular data objects in R
programming because it is more comfortable to analyze the data in the
tabular form. Data frames can also be taught as mattresses where each
column of a matrix can be of the different data types. DataFrame are made up
of three principal components, the data, rows, and columns.
Operations that can be performed on a DataFrame are:
Creating a DataFrame
Accessing rows and columns
Selecting the subset of the data frame
Editing dataframes
Adding extra rows and columns to the data frame
Add new variables to dataframe based on existing ones
Delete rows and columns in a data frame
Creating a DataFrame
In the real world, a DataFrame will be created by loading the datasets from
existing storage, storage can be SQL Database, CSV file, and an Excel file.
V.SAI KRISHNA M.Sc.,M.Tech.,(PhD) –SLNDC - ATP Page 15
DataFrame can also be created from the vectors in R. Following are some of
the various ways that can be used to create a DataFrame:
Creating a data frame using Vectors: To create a data frame user use
the data.frame() function in R. To create a data frame
use data.frame() command and then pass each of the vectors you have
created as arguments to the function.
Example:
# R program to illustrate dataframe
# A vector which is a character vector
Name = c("Amiya", "Raj", "Asish")
# A vector which is a character vector
Language = c("R", "Python", "Java")
# A vector which is a numeric vector
Age = c(22, 25, 45)
# To create dataframe use data.frame command and
# then pass each of the vectors
# user have created as arguments
# to the function data.frame()
df = data.frame(Name, Language, Age)
print(df)
Output:
Name Language Age
1 Amiya R 22
2 Raj Python 25
3 Asish Java 45
Creating a data frame using data from a file: Dataframes can also be created
by importing the data from a file. For this, you have to use the function called
‘read.table()‘.
Syntax:
newDF = read.table(path="Path of the file")
To create a dataframe from a CSV file in R:
Syntax:
newDF = read.csv("FileName.csv")
Accessing rows and columns
The syntax for accessing rows and columns is given below,
df[val1, val2]
df = dataframe object
val1 = rows of a data frame
val2 = columns of a data frame
So, this ‘val1‘ and ‘val2‘ can be an array of values such as “1:2” or “2:3” etc. If
you specify only df[val2] this refers to the set of columns only, that you need
to access from the data frame.
Example: Row selection
V.SAI KRISHNA M.Sc.,M.Tech.,(PhD) –SLNDC - ATP Page 16
# R program to illustrate operations
# on a data frame
# Creating a dataframe
df = data.frame(
"Name" = c("Amiya", "Raj", "Asish"),
"Language" = c("R", "Python", "Java"),
"Age" = c(22, 25, 45)
)
print(df)
# Accessing first and second row
cat("Accessing first and second row\n")
print(df[1:2, ])
Output:
Name Language Age
1 Amiya R 22
2 Raj Python 25
3 Asish Java 45
Accessing first and second row
Name Language Age
1 Amiya R 22
2 Raj Python 25
Example: Column selection
# R program to illustrate operations
# on a data frame
# Creating a dataframe
df = data.frame(
"Name" = c("Amiya", "Raj", "Asish"),
"Language" = c("R", "Python", "Java"),
"Age" = c(22, 25, 45)
)
print(df)
# Accessing first and second column
cat("Accessing first and second column\n")
print(df[, 1:2])
Output:
Name Language Age
1 Amiya R 22
2 Raj Python 25
3 Asish Java 45
Accessing first and second column
Name Language
1 Amiya R
2 Raj Python
V.SAI KRISHNA M.Sc.,M.Tech.,(PhD) –SLNDC - ATP Page 17
3 Asish Java
Selecting the subset of the DataFrame
A subset of a DataFrame can also be created based on certain conditions with
the help of following syntax.
newDF = subset(df, conditions)
df = Original dataframe
conditions = Certain conditions
Example:
# R program to illustrate operations
# on a data frame
# Creating a dataframe
df = data.frame(
"Name" = c("Amiya", "Raj", "Asish"),
"Language" = c("R", "Python", "Java"),
"Age" = c(22, 25, 45))
print(df)
# Selecting the subset of the data frame
# where Name is equal to Amiya
# OR age is greater than 30
newDf = subset(df, Name =="Amiya"|Age>30)
cat("After Selecting the subset of the data frame\n")
print(newDf)
Output:
Name Language Age
1 Amiya R 22
2 Raj Python 25
3 Asish Java 45
After Selecting the subset of the data frame
Name Language Age
1 Amiya R 22
3 Asish Java 45
Editing DataFrames
In R, DataFrames can be edited in two ways:
Editing data frames by direct assignments: Much like the list in R you can edit
the data frames by a direct assignment.
Example:
# R program to illustrate operation on a data frame
# Creating a dataframe
df = data.frame(
"Name" = c("Amiya", "Raj", "Asish"),
"Language" = c("R", "Python", "Java"),
"Age" = c(22, 25, 45)
)
cat("Before editing the dataframe\n")
print(df)
V.SAI KRISHNA M.Sc.,M.Tech.,(PhD) –SLNDC - ATP Page 18
# Editing dataframes by direct assignments
# [[3]] accessing the top level components
# Here Age in this case
# [[3]][3] accessing inner level components
# Here Age of Asish in this case
df[[3]][3] = 30
cat("After edited the dataframe\n")
print(df)
Output:
Before editing the data frame
Name Language Age
1 Amiya R 22
2 Raj Python 25
3 Asish Java 45
After edited the data frame
Name Language Age
1 Amiya R 22
2 Raj Python 25
3 Asish Java 30
Editing dataframes using the edit() command:
Follow the given steps to edit a DataFrame:
Step 1: So, what you need to do for this is you have to create an instance of
data frame, for example, you can see that here an instance of a data frame is
created and named as “myTable” by using the command data.frame() and
this creates an empty data frame.
myTable = data.frame()
Step 2: Next user will use the edit function to launch the viewer. Note that
“myTable” data frame is passed back to the “myTable” object and this way
the changes user make to this module will be saved to the original object.
myTable = edit(myTable)
So, when the above command is executed it will pop up a window like this,
Step 3: Now, the table is completed with this small roster.
V.SAI KRISHNA M.Sc.,M.Tech.,(PhD) –SLNDC - ATP Page 19
Note that, change variable names by clicking on their labels and typing your
changes. Variables can also be set as numeric or character. Once the data in
the DataFrame looks like the above, close the table. Changes are saved
automatically.
Step 4: Check out the resulting data frame by printing it.
> myTable
Name Language Age
1 Amiya R 22
2 Raj Python 25
3 Asish Java 45
Adding rows and columns to the data frame
Adding extra rows: User can add extra row using the command rbind(). The
syntax for this is given below,
newDF = rbind(df, the entries for the new row you have to add )
df = Original data frame
Note that the entries for the new row you have to add you have to be careful
when using rbind() because the data types in each column entry should be
equal to the data types that are already existing rows.
Example:
Python3
# R program to illustrate operation on a data frame
# Creating a dataframe
df = data.frame(
"Name" = c("Amiya", "Raj", "Asish"),
"Language" = c("R", "Python", "Java"),
"Age" = c(22, 25, 45))
cat("Before adding row\n")
print(df)
# Add a new row using rbind()
newDf = rbind(df, data.frame(Name = "Sandeep",
Language = "C",
Age = 23
))
cat("After Added a row\n")
print(newDf)
V.SAI KRISHNA M.Sc.,M.Tech.,(PhD) –SLNDC - ATP Page 20
Output:
Before adding row
Name Language Age
1 Amiya R 22
2 Raj Python 25
3 Asish Java 45
After Added a row
Name Language Age
1 Amiya R 22
2 Raj Python 25
3 Asish Java 45
4 Sandeep C 23
Adding extra columns: User can add extra column using the command cbind().
The syntax for this is given below,
newDF = cbind(df, the entries for the new column you have to add )
df = Original data frame
Example:
# R program to illustrate operation on a data frame
# Creating a dataframe
df = data.frame(
"Name" = c("Amiya", "Raj", "Asish"),
"Language" = c("R", "Python", "Java"),
"Age" = c(22, 25, 45))
cat("Before adding column\n")
print(df)
# Add a new column using cbind()
newDf = cbind(df, Rank=c(3, 5, 1))
cat("After Added a column\n")
print(newDf)
Output:
Before adding column
Name Language Age
1 Amiya R 22
2 Raj Python 25
3 Asish Java 45
After Added a column
Name Language Age Rank
1 Amiya R 22 3
2 Raj Python 25 5
3 Asish Java 45 1
Adding new variables to DataFrame
V.SAI KRISHNA M.Sc.,M.Tech.,(PhD) –SLNDC - ATP Page 21
In R, user can add new variables to a data frame based on existing ones. To do
that user have to first call the dplyr library using the command library() . And
then calling mutate() function will add extra variable columns based on
existing ones.
Syntax:
library(dplyr)
newDF = mutate(df, new_var=[existing_var])
df = original data frame
new_var = Name of the new variable
existing_var = The modify action you are taking(e.g log value, multiply by 10)
Example:
# R program to illustrate operation on a data frame
# Importing the dplyr library
library(dplyr)
# Creating a dataframe
df = data.frame(
"Name" = c("Amiya", "Raj", "Asish"),
"Language" = c("R", "Python", "Java"),
"Age" = c(22, 25, 45)
)
cat("Original Dataframe\n")
print(df)
# Creating an extra variable column
# "log_Age" which is log of variable column "Age"
# Using mutate() command
newDf = mutate(df, log_Age = log(Age))
cat("After creating extra variable column\n")
print(newDf)
Output:
Original Dataframe
Name Language Age
1 Amiya R 22
2 Raj Python 25
3 Asish Java 45
After creating extra variable column
Name Language Age log_Age
1 Amiya R 22 3.091042
2 Raj Python 25 3.218876
3 Asish Java 45 3.806662
Deleting rows and columns from a data frame
To delete a row or a column, first of all, you need to access that row or column
and then insert a negative sign before that row or column. It indicates that you
had to delete that row or column.
Syntax:
V.SAI KRISHNA M.Sc.,M.Tech.,(PhD) –SLNDC - ATP Page 22
newDF = df[-rowNo, -colNo]
df = original data frame
Example:
Python3
# R program to illustrate operation on a data frame
# Creating a dataframe
df = data.frame(
"Name" = c("Amiya", "Raj", "Asish"),
"Language" = c("R", "Python", "Java"),
"Age" = c(22, 25, 45)
)
cat("Before deleting the 3rd row and 2nd column\n")
print(df)
# delete the third row and the second column
newDF = df[-3, -2]
cat("After Deleted the 3rd row and 2nd column\n")
print(newDF)
Output:
Before deleting the 3rd row and 2nd column
Name Language Age
1 Amiya R 22
2 Raj Python 25
3 Asish Java 45
After Deleted the 3rd row and 2nd column
Name Age
1 Amiya 22
2 Raj 25
6. Import data into R from text and excel files using read.table () and
read.csv() functions.
How To Import Data from a File in R Programming
The collection of facts is known as data. Data can be in different forms. To
analyze data using R programming Language, data should be first imported in
R which can be in different formats like txt, CSV, or any other delimiter
separated files. After importing data then manipulate, analyze, and report it.
Import Data from a File in R Programming Language
In this article, user are going to see how to import different files in R
programming Language.
Import CSV file into R
Method 1: Using read.csv() methods.
Here user will import csv file using read.csv() method in R.
Syntax: read.csv(path, header = TRUE, sep = “,”)
Arguments :
path : The path of the file to be imported
header : By default : TRUE . Indicator of whether to import column headings.
V.SAI KRISHNA M.Sc.,M.Tech.,(PhD) –SLNDC - ATP Page 23
sep = “,” : The separator for the values in each row.
R
# specifying the path
path <- "/gfg.csv"
# reading contents of csv file
content <- read.csv(path)
# contents of the csv file
print (content)
Output:
ID Name Post Age
1 5 H CA 67
2 6 K SDE 39
3 7 Z Admin 28
Method 2: Using read.table() methods.
Here user will use read.table() methods to import CSV file into R Programming
Language.
R
# simple R program to read csv file using read.table()
x <- read.csv2("D://Data//myfile.csv", header = TRUE, sep=", ")
# print x
print(x)
Output:
Col1.Col2.Col3
1 100, a1, b1
2 200, a2, b2
3 300, a3, b3
Importing Data from a Text File
User can easily import or read .txt file using basic R function read.table().
read.table() is used to read a file in table format. This function is easy to use
and flexible.
Syntax:
# read data stored in .txt file
x<-read.table(“file_name.txt”, header=TRUE/FALSE)
R
# Simple R program to read txt file
x<-read.table("D://Data//myfile.txt", header=FALSE)
# print x
print(x)
Output:
V1 V2 V3
1 100 a1 b1
2 200 a2 b2
3 300 a3 b3
V.SAI KRISHNA M.Sc.,M.Tech.,(PhD) –SLNDC - ATP Page 24
If the header argument is set at TRUE, which reads the column names if they
exist in the file.
Importing Data from a delimited file
R has a function read.delim() to read the delimited files into the list. The file is
by default separated by a tab which is represented by sep=””, that separated
can be a comma(, ), dollar symbol($), etc.
Syntax: read.delim(“file_name.txt”, sep=””, header=TRUE)
R
x <- read.delim("D://Data//myfile.csv", sep="|", header=TRUE)
# print x
print(x)
# print type of x
typeof(x)
Output:
X.V1.V2.V3
1 1, 100, a1, b1
2 2, 200, a2, b2
3 3, 300, a3, b3
[1] "list
Importing Json file in R
Here user are going to use rjson package to import the JSON file into R
Programming Language.
R
# Read a JSON file
# Load the package required to read JSON files.
library("rjson")
# Give the input file name to the function.
res <- fromJSON(file = "E:\\exp.json")
# Print the result.
print(res)
Output:
$ID
[1] "1" "2" "3" "4" "5"
$Name
[1] "Mithuna" "Tanushree" "Parnasha" "Arjun" "Pankaj"
$Salary
[1] "722.5" "815.2" "1611" "2829" "843.25"
Importing XML file in R
To import XML file here user are going to use XML Package in R Programming
language.
XML file for demonestration:
HTML
<RECORDS>
V.SAI KRISHNA M.Sc.,M.Tech.,(PhD) –SLNDC - ATP Page 25
<STUDENT>
<ID>1</ID>
<NAME>Alia</NAME>
<MARKS>620</MARKS>
<BRANCH>IT</BRANCH>
</STUDENT>
<STUDENT>
<ID>2</ID>
<NAME>Brijesh</NAME>
<MARKS>440</MARKS>
<BRANCH>Commerce</BRANCH>
</STUDENT>
<STUDENT>
<ID>3</ID>
<NAME>Yash</NAME>
<MARKS>600</MARKS>
<BRANCH>Humanities</BRANCH>
</STUDENT>
<STUDENT>
<ID>4</ID>
<NAME>Mallika</NAME>
<MARKS>660</MARKS>
<BRANCH>IT</BRANCH>
</STUDENT>
<STUDENT>
<ID>5</ID>
<NAME>Zayn</NAME>
<MARKS>560</MARKS>
<BRANCH>IT</BRANCH>
</STUDENT>
</RECORDS>
Reading XML file:
It can be read after installing the package and then parsing it with xmlparse()
function.
R
# loading the library and other important packages
library("XML")
library("methods")
# the contents of sample.xml are parsed
data <- xmlParse(file = "sample.xml")
print(data)
Output:
1
Alia
620
IT
2
Brijesh
440
Commerce
V.SAI KRISHNA M.Sc.,M.Tech.,(PhD) –SLNDC - ATP Page 26
3
Yash
600
Humanities
4
Mallika
660
IT
5
Zayn
560
IT
Importing SPSS sav File into R
Here user are going to read SPSS .sav File in R programming language. For
this, user will use the haven package. To read SPSS files in R user use the
read_sav() function which is inside the haven package.
Syntax: read_sav(“FileName.sav”)
R
# import haven library package
library("haven")
# Use read_sav() function to read SPSS file
dataframe <- read_sav("SPSS.sav")
dataframe
Output:
7. Write code in R to find out whether number is prime or not.
R Program to Check Prime Number
A prime number is a positive number that is greater than 1 and whose factors
are 1 and the number itself. The numbers that are not prime are called
Composite Numbers. To identify a Prime Number user have to find its factor. In
V.SAI KRISHNA M.Sc.,M.Tech.,(PhD) –SLNDC - ATP Page 27
this article, user will discuss five different to find prime numbers and write
code in the R Programming language.
Example of Prime Number: 2, 3, 5, 7, 11, 13, 17, 19.......
Note: 1 is a number which is neither prime nor composite number.
Concepts related to the topic
Prime Number: A prime number is a positive number that is greater than 1 and
is divisible by 1 and the number itself.
which( ): This function is used to find the index or positions of elements in an
array or vector that satisfy the condition
all( ): This function checks if all elements in a vector satisfy some given
condition or not. It returns TRUE if all elements are TRUE else FALSE
length( ): This function is used to find the number of elements or size of an
array, or vector.
sum( ): This function adds all the numeric values in a vector.
seq(from, to, by): This function is used to generate sequences of numbers.
Generating numbers from to using b
Example 1: General way to find prime number
R
Find_Prime_No <- function(n1) {
if (n1 == 2) {
return(TRUE)
}
if (n1 <= 1) {
return(FALSE)
}
for (i in 2:(n1-1)) {
if (n1 %% i == 0) {
return(FALSE)
}
}
return(TRUE)
}
numb_1 <- 13
if (Find_Prime_No(numb_1)) {
# Using paste function to include the number in the output
print(paste(numb_1, "is a prime number"))
} else {
print("It is not a prime number")
}
Output:
[1] "13 is a prime number"
First user make a function which will check the whether a number is prime or
not.
User apply if condition to check for number 2. Since 2 is special case i.e only
even prime number.
V.SAI KRISHNA M.Sc.,M.Tech.,(PhD) –SLNDC - ATP Page 28
User use if condition to check whether num is less than equal to 1 and return
Flase is condition satisfy.
Now apply for loop from2 to (number -1) and check if it is divisible by any
number
if condition is true i.e. numb is having factor other than 1 and number itself so
return false.
Now if it come outside for loop then it is prime number and return true.
Example 2: Use ‘all’ Function to find prime number
R
# Find_Prime_No function to check if a number is prime number or not
Find_Prime_No <- function(n1) {
#if n1 is less than equal to 1 than return FALSE
if (n1 <= 1) {
return(FALSE)
}
#return true if n1 is 2 because 2 is special case i.e only even prime number
if (n1 == 2) {
return(TRUE)
}
#div vector from 2 to square root of n or user can use (n1-1)
div_vector <- 2:(n1-1)
#it checks if the remainder is not equal to 0 for all elements in the div_vector
and return TRUE
return(all(n1 %% div_vector != 0))
}
# Define a number to check whether it is prime or not
numb_1 <- 39
# use if statement to check the number is prime using the 'all' function
if (Find_Prime_No(numb_1)) {
# print "It is a prime number" if the condtion is true
print(paste(numb_1," is a prime number"))
} else {
#print "It is not a prime number" if the condtion is false
print(paste(numb_1," is not a prime number"))
}
Output:
[1] "39 is not a prime number"
In this example, user define Find_Prime_No function to check if a number is
prime or not
User use if condition to check whether num is less than equal to 1 and return
‘FALSE’ is condition satisfy.
User apply if condtion to check for number 2. Since 2 is special case i.e only
even prime number.
Declare div_vector from 2 to (n1-1).
Now user use the all function it checks if the remainder is not equal to 0 when
n is divided by all elements in the div_vector and return TRUE.
If this condition is true for all the element of div_vector then return ‘TRUE’ and
print that it is a prime number else return ‘FALSE’.
Example 3: Use ‘which’ Function to find prime number
R
V.SAI KRISHNA M.Sc.,M.Tech.,(PhD) –SLNDC - ATP Page 29
#this function to check if a number is prime or not
Find_Prime_No <- function(n1) {
#return true if n1 is 2 because 2 is special case i.e only even prime number
if (n1 == 2) {
return(TRUE)
}
#if n1 is less than equal to 1 than return FALSE
if (n1 <= 1) {
return(FALSE)
}
#div_vector from 2 to square root of n1
div_vector <- 2:sqrt(n1)
#which function is used to identify the positions where the n1 %% div == 0
is TRUE
#length function calculate the number of elements in the result obtained
from which
#If the length is 0, it means that no divisors were found and return true
return(length(which(n1 %% div_vector == 0)) == 0)
}
# Define a Prime_no to check whether it is prime or not
Prime_no <- 31
# use if statement to check the number is prime using the 'all' function
if (Find_Prime_No(Prime_no)) {
# print "Yes it is a prime number" if the condtion is true
print("Yes it is a prime number")
} else {
#print "No it is not a prime number" if the condtion is false
print("No it is not a prime number")
}
Output:
[1] "31 is a prime number"
In this example, user define Find_Prime_No function to check if a number is
prime or not.
User use if condition to check whether n1 is less than equal to 1 and return
Flase is condition satisfy.
User apply if condtion to check for number 2. Since 2 is special case i.e only
even prime number.
Declare div_vector from 2 to square root of n.
User use ‘which’ function to identify the positions where the n1 %% div_vector
== 0 is true.
User use ‘length’ function to calculate the number of elements in the result
obtained from ‘which’ function.
If the length is 0, it means that no divisors were found and return ‘TRUE’.
If this condition is true for all the element of div_vector then return ‘TRUE’ and
print that it is a prime number else return false.
Example 4: Find Prime Number Using Vectorized Operations
R
# This function to check if a number is prime or not
Find_Prime_No <- function(n1) {
#return true if n1 is 2 because 2 is special case i.e only even prime number
V.SAI KRISHNA M.Sc.,M.Tech.,(PhD) –SLNDC - ATP Page 30
if (n1 == 2) {
return(TRUE)
}
#if n1 is less than equal to 1 then return FALSE
if (n1 <= 1) {
return(FALSE)
}
#div_vector from 2 to (n1-1)
div_vector <- 2:(n1-1)
# n1 %% divisors finds the remainder, when n1 is divided by each element of
the div_vector.
#'sum' finds the sum of the boolean values resulting from the condition.
#If the sum is 0,then there are no divisors other than 1 and number itself.
return(sum(n1 %% div_vector == 0) == 0)
}
# Define a number to check whether it is prime or not
numb_1 <- 19
# use if statement to check the number is prime using the 'sum' function
if (Find_Prime_No(numb_1)) {
# print "Yes it is a prime number" if the condition is true
print("Yes it is a prime number")
} else {
#print "No it is not a prime number" if the condition is false
print("No it is not a prime number")
}
Output:
[1] "Yes it is a prime number"
8. Print numbers from 1 to 100 using while loop and for loop in R.
R while Loop
while loops are used when you don't know the exact number of times a block
of code is to be repeated. The basic syntax of while loop in R is:
while (test_expression)
{
# block of code
}
number = 1
# while loop to print numbers from 1 to 5
while(number <= 10) {
print(number)
# increment number by 1
number = number + 1
# break if number is 6
if (number == 6) {
break
}
V.SAI KRISHNA M.Sc.,M.Tech.,(PhD) –SLNDC - ATP Page 31
}
For-loops in R
In many programming languages, a for-loop is a way to iterate across a
sequence of values, repeatedly running some code for each value in the list. In
R, the general syntax of a for-loop is
for(var in sequence) {
code
}
for (i in 1:100){
if(i %% 2 == 0 & i %% 5 == 0){
print("both")
}
else if(i %% 2==0){
print("two")
}
else if (i %% 5 ==0){
print("five")
}
else{
print(i)
}
}
9. Write a program to import data from csv file and print the data on
the console. Working with CSV files in R Programming.
CSV files are basically the text files wherein the values of each row are
separated by a delimiter, as in a comma or a tab. In this article, user will use
the following sample CSV file:
sample.csv
id, name, department, salary, projects
1, A, IT, 60754, 4
2, B, Tech, 59640, 2
3, C, Marketing, 69040, 8
4, D, Marketing, 65043, 5
5, E, Tech, 59943, 2
6, F, IT, 65000, 5
7, G, HR, 69000, 7
Reading a CSV file
The contents of a CSV file can be read as a data frame in R using the
read.csv(…) function. The CSV file to be read should be either present in the
current working directory or the directory should be set accordingly using the
setwd(…) command in R. The CSV file can also be read from a URL
using read.csv() function.
Examples:
csv_data <- read.csv(file = 'sample.csv')
V.SAI KRISHNA M.Sc.,M.Tech.,(PhD) –SLNDC - ATP Page 32
print(csv_data)
# print number of columns
print (ncol(csv_data))
# print number of rows
print(nrow(csv_data))
Output:
id, name, department, salary, projects
1 1 A HR 60754 14
2 2 B Tech 59640 3
3 3 C Marketing 69040 8
4 4 D HR 65043 5
5 5 E Tech 59943 2
6 6 F IT 65000 5
7 7 G HR 69000 7
[1] 4
[1] 7
The header is by default set to a TRUE value in the function. The head is not
included in the count of rows, therefore this CSV has 7 rows and 4 columns.
Querying with CSV files
SQL queries can be performed on the CSV content, and the corresponding
result can be retrieved using the subset(csv_data,) function in R. Multiple
queries can be applied in the function at a time where each query is separated
using a logical operator. The result is stored as a data frame in R.
Examples:
csv_data <- read.csv(file ='sample.csv')
min_pro <- min(csv_data$projects)
print (min_pro)
Output:
2
Aggregator functions (min, max, count etc.) can be applied on the CSV data.
Here the min() function is applied on projects column using $ symbol. The
minimum number of projects which is 2 is returned.
csv_data <- read.csv(file ='sample.csv')
new_csv <- subset(csv_data, department == "HR" & projects <10)
print (new_csv)
Output:
id, name, department, salary, projects
4 4 D HR 65043 5
7 7 G HR 69000 7
The subset of the data that is created is stored as a data frame satisfying the
conditions specified as the arguments of the function. The employees D and G
are HR and have the number of projects<10. The row numbers are retained in
the resultant data frame.
V.SAI KRISHNA M.Sc.,M.Tech.,(PhD) –SLNDC - ATP Page 33
Writing into a CSV file
The contents of the data frame can be written into a CSV file. The CSV file is
stored in the current working directory with the name specified in the function
write.csv(data frame, output CSV name) in R.
Examples:
csv_data <- read.csv(file ='sample.csv')
new_csv <- subset(csv_data, department == "HR" & projects <10)
write.csv(new_csv, "new_sample.csv")
new_data <-read.csv(file ='new_sample.csv')
print(new_data)
Output:
X id, name, department, salary, projects
1 4 4 D HR 65043 5
2 7 7 G HR 69000 7
The column X contains the row numbers of the original CSV file. In order to
remove it, user can specify an additional argument in the write.csv() function
that set row names to FALSE.
csv_data <- read.csv(file ='sample.csv')
new_csv <- subset(csv_data, department == "HR" & projects <10)
write.csv(new_csv, "new_sample.csv", row.names = FALSE)
new_data <-read.csv(file ='new_sample.csv')
print(new_data)
Output:
id, name, department, salary, projects
1 4 D HR 65043 5
2 7 G HR 69000 7
The original row numbers are removed from the new CSV.
10. Write a program to demonstrate histogram in R.
Histograms in R language
A histogram contains a rectangular area to display the statistical information
which is proportional to the frequency of a variable and its width in successive
numerical intervals. A graphical representation that manages a group of data
points into different specified ranges. It has a special feature that shows no
gaps between the bars and is similar to a vertical bar graph.
R – Histograms
User can create histograms in R Programming Language using the hist()
function.
Syntax: hist(v, main, xlab, xlim, ylim, breaks, col, border)
Parameters:
v: This parameter contains numerical values used in histogram.
main: This parameter main is the title of the chart.
col: This parameter is used to set color of the bars.
xlab: This parameter is the label for horizontal axis.
V.SAI KRISHNA M.Sc.,M.Tech.,(PhD) –SLNDC - ATP Page 34
border: This parameter is used to set border color of each bar.
xlim: This parameter is used for plotting values of x-axis.
ylim: This parameter is used for plotting values of y-axis.
breaks: This parameter is used as width
of each bar.
Creating a simple Histogram in R
Creating a simple histogram chart by
using the above parameter. This
vector v is plot using hist().
Example:
R
# Create data for the graph.
v <- c(19, 23, 11, 5, 16, 21, 32, 14, 19,
27, 39)
# Create the histogram.
hist(v, xlab = "No.of Articles ",
col = "green", border = "black")
Output:
Histograms in R language
Range of X and Y values
To describe the range of values user
need to do the following steps:
User can use the xlim and ylim
parameters in X-axis and Y-axis.
Take all parameters which are
required to make a histogram chart.
Example
R
# Create data for the graph.
v <- c(19, 23, 11, 5, 16, 21, 32, 14,
19, 27, 39)
# Create the histogram.
hist(v, xlab = "No.of Articles", col = "green",
border = "black", xlim = c(0, 50),
ylim = c(0, 5), breaks = 5)
Output:
Histograms in R language
Using histogram return values for labels using text()
To create a histogram return value chart.
R
# Creating data for the graph.
v <- c(19, 23, 11, 5, 16, 21, 32, 14, 19,
27, 39, 120, 40, 70, 90)
# Creating the histogram.
V.SAI KRISHNA M.Sc.,M.Tech.,(PhD) –SLNDC - ATP Page 35
m<-hist(v, xlab = "Weight", ylab ="Frequency",
col = "darkmagenta", border = "pink",
breaks = 5)
# Setting labels
text(m$mids, m$counts, labels = m$counts,
adj = c(0.5, -0.5))
Output:
Histograms in R language
Histogram using non-uniform width
Creating different width histogram charts, by using the above parameters,
user created a histogram using non-uniform width.
Example
R
# Creating data for the graph.
v <- c(19, 23, 11, 5, 16, 21, 32, 14,19,
27, 39, 120, 40, 70, 90)
# Creating the histogram.
hist(v, xlab = "Weight", ylab
="Frequency",
xlim = c(50, 100),
col = "darkmagenta", border =
"pink",
breaks = c(5, 55, 60, 70, 75,
80, 100, 140))
Output:
V.SAI KRISHNA M.Sc.,M.Tech.,(PhD) –SLNDC - ATP Page 36