R and R studio
Oliver Hinder
12/23/2020
R and R studio
I R is a data analysis environment
I The programming language
I The system of libraries and books around it.
I R Studio is an Integrated Development environment.
I If you are already a computer programmer, you can use your editing
tools to work in R.
I R Studio includes an editor, console, environment viewer, history
I Access to help files
I Goal of this lecture: a quick start guide to R if either
I you have programmed before but not in R
I you are a little rusty in R
R Markdown
I R Markdown is a format for writing documents and presentations.
I This is an R Markdown presentation.
I For more details see http://rmarkdown.rstudio.com.
I When you click the Knit a document will be generated that includes
both content as well as the output of any embedded R code.
I HWs and projects should be done in Rmarkdown
Action item: Open this presentation
(IntroductionToSoftwareTools/LectureS2.Rmd) and try knitting it.
Cheatsheets
I Click Help in the menu bar then Cheetsheets then click the
cheatsheet you want to look at.
Action item: Explore R Markdown Cheat Sheet, RStudio IDE Cheat
Sheet and R Markdown Cheat Sheet
Interactive R
R can be used as a calculator
3+5
## [1] 8
Run a function
x = rnorm(20)
y = rexp(20)
plot(x,y)
2.0
1.5
y
1.0
0.5
0.0
−2 −1 0 1
x
What does a function do?
Type ? in front of a function in the console to find out what a function
does
?rnorm
?rexp
Creating your own function
add_two <- function(input_number) {
output_number <- input_number + 2
return(output_number)
}
add_two(3)
## [1] 5
Adding comments
# This is a comment in R.
# Remember use comments to document your code.
Some data types
Some standard data types
I Logical
TRUE == FALSE # True equals false?
## [1] FALSE
!FALSE # Not false?
## [1] TRUE
FALSE | TRUE # Either true or false?
## [1] TRUE
More data types
I Floats
x <- 1.0
# Double means a floating point number stored with 64 bits of pr
# https://en.wikipedia.org/wiki/Double-precision_floating-point_
typeof(x)
## [1] "double"
class(x)
## [1] "numeric"
y <- 1
class(y) # This is still a floating point number
## [1] "numeric"
I Integers
z <- 1L # Add L to specify that is an integer
class(z)
## [1] "integer"
More data types
I Characters
my_string <- "hello"
class(my_string)
## [1] "character"
print(my_string)
## [1] "hello"
I Dates
course_start <- as.Date("2020-01-18")
course_end <- as.Date("2020-05-01")
course_length <- course_end - course_start
print(course_length)
## Time difference of 104 days
Some data structures
I Vectors (numbers)
I You can apply functions over entire vectors (vectorization)
a <- c(3, 5, 3, 7, 10)
b = c(1, 2, 3, 4, 5)
1/b
## [1] 1.0000000 0.5000000 0.3333333 0.2500000 0.2000000
sum((a-mean(a))^2)/(length(a)-1)
## [1] 8.8
var(a)
## [1] 8.8
More data structures
I Factors are more efficient ways of storing ordered categorical data
I Each unique categorical value is replaced by an integer
I Less memory that storing each string
factored_data <- factor(c("rain", "sun",
"rain", "rain",
"sun"))
factored_data
## [1] rain sun rain rain sun
## Levels: rain sun
Sequences
I You can construct a sequence like a for loop
seq(from=-5, to=5, by=2)
## [1] -5 -3 -1 1 3 5
seq(-5, 5, 2)
## [1] -5 -3 -1 1 3 5
seq(-5, 5, by=2)
## [1] -5 -3 -1 1 3 5
?seq
I You can construct a sequence using :
-2:4
## [1] -2 -1 0 1 2 3 4
Matrices
I Create a matrix from a single vector
mat <- matrix(nrow = 3, ncol = 2, c(1,2,3,4,5,6))
mat
## [,1] [,2]
## [1,] 1 4
## [2,] 2 5
## [3,] 3 6
More matrices
I Create matrix by binding columns together
mat2 <- cbind(1:4, c("dog", "cat", "bird", "dog"))
mat2
## [,1] [,2]
## [1,] "1" "dog"
## [2,] "2" "cat"
## [3,] "3" "bird"
## [4,] "4" "dog"
I Create matrix by binding rows together
mat3 <- rbind(c(1,2,4,5), c(6,7,0,4))
mat3
## [,1] [,2] [,3] [,4]
## [1,] 1 2 4 5
## [2,] 6 7 0 4
Data frames
I Like a matrix, but you can have different data types.
I Columns can be named
I Data frames are the primary data structure you will work with.
I The better R packages all work on data frames.
I ggplot2, tidyr, reshape, dplyr
students <- data.frame(c("Cedric","Fred","George","Cho",
"Draco","Ginny"),
c(3,2,2,1,0,-1),
c("H", "G", "G", "R", "S", "G"))
Data frames
I naming the columns:
names(students) <- c("name", "year", "house")
students
## name year house
## 1 Cedric 3 H
## 2 Fred 2 G
## 3 George 2 G
## 4 Cho 1 R
## 5 Draco 0 S
## 6 Ginny -1 G
I Access each column by its name
students$name
## [1] "Cedric" "Fred" "George" "Cho" "Draco" "Ginny"
students$year
## [1] 3 2 2 1 0 -1
students$house
Dataframes from vectors
I You can also create a dataframe from a set of vectors
hogwart_students <- data.frame(name = students$name,
year = students$year,
house = students$house)
hogwart_students
## name year house
## 1 Cedric 3 H
## 2 Fred 2 G
## 3 George 2 G
## 4 Cho 1 R
## 5 Draco 0 S
## 6 Ginny -1 G
Tips for coding
I use descriptive variable names, e.g., hogwarts_students rather than
x
I don’t be afraid to use Google/stack overflow to figure out issues
I play around in the console to understand how things work
I come to office hours!
Installing packages
What makes R powerful is the available packages.
install.packages("name_of_package_goes_here")
For this course:
# paste this into command line
install.packages(c("DMwR", "rattle"),
dependencies = c("Depends", "Suggests"))
will install most of the packages we need.
R studio will often detect when a package is not installed and suggest you
install it.
Modern dive
As an introduction to R packages we will find useful in this course we
cover Chapter 1-4 of modern dive before switching to the applied
predictive modeling textbook for the core course content.
Learning checkpoint
I Read Chapter 1 of Modern Dive
I Create a Rmarkdown document containing the answers to the
learning check points in Chapter 1 of ModernDive
I Submit the Rmd file with your answers via canvas.