Statistics-with-R (R Cheatsheet)

Cleaning Data:

library(dplyr)

Peek into data glimpse(data) str(data) head(data, number_of_columns) tail(data) args(data) summary(data) % Gives mean, min, max, median, NA's

Change class of a column class(data$column) as.numeric(data$column) as.factor(data$column)...

====================================================================================== Re-order data gather(data, key, value, ..., na.rm = FALSE, convert = FALSE, factor_key = FALSE) The most important function in tidyr is gather(). It should be used when you have columns that are not variables and you want to collapse them into key-value pairs. Arguments: data, key, variable, (-)columns to ignore

spread(data, key, value, fill = NA, convert = FALSE, drop = TRUE, sep = NULL) The opposite of gather() is spread(), which takes key-values pairs and spreads them across multiple columns. This is useful when values in a column should actually be column names (i.e. variables). It can also make data more compact and easier to read. Arguments: same as before

seperate(data, col, into, sep = "[^[:alnum:]]+", remove = TRUE, convert = FALSE, extra = "warn", fill = "warn", ...) The separate() function allows you to separate one column into multiple columns. Unless you tell it otherwise, it will attempt to separate on any character that is not a letter or number. You can also specify a specific separator using the sep argument. Arguments: data, Column to be seperated, names of new columns, seperator "-/_etc."

unite(data, col, ..., sep = "_", remove = TRUE) The opposite of separate() is unite(), which takes multiple columns and pastes them together. By default, the contents of the columns will be separated by underscores in the new column, but this behavior can be altered via the sep argument. Arguments: Data, Name of new united column, columns to be united, seperator "-/_etc."

=============================================================================== String Manipulation with stringr library(stringr)

str_trim(" this is string ") [1] "this is string"

%pad a string with any character on any sides str_pad("256459", width = 10, side = "left", pad = "0") [1] 00002566459

toupper("string") [1] STRING tolower()

str_detect(data$column, "string") Detects if "string" is present in the column and returns a vector of TRUE and FALSE values.

str_replace(data$column, "string", "newstring") Replaces all instances of "string" in the column to "newstring"

=============================================================================== Missing data

is.na(data) Returns a matrix of TRUE and FALSE values, TRUE if the corresponding cell is NA

any(is.na(data)) returns true if there is any NA cell

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
DataCamp		DataCamp
README.md		README.md
digsym_clean.csv		digsym_clean.csv
exercise01_2564744_2566459.R		exercise01_2564744_2566459.R
exercise02_2564744_2566459.R		exercise02_2564744_2566459.R
exercise02_2564744_2566459_old.R		exercise02_2564744_2566459_old.R
exercise03_2564744_2566459.R		exercise03_2564744_2566459.R
exercise04_2564744_2566459.R		exercise04_2564744_2566459.R
exercise05_2564744_2566459.R		exercise05_2564744_2566459.R
exercise06_2564744_2566459.R		exercise06_2564744_2566459.R
exercise07_2564744_2566459.R		exercise07_2564744_2566459.R
exercise08_2564744_2566459.R		exercise08_2564744_2566459.R
exercise09_2564744_2566459.R		exercise09_2564744_2566459.R
exercise10_2564744_2566459.R		exercise10_2564744_2566459.R
exercise11_2564744_2566459.R		exercise11_2564744_2566459.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Statistics-with-R (R Cheatsheet)

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

zshn25/Statistics-with-R

Folders and files

Latest commit

History

Repository files navigation

Statistics-with-R (R Cheatsheet)

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages