0% found this document useful (0 votes)

33 views3 pages

DSCI 100 Cheat Sheet

UBC DSCI 100 Cheat Sheet

Uploaded by

tholee0617

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views3 pages

DSCI 100 Cheat Sheet

UBC DSCI 100 Cheat Sheet

Uploaded by

tholee0617

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 3

Week1:

print (data_frame, n) <- print the first set of rows

nrow (data_frame) <- knows how many rows there are

filter (data_frame, column = (condition of row)) <- filter our specific rows in this column that satisfy
certain conditions.
filter (data_frame, (condition of row)) <- filter out specific rows that satisfy certain conditions.

select (data_frame, column1, ) <- filter out columns.

mutate(data_frame, new column name = conditions) <- creates new columns based on certain
conditions.

ggplot(data_frame, aes (x-axis, y-axis)) + geom_(typeofgraph) () + xlab (“”) + ylab (“”) + theme(text
= element_text(size=n))<- used to make plots, xlab, ylab for labels, and theme for changing label font
size.

Week2:

Arguments:
file <- This is the file name, path to a file, or URL.
delim <- Character that separates columns in file.
col_names <- Specifies whether or not the first row of data in your file are column labels. Also allows
you to create a vector that can be used to label columns.
skip <- Specifies the number of line which must be ignored because they contain metadata.

CSV vs CSV2
read. csv is used for data where commas are used as separators and periods are used as decimals
while read. csv2 is for data where semicolons are used as separators and commas are used as
decimals.

metadata <- Metadata is data about data. This refers to not the data itself, but rather to any
information that describes some aspect of the data.

read_csv (“data/filename”) <- commas

read_delim(file = “data/filename”, delim = “type of file (,/;/tab)”) <- Can read all kinds of files,
comma/semicolon/tab/…

read_excel(path = x, sheet = int) <- Used to read excel datasets. (Remember to end excel datasets
with .xls)

skip <- argument within read_* function, used to skip the first few lines that summarizes the data
information.

col_names <- argument within read_* function, character vector used to name the columns.

dbConnect(dbConnect(RSQLite::SQLite(), "data/can_lang.db") <- reads from SQLite database.

dbConnect(RPostgres::Postgres(), dbname = "xxx",

host = "xxx", port = 5432,
user = "xxx", password = "xxx") <-reads from PostgreSQL database.
dbListTables(conn_lang_data) <- Can get the names of all the database in the table using the
dbListTables function.

ggplot(faithful, aes(x = …, y = …)) + geom_histogram(bins = x) <- function for histogram

collect (data_frame) <- Used to download from the database.

download.file (url, destfile = “file_location”) <- Used to download file from internet.

make.names(colnames(data_frame))<- Used to remove white space.

plot_grid(plot1, plot2, ncol = 1) <- Combine two plots.

write_csv(data_frame, file/data_frame.csv) <- Used to put a dataset into csv file.

col_names = FALSE<- Used when there are no column names.

Week3:

vector<- An ordered collection of one, or more, values of the same data type.

list<- An ordered collection of one, or more, values of possibly different data types.

data frame<- A list of either vectors or lists of the same length, with column names. We typically use
a data frame to represent a data set.

rows = observations, columns = variables, cells = values

across(col1:col2)<- allows you to apply function(s) to multiple columns

filter (col == “row”)<-subsets rows of a data frame

group_by (col/row)<-Takes an existing data set and converts it into a grouped data set where
operations are performed “by group”.

mutate(new_name = conditions)<- adds or modifies columns in a data frame

map(row)<-Transforms the input by applying a function to each element and returning a vector the
same length as the input. For Columns.

pivot_longer (data_frame, cols = (cols we want to combine), names_to = (name of new col from
name of cols we want to combine), values_to = (name of new col comes from values of col we
want to combine)<- generally makes the data frame longer and narrower

pivot_wider(data_frame, names_from = (name of the col from which to take the variable
names), values_from = (the name of the col from which to take values))<-generally makes a data
frame wider and decreases the number of rows

rowwise ()<- applies functions across columns within one row

separate(data_frame, col = (name of the col we need to split), into = c(“newcolnames”,

“newcolnames”), sep = “/”)<- splits up a character column into multiple columns

select(data_frame, col_name)<- subsets columns of a data frame

summarize (name = function (mean))<- calculates summaries of inputs, and creates a new data
frame works similar to mutate.

%>%<- takes the output of one statement and makes it the input of another statement (makes the
code more human redable).

facet_wrap(~ factor(Month, levels = c("Jan","Feb","Mar","Apr","May","Jun",

"Jul","Aug","Sep","Oct","Nov","Dec"))) <- Used when trying to
combine variables together.
na.rm = TRUE <- Used to filter out NA values.

arrange(desc/asce(value)) <- arrange functions in desc/asce orders.

arrange() <-reorder the rows of a data frame/table by using column names.

slice () <- lets you index rows by their (integer) locations.

group_by + summarize<- Calculating summary statistics on one or more column(s) for each group.
It creates a new data frame—with one row for each group—containing the summary statistic(s) for
each column being summarized.

map_df (mean, na.rm = TRUE)<- Used to find the average of each column in the data set.

When we do -name it means taking that column or row out!!!

map_df() <- Iterate over the columns in an data frame and calculate a value for each column.

Week4:

is.na () <- find the values that are not equal to NA.

!= <- filter rows that are not equal to something.

ggplot(data_frame, aes(x = , y = )) + geom_point(aes(colour = Base on what?, shape =Base on

what? )) + labs(x = "", y = "", colour = "", shape = "")

facet_grid (cols/rows = vars (col1)) <- Used to separate one graph to two graphs to compare. A layer
to ggplot.

ggplot(data_frame, aes(x = , y = )) + geom_bar(stat = "identity") + xlab("xt") + ylab("y") +

theme(text = element_text(size = 20))<- Used to plot bar charts.

coord_flip ()<- Flip cartesian coordinates so that horizontal becomes vertical, and vertical, horizontal.

semi_join(data_frame)<- Gives the intersection of two data frames.

ggplot(marks_df, aes(x=, fill=y)) + geom_histogram() + facet_grid(rows=vars(y))<- Used to

create histograms in seperate panels (charts).

Python Cheatsheet
No ratings yet
Python Cheatsheet
2 pages
R File Code
No ratings yet
R File Code
16 pages
Important R Codes and Notes
No ratings yet
Important R Codes and Notes
13 pages
Lab Week2-3
No ratings yet
Lab Week2-3
26 pages
Content: Dplyr, Readr, TM, Ggplot2/+ggforce/, Tidyr, Broom Dplyr
No ratings yet
Content: Dplyr, Readr, TM, Ggplot2/+ggforce/, Tidyr, Broom Dplyr
8 pages
Mod3 Tables EPP
No ratings yet
Mod3 Tables EPP
9 pages
R-Cheat Sheet
100% (1)
R-Cheat Sheet
4 pages
R Command Cheatsheet2551545
No ratings yet
R Command Cheatsheet2551545
2 pages
RSTUDIO
No ratings yet
RSTUDIO
44 pages
R Topicscovered
No ratings yet
R Topicscovered
22 pages
Importing The Files
No ratings yet
Importing The Files
14 pages
CRM Cheat Sheet
No ratings yet
CRM Cheat Sheet
7 pages
R
No ratings yet
R
13 pages
MBA Sem 1 Unit 3 Fundamentals of R
No ratings yet
MBA Sem 1 Unit 3 Fundamentals of R
41 pages
UNIT II (R Programming)
No ratings yet
UNIT II (R Programming)
89 pages
Tutorial-Introduction To Dplyr
No ratings yet
Tutorial-Introduction To Dplyr
54 pages
Apunts BLOC 1 Estadística
No ratings yet
Apunts BLOC 1 Estadística
15 pages
R Guru Cheat Sheet
No ratings yet
R Guru Cheat Sheet
2 pages
R - Lecture #2
No ratings yet
R - Lecture #2
21 pages
Rtips. Revival 2012!: Paul E. Johnson June 8, 2012
No ratings yet
Rtips. Revival 2012!: Paul E. Johnson June 8, 2012
72 pages
R Basic and Advanced
No ratings yet
R Basic and Advanced
9 pages
R Programming Cheat Sheet
No ratings yet
R Programming Cheat Sheet
7 pages
People Analytics With R Part 4
No ratings yet
People Analytics With R Part 4
11 pages
Introduction To R PDF
No ratings yet
Introduction To R PDF
56 pages
6 Working With Data Frames in R
No ratings yet
6 Working With Data Frames in R
8 pages
R/Rpad Reference Card: Slicing and Extracting Data
No ratings yet
R/Rpad Reference Card: Slicing and Extracting Data
5 pages
R Reference Card
100% (4)
R Reference Card
4 pages
DA Lab Week-2
No ratings yet
DA Lab Week-2
22 pages
Session Set Working Directory Choose Directlry
No ratings yet
Session Set Working Directory Choose Directlry
17 pages
Lab1 411 Eman Yahya 7773225
No ratings yet
Lab1 411 Eman Yahya 7773225
16 pages
R Examples
No ratings yet
R Examples
56 pages
R Reference Card
No ratings yet
R Reference Card
6 pages
R Reference Guide for Programmers
No ratings yet
R Reference Guide for Programmers
6 pages
Introduction To R: Nihan Acar-Denizli, Pau Fonseca
No ratings yet
Introduction To R: Nihan Acar-Denizli, Pau Fonseca
50 pages
DP Unit1 Notes
No ratings yet
DP Unit1 Notes
18 pages
R Functions
No ratings yet
R Functions
8 pages
Apply Functions With Purrr::: Cheat Sheet
No ratings yet
Apply Functions With Purrr::: Cheat Sheet
2 pages
R Commands
No ratings yet
R Commands
18 pages
Lesson 7 - The Data Frame
No ratings yet
Lesson 7 - The Data Frame
7 pages
FDP Indoglobal Group of Colleges: 27 April To 1 May R Programming Language Assignment Submission
No ratings yet
FDP Indoglobal Group of Colleges: 27 April To 1 May R Programming Language Assignment Submission
12 pages
Basic R Dplyr Session 4 Demonstration
No ratings yet
Basic R Dplyr Session 4 Demonstration
18 pages
Practical 1 - Data Frame Manipulation - 072502
No ratings yet
Practical 1 - Data Frame Manipulation - 072502
16 pages
R Cheatsheet Base R
No ratings yet
R Cheatsheet Base R
2 pages
CH 03
No ratings yet
CH 03
42 pages
R BasicCommands
No ratings yet
R BasicCommands
5 pages
R-Lab p-4,2,1
No ratings yet
R-Lab p-4,2,1
12 pages
Lab 02 - Compound Data Structures
No ratings yet
Lab 02 - Compound Data Structures
12 pages

DSCI 100 Cheat Sheet

Uploaded by

DSCI 100 Cheat Sheet

Uploaded by

Week1:

print (data_frame, n) <- print the first set of rows

nrow (data_frame) <- knows how many rows there are

select (data_frame, column1, ) <- filter out columns.

read_csv (“data/filename”) <- commas

dbConnect(dbConnect(RSQLite::SQLite(), "data/can_lang.db") <- reads from SQLite database.

dbConnect(RPostgres::Postgres(), dbname = "xxx",

ggplot(faithful, aes(x = …, y = …)) + geom_histogram(bins = x) <- function for histogram

collect (data_frame) <- Used to download from the database.

make.names(colnames(data_frame))<- Used to remove white space.

plot_grid(plot1, plot2, ncol = 1) <- Combine two plots.

write_csv(data_frame, file/data_frame.csv) <- Used to put a dataset into csv file.

col_names = FALSE<- Used when there are no column names.

rows = observations, columns = variables, cells = values

across(col1:col2)<- allows you to apply function(s) to multiple columns

filter (col == “row”)<-subsets rows of a data frame

mutate(new_name = conditions)<- adds or modifies columns in a data frame

rowwise ()<- applies functions across columns within one row

separate(data_frame, col = (name of the col we need to split), into = c(“newcolnames”,

select(data_frame, col_name)<- subsets columns of a data frame

facet_wrap(~ factor(Month, levels = c("Jan","Feb","Mar","Apr","May","Jun",

arrange(desc/asce(value)) <- arrange functions in desc/asce orders.

arrange() <-reorder the rows of a data frame/table by using column names.

slice () <- lets you index rows by their (integer) locations.

When we do -name it means taking that column or row out!!!

!= <- filter rows that are not equal to something.

ggplot(data_frame, aes(x = , y = )) + geom_point(aes(colour = Base on what?, shape =Base on

ggplot(data_frame, aes(x = , y = )) + geom_bar(stat = "identity") + xlab("xt") + ylab("y") +

semi_join(data_frame)<- Gives the intersection of two data frames.

ggplot(marks_df, aes(x=, fill=y)) + geom_histogram() + facet_grid(rows=vars(y))<- Used to

You might also like