Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
33 views3 pages

DSCI 100 Cheat Sheet

UBC DSCI 100 Cheat Sheet

Uploaded by

tholee0617
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views3 pages

DSCI 100 Cheat Sheet

UBC DSCI 100 Cheat Sheet

Uploaded by

tholee0617
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Week1:

print (data_frame, n) <- print the first set of rows

nrow (data_frame) <- knows how many rows there are

filter (data_frame, column = (condition of row)) <- filter our specific rows in this column that satisfy
certain conditions.
filter (data_frame, (condition of row)) <- filter out specific rows that satisfy certain conditions.

select (data_frame, column1, ) <- filter out columns.

mutate(data_frame, new column name = conditions) <- creates new columns based on certain
conditions.

ggplot(data_frame, aes (x-axis, y-axis)) + geom_(typeofgraph) () + xlab (“”) + ylab (“”) + theme(text
= element_text(size=n))<- used to make plots, xlab, ylab for labels, and theme for changing label font
size.

Week2:

Arguments:
file <- This is the file name, path to a file, or URL.
delim <- Character that separates columns in file.
col_names <- Specifies whether or not the first row of data in your file are column labels. Also allows
you to create a vector that can be used to label columns.
skip <- Specifies the number of line which must be ignored because they contain metadata.

CSV vs CSV2
read. csv is used for data where commas are used as separators and periods are used as decimals
while read. csv2 is for data where semicolons are used as separators and commas are used as
decimals.

metadata <- Metadata is data about data. This refers to not the data itself, but rather to any
information that describes some aspect of the data.

read_csv (“data/filename”) <- commas

read_delim(file = “data/filename”, delim = “type of file (,/;/tab)”) <- Can read all kinds of files,
comma/semicolon/tab/…

read_excel(path = x, sheet = int) <- Used to read excel datasets. (Remember to end excel datasets
with .xls)

skip <- argument within read_* function, used to skip the first few lines that summarizes the data
information.

col_names <- argument within read_* function, character vector used to name the columns.

dbConnect(dbConnect(RSQLite::SQLite(), "data/can_lang.db") <- reads from SQLite database.

dbConnect(RPostgres::Postgres(), dbname = "xxx",


host = "xxx", port = 5432,
user = "xxx", password = "xxx") <-reads from PostgreSQL database.
dbListTables(conn_lang_data) <- Can get the names of all the database in the table using the
dbListTables function.

ggplot(faithful, aes(x = …, y = …)) + geom_histogram(bins = x) <- function for histogram

collect (data_frame) <- Used to download from the database.

download.file (url, destfile = “file_location”) <- Used to download file from internet.

make.names(colnames(data_frame))<- Used to remove white space.

plot_grid(plot1, plot2, ncol = 1) <- Combine two plots.

write_csv(data_frame, file/data_frame.csv) <- Used to put a dataset into csv file.

col_names = FALSE<- Used when there are no column names.

Week3:

vector<- An ordered collection of one, or more, values of the same data type.

list<- An ordered collection of one, or more, values of possibly different data types.

data frame<- A list of either vectors or lists of the same length, with column names. We typically use
a data frame to represent a data set.

rows = observations, columns = variables, cells = values

across(col1:col2)<- allows you to apply function(s) to multiple columns

filter (col == “row”)<-subsets rows of a data frame

group_by (col/row)<-Takes an existing data set and converts it into a grouped data set where
operations are performed “by group”.

mutate(new_name = conditions)<- adds or modifies columns in a data frame

map(row)<-Transforms the input by applying a function to each element and returning a vector the
same length as the input. For Columns.

pivot_longer (data_frame, cols = (cols we want to combine), names_to = (name of new col from
name of cols we want to combine), values_to = (name of new col comes from values of col we
want to combine)<- generally makes the data frame longer and narrower

pivot_wider(data_frame, names_from = (name of the col from which to take the variable
names), values_from = (the name of the col from which to take values))<-generally makes a data
frame wider and decreases the number of rows

rowwise ()<- applies functions across columns within one row

separate(data_frame, col = (name of the col we need to split), into = c(“newcolnames”,


“newcolnames”), sep = “/”)<- splits up a character column into multiple columns

select(data_frame, col_name)<- subsets columns of a data frame


summarize (name = function (mean))<- calculates summaries of inputs, and creates a new data
frame works similar to mutate.

%>%<- takes the output of one statement and makes it the input of another statement (makes the
code more human redable).

facet_wrap(~ factor(Month, levels = c("Jan","Feb","Mar","Apr","May","Jun",


"Jul","Aug","Sep","Oct","Nov","Dec"))) <- Used when trying to
combine variables together.
na.rm = TRUE <- Used to filter out NA values.

arrange(desc/asce(value)) <- arrange functions in desc/asce orders.

arrange() <-reorder the rows of a data frame/table by using column names.

slice () <- lets you index rows by their (integer) locations.

group_by + summarize<- Calculating summary statistics on one or more column(s) for each group.
It creates a new data frame—with one row for each group—containing the summary statistic(s) for
each column being summarized.

map_df (mean, na.rm = TRUE)<- Used to find the average of each column in the data set.

When we do -name it means taking that column or row out!!!

map_df() <- Iterate over the columns in an data frame and calculate a value for each column.

Week4:

is.na () <- find the values that are not equal to NA.

!= <- filter rows that are not equal to something.

ggplot(data_frame, aes(x = , y = )) + geom_point(aes(colour = Base on what?, shape =Base on


what? )) + labs(x = "", y = "", colour = "", shape = "")

facet_grid (cols/rows = vars (col1)) <- Used to separate one graph to two graphs to compare. A layer
to ggplot.

data_frame|> filter(st %in% c(“row1”, “row2”)) |> group_by(name) |> summarise(n = n()) |>
arrange(desc(n)) |> head(n = x) <- Used to find the top x restaurant in the west coast.

ggplot(data_frame, aes(x = , y = )) + geom_bar(stat = "identity") + xlab("xt") + ylab("y") +


theme(text = element_text(size = 20))<- Used to plot bar charts.

coord_flip ()<- Flip cartesian coordinates so that horizontal becomes vertical, and vertical, horizontal.

semi_join(data_frame)<- Gives the intersection of two data frames.

ggplot(marks_df, aes(x=, fill=y)) + geom_histogram() + facet_grid(rows=vars(y))<- Used to


create histograms in seperate panels (charts).

You might also like