Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
14 views4 pages

Week4 CheatSheet ModelDevelopment

This document serves as a cheat sheet for model development in R, detailing commands and their syntax for installing packages, loading libraries, and performing various types of regression analysis. It includes examples for functions such as lm(), filter(), ggplot(), and others, providing a quick reference for data manipulation and visualization. Additionally, it outlines methods for assessing model performance, including R-squared and mean squared error calculations.

Uploaded by

moonb4115
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views4 pages

Week4 CheatSheet ModelDevelopment

This document serves as a cheat sheet for model development in R, detailing commands and their syntax for installing packages, loading libraries, and performing various types of regression analysis. It includes examples for functions such as lm(), filter(), ggplot(), and others, providing a quick reference for data manipulation and visualization. Additionally, it outlines methods for assessing model performance, including R-squared and mean squared error calculations.

Uploaded by

moonb4115
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

CheatSheet - Model Development

Command Syntax Description Example


install.packages is used
install package install.packages("packagename") to install the packages install.packages("tidyverse")
from the R library.
library() Load the
load package library(packagename) library(tidyverse)
package from R library.
download.file(url, destfile, download.file() to
method, quiet = FALSE, mode = download the file locally download.file(url, destfile =
download.file "w",cacheOK = TRUE,headers = using the download.file() "lax_to_jfk.tar.gz")
NULL, …) function.
untar() is used to extract
files from a tar archive is
untar untar() untar("lax_to_jfk.tar.gz")
done with untar function
from the utils package.
Simple Linear
Regression
Linear Model lm(formula, data, subset, lm() is used to fit linear lm(arrdelayminutes ~
Function weights, na.action, method = models. It can be used to depdelayminutes, data =
"qr", model = TRUE, x = FALSE,
y = FALSE, qr = TRUE,
carry out regression, single aa_delays)
singular.ok = TRUE, contrasts = stratum analysis of
NULL, offset, …) variance and analysis of
covariance (although aov
may provide a more
convenient interface for
these).

formula an object of class


“formula” a symbolic
description of the model to
be fitted.

na.action a function which


indicates what should
happen when the data
contain NAs.

method the method to be


used; for fitting, currently
only method = “qr” is
supported; method =
“model.frame” returns the
model frame (the same as
with model = TRUE, see
below).

model, x, y, qr logicals. If
TRUE the corresponding
components of the fit (the
model frame, the model
matrix, the response, the
QR decomposition) are
returned.

singular If FALSE (the


default in S but not in R) a
singular fit is an error.
filter() function screens
filter(carrierDelay != "na",
filter filter() out observations based on reporting_airline == "aa")
values.
head(x) function returns
the first part of a vector,
head head(x) head(aa_delays)
matrix, table, data frame or
function.
summary() function is a
generic function used to
summary summary(model) produce result summaries summary(linear_model)
of the results of various
model fitting functions.
The function data.frame()
creates data frames, tightly
coupled collections of data.frame(depdelayminutes =
data.frame data.frame(object)
variables which share c(12, 19, 24))
many of the properties of
matrices and of lists.
The predict() function in R
is used to predict the predict(linear_model, newdata =
predict predict(object..) new_depdelay, interval =
values based on the input "confidence")
data.
Multiple Linear
Regression
In multiple regression we lm(arrdelayminutes ~
MLR model build a model having more depdelayminutes +
lm(y ~ x1+x2+x3...,data)
Function than one predictor variable lateaircraftdelay, data =
and one response variable. aa_delays)

The $ operator is used to


$ (dollar symbol) df$object extract or subset a specific mlr$coefficients
part of a data object.
Assessing Models
Visually
ggplot ggplot(df, aes(x, y, other ggplot is a plotting ggplot(aa_delays, aes(x =
aesthetics)) package that makes it depdelayminutes, y =
arrdelayminutes))
simple to create complex
plots from data in a data
frame.

data Default dataset to use


for plot. If not already a
data.frame, will be
converted to one by
fortify(). If not specified,
must be supplied in each
layer added to the plot.

mapping Default list of


aesthetic mappings to use
for plot. If not specified,
must be supplied in each
layer added to the plot.


Other arguments passed on
to methods. Not currently
used.

environment
DEPRECATED. Used
prior to tidy evaluation.
The function geom_point()
adds a layer of points to ggplot(data=null,aes(x,
geom_point geom_point() noisy.y)) + geom_point() +
your plot, which creates a geom_smooth(method = "lm")
scatterplot.
geom_smooth() for adding ggplot(data=null,aes(x,
geom_smooth geom_smooth(objects...) smoothed conditional noisy.y)) + geom_point() +
means / regression line. geom_smooth(method = "lm")
geom_segment() draws a
straight line between geom_segment(aes(xend =
geom_segment( mapping = NULL,
geom_segment depdelayminutes, yend =
data = NULL,...) points (x, y) and (xend, predicted), alpha = .2)
yend).
A theme with white ggplot(data=null,aes(x,
theme_bw(base_size = 12, noisy.y)) + geom_point() +
theme_bw base_family = "")
background and black geom_smooth(method = "lm") +
gridlines. theme_bw()
cor() computes the cor(aa_delays$depdelayminutes,
cor cor(object)
correlation coefficient. aa_delays$arrdelayminutes)
Polynomial
Regression
Polynomial Regression is a
form of linear regression in
which the relationship
Polynomial lm(y ~ poly(x, degree, raw =
between the independent lm(temp ~ poly(time, 4, raw =
regression function true)) true))
variable x and dependent
variable y is modeled as an
nth degree polynomial.
Assessing the
Model
r.squared() computes R
squared or adjusted R
squared for plm objects. It
allows to define on which
transformation of the data
the (adjusted) R squared is
to be computed and which
method for calculation is
used.

object an object of class


r.squared(object, model = NULL, plm,
R-squared type = c("cor", "rss", "ess"), summary(linear_model)$r.squared
dfcor = FALSE) model on which
transformation of the data
the R-squared is to be
computed. I

type indicates method


which is used to compute
R squared.

dfcor
if TRUE, the adjusted R
squared is computed.
mean() compute the mean
Mean Squared
mean(x, …) squared error regression mean(linear_model$residuals^2)
Error (MSE)
loss.
Author(s)
D.M. Naidu

Changelog
Date Version Changed by Change Description
2020-08-11 1.0 D.M. Naidu Initial Version

You might also like