You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: 1_DATASCITOOLBOX/Data Scientists Toolbox Course Notes.Rmd
+12-11Lines changed: 12 additions & 11 deletions
Original file line number
Diff line number
Diff line change
@@ -2,15 +2,16 @@
2
2
title: "Data Scientist’s Toolbox Course Notes"
3
3
author: "Xing Su"
4
4
output:
5
+
pdf_document:
6
+
toc: yes
7
+
toc_depth: 3
5
8
html_document:
6
9
highlight: pygments
7
10
theme: spacelab
8
11
toc: yes
9
-
pdf_document:
10
-
toc: yes
11
-
toc_depth: 3
12
12
---
13
13
14
+
$\pagebreak$
14
15
15
16
## CLI (Command Line Interface)
16
17
@@ -34,21 +35,21 @@ output:
34
35
*`move <file> <directory>` = move file to directory
35
36
*`move <fileName> <newName>` = rename file
36
37
*`echo` = print arguments you give/variables
37
-
*`date` = print current date
38
+
*`date` = print current date
38
39
39
40
40
41
41
42
## GitHub
42
43
43
-
***Workflow**
44
+
***Workflow**
44
45
1. make edits in workspace
45
46
2. update index/add files
46
-
3. commit to local repo
47
+
3. commit to local repo
47
48
4. push to remote repository
48
49
*`git add .` = add all new files to be tracked
49
50
*`git add -u` = updates tracking for files that are renamed or deleted
50
51
*`git add -A` = both of the above
51
-
****Note**: `add` is performed before committing*
52
+
****Note**: `add` is performed before committing*
52
53
*`git commit -m "message"` = commit the changes you want to be saved to the local copy
53
54
*`git checkout -b branchname` = create new branch
54
55
*`git branch` = tells you what branch you are on
@@ -68,7 +69,7 @@ output:
68
69
69
70
## R Packages
70
71
71
-
* Primary location for R packages --> CRAN
72
+
* Primary location for R packages $\rightarrow$ CRAN
72
73
*`available.packages()` = all packages available
73
74
*`head(rownames(a),3)` = returns first three names of a
74
75
*`install.packages("nameOfPackage")` = install single package
@@ -83,7 +84,7 @@ output:
83
84
84
85
## Types of Data Science Questions
85
86
86
-
* in order of difficulty: ***Descriptive***-->***Exploratory***-->***Inferential***-->***Predictive***-->***Causal***-->***Mechanistic***
87
+
* in order of difficulty: ***Descriptive***$\rightarrow$***Exploratory***$\rightarrow$***Inferential***$\rightarrow$***Predictive***$\rightarrow$***Causal***$\rightarrow$***Mechanistic***
87
88
***Descriptive analysis** = describe set of data, interpret what you see (census, Google Ngram)
88
89
***Exploratory analysis** = discovering connections (correlation does not = causation)
89
90
***Inferential analysis** = use data conclusions from smaller population for the broader group
@@ -101,7 +102,7 @@ output:
101
102
***Big data** = now possible to collect data cheap, but not necessarily all useful (need the right data)
102
103
103
104
## Experimental Design
104
-
* Formulate you question in advance
105
+
* Formulate you question in advance
105
106
***Statistical inference** = select subset, run experiment, calculate descriptive statistics, use inferential statistics to determine if results can be applied broadly
Copy file name to clipboardExpand all lines: 2_RPROG/R Programming Course Notes.Rmd
+13-13Lines changed: 13 additions & 13 deletions
Original file line number
Diff line number
Diff line change
@@ -19,7 +19,7 @@ $\pagebreak$
19
19
* 1988 rewritten in C (version 3 of language)
20
20
* 1998 version 4 (what we use today)
21
21
***History of S**
22
-
* Bell labs --> insightful --> Lucent --> Alcatel-Lucent
22
+
* Bell labs $\rightarrow$ insightful $\rightarrow$ Lucent $\rightarrow$ Alcatel-Lucent
23
23
* in 1998, S won the Association for computing machinery’s software system award
24
24
***History of R**
25
25
* 1991 created in New Zealand by Ross Ihaka & RobertGentleman
@@ -105,7 +105,7 @@ $\pagebreak$
105
105
106
106
### Vectors and Lists
107
107
***atomic vector** = contains one data type, most basic object
108
-
*`vector <- c(value1, value2, …)` = creates a vector with specified values
108
+
*`vector <- c(value1, value2, ...)` = creates a vector with specified values
109
109
*`vector1*vector2` = element by element multiplication (rather than matrix multiplication)
110
110
* if the vectors are of different lengths, shorter vector will be recycled until the longer runs out
111
111
* computation on vectors/between vectors (`+`, `-`, `==`, `/`, etc.) are done element by element by default
@@ -122,7 +122,7 @@ $\pagebreak$
122
122
*`as.character(list)` = converts list into a character vector
123
123
***implicit coercion**
124
124
* matrix/vector can only contain one data type, so when attempting to create matrix/vector with different classes, forced coercion occurs to make every element to same class
125
-
* *least common denominator* is the approach used (basically everything is converted to a class that all values can take, numbers --> characters) and *no errors generated*
125
+
* *least common denominator* is the approach used (basically everything is converted to a class that all values can take, numbers $\rightarrow$ characters) and *no errors generated*
126
126
* coercion occurs to make every element to same class (implicit)
127
127
- `x <- c(NA, 2, "D")` will create a vector of character class
128
128
*`list()` = special vector wit different classes of elements
@@ -131,15 +131,15 @@ $\pagebreak$
131
131
***logical vectors** = contain values `TRUE`, `FALSE`, and `NA`, values are generated as result of logical conditions comparing two objects/values
132
132
*`paste(characterVector, collapse = " ")` = join together elements of the vector and separating with the `collapse` parameter
133
133
*`paste(vec1, vec2, sep = " ")` = join together different vectors and separating with the `sep` parameter
134
-
****Note**: vector recycling applies here too*
134
+
****Note**: vector recycling applies here too*
135
135
*`LETTERS`, `letters`= predefined vectors for all 26 upper and lower letters
136
136
*`unique(values)` = returns vector with all duplicates removed
137
137
138
138
### Matrices and Data Frames
139
139
*`matrix` can contain **only 1** type of data
140
140
*`data.frame` can contain **multiple**
141
141
*`matrix(values, nrow = n, ncol = m)` = creates a n by m matrix
142
-
* constructed **COLUMN WISE**--> the elements are placed into the matrix from top to bottom for each column, and by column from left to right
142
+
* constructed **COLUMN WISE**$\rightarrow$ the elements are placed into the matrix from top to bottom for each column, and by column from left to right
143
143
* matrices can also be created by adding the dimension attribute to vector
144
144
*`dim(m) <- c(2, 5)`
145
145
* matrices can also be created by binding columns and rows
<li>in 1998, S won the Association for computing machinery’s software system award</li>
156
156
</ul></li>
157
157
<li><strong>History of R</strong>
@@ -269,7 +269,7 @@ <h3>Vectors and Lists</h3>
269
269
<ul>
270
270
<li><strong>atomic vector</strong> = contains one data type, most basic object
271
271
<ul>
272
-
<li><code>vector <- c(value1, value2, …)</code> = creates a vector with specified values</li>
272
+
<li><code>vector <- c(value1, value2, ...)</code> = creates a vector with specified values</li>
273
273
<li><code>vector1*vector2</code> = element by element multiplication (rather than matrix multiplication)
274
274
<ul>
275
275
<li>if the vectors are of different lengths, shorter vector will be recycled until the longer runs out</li>
@@ -297,7 +297,7 @@ <h3>Vectors and Lists</h3>
297
297
<ul>
298
298
<li>matrix/vector can only contain one data type, so when attempting to create matrix/vector with different classes, forced coercion occurs to make every element to same class
299
299
<ul>
300
-
<li><em>least common denominator</em> is the approach used (basically everything is converted to a class that all values can take, numbers –> characters) and <em>no errors generated</em></li>
300
+
<li><em>least common denominator</em> is the approach used (basically everything is converted to a class that all values can take, numbers <span class="math">\(\rightarrow\)</span> characters) and <em>no errors generated</em></li>
301
301
<li>coercion occurs to make every element to same class (implicit)</li>
302
302
<li><code>x <- c(NA, 2, "D")</code> will create a vector of character class</li>
303
303
</ul></li>
@@ -311,7 +311,7 @@ <h3>Vectors and Lists</h3>
311
311
<li><code>paste(characterVector, collapse = " ")</code> = join together elements of the vector and separating with the <code>collapse</code> parameter</li>
312
312
<li><code>paste(vec1, vec2, sep = " ")</code> = join together different vectors and separating with the <code>sep</code> parameter
313
313
<ul>
314
-
<li><em><strong>Note</strong>: vector recycling applies here too</em></li>
314
+
<li><em><strong>Note</strong>: vector recycling applies here too</em></li>
315
315
<li><code>LETTERS</code>, <code>letters</code>= predefined vectors for all 26 upper and lower letters</li>
316
316
</ul></li>
317
317
<li><code>unique(values)</code> = returns vector with all duplicates removed</li>
@@ -324,7 +324,7 @@ <h3>Matrices and Data Frames</h3>
324
324
<li><code>data.frame</code> can contain <strong>multiple</strong></li>
325
325
<li><code>matrix(values, nrow = n, ncol = m)</code> = creates a n by m matrix
326
326
<ul>
327
-
<li>constructed <strong>COLUMN WISE</strong> –> the elements are placed into the matrix from top to bottom for each column, and by column from left to right</li>
327
+
<li>constructed <strong>COLUMN WISE</strong> <span class="math">\(\rightarrow\)</span> the elements are placed into the matrix from top to bottom for each column, and by column from left to right</li>
328
328
<li>matrices can also be created by adding the dimension attribute to vector
329
329
<ul>
330
330
<li><code>dim(m) <- c(2, 5)</code></li>
@@ -413,7 +413,7 @@ <h3>Arrays</h3>
413
413
<li><code>data</code> = data to be stored in array</li>
<li><code>factorVar1, factorVar1</code> = factor variables to split the data by</li>
779
779
<li><em><strong>Note</strong>: order matters here in terms of how to break down the data </em></li>
780
780
<li><code>function</code> = what is applied to the subsets of data, can be sum/mean/median/etc</li>
781
-
<li><code>na.rm = TRUE</code> –> removes NA values</li>
781
+
<li><code>na.rm = TRUE</code> <span class="math">\(\rightarrow\)</span> removes NA values</li>
782
782
</ul></li>
783
783
</ul>
784
784
</div>
@@ -798,10 +798,10 @@ <h2>Simulation</h2>
798
798
</ul></li>
799
799
<li>Each probability distribution functions usually have 4 functions associated with them:
800
800
<ul>
801
-
<li><code>r***</code> function (for “random”) –> random number generation (ex. <code>rnorm</code>)</li>
802
-
<li><code>d***</code> function (for “density”) –> calculate density (ex. <code>dunif</code>)</li>
803
-
<li><code>p***</code> function (for “probability”) –> cumulative distribution (ex. <code>ppois</code>)</li>
804
-
<li><code>q***</code> function (for “quantile”) –> quantile function (ex. <code>qbinom</code>)</li>
801
+
<li><code>r***</code> function (for “random”) <span class="math">\(\rightarrow\)</span> random number generation (ex. <code>rnorm</code>)</li>
802
+
<li><code>d***</code> function (for “density”) <span class="math">\(\rightarrow\)</span> calculate density (ex. <code>dunif</code>)</li>
803
+
<li><code>p***</code> function (for “probability”) <span class="math">\(\rightarrow\)</span> cumulative distribution (ex. <code>ppois</code>)</li>
804
+
<li><code>q***</code> function (for “quantile”) <span class="math">\(\rightarrow\)</span> quantile function (ex. <code>qbinom</code>)</li>
805
805
</ul></li>
806
806
<li>If <span class="math">\(\Phi\)</span> is the cumulative distribution function for a standard Normal distribution, then <code>pnorm(q)</code> = <span class="math">\(\Phi(q)\)</span> and qnorm(p) = <span class="math">\(\Phi^{-1}(q)\)</span>.</li>
807
807
<li><code>set.seed()</code> = sets seed for randon number generator to ensure that the same data/analysis can be reproduced</li>
@@ -948,7 +948,7 @@ <h2>Reading Tabular Data</h2>
948
948
<h3>Larger Tables</h3>
949
949
<ul>
950
950
<li><em><strong>Note</strong>: help page for read.table important</em></li>
951
-
<li>need to know how much RAM is required –> calculating memory requirements
951
+
<li>need to know how much RAM is required <span class="math">\(\rightarrow\)</span> calculating memory requirements
952
952
<ul>
953
953
<li><code>numRow</code> x <code>numCol</code> x 8 bytes/numeric value = size required in bites</li>
954
954
<li>double the above results and convert into GB = amount of memory recommended</li>
0 commit comments