NYC
Data
Science
Academy
12-Week
Data
Science
Bootcamp
C urriculum
Week
1
Data
Science
Toolkit
Linux,
Git,
Bash,
and
SQL
Data
Science
with
R
Data
Analytics
Part
I
Linux
system
o Introduce
Linux
environment
o Learn
Linux
commands
o IO
redirection
and
Pipe
o Introduce
server-side
Linux
usage
Git
o Introduce
modern
source
code
management
o Learn
common
git
operations
o Setup
github
and
personal
portfolio
page
Other
server
related
topics
o Text
editors
and
IDEs
o ssh:
how
to
communicate
with
a
remote
server
o Linux
environment
variables
SQL
o Introduction
to
relational
database
o Introduction
to
structured
query
language
o SQL
major
commands
and
examples
Programming
foundation
in
R
I
o Syntax
o Data
object:
Vectors,
Matrices,
Data
Frames,
and
Lists
o Common
functions
o Rstudio
environment
and
package
management
o Local
data
input/output
o Introduction
to
R
data
visualization
Programming
foundation
in
R
II
o Data
sorting
and
merging
o String
manipulation
o Dates
and
times
o Connecting
to
an
external
database
Week
2
Data
Science
with
R
Data
Analytics
Part
II
Data
manipulation
with
dplyr
o Tables
in
R
o Join
o Subset
o Advanced
manipulations
with
dplyr
Data
Visualization
with
"ggplot2"
Updated
June
29,
2016
NYC
Data
Science
Academy
12-Week
Data
Science
Bootcamp
C urriculum
o Histogram
o Point
graphics
o Columnar
graphics
o Line
charts
o Pie
charts
o Box
plots
o Scatter
plots
o Visualizing
multivariate
data
o Matrix-based
visualizations
o Maps
Introduction
to
Shiny
o Shiny
introduction
o Design
the
User-interface
o Control
widgets
o Build
reactive
output
o Use
data
table
in
Shiny
Apps
o Use
R
scripts,
data
and
packages
o UI
and
server
for
the
App
o Make
Shiny
perform
quickly
o Matrix-based
visualizations
o Use
reactive
expressions
o Share
and
deploy
Shiny
apps
Lab:
Moneyball
Project
1
Due:
Exploratory
Data
Visualization
Week
3
Data
Science
with
Python
-
Data
Analytics
Part
I
Python
Programming
Language
I
o
Simple
Values
and
Expressions
o
Functions
o
Lists
o
Conditionals
o
Functional
programming:
map,
filter
and
reduce
Python
Programming
Language
II
o
String
operations
o
File
input/output
and
searching
o
Data
Structures:
Mutating
operations
on
Lists
Tuples,
sets
and
dictionaries
Python
Programming
Language
III
o
Control
flows
Updated
June
29,
2016
Errors
and
exceptions
Object-oriented
programming
Web
scraping
o
Regular
expression
o
HTML,
beautiful
soup
and
scrapy
o
NoSQL
and
MongoDB
Week
4
Data
Science
with
Python
Data
Analytics
Part
II
Numpy
and
Scipy
o
Basic
data
structure
and
operations
o
Matrices
and
linear
algebra
o
Stats
module
o
Random
Sampling
Pandas
o
Series
and
data
frame
o
I/O
of
pandas
data
frame
o
Concatenation
and
merge
o
Arithmetic,
drop,
apply
and
describe
o
Selection
and
filter
o
Missing
values
o
Grouping
and
aggregation
o
Time
series
o
Interacting
with
data
base
Matplotlib
and
Seaborn
o
Basic
plots
o
Statistical
plots:
Scatter
plots
Histogram
Boxplot
Barchart
o
Multiple
figures
o
Advanced
plots
with
seaborn
Python
lab:
linear
regression
from
scratch
Project
2
Due:
R
Shiny
Interactive
Applications
Week
5
Data
Science
with
R
-
Machine
Learning
Part
I
Foundations
of
Statistics
o
Descriptive
Statistics
o
o
Updated
June
29,
2016
NYC
Data
Science
Academy
12-Week
Data
Science
Bootcamp
C urriculum
NYC
Data
Science
Academy
12-Week
Data
Science
Bootcamp
C urriculum
Measures
of
Centrality
Measures
of
Variability
Frequency,
Proportion
&
Contingency
Tables
Correlation
o
Hypothesis
Testing
One
Sample
t-test
Two
Sample
t-test
F-test
One-way
ANOVA
X2
Test
of
Independence
o
Introduction
to
Machine
Learning
Supervised
Learning
Regression
Classification
Unsupervised
Learning
Clustering
Dimension
Reduction
Missingness
&
Imputation
o
Types
of
Missingness
MCAR
MAR
MNAR
o
Basic
Methods
of
Imputation
Mean
Value
Imputation
Simple
Random
Imputation
Regression
Prediction
o
K-Nearest
Neighbors
Voronoi
Tessellations
KNN
for
Classification
KNN
for
Regression
Distance
Measures
Linear
Regression
I
o
Simple
Linear
Regression
From
a
Mathematical
Standpoint
Accuracy
of
the
Coefficient
Estimates
Performing
Hypothesis
Tests
Constructing
Confidence
Intervals
o
Assumptions
&
Diagnostics
o
Transformations
Power
Transformation
Box-Cox
Transformation
Updated
June
29,
2016
The
Coefficient
of
Determination
R2
Linear
Regression
II
o
Multiple
Linear
Regression
From
a
Mathematical
Standpoint
o
Assumptions
&
Diagnostics
o
Potential
Problems
o
Research
Questions
o
Variable
Selection
o
Factors
o
Interactions
o
Higher-Order
Terms
o
Week
6
Data
Science
with
R
-
Machine
Learning
Part
II
Lab:
Building
Bridges
Generalized
Linear
Models
o
Logistic
Regression
The
Curse
of
Dimensionality
o
Ridge
Regression
o
Lasso
Regression
o
Cross-Validation
o
Bias/Variance
Tradeoff
o
Density
o
Principal
Component
Analysis
The
Curse
of
Dimensionality
o
Density
o
Principal
Components
Analysis
Guest
Lecture:
Dataiku
Part
I
Project
3
Due:
Python
Web
Scraping
Week
7
Data
Science
with
R
-
Machine
Learning
Part
III
Classification
o
Feature
Selection
o
Support
Vector
Machines
o
Decision
Trees
o
Pruning/Purity/Entropy/GINI
o
Random
Forests
o
Bagging
o
Boosting
Cluster
Analysis
Updated
June
29,
2016
NYC
Data
Science
Academy
12-Week
Data
Science
Bootcamp
C urriculum
o
o
o
NYC
Data
Science
Academy
12-Week
Data
Science
Bootcamp
C urriculum
K-Means
Clustering
Agglomerative
Clustering
Hierarchical
Clustering
Neural
Networks
Week
8
Data
Science
with
R
-
Machine
Learning
Part
IV
Introduction
to
Natural
Language
Processing
Case
Study:
Spam
Detection
Association
Rules
o
Market
Basket
Analysis
Nave
Bayes
Analysis
Introduction
to
Natural
Language
Processing
o
Creating
corpus:
stemming
and
lemmatization
o
POS
tag
and
chunking
o
Text
classification
Time
Series
Analysis
o
Smoothing
o
Seasonal
Decomposition
o
ARIMA
Guest
Lecture:
Dataiku
Part
II
Week
9
Data
Science
with
Python
-
Machine
Learning
Machine
Learning
Recap
/
Linear
Regression
o
Introduction
to
scikit
learn
o
Simple
linear
regression
o
Multiple
linear
regression
o
Stats
module
Classification
part
I
o
Logistic
regression
o
Discriminant
analysis
o
Nave
Bayes
Model
Selection
o
Cross-validation
o
Bootstrap
o
Feature
selection
o
Regularization
o
Grid
search
Classification
part
II
Updated
June
29,
2016
NYC
Data
Science
Academy
12-Week
Data
Science
Bootcamp
C urriculum
o
Support
vector
machine
o
Decision
tree
o
Random
forest
Unsupervised
learning
o
Principal
Components
Analysis
o
Kmeans
and
Hierarchical
Clustering
Project
4
Due:
Machine
Learning
Project
(It
can
be
a
Kaggle
competition,
a
hiring
partner
project
or
a
non-profit
project
from
our
partners)
Week
10
Big
Data
Parallel
processing:
Introduction
to
Hadoop
and
MapReduce
o
HDFS
o
MapReduce
Conceptual
framework
Streaming
and
Python
o
Examples
and
lab
work
MapReduce
design
pattern
o
Filtering
patterns
Simple
filtering
Top
N
o
Summarization
patterns
Numerical
summarizations
Inverted
Index
summarizations
Apache
Hive:
o
Databases
for
Hadoop
o
Hive
Select
Joins
o
Compiling
HiveQL
to
MapReduce
o
Technical
aspects
of
Hive
o
Extending
Hive
with
TRANSFORM
Spark
o
Basics
concepts
RDDs,
transformations
and
actions
PairRDDs
o
Examples
Wordcount
Mean
and
variance
Updated
June
29,
2016
NYC
Data
Science
Academy
12-Week
Data
Science
Bootcamp
C urriculum
Week
11
Big
Data
and
Algorithms
Spark
MLlib
Amazon
Web
Service
Introduction
to
Algorithms
o
Analysis
of
algorithms:
big-O
notation
Sorting
o
Elementary
sorts
o
Merge
sorts
o
Quick
sorts
Searching
o
Linear
search
o
Binary
search
o
Hash
tables
Machine
Learning
Theory
Defense
Practice
Week
12
Capstone
Project
Presentations
and
Review
Machine
learning
theory
defense
practice
SQL
code
review
R
code
review
Python
code
review
From
the
beginning
of
Bootcamp,
you
will
work
on
hands-on
projects.
Now
your
Capstone
Project
lets
you
create
your
own
data
product
that
showcases
your
interests
and
talents.
Students
are
free
to
use
anything
covered
in
class
on
this
project.
Project
5
Due:
Capstone
Project
Updated
June
29,
2016