- Intro to Data Science UW / Coursera
- Topics: Python NLP on Twitter API, Distributed Computing Paradigm, MapReduce/Hadoop & Pig Script, SQL/NoSQL, Relational Algebra, Experiment design, Statistics, Graphs, Amazon EC2, Visualization.
- Linear Algebra / Levandosky Stanford / Book
- Linear Programming (Math 407) University of Washington / Course
- Statistics Stats in a Nutshell / Book
- Forecasting: Principles and Practice Monash University / Book *uses R
- Problem-Solving Heuristics "How To Solve It" Polya / Book
- Coding the Matrix: Linear Algebra through Computer Science Applications Brown / Coursera
- Think Bayes Allen Downey / Book
-
Algorithms
-
Algorithms Design & Analysis I Stanford / Coursera
-
Algorithm Design Kleinberg & Tardos / Book
-
Databases
-
Introduction to Databases Stanford / Coursera
-
SQL Tutorial W3Schools / Tutorials <-- Currently doing this
-
Data Mining
-
Mining Massive Data Sets Stanford / Book
-
Mining The Social Web O'Reilly / Book
-
Introduction to Information Retrieval Stanford / Book
-
Machine Learning
-
Machine Learning / Ng Stanford / Coursera
-
A Course in Machine Learning / Hal Daumé III UMD Online Book
-
Programming Collective Intelligence O'Reilly / Book
-
Statistics The Elements of Statistical Learning
-
Probabilistic Graphical Models
-
Probabilistic Programming and Bayesian Methods for Hackers Github / Tutorials
-
PGMs / Koller Stanford / Coursera
-
Natural Language Processing
-
NLP with Python O'Reilly / Book
-
Analysis
-
Python for Data Analysis O'Reilly / Book
-
Big Data Analysis with Twitter UC Berkeley / Lectures
-
Social and Economic Networks: Models and Analysis / Stanford / Coursera
-
Information Visualization "Envisioning Information" Tufte / Book
-
Python (Learning)
-
New To Python: Learn Python the Hard Way, Google's Python Class
-
Python (Libraries)
-
Basic Packages Python, virtualenv, NumPy, SciPy, matplotlib and IPython
-
Data Science in iPython Notebooks (Linear Regression, Logistic Regression, Random Forests, K-Means Clustering)
-
Bayesian Inference | pymc
-
Labeled data structures objects, statistical functions, etc pandas (See: Python for Data Analysis)
-
Python wrapper for the Twitter API twython
-
Tools for Data Mining & Analysis scikit-learn
-
Network Modeling & Viz networkx
-
Natural Language Toolkit NLTK
- Toy Data Ideas
- Capstone Analysis of Your Own Design; Quora's Idea Compendium
- Healthcare Twitter Analysis Coursolve & UW Data Science
- Coursera
- Khan Academy
- Wolfram Alpha
- Wikipedia
- Quora
- Kindle .mobis
- Great PopSci Read: The Signal and The Noise Nate Silver
- Zipfian Academy's List of Resources
- A Software Engineer's Guide to Getting Started w Data Science
- Data Scientist Interviews Metamarkets
This is an introduction geared toward those with at least a minimal understanding of programming, and (perhaps obviously) an interest in the components of Data Science (like statistics and distributed computing). Out of personal preference and need for focus, I geared the original curriculum toward Python tools and resources, so I've explicitly marked when resources use other tools to teach conceptual material (like R)
Please Share and Contribute Your Ideas -- it's Open Source!
Here's my transcript; Please showcase your own on the wiki!