- Introduction: Motivation, software installation, and data visualization [Slides]
- Version control with Git(Hub) [Slides]
- Learning to love the shell [Slides]
- R language basics [Slides]
- Data wrangling & tidying: (1) Tidyverse [Slides] and (2) data.table [Slides]
- Webscraping: (1) Server-side and CSS [Notebook]
- Webscraping: (2) Client-side and APIs [Notebook]
- Regression analysis in R [Notebook]
- Spatial analysis in R [Notebook]
- Functions in R: (1) Introductory concepts [Notebook]
- Functions in R: (2) Advanced concepts [Notebook]
- Parallel programming [Notebook]
- Docker [rOpenSci tutorial]
- Cloud computing with Google Compute Engine [Notebook]
- High performance computing (UO Talapas cluster) [Slides from Nick Maggio guest lecture.]
- Databases: SQL(ite) and BigQuery [Notebook]
- Spark [Notebook]
- Machine learning: (1)
- Machine learning: (2)
This is a graduate course taught by Grant McDermott at the University of Oregon. Here is the course description, taken from the syllabus:
This seminar is targeted at economics PhD students and will introduce you to the modern data science toolkit. While some material will likely overlap with your other quantitative and empirical methods courses, this is not just another econometrics course. Rather, my goal is bring you up to speed on the practical tools and techniques that I feel will most benefit your dissertation work and future research career. This includes seemingly mundane skills, generally excluded from the core graduate curriculum, which are nevertheless essential to any scientific project. We will cover topics like version control (Git) and project management; data acquisition, cleaning and visualization; efficient programming; and tools for big data analysis (e.g. relational databases, cloud computation and machine learning). In short, we will cover things that I wish someone had taught me when I was starting out in graduate school. While I will occasionally draw on examples from own research (environmental economics), the tools and methods apply broadly. Students from other fields and specialisations are thus welcome to register.
Please do read the rest of the syllabus before you go through the lectures. This will detail software requirements and installation, and give you a better sense of the full aims and scope of the course. I also have an "FAQ" section at the end that covers frequently asked questions (or, at least, potentially asked questions). Speaking of which, here follow answers to some questions that are more specifically related to this repo.
Please note that this is a work in progress, with new material being added every week.
If you just want to read the lecture slides or HTML notebooks in your browser, then you should simply scroll up to the Lecture outline and quicklinks section at the top of this page. Completed lectures will be hyperlinked as soon as they have been added. Remember to check back in regularly to get any updates. Or, you can watch or star the repo to get notified automatically.
If you actually want to run the analysis and code on your own system (highly recommended), then you will need to download the material to your local machine. The best way to do this is to clone the repo via Git and then pull regularly to get updates. Please take a look at these slides if you are unfamiliar with Git or are unsure how to do any of that. Once that's done, you will find each lecture contained in a numbered folder (e.g. 01-intro). The lectures themselves are written in R Markdown and then exported to HMTL format. Click on the HTML files if you just want to view the slides or notebooks.
Please open a new issue. Better yet, please fork the repo and submit an upstream pull request. I'm very grateful for any contributions, but may be slow to respond while this course is still be developed. Similarly, I am unlikely to help with software troubleshooting or conceptual difficulties for non-enrolled students. Others may feel free to jump in, though.
Sure. That's partly why I have made everything publicly available. I only ask two favours. 1) Please let me know (email/Twitter) if you do use material from this course, or have found it useful in other ways. 2) An acknowledgment somewhere in your own syllabus or notes would be much appreciated.
Possibly. Please contact me if you would like to discuss further.
Depends on a lot things and I'm too time constrained right now... but I'm thinking about it. Preliminary working title: "Data science for economists (and other animals)".
The material in this repository is made available under the MIT license.