This project aims to share the python codes covered in LMU Data Analytics Course, in a clearly structured and carefully commented manner, for learning and communicating purpose.
The reason I put these two answers together is they can be answered in a same fact: Real-World Data is Messy. It comes in different formats, from different sources, and often contains missing values, inconsistencies, and problems you may never expected. But as the flip side of this coin, if we can handle the data source with the most evel nature, we have a huge advantage to understand phenomenon that others may not.
As previously suggested, the main focus of this course is data preparation. Dr. Mathias only used a fraction of lecture to cover the actual statistic inference results, perhaps the things matter are the data themselves. Enjoy~
The main course of Data Analytics is composed by 3 projects:
- Agriculture Datasets Merging
- Missing Value Filling with the Dataset Scraped from Clergy Database
- Canada Parliament Election Analysis with Dynamic Website Scraping and Text Processing
- Agriculture Datasets Merging
- Download the 46 subdatasets (the variable names in each subdatasets are inconsistent)
- Choose a list of interested variables (later use then to construct a DV called Farm_Performance)
- Pick the list of variables from each subdataset (You may choose different time period to work on)
- Merge the interested variables into one dataframe (name it "agriculture_df")
- Merge agriculture_df with the election_df (election_df can be downloaded from Moodle)
- Done
- Missing Value Filling with the Dataset Scraped from Clergy Database
- Scrape the Clergy Database website to get many html files
- Parse the html files and turn them into to CSV files
- Extract appointments data from the CSV files
- Merge the appointment data with england_population_df (england_population_df can be downloaded from Moodle)
- Predict the missing values in england_population_df with the appointment data
- Done
- Canada Parliament Election Analysis with Dynamic Website Scraping and Text Processing
- Scrape the Canada Parliament data from a dynamic website
- Get many CSV files and merge them for later usage (then we use people's speech to augment the data)
- Download raw speech data from Moodle
- Clean and prepare the speech text
- Extract features from the prepared speech text (using vectorizer and clustering algorithm)
- ( continue... )
This is still a working project, I will try my best to make the codes I share as systematic and easy-to-read as possible. If you find or have any questions, please concact me. Let's learn together~ email: [email protected]
Sincerely, Henry