| output |
|---|
html_document |
The purpose of this project is to demonstrate the ability to collect, work with, and clean a data set. The goal is to prepare tidy data that can be used for later analysis. The output of this project is a tidy data set using the run_analysis.R script for performing the analysis, and a code book that describes the variables. The data, and any transformations or work that performed to clean up the data called CodeBook.md. and this is the README.md, which explains how all of the scripts work and how they are connected.
-
The script first installs any package that be used for the data cleanning.
-
In order to clean the data, it downloaded the raw dataset from an online zipfile using variable "url" to define the path of the zip file.
-
filelist variable is used to store the set of filenames.
-
The code then read each file and bind records using cbind() after checking the number of records in each file are consistant in folder "test" and "train".
-
A column name variable is created "label_df", based on the number of columns required for the test and train dataset and assigned to merged dataset train_test_df.
-
As project requirements, the next step is to narrow down the train_test_df dataset to only inlcude feature variables contains "mean()" &"std()". grepl function is used to create narrow_df.
-
A merge function is used to provide descriptive activity name to the activity code in the narrow_df dataset and the new dataset is named tidy_df.
-
Finally to calculate the mean of all variables grouping by subject and activity code, aggregate function is used against the tidy_df and final result is saved under name mean_dataset and write.table function is used to create the mean_dataset in txt file.
Code book is included as part of the project file. It describes variables used in the analysis, include it's type and purpose.