The files are expected to be in the working directory, in the same directories as in the zip file. So the features.txt in the working directory, all other files or in the test or in the train folders
The script reads the features file. This will be to assign column names when reading the data files
What it does for both the training and test data:
- Read the subject file and assign the column name "subject"
- Read the activity file and assign the column name "activity"
- Read the data file and assign the column names coming from the features file
Now, for both the training and test files, it will bind the columns of all 3 files."subject" will be column 1, "activity" column 2 and then the data"
To get to the final big file, both training and test file will be combined
Currently, the column names still contain several symbols
The column names are transformed to lower cases and the brackets and hyphens removed
From all columns, only those containing "mean" and "std" are maintained for further analysis
To allow for easier grouping and summarizing the data, the dataset is transformed into a data.table
Now, the data is grouped for each activity for each subject and the mean of these values is calculated
To improve readability, the tidy dataset is ordered by subject and activity
The final set consists of 180 rows and 88 columns (2 for subject and activity, 86 containing the averages).