Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Project for using a gated recurrent unit to improve detecting changes in variance for multivariate time series data.

Notifications You must be signed in to change notification settings

dpweix/gruMEWMA

Repository files navigation

README

Set Up

The repository is meant as an accompaniment to the paper Monitoring Covariance in Multivariate Time Series: Comparing Machine Learning and Statistical Approaches.

The original excel data file is bw30-navajo-raw-data.xlsx. The bw30-navajo.csv and bw30-navajo.rds files contain cleaned version of the data. The .csv file can be used for analysis in any programming language, whereas the .rds file includes additional formatting and can only be used in R. Our analysis is based on the bw30-navajo.rds file. Both bw30-navajo.csv and bw30-navajo.rds are created from bw30-navajo-raw-data.xlsx in the first section of the 1-case-study-data.R script.

To run the following scripts, first install the mlmewma package from https://github.com/dpweix/mlmewma.git. The gated recurrent unit (GRU) is run using the TensorFlow software in Python. This code can be found in the downloaded R package at mlmewma/inst/python/gru_functions.py. Whenever the path_py variable is included in a file, it is important that the absolute file path is provided. Otherwise, the Python code cannot be found by the package.

Once the path is determined, use source_python from the reticulate package to load in all required Python functions for the mlmewma package. Note that TensorFlow must be installed on the Python instance toward which reticulate points.

Additionally, these files make use of the here package to avoid needing absolute pathing for file locations, aside from gru_functions.py. In order for the here package to work, please open this repository as an R project. Alternatively, absolute pathing may be added to each file for a specific user.

library("reticulate")
# Custom package https://github.com/dpweix/mlmewma.git
library("mlmewma")

# Load GRU functions
path_py <- "~/git/mlmewma/inst/python/gru_functions.py"
source_python(path_py)

File Descriptions

File name Description
0-test-methods.R This short script can be used to confirm that the mlmewma package has been properly installed and the path to the Python file is correct.
1-case-study-data.R This script reads and cleans the data from the case study. The resulting clean data are saved as .rds files. Additionally, the breaks separating the training data (model), training data (estimate h), and testing data are defined here.
2-case-study-apply-methods.R This script fits the models and applies the methods to the data cleaned in file 1. The resulting model predictions, plotting statistics, and control limits can be saved.
3-case-study-figures.R This script creates all figures concerning the case study used in the paper. This file depends on the output from scripts 1 and 2.
4-broken-methods.R Do not run this script. This script only contains the function gen_sim_study_brk. This function runs the entire simulation study based on a choice of data structure. The return values for this function are the plotting statistic, run length, and control limit for each method type.
5-arl-study.R Do not run this script. This script imports gen_sim_study_brk and runs a single simulation with the chosen settings. The resulting run length and control limit are saved in a .rds file in the results folder.
6-sim-study.R This script runs the simulation study under conditions chosen at the top of the file. After the simulation settings are chosen, file 5 is repeatedly submitted as an Rstudio background job to encourage parallel running of the simulations.

Once all files are saved for each data generation structure, the ARL tables can be loaded in as data frames and converted to LaTeX tables.
7-visualize-study.R This script produces time series plots and run length histograms for all the different data generation structures and their respective faults.
8-visualize-study-3D.R This script generates 3D scatter plots for all the different data generation structures and their respective faults.

How to use this repository

This repository can be used to recreate the results in the paper or as a general guide for how to use the mlmewma package. This package is meant for multivariate statistical process monitoring of the covariance of multivariate time series data, particularly autocorrelated data with non-linear relationships between variables.

About

Project for using a gated recurrent unit to improve detecting changes in variance for multivariate time series data.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages