TPers: Exploring Persistent Homology for Time-series with Total Persistence

Installation

I'm using Python 3.8. Running

pip install -r requirements.txt

might work. The only dependencies other than the usual (numpy, scipy, sklearn, matplotlib, pandas, tqdm) are ripser for fast persistence computations, persim for computing persistence images, and sklearn-som.

Usage

To do a simple viz of raw data from the default data/test set run

python main.py --plot input --show

To run and view (but not save) the default data/test set run

python main.py --preset --interact

Interaction may not work. I think you have to set your matplotlib backend to something like Qt5Agg. Try clicking on the TPers plot. I have

backend: Qt5Agg

in ~/.matplotlib/matplotlibrc.

To replicate the results detailed in the report (saving to figures/{DATASET}/{TESTSET}) call

python main.py --preset --som --plot input pre tpers --save --set {DATASET} --test {TESTSET}

For example,

python main.py --preset --som --plot input pre tpers --save --set SystemSLogs --test cpuhog

is the default behavior. Running the bash script

./mkfigs.sh

will run presets on all data/test sets in the data directory, generating the figures included in the report (hopefully).

A Note

The --som flag attempts to load a .pkl file containing a pre-trained self-organizing map (SOM) for the specified data/test set. I don't know if .pkl files will survive. New ones can be trained by running

python mksom.py --set {DATASET} --test {TESTSET}

The script trains a model using the training data (tr.log) file for the specified data/test set, tests it on the corresponding test data set (te.log), and plots the results against an existing model in cache/som_{DATASET}-{TESTSET}.pkl, if available. Pass anything (other than n or no) to override the existing model. If no model exists it just saves.

Arguments

Processing

Passing something like

--pre A B C

will run operations A, B, and C in order. Default behavior (--preset 0) is

--pre scale pca=4

Available operations are as follows:

scale: Min-Max scaling on features independently
scale=min: Min scaling only on features independently,
scale=all: Min-Max scaling on the whole dataset (min and max of all entries),
scale=min,all: Min scaling on the whole dataset (min of all entries),
diff: apply difference transform (discrete derivative) to each feature independently,
power: apply power transform to each feature independently,
detrend: detrend each feature independently,
ma: apply moving average to each feature independently. ma=w convolves with 2*2+1 point window,
pca: PCA transform. pca=n reduces to n principal compnents.

The same operations can be applied to the total persistence curve by passing them as arguments to --post. Note that this has no effect on kmeans prediction on persistence, but does affect prediction via threshold on tpers (see below).

Window

--length {n}: set the window length to n,
--overlap {w}: set the window overlap to w,

Transform

If none of --period, --fft, or --torus are passed persistence will be run on raw windowed data.

--period {t}: Period of cycle in the complex plane. Equal to the length of the window if passed without argument. If passed with argument and --torus, the complex data will be period specified will be provided to the torus transform (untested).
--fft: Run Fourier transform on each frame with blackmann window. If passed with --torus the complex frequency domain output of the Fourier transform will be passed as phase and amplitude to the torus transform (super cool, kinda works... sometimes).
--torus: Apply torus transform (warning don't do torus transform on more than two values/features without setting --nperm less than ~50, usually 20 works).
--exp {p}: Apply p as an exponent to all data, for fun. Executed before all other transforms.
--abs: Take the absolute value of all data, for science. Executed before all other transforms.

Persistence

The following arguments are passed to ripser for each frame.

--dim {d}: Maximum rips/persistence dimension,
--thresh {t}: Maximum distance to compute in the Rips complex,
--nperm {n}: Number of greedy permutations. Probably safe to set to 20 for all applications. Huge speedup., type=int, help='greedy permutations')
--metric {euclidean, manhattan, cosine}: Metric for Rips computation. Default: euclidean.

Total Persistence (almost depricated)

--invert {d,...}: Invert provided dimensions (multiply by -1. Inverted in the sum),
--entropy: Compute persistent entropy, for fun (and science),
--average: Compute average total persistence in each dimension,
--pmin {m}: Only include diagram features with total persistence at least m.

Program Arguments

Just run

python main.py -h

It's the same thing.

--data: Print available data/test sets,
--dir {DATA_DIR}: Data directory. Default: ./data,
--set {DATASET}: Dataset. Default: SystemSLogs.log,
--test {TESTSET}: Test set. Default: cpuhog.log,
--file {LOGFILE}: File name. Default: te.log,
--cache {CACHE}: Cache directory. Currently only for SOM models. Default ./cache,
--preset {?i}: Preset to run. Default for dataset provided if passed without argument,
--show-presets: Print available presets,
--values {COLUMN_NAME1 COLUMN_NAME2 ...}: Data values (features) to use,
--plot {input, pre, window, transform, persistence, tpers', post}: Modules to plot,
--nroc {n}: Number of points on ROC curve,
--frame {f}: Frame to plot (if window or transform passed to plot). For saving purposes. Warning untested,
--show: Show plot, otherwise it will just quit (if neither --save nor --interact is passed),
--save {?fdir}: Save plots to directory. Default: ./figures/{DATASET}/{TESTSET})
--predict {threshold,SOM,kmeans,minkmeans,maxkmeans}: Don't pass SOM. min/max kmeans are dumb. Threshold is just prediction by thresholding each feature.
--analyze {input, pre, persistence, tpers, post}: Modules to analyze. Pass --analyze {MODULE}={PREDICT} to override prediction type passed by --predict for a given module,
--aplot {input, pre, persistence, tpers, post}: Analyze and plot module,
--interact: Interact with terminal module. Useful for viewing data from individual frames for "framed" modules passed to --plot such as transform, window, and persistence. Default behavior is to plot persistence diagram. Very useful with transform passed to --plot.
--som: Compare with saved SOM model (in cache),
--lead {W}: SOM predict lead (anomaly pending) time. Default: 10,
--streak {s}: Streak of anomalies required to raise SOM alarm. Default: 3

System Specification

The dataset should be specified and included in main.py. The UBL data is included (?), and specified in ubl_data.py. At this time the user is required to specify

Available Data

AVAIL_DATA: Dictionary of available data/test sets,
AVAIL_VALUES: List of available values (features) for each data point/

Data Defaults

DIR: Directory containing data,
DATASET: Default dataset,
TESTSET:Default test set,
LOGFILE:Default test file,
VALUES: Default values (features).

Program Defaults

LENGTH: Transformation window length,
OVERLAP: Transformation window overlap,
DIM: Max persistence dimension,

Presets

PRESETS: List of argument presets,
PRESET_DICT: Data/test set presets (when --preset is passed without argument).

InputData class (todo)

Definition of an InputData object that extends TimeSeriesData and contains raw input data for a given data/test set.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TPers: Exploring Persistent Homology for Time-series with Total Persistence

Installation

Usage

A Note

Arguments

Processing

Window

Transform

Persistence

Total Persistence (almost depricated)

Program Arguments

System Specification

Available Data

Data Defaults

Program Defaults

Presets

InputData class (todo)

GLHF

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
cache		cache
data		data
tpers		tpers
.gitignore		.gitignore
README.md		README.md
main.py		main.py
mkfigs.sh		mkfigs.sh
mksom.py		mksom.py
requirements.txt		requirements.txt
ubl_data.py		ubl_data.py

shirtd/tpers

Folders and files

Latest commit

History

Repository files navigation

TPers: Exploring Persistent Homology for Time-series with Total Persistence

Installation

Usage

A Note

Arguments

Processing

Window

Transform

Persistence

Total Persistence (almost depricated)

Program Arguments

System Specification

Available Data

Data Defaults

Program Defaults

Presets

InputData class (todo)

GLHF

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages