See update below on selecting value of NLMS order for LFZip
LFZip (NLMS prediction mode) is now available on conda through the conda-forge channel. For the neural network prediction mode or to run from source, see the next section.
conda create --name lfzip_env
conda activate lfzip_env
conda config --add channels conda-forge
conda install lfzip
After the installation, LFZip (NLMS) can be run using the command lfzip-nlms. To install LFZip in a conda virtual environment, follow the instructions here.
Download:
git clone https://github.com/shubhamchandak94/LFZip.git
To set up virtual environment and dependencies (on Linux):
cd LFZip/src/
python3 -m venv env
source env/bin/activate
./install.sh
On macOS, you need gcc compiler for running BSC which is the entropy coder used in LFZip. For this, install gcc@9 using brew as follows:
brew update
brew install gcc@9
and then replace the last statement of the Linux instructions with
./install_macos.sh
If you get an error related to the compilation flags, please look at issue #6 which might help you out.
For processors without AVX instructions (e.g., Intel Pentium/Celeron) used in the latest Tensorflow package, do the following instead (requires a working conda installation):
cd LFZip/src/
conda create --name no_avx_env python=3.6
conda activate no_avx_env
./install_without_avx.sh
- Note that LFZip (NLMS), LFZip (NN) and CA (critical aperture) expect the input to be in numpy array (.npy) format and support only float32 arrays.
- LFZip (NLMS) additionally supports multivariate time series with at most 256 variables, where the input is a numpy array of shape
(k,T)wherekis the number of variables andTis the length of the time series. - During compression, the reconstructed time series is also generated as a byproduct and stored as
compressed_file.bsc.recon.npy. This can be used to verify the correctness of the compression-decompression pipeline. - Examples are shown after the usages below [link].
If installed using conda, replace python3 nlms_compress.py by lfzip-nlms.
python3 nlms_compress.py [-h] --mode MODE --infile INFILE --outfile OUTFILE
[--NLMS_order N [N ...]] [--mu MU [MU ...]]
[--absolute_error MAXERROR [MAXERROR ...]]
[--quantization_bytes QUANTIZATION_BYTES [QUANTIZATION_BYTES ...]]
with the parameters:
-h, --help show this help message and exit
--mode MODE, -m MODE c or d (compress/decompress)
--infile INFILE, -i INFILE
infile .npy/.bsc
--outfile OUTFILE, -o OUTFILE
outfile .bsc/.npy
--NLMS_order N [N ...], -n N [N ...]
order of NLMS filter for compression (default 32) -
single value or one per variable
--mu MU [MU ...] learning rate of NLMS for compression (default 0.5) -
single value or one per variable
--absolute_error MAXERROR [MAXERROR ...], -a MAXERROR [MAXERROR ...]
max allowed error for compression - single value or
one per variable
--quantization_bytes QUANTIZATION_BYTES [QUANTIZATION_BYTES ...], -q QUANTIZATION_BYTES [QUANTIZATION_BYTES ...]
number of bytes used to encode quantized error -
decides number of quantization levels. Valid values
are 1, 2 (default: 2) - single value or one per variable
Note that nlms_compress_python.py is an older and slower version with a similar interface
but with the core NLMS compression code written in Python instead of C++.
While the default order for NLMS is 32, we have found that for certain dataset, the optimal order is 0 (i.e., the prediction step is skipped). We recommend that the user try out both values using the -n flag for a given data source before selecting the order. We are currently working on making this process automatic.
First select the appropriate function from models.py, e.g., FC or biGRU. Then call
python3 nn_trainer.py -train training_data.npy -val validation_data.npy -model_file saved_model.h5 \
-model_name model_name -model_params model_params [-lr lr -noise noise -epochs epochs]
with the parameters:
model_name: (str) name of model (function name from models.py)
model_params: space separated list of parameters to the function model_name
lr: (float) learning rate (default 1e-3 for Adam)
noise: (float) noise added to input during training (uniform[-noise,noise]), default 0
epochs: (int) number of epochs to train (0 means store random model)
CUDA_VISIBLE_DEVICES="" PYTHONHASHSEED=0 python3 nn_compress.py [-h] --mode MODE --infile INFILE --outfile OUTFILE
[--absolute_error MAXERROR] --model_file MODEL_FILE
[--quantization_bytes QUANTIZATION_BYTES]
[--model_update_period MODEL_UPDATE_PERIOD] [--lr LR]
[--epochs NUM_EPOCHS]
with the parameters:
-h, --help show this help message and exit
--mode MODE, -m MODE c or d (compress/decompress)
--infile INFILE, -i INFILE
infile .npy/bsc
--outfile OUTFILE, -o OUTFILE
outfile .bsc/.npy
--absolute_error MAXERROR, -a MAXERROR
max allowed error for compression
--model_file MODEL_FILE
model file
--quantization_bytes QUANTIZATION_BYTES, -q QUANTIZATION_BYTES
number of bytes used to encode quantized error -
decides number of quantization levels. Valid values
are 1, 2 (deafult: 2)
--model_update_period MODEL_UPDATE_PERIOD
train model (both during compression & decompression)
after seeing these many symbols (default: never train)
--lr LR learning rate for Adam when model update used
--epochs NUM_EPOCHS number of epochs to train when model update used
The CUDA_VISIBLE_DEVICES="" PYTHONHASHSEED=0 environment variables are set to ensure that the decompression works precisely the same as the compression and generates the correct reconstruction.
WARNING: in some cases, maxerror constraint can be slightly violated (~1e-5) due to numerical precision issues (only for the CA implementation).
python3 ca_compress.py [-h] --mode MODE --infile INFILE --outfile OUTFILE
[--absolute_error MAXERROR]
Input
optional arguments:
-h, --help show this help message and exit
--mode MODE, -m MODE c or d (compress/decompress)
--infile INFILE, -i INFILE
infile .npy/.bsc
--outfile OUTFILE, -o OUTFILE
outfile .bsc/.npy
--absolute_error MAXERROR, -a MAXERROR
max allowed error for compression
data/dat_to_np.py: convert a .dat file (with 1 time series value in plaintext per line) to .npy filedata/npy_to_bin.py: convert a .npy file to binary file used as input to SZdata/bin_to_npy.py: convert a .bin file to .npy file
If installed using conda, replace python nlms_compress.py by lfzip-nlms. See also update above on selecting the NLMS order.
Compression:
python nlms_compress.py -m c -i ../data/evaluation_datasets/dna/nanopore_test.npy -o nanopore_test_compressed.bsc -a 0.01
Decompression:
python nlms_compress.py -m d -i nanopore_test_compressed.bsc -o nanopore_test.decompressed.npy
Verification:
cmp nanopore_test.decompressed.npy nanopore_test_compressed.bsc.recon.npy
Training a fully connected model (FC in models.py) with input_dim = 32, num_hidden_layers = 4, hidden_layer_size = 128 for 5 epochs with uniform noise in [-0.05,0.05] added to the input.
python nn_trainer.py -train ../data/evaluation_datasets/dna/nanopore_train.npy -val ../data/evaluation_datasets/dna/nanopore_val.npy -model_name FC -model_params 32 4 128 -model_file nanopore_trained.h5 -noise 0.05 -epochs 5
Compression:
CUDA_VISIBLE_DEVICES="" PYTHONHASHSEED=0 python nn_compress.py -m c -i ../data/evaluation_datasets/dna/nanopore_test.npy -o nanopore_test_compressed.bsc -a 0.01 --model_file nanopore_trained.h5
Decompression:
CUDA_VISIBLE_DEVICES="" PYTHONHASHSEED=0 python nn_compress.py -m d -i nanopore_test_compressed.bsc -o nanopore_test.decompressed.npy --model_file nanopore_trained.h5
Verification:
cmp nanopore_test.decompressed.npy nanopore_test_compressed.bsc.recon.npy
Compression:
python ca_compress.py -m c -i ../data/evaluation_datasets/dna/nanopore_test.npy -o nanopore_test_compressed.bsc -a 0.01
Decompression:
python ca_compress.py -m d -i nanopore_test_compressed.bsc -o nanopore_test.decompressed.npy
Verification:
cmp nanopore_test.decompressed.npy nanopore_test_compressed.bsc.recon.npy