Thanks to visit codestin.com
Credit goes to github.com

Skip to content

acopar/fast-nmtf-gpu

Repository files navigation

fast-nmtf

Fast optimization of non-negative matrix tri-factorization.

Installation

This project relies on numpy and scipy libraries. For best results, we recommend installing it inside the Anaconda environment. Anaconda simplifies the environment setup by providing optimized libraries for matrix operations (such as Intel MKL).

   git clone https://github.com/acopar/fast-nmtf
   cd fast-nmtf
   conda env create -f environment.yml
   conda activate fast-nmtf
   pip install -e .

Data

To download preprocessed benchmark datasets, use the provided get_datasets.sh script.

    scripts/get_datasets.sh

This script downloads datasets that have already been preprocessed and converted into npz (numpy compressed) format:

Example

    python fnmtf/factorize.py -t cod -k 20 data/aldigs.npz

The following optimization techniques can be set with option -t.:

  • mu: multiplicative updates
  • als: alternating least squares
  • pg: projected gradient
  • cod: coordinate descent

Reproduce results

To exactly reproduce the experiments, where each dataset is run ten times on each of the optimization techniques, run the following command. This may take days depending on your configuration.

    bash scripts/full.sh

Long test will evaluate convergence (using the same factorization rank=20). This will take hours to complete (less than 10 times faster compared to full test).

    bash scripts/long.sh

There is a shorter version of the experiments, which has a lower threshould for convergence (epsilon=10^-5), max iterations set to 2000. This test will complete in a few hours.

    bash scripts/short.sh

After the experiments are done, you can visualize the output using the following two commands:

    python fnmtf/visualize.py

Command line arguments

  • -t [arg]: Optimization technique [mu, als, pg, cod]
  • -s: Use sparse matrices
  • -k [arg]: factorization rank, positive integer
  • -p [arg]: number of parallel workers
  • -S [arg]: random seed
  • -e [arg]: stopping criteria threshould (higher means more iterations), default=6
  • -m [arg]: minimum number of iterations
  • data: last argument is path to the dataset (required)

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published