The fx-cmix is a updated implementation of fast-cmix.
Prize awarded on February 2, 2024. http://prize.hutter1.net/
This submission contains fallowing modifications on top of the recent fast-cmix Hutter Prize winner:
- paq8hp model is replaced with fxcmv1 model with fallowing notable additions:
- Multiple state tables are used in predictors, this allows better predictability.
- Most contexts are divided between 30 main predictors, this allows more efficient memory usage per context.
- Added bracket, quote, first char, char in paragraph, column, table, template, word stream/paragraph context. These contexts are parsed depending on input, this includes parsing of wiki links, http links, tables, columns, paragraphs, quotes, brackets, list, templates.
- Some contexts are swapped in predictors depending on what current input is (table, column mode, word/paragraph, list).
- Some predictors are switched on/off depending on last char, link or current bracket which improves compression.
- Predictions are mixed with context that are more aware of what predictors are outputting.
- Match model (not present in paq8hp model).
- Predictors are faster allowing more complex contexts.
- new dictionary
- small change in phda9 preprocessor and in two tables in cmix
- memory usage is larger in fxcmv1 model compared to old paq8hp
Below is the fx-cmix result:
| Metric | Value |
|---|---|
| fx-cmix compressor's executable file size (S1) | 436707 bytes |
| fx-cmix self-extracting archive size (S2) | 112148343 bytes |
| Total size (S) | 112585050 bytes |
| Previous record (L) | 114156155 bytes |
| fx-cmix improvement (1 - S/L) | 1.38% |
| Experiment platform | |
|---|---|
| Operating system | Ubuntu 20.04 |
| Processor | Intel(R) Xeon(R) CPU @ 2.20GHz Geekbench score 706 |
| Memory | 32 GB |
| Decompression running time | 85 hours |
| Decompression RAM max usage | 9575124 KiB |
| Decompression disk usage | ~35GB |
Time, disk, and RAM usage are approximately symmetric for compression and decompression.
The installation and usage instructions for fx-cmix are the same as for fast-cmix.
One important note: it is recommended to change one variable in the source code for PPM. From line 26 in src/models/ppmd.cpp:
// If mmap_to_disk is set to false (recommended setting), PPM will only use RAM
// for memory.
// If mmap_to_disk is set to true, PPM memory will be saved to disk using mmap.
// This will reduce RAM usage, but will be slower as well. *Warning*: this will
// write a *lot* of data to disk, so can reduce the lifespan of SSDs. Not
// recommended for normal usage.
bool mmap_to_disk = true;
This variable is set to true by default, to comply with the Hutter Prize RAM limit.
Building fx-cmix compressor from sources requires clang-17, upx-ucl, and make packages. On Ubuntu, these packages can be installed by running the following scripts:
./install_tools/install_upx.sh
./install_tools/install_clang-17.shA bash script is provided for compiling cmix-hp compressor from sources on Ubuntu. This script places the cmix-hp executable file named as cmix in ./run directory. The script can be run as
./build_and_construct_comp.shTo run the cmix-hp compressor use
cd ./run
cmix -e <PATH_TO_ENWIK9> enwik9.compThe compressor is expected to output an executable file named archive9 in the same directory (./run). The file archive9 when executed is expected to reproduce the original enwik9 as a file named enwik9_restored. The executable file archive9 should be launched without argments from the directory containing it.
cd ./run
./archive9