A Neural Network Trainer, used to train NNUE-style networks for akimbo.
Also used by a number of other engines, including:
Used exclusively to train architectures of the form Input -> Nx2 -> Output.
Can use either a CPU or hand-written CUDA backend.
Currently Supported Games:
- Chess
- Ataxx
Raise an issue for support of a new game.
To learn how it works, read the wiki.
The trainer uses its own binary data format for each game.
The specifications for the data formats are found in the bulletformat crate.
Additionally, each type implements from_raw which is recommended for use if your engine is written in Rust (or you don't
mind FFI).
All data types at present are 32 bytes, so you can use marlinflow-utils to shuffle and interleave files.
You can convert text format, where
- each line is of the form
<FEN> | <score> | <result> FENhas 'x'/'r', 'o'/'b' and '-' for red, blue and gaps/blockers, respectively, in the same format as FEN for chessscoreis red relative and an integerresultis red relative and of the form1.0for win,0.5for draw,0.0for loss
by using the command
cargo r -r --bin convertataxx <input file path> <output file path>
You can convert a Marlinformat file by running
cargo r -r --bin convertmf <input file path> <output file path> <threads>
it is up to the user to provide a valid Marlinformat file.
Additionally, you can convert legacy text format as in Marlinflow, where
- each line is of the form
<FEN> | <score> | <result> scoreis white relative and in centipawnsresultis white relative and of the form1.0for win,0.5for draw,0.0for loss
by using the command
cargo r -r --bin convert <input file path> <output file path>
By default all trained nets are quantised with the QA and QB factors found in common/src/lib.rs.
However, if you have a params.bin file from a checkpoint folder then you can quantise this however you want with
cargo r -r --bin quantise <input file path> <output file path> <QA> <QB>
It is recommended to change QA to 181 or less if using SCReLU activation, as then you can utilise manual SIMD to
achieve a significant speedup.
General architecture settings, that must be known at compile time, are found in common/src/lib.rs.
It is like this because of Rust's limitations when it comes to const code.
After settings those as you please, you can run the trainer using the run.py script, and use
python3 run.py --help
to get a full description of all options.
A sample usage is
python3 run.py \
--data-path data.bin \
--test-id net \
--threads 6 \
--lr 0.001 \
--wdl 0.5 \
--max-epochs 40 \
--batch-size 16384 \
--save-rate 10 \
--lr-step 15 \
--lr-gamma 0.1
of these options, only data-path, threads and lr-step are not default values.
NOTE: You may need to run cargo update if you pull a newer version from main.
There are 3 separate learning rate options:
lr-step Ndrops the learning rate everyNepochs by a factor oflr-gammalr-drop Ndrops the learning rate once, atNepochs, by a factor oflr-gammalr-end xis exponential LR, starting atlrand ending atxwhen atmax-epochs, it is equivalent tolr-step 1with an appropriatelr-gamma.
By default lr-gamma is set to 0.1, but no learning rate scheduler is chosen. It is highly
recommended to have at least one learning rate drop during training.
Add --cuda to use CUDA, it will fail to compile if not available.
Note that for small net sizes (generally unbucketed & hidden layer size < 256),
CUDA may be slower than using the CPU.
Currently (at the time of writing) rustc does not emit avx512 via autovec, so if you have an avx512 cpu, switch to the nightly
Rust channel and add the --simd flag to the run command to enable usage of hand-written SIMD.
This comes with the caveat that hidden layer size must be a multiple of 32.
As rust nightly is unstable and has a bunch of experimental compiler changes, there may be an overall diminished performance compared to compiling on stable, so I'd recommend testing the two on your machine.
Every save-rate epochs and at the end of training, a quantised network is saved to /nets, and a checkpoint
is saved to /checkpoints (which contains the raw network params, if you want them). You can "resume" from a checkpoint by
adding --resume checkpoints/<name of checkpoint folder> to the run command.
This is designed such that if you use an identical command with the resuming appended, it would be as if you never stopped training, so if using a different command be wary that it will try to resume from the epoch number the checkpoint was saved at, meaning it will fast-forward Learning Rate to that epoch.