Releases: Witek902/Caissa
Caissa v1.24
A small, incremental release: gaining Elo at this level is getting harder without a new architecture and much more compute. The new version focuses on new nets, search cleanups, and the first proper SMP tuning, with especially nice gains in DFRC.
Changes
Evaluation
- New neural net trained on 17B positions
- Added a castling-rights bonus for FRC/DFRC, combining well with the new nets for larger gains there than in regular chess.
- Removed KRvKR and KQvKQ specialty code (was causing low-time blunders).
Search & TT
- Store eval in TT as early as possible in QSearch.
- In PV nodes, skip QSearch on the TT move (idea by Viz from Stockfish).
- Increased NMP start depth (more conservative at shallow depth).
- Added a recapture extension.
- Increased the βimprovingβ factor in RFP.
- Simplified move ordering and eval adjustment code paths.
- TT now updates the move field even when the entry is not overwritten.
- Removed current-move reporting from the search output.
- First dedicated multithreaded tuning (β31k games, 20+0.2, 8 threads), giving about +5 Elo on 4 threads.
Time management & SMP
- Tweak with the biggest gain in high-increment games (β+4 Elo).
Progression test
SMP LTC (20+0.2, 4 threads)
Elo | 17.33 +- 3.96 (95%)
Conf | 20.0+0.20s Threads=4 Hash=128MB
Games | N: 6964 W: 1758 L: 1411 D: 3795
Penta | [5, 639, 1847, 986, 5]
https://furybench.com/test/3969/
LTC (60+0.6)
Elo | 8.84 +- 2.38 (95%)
Conf | 60.0+0.60s Threads=1 Hash=128MB
Games | N: 20010 W: 4835 L: 4326 D: 10849
Penta | [19, 2133, 5185, 2656, 12]
https://furybench.com/test/3935/
DFRC LTC (60+0.6)
Elo | 10.33 +- 2.17 (95%)
Conf | 60.0+0.60s Threads=1 Hash=128MB
Games | N: 17118 W: 2595 L: 2086 D: 12437
Penta | [28, 1110, 5792, 1583, 46]
Caissa v1.23
A smaller, focused update. At this level gains are hard to come by and my dev time is limited, but 1.23 still squeezes out steady Elo at both STC and LTC, plus a big leap in a quirky βqueens-onlyβ test.
Changes
- Evaluation
- New neural net.
- Simplify Evaluate: leaner piece/phase accounting; removed the old castling-rights bonus.
- Search
- Simplify extensions.
- Fix SkipQuiets so quiets are truly skipped.
- Simplify TT alpha cutoffs.
- Use uint8 for TT entry depth (avoids signed wrap; widens depth range) and adjust a static-eval TT write.
- Tweak correction history bonus.
- QSearch: adjust best score toward beta on stand-pat exceed (idea from Stockfish).
- Perform Null Move Pruning only in cut nodes.
- Tuning
- Retuned all parameters at long time control (80+0.8).
Progression test (UHO book)
LTC
Elo | 15.85 +- 3.27 (95%)
Conf | 60.0+0.60s Threads=1 Hash=128MB
Games | N: 10000 W: 2532 L: 2076 D: 5392
Penta | [2, 923, 2704, 1359, 12]
STC
Elo | 7.92 +- 3.56 (95%)
Conf | 10.0+0.10s Threads=1 Hash=16MB
Games | N: 10006 W: 2466 L: 2238 D: 5302
Penta | [30, 1126, 2475, 1330, 42]
Special sauce: Queens-only test
Using more high-quality queens-only games in training continues to pay off.
Score of Caissa 1.23 BMI2 vs Caissa 1.22 BMI2: 295 - 19 - 463 [0.678] 777
... Caissa 1.23 BMI2 playing White: 159 - 11 - 218 [0.691] 388
... Caissa 1.23 BMI2 playing Black: 136 - 8 - 245 [0.665] 389
... White vs Black: 167 - 147 - 463 [0.513] 777
Elo difference: 129.0 +/- 14.7, LOS: 100.0 %, DrawRatio: 59.6 %
Thanks to everyone running games and sharing feedback!
Full Changelog: 1.22...1.23
Caissa v1.22
Caissa just got a little smarter and a lot more fun - especially in strange βqueens-onlyβ or "rook-only" setups, where it now crushes Stockfish and beats its own previous version by +500 Elo.
Changes
- Tweaked how deep singular extensions go.
- Added extra negative extensions.
- Late-move reductions (LMR) are now fully deterministic.
- Quiescence search now averages stand-pat scores.
- Time-manager tweaks.
- Transposition table clears faster on multi-core CPUs.
- Fresh neural net trained on 15B positions.
Special sauce
In positions with only queens on the first rank (e.g. qqqqkqqq/pppppppp/8/8/8/8/PPPPPPPP/QQQQKQQQ w - - 0 1) Caissa 1.22 gains a massive +530 (+/- 40) over 1.21, putting it far ahead of Stockfish in that odd corners of chess.
Measured gains
Book: UHO_Lichess_4852_v1.epd
Elo | 16.96 +- 3.61 (95%)
Conf | 60.0+0.60s Threads=1 Hash=128MB
Games | N: 8508 W: 2158 L: 1743 D: 4607
Penta | [4, 801, 2237, 1200, 12]
Elo | 16.62 +- 3.09 (95%)
Conf | 8.0+0.08s Threads=1 Hash=8MB
Games | N: 13284 W: 3449 L: 2814 D: 7021
Penta | [43, 1313, 3327, 1884, 75]
Not huge jumps - improvements are getting harder at this level...
Thanks for testing!
Caissa v1.21.7
Update 2-ply-back continuation history Elo | 0.72 +- 0.62 (95%) SPRT | 10.0+0.10s Threads=1 Hash=16MB LLR | 2.11 (-2.94, 2.94) [0.00, 2.00] Games | N: 332766 W: 80290 L: 79598 D: 172878 Penta | [1333, 39748, 83523, 40452, 1327] Bench 7141438
Caissa v1.21
I'm excited to announce Caissa version 1.21, the latest release packed with improvements to make gameplay even stronger and more stable.
Progression test
TC=60+0.6s, Book=UHO_Lichess_4852_v1.epd
Elo | 35.09 +- 4.16 (95%)
Conf | 60.0+0.60s Threads=1 Hash=128MB
Games | N: 6278 W: 1758 L: 1126 D: 3394
Penta | [2, 446, 1624, 1052, 15]
TC=8+0.08s, Book=UHO_Lichess_4852_v1.epd
Elo | 22.65 +- 3.68 (95%)
Conf | 8.0+0.08s Threads=1 Hash=8MB
Games | N: 10000 W: 2812 L: 2161 D: 5027
Penta | [31, 974, 2402, 1499, 94]
TC=10+0.1s, Book=DFRC.epd
Elo | 18.98 +- 5.35 (95%)
Conf | 10.0+0.10s Threads=1 Hash=16MB
Games | N: 5002 W: 1114 L: 841 D: 3047
Penta | [38, 462, 1258, 675, 68]
Changes
- Fixed problems detected by sanitizers that could lead to potential crashes.
- New neural network trained on a total of 13 billion positions.
- Improved eval correction (Stockfish style).
- Tuned parameters at long time control.
- Various search improvements.
Special thanks to @aronpetko for invaluable access to the OpenBench instance.
Caissa v1.20
Progression test
TC=40+0.4s, Book=UHO_Lichess_4852_v1.epd
Elo | 22.15 +- 5.24 (95%)
Conf | 40.0+0.40s Threads=1 Hash=64MB
Games | N: 7808 W: 2054 L: 1557 D: 4197
Penta | [7, 696, 2007, 1181, 13]
TC=10+0.1s, Book=UHO_Lichess_4852_v1.epd
Elo | 18.95 +- 2.35 (95%)
Conf | 10.0+0.10s Threads=1 Hash=16MB
Games | N: 40014 W: 10644 L: 8464 D: 20906
Penta | [98, 3904, 9948, 5834, 223]
TC=1+0s, Book=UHO_Lichess_4852_v1.epd
Elo | 71.57 +- 4.99 (95%)
Conf | 1.0+0.00s Threads=1 Hash=1MB
Games | N: 10240 W: 3832 L: 1752 D: 4656
Penta | [89, 731, 1889, 1833, 578]
Changes
- Bigger neural net (11 king buckets instead of 5) trained on total 12.6B positions
- Improved performance in ultra short time controls without increment
Caissa v1.19
Progression test
TC=60+0.6, Book=UHO_Lichess_4852_v1.epd
Elo | 22.12 +- 7.60 (95%)
Conf | 60.0+0.60s Threads=1 Hash=64MB
Games | N: 3790 W: 1016 L: 775 D: 1999
Penta | [1, 344, 969, 575, 6]
Changes
- New neural net trained on total 12.5B positions
- Various search improvements
- Search parameter tuning at LTC
Caissa v1.18
Progression test
TC=8+0.08, Book=UHO_Lichess_4852_v1.epd
Elo | 21.60 +- 3.48 (95%)
Conf | 8.0+0.08s Threads=1 Hash=8MB
Games | N: 18582 W: 5085 L: 3931 D: 9566
Penta | [62, 1822, 4474, 2766, 167]
TC=60+0.6, Book=UHO_Lichess_4852_v1.epd
Elo | 21.98 +- 5.45 (95%)
Conf | 60.0+0.60s Threads=1 Hash=128MB
Games | N: 7266 W: 1924 L: 1465 D: 3877
Penta | [10, 654, 1850, 1105, 14]
Changes
- New neural net trained on total 9.3B positions. Introduced more positions from regular chess games (instead of DFRC games).
- Various search improvements
Caissa v1.17
Progression test
TC=8+0.08, Book=UHO_Lichess_4852_v1.epd
Elo | 20.60 +- 4.85 (95%)
Conf | 8.0+0.08s Threads=1 Hash=8MB
Games | N: 9690 W: 2670 L: 2096 D: 4924
Penta | [38, 947, 2337, 1449, 74]
TC=60+0.6, Book=UHO_Lichess_4852_v1.epd
Elo | 19.83 +- 6.47 (95%)
Conf | 60.0+0.60s Threads=1 Hash=64MB
Games | N: 5000 W: 1272 L: 987 D: 2741
Penta | [4, 442, 1332, 709, 13]
TC=8+0.08, Book=DFRC.epd
Elo | 15.59 +- 6.38 (95%)
Conf | 8.0+0.08s Threads=1 Hash=8MB
Games | N: 4394 W: 948 L: 751 D: 2695
Penta | [28, 433, 1110, 566, 60]
Changes
- New neural net trained on total 7.1B positions. Introduced more high quality games from SPRT tests to the dataset (~220M positions) and random endgame positions scored with 7-man TB (~40M positions). Finetuning previous net for over 50B iterations.
- Small speedup
- Search improvements:
- Prevent search explosions in LMR
- Simplify LMR history formula
- Additional history bonus based on score difference
- Store eval in TT as soon as possible
- Higher RFP margin if opponent is threating a capture
- Use threats info to generate less illegal king moves
- Tweak transposition table replacement scheme
Caissa v1.16
Progression test
TC=10+0.1, Book=UHO_4060_v2.epd
Elo | 36.65 +- 4.71 (95%)
Conf | 10.0+0.10s Threads=1 Hash=16MB
Games | N: 10000 W: 2917 L: 1866 D: 5217
Penta | [25, 788, 2387, 1711, 89]
TC=60+0.6, Book=UHO_4060_v2.epd
Elo | 34.15 +- 7.41 (95%)
Conf | 60.0+0.60s Threads=1 Hash=128MB
Games | N: 3848 W: 1065 L: 688 D: 2095
Penta | [4, 276, 991, 645, 8]
TC=10+0.1, Book=DFRC.epd
Elo | 27.03 +- 7.27 (95%)
Conf | 10.0+0.10s Threads=1 Hash=16MB
Games | N: 3568 W: 866 L: 589 D: 2113
Penta | [23, 311, 879, 508, 63]
Changes
- New neural net trained on total 6.9B positions. Introduced high quality games from SPRT tests to the dataset (~350M positions).
- SPSA parameter tuning at long time controls
- Eval correction improvements
- Smaller transposition table entries
- Few speedups (around 4% in total)
- Various search improvements