-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Dear Mr. Gévay and Mr. Danner,
I hope this message finds you well. I have been greatly inspired by your work on Nine Men’s Morris, Morabaraba, and Lasker Morris, particularly regarding extended and ultrastrong solutions. Building on the foundations you established, I am developing a preliminary research plan to integrate perfect database knowledge with traditional Alpha-Beta search in a more streamlined way, ultimately aiming to train a lightweight neural network for efficient, near-optimal play.
Below is the detailed design of my intended approach. While none of these steps have been implemented yet, I would be grateful for any guidance or suggestions you might share.
1. Overall Goal
Create a “delta-compressed” subset of the tablebase that captures only those states in which Alpha-Beta search fails to replicate the optimal solution. This compressed database should be dramatically smaller yet still ensure overall near-perfect decision-making.
2. Methodology in Detail
2.1 Segmenting the Database
-
Phased or Subset Handling:
- We plan to split the full tablebase by game phase (e.g., opening/placing, midgame/moving, endgame/flying) or other meaningful partitions (piece counts, forced lines, etc.).
- By working on smaller subsets, we can systematically test and refine our methods before extending them to the entire database.
-
Sampling Strategies:
- For each subset, we may combine random sampling for broad coverage and stratified sampling to focus on higher-complexity states.
- This ensures we do not overlook “rare but critical” positions.
2.2 Alpha-Beta Search Validation
-
Running Alpha-Beta:
- We will use a minimax framework with iterative deepening, heuristic evaluation, and basic pruning.
- The search depth or time budget can be adjusted per subset to balance thoroughness and computation.
-
Mismatch Detection:
- For every selected state (s), we compare the Alpha-Beta best move (and outcome, if applicable) with the tablebase result.
- If Alpha-Beta matches the database, we need not store additional information.
- If Alpha-Beta differs, we mark (s) as a conflict and record essential data (optimal move, value, or short continuation).
2.3 Building the “Delta” Compressed Database
-
Core Principle:
- Only states where Alpha-Beta conflicts with the tablebase are retained in the “delta” set.
- Each entry contains:
- State ID (e.g., Zobrist hash or another representation).
- Correct move/value according to the perfect database.
- Optional follow-up for forced lines, ensuring correct transitions in deeper sequences.
-
Iterative Construction:
- We plan to add entries to the delta set incrementally, continuing until the main search engine rarely encounters undiscovered discrepancies.
-
Use in Gameplay:
- During a live match, the system would run Alpha-Beta . For state (s):
- If (s) is not in the delta set, Alpha-Beta is presumed correct.
- If (s) is in the delta set, we override Alpha-Beta with the tablebase’s recommended move.
- During a live match, the system would run Alpha-Beta . For state (s):
2.4 Neural Network Training
-
Data Sources:
- Consistent States: (where Alpha-Beta aligns with the tablebase) used to train on “typical” or “straightforward” positions.
- Conflict States: (from the delta table) used to highlight trickier scenarios in which Alpha-Beta needed correction.
-
Network Structure:
- Potentially a convolutional or graph-based architecture to encode Nine Men’s Morris boards (24 points, occupant info, etc.).
- Outputs could be a value (win/draw/loss or numerical) and/or a policy (move probabilities).
-
Training Workflow:
- Supervised Phase: We label each sampled position with the tablebase’s best move or value.
- Iterative Refinement: If the network remains uncertain in certain sub-regions, we may sample more states or refine heuristics for those areas, always referencing the delta table for ground truth.
3. Potential Refinements
- Symmetry and Transformation:
- Exploit rotational/reflective symmetries to reduce storage and training data repetition.
- Further Distillation & Self-Play:
- After the initial supervised stage, it could be beneficial to integrate minimal self-play or a reinforcement step, referencing the delta table whenever the network encounters new or uncertain positions.
- Coverage vs. Table Size:
- We aim to keep the compressed database small, but must ensure that all critical lines—especially forced sequences—are captured.
- Practical Heuristics:
- If certain states are extremely deep or complex, we plan to refine our Alpha-Beta heuristics or incrementally increase search depth to catch hidden traps.
4. Invitation for Your Guidance
As we have not yet begun implementation, your expertise could help us refine the initial strategy and prioritize our efforts. We welcome any broad or specific suggestions, such as:
- Ensuring comprehensive coverage of intricate states without over-expanding the delta table.
- Efficient indexing (e.g., hashing strategies) to guarantee fast lookups in both the tablebase and the compressed data.
- Integrating knowledge of specialized lines (e.g., punishing suboptimal moves) in the style of ultrastrong solutions.
We would be honored to incorporate any insights or recommendations you might share.
Thank you for taking the time to review this proposal. Your research has been a significant inspiration, and I look forward to any thoughts or advice you may be willing to offer.
Warm regards,
Calcitem