Wenxi Wang, Yang Hu, Mohit Tiwari, Sarfraz Khurshid, Kenneth McMillan, Risto Miikkulainen
If you use any part of our tool or data present in this repository or huggingface, please do cite our ICLR'24 NeuroBack paper.
@inproceedings{
wang2024neuroback,
title={NeuroBack: Improving {CDCL} {SAT} Solving using Graph Neural Networks},
author={Wenxi Wang and Yang Hu and Mohit Tiwari and Sarfraz Khurshid and Kenneth McMillan and Risto Miikkulainen},
booktitle={The Twelfth International Conference on Learning Representations},
year={2024},
}
NeuroBack is an innovative SAT solver that integrates Graph Neural Networks (GNNs) to enhance Conflict-Driven Clause Learning (CDCL) SAT solving. The tool consists of two primary modules:
- GNN Module: This module, built on PyTorch Geometric, manages the pretraining, finetuning, and inference of the GNN model for backbone prediction.
- Solver Module: Based on the Kissat solver, this module utilizes the GNN’s predictions to guide SAT solving.
The GNN module requires Python 3, and it has been tested on Python 3.11 interpreter. To set up the python environment:
- Install Anaconda3, which includes most of the essential packages for machine learning and data science.
- Follow the official documentation to install PyTorch and PyTorch Geometric.
- Install additional dependencies using the following command:
pip install xtract texttable
To facilitate initial testing and understanding of NeuroBack, a small dataset containing a few SAT formulas and their backbone variable phases is provided. The directory structure is as follows:
|-data
|-cnf # SAT formulas in CNF format
| |-pretrain # For pretraining (backbone required)
| |-finetune # For finetuning (backbone required)
| |-validation # For validation (backbone required)
| |-test # For testing (backbone NOT required)
|
|-backbone # Backbone information
|-pretrain # For pretraining
|-finetune # For finetuning
|-validation # For validation
You can replace the provided datasets with your own SAT formulas and backbone variable information as needed.
Our DataBack dataset, which includes 120,286 SAT formulas in CNF format along with their backbone variable phases, is available on HuggingFace. If you opt to use the DataBack dataset, please ensure it adheres to the same directory structure as the initial small dataset.
To use your own SAT formulas for pretraining or finetuning the GNN model, please utilize the cadiback backbone extractor to compute the backbone variable phases. Please compress the backbone information in xz format and ensure your dataset is organized similarly to the initial small dataset.
Once your dataset is deployed, please generate graph representations for each SAT formula by running the following commands:
python3 graph.py pretrain
python3 graph.py finetune
python3 graph.py validation
python3 graph.py testNote that it is normal to see "backbone file does not exist" messages when running the last command, as the test dataset does not include precomputed backbones.
Graph representations are saved in the data/pt folder. To accelerate the graph generation process, you can adjust the n_cpu variable in line 393 of graph.py to allocate more CPUs (by default, n_cpu = 1). However, please note that this may increase memory usage.
Once graph representations are ready, you can start the pretraining and finetuning process:
python3 learn.py pretrain # Pretrain the GNN model
python3 learn.py finetune # Finetune the GNN modelPretrained model checkpoints are saved in model/pretrain, and finetuned model checkpoints in model/finetune. Each folder includes:
[pretrain/finetune]-[i].ptg: The model checkpoint after the i-th epoch of pretraining/finetuning.[pretrain/finetune]-best.ptg: The model checkpoint with the best F1 score among those saved for each epoch. By default, pretrain-best.ptg will be loaded at the beginning of finetuning.
Logs regarding pretraining/finetuning are saved in log/pretrain or log/finetune, respectively. gnn-load.log (if it exists) records the performance metrics of the loaded model checkpoint on the validation set. gnn-[i].log contains the performance metrics of the model checkpoint for the i-th epoch. Metrics in the logs include confusion matrices, losses, precision, recall, and F1 scores.
You can customize hyperparameters such as learning rate and batch size by modifying lines 34-52 in learn.py. The default hyperparameter setting is the same as what has been introduced in our NeuroBack paper.
To predict the backbone variables for SAT formulas in the ./data/cnf/test folder using the finetuned GNN model (by default, the model we use is ./models/finetune/finetune-best.ptg), you may choose to run the following command:
python3 predict_cuda.pyThis command leverages GPU (cuda) for model inference and will skip formulas that trigger cuda out of memory errors. Alternatively, you can perform inference using CPUs:
python3 predict_cpu.py # CPU-only inference
python3 predict_mix.py # Mixed GPU and CPU inference (GPU first, then CPU if cuda out of memory)Predictions are saved in the ./prediction/{cuda|cpu|mix}/cmb_predictions folder. Each record contains a boolean variable ID and the estimated probability of being a positive or negative backbone (closer to 1 indicates a positive backbone; closer to 0 indicates a negative backbone).
Logs for predictions are saved in ./log/predict_cuda, ./log/predict_cpu, or ./log/predict_mix. For each CNF file in the test dataset, a log file in csv format is generated, which records the CNF file name, the hardware used (i.e., cuda or cpu), and the time cost (in seconds) of model inference.
The Solver Module is built on top of Kissat. Below are the steps to compile the solver, and apply predicted backbone to the solver.
The source code for the solver module is located in the solver folder. Please compile the solver using the following commands:
cd solver
./configure && make
cd ..After successful compilation, the solver binary will be available at solver/build/kissat.
To minimize disk storage usage, the GNN module automatically compresses predicted backbone information. Therefore, before solving CNF formulas using our solver module, their corresponding backbone files must be uncompressed. In particular, for each CNF formula located at ./data/cnf/test/[CNF_FILE_NAME], its corresponding backbone file predicted by the GNN module is located at ./prediction/{cuda|cpu|mix}/cmb_predictions/[CNF_FILE_NAME].res.tar.gz, which can be uncompressed via tar.
The example below demonstrates how to uncompress a backbone file predicted using CUDA for the CNF file fee70cede2b5b55bfbdb6e48fbe7ce4f-DLTM_twitter690_74_16.cnf.xz.res.tar.gz in ./data/cnf/test/ folder:
CNF_FILE_NAME=fee70cede2b5b55bfbdb6e48fbe7ce4f-DLTM_twitter690_74_16.cnf.xz.res.tar.gz
BACKBONE_FILE_NAME=$CNF_FILE_NAME.res.tar.gz
tar -xzvf ./prediction/cuda/cmb_predictions/$BACKBONE_FILE_NAME
UNCOMPRESSED_BACKBONE_FILE_PATH=./$CNF_FILE_NAME.res
cat $UNCOMPRESSED_BACKBONE_FILE_PATH # Optional: view the uncompressed backboneAfter running these commands, the uncompressed backbone file for the above mentioned CNF file will be available for use in the solver.
To solve a CNF formula using the uncompressed backbone, execute the following command:
./solver/build/kissat ./data/cnf/test/$CNF_FILE_NAME -q -n --stable=2 --neural_backbone_initial --neuroback_cfd=0.9 $UNCOMPRESSED_BACKBONE_FILE_PATHThe above command uses two neuroback-specific flags:
--neural_backbone_initial: activate the use of the GNN's predicted backbone for initialization.--neuroback_cfd: specify a confidence threshold score ranging from 0 to 1, to determine whether the predicted phases should be treated as backbone phases. Variables with predicted phase confidence scores exceeding the threshold are considered backbone variables, while those with scores below the threshold are classified as non-backbone variables and initialized with the default phase. We usually use a threshold of 0.9, but you can adjust this value to optimize solving performance for specific SAT problems.
In addition, --random_phase_initial means that to randomly initialize the phases of the variables.
For questions, please reach out to Wenxi Wang at [email protected] or Yang Hu at [email protected].