Holey

A Python library for program synthesis and symbolic execution that combines SMT (Z3, CVC5, ...) constraint solving with LLM-guided synthesis. Put holes in your Python code and let holey fill them using formal constraints, natural language specifications, or both.

The symbolic execution is inspired by Philip Zucker's blog post "Symbolic Execution by Overloading __bool__", but explores all branches exhaustively instead of randomly and fleshes out the concepts towards solving Python Programming Puzzles.

The solver incorporates heuristics from LLMs in addition to symbolic execution.

Examples

# find a string s such that sat(s) holds
def sat(s: str):
    return s.count('o') == 1000 and s.count('oo') == 0
# SMT solver times out, but try 3 instead of 1000
"oCoDo"
# ask LLM to extrapolate from 3 to 1000
"o" + "Co" * 999

Another example:

See more in examples/.

Setup

Install dependencies

python with support for pip (e.g. conda), tested with Python 3.12
z3 or cvc5 or both -- on mac with Homebrew, can install with brew install z3 cvc5

Clone recursive

git clone --recursive https://github.com/namin/holey.git

Setup Python environment

conda create -n holey python=3.12
conda activate holey
pip install -e ".[test,ollama,anthropic]"

Setup LLM environments

For each LLM you want to use, provide an LLM API key, even if only a dummy one is needed. Only provide a key if you want to use that particular LLM provider. All provided keys will be used in parallel to generate a matrix of successes per LLM provider.

export OLLAMA_API_KEY=ollama
export ANTHROPIC_API_KEY=...
export GEMINI_API_KEY=...
export OPENAI_API_KEY=...

Run

Help reference

python puzzle_solver.py --help

Sanity check

python puzzle_solver.py --name-prefix HelloWorld:0

Run all puzzles, saving stdout/stderr to file results.txt

python puzzle_solver.py  >results.txt 2>&

Fallback to LLMs

python puzzle_solver.py --name-prefix ListIn:1  --llm

Current status

The symbolic execution alone currently solves:

69% (247 out of 360) of int puzzles,
45% (165 out of 363) of str puzzles,
24% (142 out of 591) of List[int] puzzles,
90% (46 out of 51) of float puzzles,
27% (38 out of 141) of List[str] puzzles,
42% (638 out of 1506) overall.

with the following errors:

51 timeouts after 3 seconds at staging time (while generating the SMTLIB program)
292 errors at staging time
41 SMTLIB programs returning sat but the original sat function failing on synthesized model input,
493 SMTLIB programs returning non-sat:
- 75 unsat
- 165 unknown
- 253 timeout (after 2 seconds)
- 0 sat but failed to extract solution variable
209 (out of 1715) puzzles not yet even attempted because their type is not int or str, such as float, list (of various specialization), etc.

Extrapolation

194 smaller problems tried
59 successes on smaller problem

Earlier extrapolation results

123 smaller problems tried
11 successes on smaller problem
9 successful extrapolations

Extrapolated puzzles

Study_1:0 PandigitalSquare:0 CircularShiftNum:2 WeirdDecodeVowels:0 TripleDouble:0 MaxDelta:0 MinConsecutiveSum:2 MaxConsecutiveSum:0 BirthdayParadox:0 BirthdayParadox:1 Tutorial5:0

Successfully extrapolated puzzles

Study_1:0 PandigitalSquare:0 WeirdDecodeVowels:0 TripleDouble:0 MaxDelta:0 MinConsecutiveSum:2 MaxConsecutiveSum:0 BirthdayParadox:0 Tutorial5:0

Matrix

claude (extrapolate) 8 1 1 0 1 1 0 1 1 1 0 1
claude (end-to-end) 5 1 1 0 1 0 0 0 0 1 1 0
claude (SMTLIB) 0 0 0 0 0 0 0 0 0 0 0 0
gemini (extrapolate) 5 0 1 0 1 1 1 1 0 0 0 0
gemini (end-to-end) 5 0 0 0 0 1 1 0 1 1 0 1
gemini (SMTLIB) 0 0 0 0 0 0 0 0 0 0 0 0
ollama (extrapolate) 2 0 0 0 0 0 0 1 0 0 0 1
ollama (end-to-end) 2 0 0 0 0 0 0 0 0 1 0 1
ollama (SMTLIB) 0 0 0 0 0 0 0 0 0 0 0 0

Source map

.
├── README.md
├── benchmarks
│   └── PythonProgrammingPuzzles benchmark added as git submodule
├── holey
│   ├── __init__.py
│   ├── backend.py backend to SMTLIB batch processes
│   ├── core.py includes tracer, symbolic classes, ...
│   ├── llm.py support for LLM generation and code extraction
│   └── preprocessor.py includes node transformer and sat driver
├── log
│   └── results.txt example run
├── puzzle_solver.py main routine for benchmark solver
├── pyproject.toml
└── tests
└── test_core.py ran with python -m pytest, basic and LLM-generated

Contribute!

I need help in completely fleshing out the symbolic executor as well as designing and implementing LLM-based heuristics to complement it. See the contributing guidelines, in particular discussing a workflow to find and fix issues driven by the benchmarks.

Name		Name	Last commit message	Last commit date
Latest commit History 518 Commits
analysis		analysis
benchmarks		benchmarks
examples		examples
examples_property		examples_property
holey		holey
log		log
tests		tests
.gitignore		.gitignore
.gitmodules		.gitmodules
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
PROPERTY.md		PROPERTY.md
README.md		README.md
diagram.md		diagram.md
diagram_staging_SMT.md		diagram_staging_SMT.md
exp.svg		exp.svg
property_tester.py		property_tester.py
puzzle_solver.py		puzzle_solver.py
pyproject.toml		pyproject.toml
regenerate_examples.py		regenerate_examples.py
regression.py		regression.py
results_graph_claude_extrapolation.png		results_graph_claude_extrapolation.png
results_graph_claude_extrapolation.py		results_graph_claude_extrapolation.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Holey

Examples

Setup

Install dependencies

Clone recursive

Setup Python environment

Setup LLM environments

Run

Help reference

Sanity check

Run all puzzles, saving stdout/stderr to file results.txt

Fallback to LLMs

Current status

Extrapolation

Earlier extrapolation results

Extrapolated puzzles

Successfully extrapolated puzzles

Matrix

Source map

Contribute!

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

namin/holey

Folders and files

Latest commit

History

Repository files navigation

Holey

Examples

Setup

Install dependencies

Clone recursive

Setup Python environment

Setup LLM environments

Run

Help reference

Sanity check

Run all puzzles, saving stdout/stderr to file results.txt

Fallback to LLMs

Current status

Extrapolation

Earlier extrapolation results

Extrapolated puzzles

Successfully extrapolated puzzles

Matrix

Source map

Contribute!

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages