This script repeatedly asks a large language model (LLM) to implement a function with a given signature until the given tests pass.
For many problems, implementing tests is less work than implementing a solution, so we can outsource the laborious task of finding a solution to an LLM.
Just write your desired function signature in signature.py, your tests in test.py, start llama-server and main.py and then go get a coffee. If you are lucky, the problem will be solved when you come back.
Currently, only a simple bruteforce approach is implemented. If time allows, I might try implementing something smarter like e.g. Language Agent Tree Search in the distant future.
Given the following signature.py and tests.py file, the language model will (eventually) implement a function to invert the given matrix.
def invert(A: list[list[float]]) -> list[list[float]]:
# Invert the matrix A.import random
# Make tests reproducible
random.seed(0)
# Check that NumPy is not being used
with open(__file__, encoding="utf-8") as f:
solution = f.read().rsplit("import random\n", 1)[0]
assert "np." not in solution
def matmul(A, B):
return [[sum(a * b for a, b in zip(A_row, B_col)) for B_col in zip(*B)] for A_row in A]
for n in range(1, 10):
for _ in range(10):
# Create random matrix
A = [[random.random() for _ in range(n)] for _ in range(n)]
A_inv = invert(A)
# A * A_inv should be the identity matrix with 1 on its diagonal and 0 otherwise
I = matmul(A, A_inv)
for i in range(n):
for j in range(n):
assert abs(I[i][j] - float(i == j)) < 1e-5git clone [email protected]:99991/blts.gitcd blts- Install Docker.
docker build -t testimage .to build the Docker image from the Dockerfile.- Install llama.cpp.
- Download a language model. For testing, Qwen/Qwen2.5-Coder-1.5B-Instruct-GGUF is a probably good enough and should run even if your computer is bad.
- Start
llama-serverusing a command similar to the following:
llama-server \
--model qwen2.5-coder-1.5b-instruct-q4_k_m.gguf \
--host 127.0.0.1 \
--port 8080 \
--flash-attn \
-ngl 999 \
--ctx-size 16384 \
--cache-type-k q8_0 \
--cache-type-v q8_0 \
--parallel 10- Run
main.py