PuzzleJAX

This repository contains the code for PuzzleJAX, a GPU-accelerated implementation of PuzzleScript (https://www.puzzlescript.net)

PuzzleScript is a concise and expressive game description language that has been used by designers to create a plethora of grid-based puzzle games. At its core are local pattern rewrite rules. We take advantage of the convolutional nature of these rewrite rules to implement the engine in JAX, allowing AI practitioners to e.g. efficiently train Reinforcement Learning agents to play arbitrary PuzzleScript games.

Setup

Create a conda environment (or other virtual environment) running python 3.13, then run:

pip install -r requirements.txt

But first! Make sure the lines requiring jax or jax[cuda] are (un)commented appropriately, depending on whether or not you have CUDA available on your system.

Collecting and parsing data

First, collect games, both from the original PuzzleScript website/editor/javascript-engine repository, (which is checkpointed here under src) and an online archive with the following command:

python collect_games.py

This will also attempt to scrape a dataset of ~900 games from an online database. For this, you will need a Github REST API key saved in .env.

To preprocess these files, so that we can validate, profile and benchmark them in the jax, nodejs, and javascript versions of PuzzleScript, run:

python preprocess_games.py

Interactive playtesting

To play a game interactively on a local machine, using the jax environment to run the engine, run, e.g.:

python human_env.py game=sokoban_basic jit=True debug=False

You can add new/custom games to the custom_games folder and refer to them in the above command line argument to playtest them yourself, or similarly supply the game as a command line argument to the RL training script below.

Note that when playing a new level, the first 2 timesteps will have JAX trace and compile the engine's step function, which can be slow (especially for games with more rules, objects, and larger levels).

You can toggle the jit and debug command line arguments to replace jitted functions with traditional python control loops, and print out verbose logging about rule applications, etc., respectively. (When jit=False, we're also able to print out text representations of intermediary level states, which is useful for fine-grained engine debugging, and understanding the rule execution order.)

Similarly, you can launch the javascript PuzzleScript editor with:

python server.py mode=None headless=False auto_launch_client=True port=8002

, then copy games into the editor and compile and playtest them there as well.

Validating the jax engine

To generate solutions PuzzleScript games by applying tree search to the original engine, run:

python profile_nodejs.py for_validation=True

This will run a standalone NodeJS version of the original PuzzleScript engine and save solutions (and terminal states) to disk.

We can then validate that these solutions lead to the same win conditions and level states in PuzzleJAX with:

python validate_sols.py overwrite=True

This will run the solutions generated above in PuzzleJAX, and ensure that they lead to the same results.

Profiling the speed of random actions

python profile_rand_jax.py

python profile_rand_nodejs.py

Reinforcement learning

To train an agent using reinforcement learning to play a particular game level, run, e.g.:

python train.py game=sokoban_basic level=0 n_envs=600 model=conv2 render_freq=5 hidden_dims=[128,128] seed=0

This will log plots and gifs to wandb.

LLM player agent

To run the LLM agent, use the llm_agent_loop.py script. This script iterates through a predefined list of games, running each one for a specified number of trials (--num_runs) across all its levels.

Basic Usage

To run the agent for all priority games with a specific model for a certain number of runs:

python llm_agent_loop.py --model gemini --num_runs 10

Supported models are 4o-mini, o3-mini, gemini, deepseek, qwen, and deepseek-r1.

Command-line options

The script offers several options to customize its execution:

--model: (Required) The LLM model to use.
--num_runs: The number of times to run each game level (default: 10).
--max_steps: The maximum number of steps allowed per episode (default: 100).
--resume_game_name: The name of the game to start from in the priority list.
--level: The level number to start from for each game (default: 0).
--reverse: Process the games in reverse order.
--force: Rerun all games, even if result files already exist.

Execution flow

The script performs the following steps:

It loads a predefined list of priority games.
It iterates from run_id 1 to --num_runs.
In each run, it processes every game in the list.
For each game, it iterates through all available levels, starting from the specified --level.
The results for each run and level are saved as individual JSON files in the llm_agent_results/<model_name>/ directory.

Generating Analysis

After running the agent, you can generate heatmaps and other analysis by running:

python llm_agent_results/analysis/result_analysis.py

This will save the generated images in the llm_agent_results/analysis/ directory.

Custom Environment Integration

To integrate this system with other environments, you need to implement the following functions:

Convert environment observations to ASCII representation
Convert agent actions to environment-acceptable format
Process environment feedback and update the agent

Refer to the implementation in jax_sokoban_agent.py, especially the following methods:

_observation_to_ascii
_action_to_env_action
run_episode

Notes

Ensure environment variables are correctly configured, especially API keys
For large game states, you may need to adjust the LLM token limit

Citing this work

omitted for anonymity

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 350 Commits
conf		conf
custom_games		custom_games
gifs		gifs
gymnax/environments		gymnax/environments
js_humandata		js_humandata
llm_agent_results		llm_agent_results
misc		misc
purejaxrl		purejaxrl
puzzlejax		puzzlejax
script_doctor		script_doctor
src		src
standalone		standalone
utils_misc		utils_misc
.gitignore		.gitignore
LICENSE		LICENSE
LLM_agent.py		LLM_agent.py
README.md		README.md
auto_play_and_record.py		auto_play_and_record.py
bfs.py		bfs.py
categorical.py		categorical.py
client.py		client.py
collect_games.py		collect_games.py
compile.js		compile.js
enjoy.py		enjoy.py
env_render.py		env_render.py
env_wrappers.py		env_wrappers.py
gen_tree.py		gen_tree.py
generate_json_from_logs.py		generate_json_from_logs.py
globals.py		globals.py
human_env.py		human_env.py
jax_utils.py		jax_utils.py
llm_agent_loop.py		llm_agent_loop.py
models.py		models.py
parse_games.py		parse_games.py
plot_rand_profile.py		plot_rand_profile.py
plot_rl_eval_results.py		plot_rl_eval_results.py
plot_search_results.py		plot_search_results.py
preprocess_games.py		preprocess_games.py
profile_nodejs.py		profile_nodejs.py
profile_rand_jax.py		profile_rand_jax.py
ps_game.py		ps_game.py
random_agent.py		random_agent.py
render_js_sols.py		render_js_sols.py
requirements.txt		requirements.txt
server.py		server.py
setup.py		setup.py
sort_games_by_n_rules.py		sort_games_by_n_rules.py
sweep_rl.py		sweep_rl.py
syntax.lark		syntax.lark
train.py		train.py
utils_rl.py		utils_rl.py
validate_sols.py		validate_sols.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

PuzzleJAX

Setup

Collecting and parsing data

Interactive playtesting

Validating the jax engine

Profiling the speed of random actions

Reinforcement learning

LLM player agent

Basic Usage

Command-line options

Execution flow

Generating Analysis

Custom Environment Integration

Notes

Citing this work

License

About

Uh oh!

Releases

Packages

Contributors 5

Uh oh!

Languages

Uh oh!

License

Uh oh!

smearle/script-doctor

Folders and files

Latest commit

History

Repository files navigation

PuzzleJAX

Setup

Collecting and parsing data

Interactive playtesting

Validating the jax engine

Profiling the speed of random actions

Reinforcement learning

LLM player agent

Basic Usage

Command-line options

Execution flow

Generating Analysis

Custom Environment Integration

Notes

Citing this work

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Uh oh!

Languages

Packages