A fully static React + TypeScript app for evaluating AI agents against turn‑based games. I made it because I wanted to see LLMs battle each other. Turns out they suck. Also made two games: chess and tic-tac toe. This should work wth OpenAI, Anthropic, Ollama, or your custom LLM. I only tested Deepsek and Anthropic.
Main idea is: you implement the interface, and if it satisfies the interface, it should work. You should provide the prompts for the LLM aswell. See the metadata.json files on public folder for examples.
If you want to try Chess or TicTacToe, they are in the public/ folder.
This loads user‑provided WebAssembly games that implement a minimal standard interface.
Required exports (C/wasm-bindgen style names shown as snake_case):
- get_initial_state() -> char* // JSON string of initial game state
- get_valid_moves() -> char* // JSON array of strings
- apply_move(move_ptr: char*) -> char*
- is_game_over() -> i32 // 1 or 0
- get_winner() -> char* // "player1" | "player2" | "draw"
- render() -> char* // optional pretty render string
Optional exports:
- get_game_name() -> char*
- get_current_player() -> char*
- get_game_description() -> char*
- get_move_notation(move_ptr: char*) -> char*
JSON formats:
- State: free‑form per game, but must be a valid JSON string. Example: {"board":"...","current_player":"player1","move_count":0}
- Moves: array of strings. Example: ["e2e4","g1f3"] or ["up","down","left","right"].
- Winner: one of "player1", "player2", "draw", or empty/null while in‑progress.
Minimal metadata (supplied alongside WASM at upload time):
- name: string (required)
- Optional: description, gameType, tags, author, version, difficulty, aiPrompts
- Define your game logic and implement the exports above.
- Each exported function returns a pointer to a null‑terminated UTF‑8 string allocated in WASM memory.
- Provide a way to read input strings (e.g., apply_move receives a pointer to a C string).
- Build using wasm-pack with target web.
Example (very abbreviated Rust):
#[no_mangle]
pub extern "C" fn get_initial_state() -> *mut c_char { c_string("{\"current_player\":\"player1\"}") }
#[no_mangle]
pub extern "C" fn get_valid_moves() -> *mut c_char { c_string("[\"a\",\"b\"]") }
#[no_mangle]
pub extern "C" fn apply_move(ptr: *const c_char) -> *mut c_char { /* update state */ c_string("{...}") }
#[no_mangle]
pub extern "C" fn is_game_over() -> i32 { 0 }
#[no_mangle]
pub extern "C" fn get_winner() -> *mut c_char { c_string("") }
#[no_mangle]
pub extern "C" fn render() -> *mut c_char { c_string("ASCII board text") }
- Open the app → Upload WASM
- Select your .wasm file and optional metadata.json
- The app validates exports, loads the engine, and persists it in localStorage
- Start a match from Game Selection
Troubleshooting:
- If validation fails, ensure the required exports exist and memory is exported
- Make sure all returned strings are null‑terminated and valid UTF‑8
- Keep JSON outputs small to avoid memory issues; prefer concise encodings
Visit the live application at: https://nullwiz.github.io/llm-arena/
# Clone the repository
git clone https://github.com/nullwiz/llm-arena.git
cd llm-arena
# Install dependencies
npm install
# Start development server
npm run devTo use AI opponents, you'll need API keys from:
OpenAI (GPT models)
- Visit OpenAI Platform
- Create an account and add billing information
- Generate a new API key
Anthropic (Claude models)
- Visit Anthropic Console
- Create an account and add billing information
- Generate a new API key
- Click "Settings" in the top navigation
- Go to "AI Models" tab
- Click "Add Configuration"
- Enter your API key and select a model
- Save the configuration
- Return to the main page
- Select a game (Tic-Tac-Toe or Connect Four)
- Choose player types for Player 1 and Player 2
- Click "Start Game"
- Human player vs LLM agent
- Human player vs Rule-based AI
- Perfect for testing strategies against different AI types
- LLM agent vs LLM agent (watch AIs battle each other)
- LLM agent vs Rule-based AI
- Great for observing AI behavior and strategies
- Human vs Human
- Take turns on the same device
- Human vs Empty slot
- Practice mode or puzzle solving
Models are pretty stupid when it comes to turn-based games. You might waste some tokens.