This repository provides code for Werewolf Arena - a framework for evaluating the social reasoning skills of large language models (LLMs) through the game of Werewolf.
You only need to do this once.
python3 -m venv ./venv
source ./venv/bin/activate
pip install -r requirements.txt
export OPENAI_API_KEY=<your api key>
The program will read from this environment variable.
- Install the gcloud cli
- Authenticate and set your GCP project
- Create the application default credentials by running
gcloud auth application-default login
python3 main.py --run --v_models=pro1.5 --w_models=gpt4
python3 main.py --eval --num_games=5 --v_models=pro1.5,flash --w_models=gpt4,gpt4o
python3 main.py --resume
The games to be resumed are currently hardcoded in runner.py, and
is defined as a list of directories where their states are saved.
Once a game is completed, you can use the interactive viewer to explore the gamelog. You can see players' private reasoning, bids, votes and prompts.
npm inpm run start- Open the browser, e.g.
http://localhost:8080/?session_id=session_20240610_084702