This repository contains a modified version of Windows Agent Arena (WAA) 🪟 , a scalable Windows AI agent platform for testing and benchmarking multi-modal, desktop AI agents. This modified version focuses on integration with UFO, a UI-Focused Agent for Windows OS Interaction.
We highly recommend you have a look at the deployment guide from the original WindowsAgentArena repository. Our guide here assumes you are familiar with the deployment process of the original repository. The following steps will help you set up the environment for running the UFO agent in the Windows Agent Arena.
Clone the repository
git clone https://github.com/nice-mee/WindowsAgentArena.gitNote: If you want to run OSWorld cases, checkout the
2020-qqtcg/devbranch.git checkout 2020-qqtcg/dev
Create a config.json file in the root of WAA repo, the API key here doesn't matter, since UFO will only use the key from its own config file.
{
"OPENAI_API_KEY": "placeholder"
}Next, build the WinArena image locally:
cd scripts
chmod +x build-container-image.sh # (if required)
chmod +x prepare-agents.sh # (if required)
./build-container-image.sh --build-base-image trueThis will create the windowsarena/winarena:latest image with the latest code from the src directory.
You should first configure UFO with ufo/config/config.json (refer to UFO repo for details). Then copy the entire ufo folder to WindowsAgentArena/src/win-arena-container/client/.
cp -r src/win-arena-container/vm/setup/mm_agents/UFO/ufo src/win-arena-container/client/Remember to swap the order of @staticmethod and @functools.lru_cache() in src/win-arena-container/client/ufo/llm/openai.py, this is actually due to a bug in Python 3.9 and unfortunately WAA uses Python 3.9 instead of higher versions (UFO uses Python 3.10).
- Visit Microsoft Evaluation Center, accept the Terms of Service, and download a Windows 11 Enterprise Evaluation (90-day trial, English, United States) ISO file [~6GB]
- After downloading, rename the file to
setup.isoand copy it to the directoryWindowsAgentArena/src/win-arena-container/vm/image
Before running the arena, you need to prepare a new WAA snapshot (also referred as WAA golden image). This 30GB snapshot represents a fully functional Windows 11 VM with all the programs needed to run the benchmark. This VM additionally hosts a Python server which receives and executes agent commands. To learn more about the components at play, see our local and cloud components diagrams.
To prepare the gold snapshot, run once:
cd ./scripts
./run-local.sh --mode dev --prepare-image truePlease do not interfere with the VM while it is being prepared. It will automatically shut down when the provisioning process is complete.
You will find the 30GB WAA golden image in WindowsAgentArena/src/win-arena-container/vm/storage.
Start the initial run with this command:
./run-local.sh --mode dev --json-name "evaluation_examples_windows/test_custom.json" --agent UFO --agent-settings '{"llm_type": "azure", "llm_endpoint": "https://cloudgpt-openai.azure-api.net/openai/deployments/gpt-4o-20240513/chat/completions?api-version=2024-04-01-preview", "llm_auth": {"type": "api-key", "token": ""}}'After booting up, wait until the device code prompt shows up, then do not enter the device code. This will block the WAA server forever as long as you don't enter the device code.
Instead, visit localhost:8006 and control the WAA Windows, do the following things:
- Disable Windows Firewall.
- Open Google Chrome and complete the initial setup.
- Open VLC and complete the initial setup.
After completing these steps, kill the WAA client, then copy the "golden" image under storage folder to somewhere else.
Before an experiment run, do the following things:
- Replace image with previously obtained golden image
- Delete the UFO logs
Then run this command:
./run-local.sh --mode dev --json-name "evaluation_examples_windows/test_full.json" --agent UFO --agent-settings '{"llm_type": "azure", "llm_endpoint": "https://cloudgpt-openai.azure-api.net/openai/deployments/gpt-4o-20240513/chat/completions?api-version=2024-04-01-preview", "llm_auth": {"type": "api-key", "token": ""}}You probably will use a different LLM type or endpoint, so make sure to change the --agent-settings parameter accordingly.
Note:
test_full.jsoncontains all the test cases where UIA works,test_all.jsoncontains all the test cases, even if UIA doesn't work. So please usetest_full.jsonif OmniParser is not used.