Note: This report outlines reproduction steps, environment details, and issues discovered when forking and running the Camel-AI OWL project using our Self-Evolving Workflow (SEW).
OWL is an open-source agentic framework developed by Camel-AI, designed to enable general-purpose multi-agent collaboration in solving real-world tasks. It provides a flexible and extensible platform for building, coordinating, and deploying multiple intelligent agents across a wide range of practical applications.
We optimized the OWL role-playing framework using our proposed SEWOptimizer, with a primary focus on improving the prompts within the framework. We report the performance of the original and optimized prompts on the full GAIA validation set in the following figure.
The results indicate that our optimized prompts improve the performance by 20% on average, with noticeable improvements on tasks from all three levels of the GAIA benchmark.
In our experiments, we leveraged the OpenAI o1 model to optimize the prompts, and used gpt-4o to run the model during evaluation. The total investment for this optimization process was approximately 90$ and the cost of running the model with gpt-4o for validation is around 40$. These results indicate that our optimization process is cost-effective and can achieve remarkable performance improvements.
- Reasons: We choose OWL since it is claimed to be ranked #1 among open-source frameworks on the GAIA benchmark!
We made the following modifications to the original framework:
- We optimized the prompts within the OWL framework using our proposed SEWOptimizer. In our experiments, we randomly sampled 25 questions from the GAIA validation set and used them as a validation subset for optimization. These optimized prompts can be found in the 'prompt_process' folder.
- We change the running script into Multi-threading to speed up the process.
- The code for reproducing the best variant of owl i.e., workforce workflow is missing (check more details here).
- The code for the second-best variant of owl i.e., role-palying workflow is not reproducible (check more details here).
# Clone github repo
git clone https://github.com/camel-ai/owl.git
# Change directory into project directory
cd owl
# Create a virtual environment
# For Python 3.10 (also works with 3.11, 3.12)
python3.10 -m venv .venv
# Activate the virtual environment
# For macOS/Linux
source .venv/bin/activate
# For Windows
.venv\Scripts\activate
# Install from requirements.txt
pip install -r requirements.txt --use-pep517OWL requires various API keys to interact with different services.
You can set environment variables directly in your terminal:
-
macOS/Linux (Bash/Zsh):
export OPENAI_API_KEY="your-openai-api-key-here" # Add other required API keys as needed
-
Windows (Command Prompt):
set OPENAI_API_KEY=your-openai-api-key-here
-
Windows (PowerShell):
$env:OPENAI_API_KEY = "your-openai-api-key-here"
cd owl/examples
python run_gaia_roleplaying.pyResults are saved as JSON files. The results shown in the previous figure can be found in the results folder:
To reproduce the performance of the original owl, please go to config and switch USE_SEW_PROMPT to false.