OWL Project Reproduction Report

Note: This report outlines reproduction steps, environment details, and issues discovered when forking and running the Camel-AI OWL project using our Self-Evolving Workflow (SEW).
OWL is an open-source agentic framework developed by Camel-AI, designed to enable general-purpose multi-agent collaboration in solving real-world tasks. It provides a flexible and extensible platform for building, coordinating, and deploying multiple intelligent agents across a wide range of practical applications.

1. Comparison between SEW enhanced OWL and OWL on the GAIA benchmark (validation set)

We optimized the OWL role-playing framework using our proposed SEWOptimizer, with a primary focus on improving the prompts within the framework. We report the performance of the original and optimized prompts on the full GAIA validation set in the following figure.

The results indicate that our optimized prompts improve the performance by 20% on average, with noticeable improvements on tasks from all three levels of the GAIA benchmark.

In our experiments, we leveraged the OpenAI o1 model to optimize the prompts, and used gpt-4o to run the model during evaluation. The total investment for this optimization process was approximately 90$ and the cost of running the model with gpt-4o for validation is around 40$. These results indicate that our optimization process is cost-effective and can achieve remarkable performance improvements.

2. Why OWL?

Reasons: We choose OWL since it is claimed to be ranked #1 among open-source frameworks on the GAIA benchmark!

3. What have we changed?

We made the following modifications to the original framework:

We optimized the prompts within the OWL framework using our proposed SEWOptimizer. In our experiments, we randomly sampled 25 questions from the GAIA validation set and used them as a validation subset for optimization. These optimized prompts can be found in the 'prompt_process' folder.
We change the running script into Multi-threading to speed up the process.

4. Issues

The code for reproducing the best variant of owl i.e., workforce workflow is missing (check more details here).
The code for the second-best variant of owl i.e., role-palying workflow is not reproducible (check more details here).

5. Environment

# Clone github repo
git clone https://github.com/camel-ai/owl.git

# Change directory into project directory
cd owl

# Create a virtual environment
# For Python 3.10 (also works with 3.11, 3.12)
python3.10 -m venv .venv

# Activate the virtual environment
# For macOS/Linux
source .venv/bin/activate
# For Windows
.venv\Scripts\activate

# Install from requirements.txt
pip install -r requirements.txt --use-pep517

Setup Environment Variables

OWL requires various API keys to interact with different services.

Setting Environment Variables Directly

You can set environment variables directly in your terminal:

macOS/Linux (Bash/Zsh):

export OPENAI_API_KEY="your-openai-api-key-here"
# Add other required API keys as needed

Windows (Command Prompt):

set OPENAI_API_KEY=your-openai-api-key-here

Windows (PowerShell):

$env:OPENAI_API_KEY = "your-openai-api-key-here"

4. Reproduce Results

cd owl/examples
python run_gaia_roleplaying.py

Results are saved as JSON files. The results shown in the previous figure can be found in the results folder:

To reproduce the performance of the original owl, please go to config and switch USE_SEW_PROMPT to false.

Name		Name	Last commit message	Last commit date
Latest commit History 568 Commits
.container		.container
.github/workflows		.github/workflows
assets		assets
community_usecase		community_usecase
data/gaia		data/gaia
examples		examples
extracted_files		extracted_files
licenses		licenses
owl		owl
prompt_process		prompt_process
results		results
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Emily_Midkiff_Article.pdf		Emily_Midkiff_Article.pdf
Fafnir_2014_Issue.pdf		Fafnir_2014_Issue.pdf
IPCC_2023_Report.pdf		IPCC_2023_Report.pdf
IPCC_AR6_SYR_LongerReport.pdf		IPCC_AR6_SYR_LongerReport.pdf
README.md		README.md
README_ja.md		README_ja.md
README_zh.md		README_zh.md
The-Responsibility-of-Intellectuals.pdf		The-Responsibility-of-Intellectuals.pdf
arxiv_ps_version_check.py		arxiv_ps_version_check.py
census2011final_en.pdf		census2011final_en.pdf
census2011final_en.pdf.20250426_170557.bak		census2011final_en.pdf.20250426_170557.bak
check_ps_versions.py		check_ps_versions.py
community_challenges.md		community_challenges.md
dragon_diet_paper.pdf		dragon_diet_paper.pdf
error_log.txt		error_log.txt
extract_and_search_pdf.py		extract_and_search_pdf.py
extract_docx.py		extract_docx.py
extract_docx_content.py		extract_docx_content.py
extract_pdf_text.py		extract_pdf_text.py
extract_ppt_text.py		extract_ppt_text.py
extract_pptx_text.py		extract_pptx_text.py
extract_pptx_text.py.20250426_213751.bak		extract_pptx_text.py.20250426_213751.bak
extraction_attempt.txt		extraction_attempt.txt
extraction_attempt_2.txt		extraction_attempt_2.txt
extraction_attempt_3.txt		extraction_attempt_3.txt
extraction_instructions.txt		extraction_instructions.txt
file_path.txt		file_path.txt
file_path.txt.20250426_213705.bak		file_path.txt.20250426_213705.bak
find_longest_boggle_word.py		find_longest_boggle_word.py
fractions_and_solutions.txt		fractions_and_solutions.txt
franklin_2023-05-22.pdf		franklin_2023-05-22.pdf
heaven_sent.pdf		heaven_sent.pdf
instruction.txt		instruction.txt
leaderboard.png		leaderboard.png
library_books.txt		library_books.txt
llm_related_config.py		llm_related_config.py
model_comparison.png		model_comparison.png
output.txt		output.txt
output.txt.20250426_202834.bak		output.txt.20250426_202834.bak
output.txt.20250426_203313.bak		output.txt.20250426_203313.bak
pyproject.toml		pyproject.toml
read_docx.py		read_docx.py
requirements.txt		requirements.txt
result_comparison_percentage.png		result_comparison_percentage.png
retrieve_file_instructions.txt		retrieve_file_instructions.txt
road_layout.txt		road_layout.txt
road_layout.txt.20250423_230542.bak		road_layout.txt.20250423_230542.bak
road_layout.txt.20250423_230550.bak		road_layout.txt.20250423_230550.bak
road_layout.txt.20250426_172209.bak		road_layout.txt.20250426_172209.bak
road_layout.txt.20250426_172211.bak		road_layout.txt.20250426_172211.bak
road_layout.txt.20250426_172233.bak		road_layout.txt.20250426_172233.bak
road_layout.txt.20250426_172236.bak		road_layout.txt.20250426_172236.bak
road_layout.txt.20250426_172240.bak		road_layout.txt.20250426_172240.bak
road_layout.txt.20250426_172243.bak		road_layout.txt.20250426_172243.bak
road_layout.txt.20250426_172255.bak		road_layout.txt.20250426_172255.bak
road_layout.txt.20250426_172307.bak		road_layout.txt.20250426_172307.bak
samples.tsv		samples.tsv
samples_with_index.tsv		samples_with_index.tsv
scrape_neurips_2022.py		scrape_neurips_2022.py
secret_santa_info.txt		secret_santa_info.txt
secret_santa_info.txt.20250423_235914.bak		secret_santa_info.txt.20250423_235914.bak
task_instructions.txt		task_instructions.txt
task_instructions.txt.20250423_220145.bak		task_instructions.txt.20250423_220145.bak
uv.lock		uv.lock
words_alpha.txt		words_alpha.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

OWL Project Reproduction Report

1. Comparison between SEW enhanced OWL and OWL on the GAIA benchmark (validation set)

2. Why OWL?

3. What have we changed?

4. Issues

5. Environment

Setup Environment Variables

Setting Environment Variables Directly

4. Reproduce Results

About

Uh oh!

Releases

Packages

Languages

TedSIWEILIU/owl

Folders and files

Latest commit

History

Repository files navigation

OWL Project Reproduction Report

1. Comparison between SEW enhanced OWL and OWL on the GAIA benchmark (validation set)

2. Why OWL?

3. What have we changed?

4. Issues

5. Environment

Setup Environment Variables

Setting Environment Variables Directly

4. Reproduce Results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages