This repository provides the code for our ACL 2025 paper titled A Joint Optimization Framework for Enhancing Efficiency of Tool Utilization in LLM Agents, making the integration of our code contributions into other projects more accessible.
Pass Rate and Average Number of Tool Calls for Pass Query on G1-instruction between complete context and incomplete context, among three different Inference Scaling algorithms (CoT@1, CoT@5, and DFS). Our key findings: (a) incomplete context easily leads to inefficiency issues (green line v.s. purple line); (b) Our proposed optimization system can improve the efficiency of incomplete context (organe line).
Our optimization framework consists of three key components: (1) Stepwise and Pairwise Feedback Generator for Generating Verbalized Feedback on Final Response and Tool Calls. (2) Improvement Suggestion Coordinator for Generateing Separate Improvement Suggestions for Instructions and Tool Description. (3) Batch Context Refiner for Aggregating Multiple Improvement Suggestions.
You can reproduce the experiments of our paper A Joint Optimization Framework for Enhancing Efficiency of Tool Utilization in LLM Agents.
Abstract Large Language Models (LLMs) augmented with external tools have demonstrated remarkable capabilities in complex problem solving. Existing algorithms for tool utilization typically involve an LLM agent which contains instructions on using the description of the available tools to determine and call the tools required to solve the current problem. Most current algorithms utilize methods such as chain-of-thought and tree-of-thought reasoning, requiring significant computational overhead and rendering such methods impractical in real-world applications. In this work, we recognize and formalize the critical role of instructions provided in agent prompts and tool descriptions---collectively referred to as context---and show that incomplete context is one of the reasons for this computational overhead. To fill this efficiency gap, we propose an optimization framework that jointly refines both the instructions provided in the agent prompt and tool description, enhancing their interaction. Experiments on StableToolBench demonstrate that our optimized agents achieve superior efficiency while maintaining effectiveness. Our findings underscore the critical role of context optimization in improving LLM agents for tool utilization, paving the way for more responsive and cost-effective LLM agents.
.
├── README.md
├── requirements.txt
├── new_metrics # CAPR Evaluation
├── optimization # Optimization Framework
├── solvable_queries # Query Dataset
│ ├── agent_test_instruction # Agent Test Set
│ ├── test_instruction # original toolbench
│ ├── tool_test_instruction # Tool Test Set
│ ├── training_instruction # Training Set
├── toolbench # ToolBench
├── ...
To install the required packages for our baseline approaches (semi-supervised approaches), you can run the following command.
- Set up a virtual environment:
python -m venv .venv
source .venv/bin/activate # On macOS/Linux
# or
.venv\Scripts\activate # On Windows- Install the requirements:
pip install -r requirements.txtPlease follow StableToolBench to download the dataset, and deploy tool server.
Add your api key in openai_key.json, inference_chatgpt_testing.sh, inference_chatgpt_training, and run_pass_rate.sh.
Running on Training Set to Obtain the trials for optimization.
bash inference_chatgpt_training.shRunning our oprimization framework on trials from training set
bash run_optimization.shRunning on both of Tool Test Set and Agent Test Set across CoT@5 (chain-based) and DFS (tree-based).
bash inference_chatgpt_testing.sh Converting to the final answer and running evaluation (i.e., Pass Rate and Cost-Aware Pass Rate) on answers.
bash run_convert_answer.sh
bash run_pass_rate.sh
bash run_capr.shIf you have any questions regarding the code or the paper, please feel free to reach out to Authors at [email protected]. If you experience any difficulties while using the code or need to report a bug, feel free to open an issue. We kindly ask that you provide detailed information about the problem to help us provide effective support.
@inproceedings{wu-etal-2025-joint,
title = "A Joint Optimization Framework for Enhancing Efficiency of Tool Utilization in LLM Agents",
author = "Wu, Bin and
Meij, Edgar and
Yilmaz, Emine",
booktitle = "Findings of the Association for Computational Linguistics: ACL 2025",
year = "2025",
address = "Vienna, Austria",
publisher = "Association for Computational Linguistics",
pages = "22361--22373",
ISBN = "979-8-89176-256-5"
}
We would like to thank the authors of the following repositories for providing the codebase: