Thanks to visit codestin.com
Credit goes to github.com

Skip to content

This repository provides the source code for our ACL 2025 paper "A Joint Optimization Framework for Enhancing Efficiency of Tool Utilization in LLM Agents".

Notifications You must be signed in to change notification settings

Bingo-W/ToolOptimization

Repository files navigation

A Joint Optimization Framework for Enhancing Efficiency of Tool Utilization in LLM Agents

This repository provides the code for our ACL 2025 paper titled A Joint Optimization Framework for Enhancing Efficiency of Tool Utilization in LLM Agents, making the integration of our code contributions into other projects more accessible.

arxiv-link made-with-pytorch License: MIT


Pass Rate and Average Number of Tool Calls for Pass Query on G1-instruction between complete context and incomplete context, among three different Inference Scaling algorithms (CoT@1, CoT@5, and DFS). Our key findings: (a) incomplete context easily leads to inefficiency issues (green line v.s. purple line); (b) Our proposed optimization system can improve the efficiency of incomplete context (organe line).


Our optimization framework consists of three key components: (1) Stepwise and Pairwise Feedback Generator for Generating Verbalized Feedback on Final Response and Tool Calls. (2) Improvement Suggestion Coordinator for Generateing Separate Improvement Suggestions for Instructions and Tool Description. (3) Batch Context Refiner for Aggregating Multiple Improvement Suggestions.

Overview

You can reproduce the experiments of our paper A Joint Optimization Framework for Enhancing Efficiency of Tool Utilization in LLM Agents.

Abstract Large Language Models (LLMs) augmented with external tools have demonstrated remarkable capabilities in complex problem solving. Existing algorithms for tool utilization typically involve an LLM agent which contains instructions on using the description of the available tools to determine and call the tools required to solve the current problem. Most current algorithms utilize methods such as chain-of-thought and tree-of-thought reasoning, requiring significant computational overhead and rendering such methods impractical in real-world applications. In this work, we recognize and formalize the critical role of instructions provided in agent prompts and tool descriptions---collectively referred to as context---and show that incomplete context is one of the reasons for this computational overhead. To fill this efficiency gap, we propose an optimization framework that jointly refines both the instructions provided in the agent prompt and tool description, enhancing their interaction. Experiments on StableToolBench demonstrate that our optimized agents achieve superior efficiency while maintaining effectiveness. Our findings underscore the critical role of context optimization in improving LLM agents for tool utilization, paving the way for more responsive and cost-effective LLM agents.

0. Project Structure

.
├── README.md
├── requirements.txt
├── new_metrics                       # CAPR Evaluation
├── optimization                      # Optimization Framework
├── solvable_queries                  # Query Dataset
│   ├── agent_test_instruction        # Agent Test Set
│   ├── test_instruction              # original toolbench
│   ├── tool_test_instruction         # Tool Test Set
│   ├── training_instruction          # Training Set
├── toolbench                         # ToolBench
├── ...

1. Requirements and Installation

To install the required packages for our baseline approaches (semi-supervised approaches), you can run the following command.

  1. Set up a virtual environment:
python -m venv .venv
source .venv/bin/activate  # On macOS/Linux
# or
.venv\Scripts\activate    # On Windows
  1. Install the requirements:
pip install -r requirements.txt

2. Environment

Please follow StableToolBench to download the dataset, and deploy tool server.

Add your api key in openai_key.json, inference_chatgpt_testing.sh, inference_chatgpt_training, and run_pass_rate.sh.

3. Usage

Running on Training Set

Running on Training Set to Obtain the trials for optimization.

bash inference_chatgpt_training.sh

Running Optimization

Running our oprimization framework on trials from training set

bash run_optimization.sh

Running on Testing Set

Running on both of Tool Test Set and Agent Test Set across CoT@5 (chain-based) and DFS (tree-based).

bash inference_chatgpt_testing.sh 

Convert Answer and Evaluation

Converting to the final answer and running evaluation (i.e., Pass Rate and Cost-Aware Pass Rate) on answers.

bash run_convert_answer.sh
bash run_pass_rate.sh
bash run_capr.sh

Bugs or questions?

If you have any questions regarding the code or the paper, please feel free to reach out to Authors at [email protected]. If you experience any difficulties while using the code or need to report a bug, feel free to open an issue. We kindly ask that you provide detailed information about the problem to help us provide effective support.

Citation

@inproceedings{wu-etal-2025-joint,
    title = "A Joint Optimization Framework for Enhancing Efficiency of Tool Utilization in LLM Agents",
    author = "Wu, Bin  and
      Meij, Edgar  and
      Yilmaz, Emine",
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2025",
    year = "2025",
    address = "Vienna, Austria",
    publisher = "Association for Computational Linguistics",
    pages = "22361--22373",
    ISBN = "979-8-89176-256-5"
}

Acknowledgements

We would like to thank the authors of the following repositories for providing the codebase:

About

This repository provides the source code for our ACL 2025 paper "A Joint Optimization Framework for Enhancing Efficiency of Tool Utilization in LLM Agents".

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published