Thanks to visit codestin.com
Credit goes to github.com

Skip to content

DorothyDUUU/Info-Mosaic

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

20 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

InfoMosaic Logo

πŸš€ One-Stop Automated Agent Evaluation & Tool Deployment Framework

Complete all environment configurations and tool deployments with a single command, supporting multimodal, multi-source information retrieval and evaluation

InfoMosaic Overview

InfoMosaic-Bench Overview

πŸ“š Project Overview

InfoMosaic is a comprehensive framework designed for advanced information retrieval, multi-step reasoning, and performance evaluation of large language models (LLMs). This project is based on the research paper "InfoMosaic: A Multimodal, Multi-Source Benchmark for Information Integration and Reasoning" and leverages the InfoMosaic_Bench dataset for evaluation.

The framework enables:

  • Multi-source information retrieval and integration
  • Rigorous evaluation of LLM's reasoning capabilities
  • Flexible tool usage for enhanced information acquisition
  • Parallel processing for efficient benchmarking

πŸ“Š Evaluation Results

Comparison of 14 LLM agents equipped with a web search tool on InfoMosaic-Bench, evaluated across six domains and the overall average. Metrics include Accuracy (Acc) and Pass Rate. The best overall Accuracy and Pass Rate is highlighted in bold.

Evaluation Results

InfoMosaic-Bench Evaluation Results

πŸ“ Project Structure

InfoMosaic/
β”œβ”€β”€ data/                   # Data preparation and management
β”œβ”€β”€ eval/                   # Evaluation utilities
β”œβ”€β”€ inference/              # Inference components
β”œβ”€β”€ tool_backends/          # Backend services for tools
β”‚   β”œβ”€β”€ MCP/                     # Multi-Content Protocol tools
β”‚   β”œβ”€β”€ api_proxy/               # API proxies
β”‚   β”œβ”€β”€ configs/                 # Configuration files
β”‚   └── test/                    # Test scripts for tools
β”œβ”€β”€ infer_answer.py         # Main inference script
β”œβ”€β”€ ensemble_answer.py      # Script to avoid answer leakage
β”œβ”€β”€ gen_result.py           # Result generation
β”œβ”€β”€ utils.py                # Utility functions
└── README.md               # Project documentation

πŸ”§ Installation & Setup

Prerequisites

  • Python 3.8+ or Docker
  • API keys for external services (serper, google_map, youtube, serapi)

Installation

# Clone the repository
git clone [email protected]:DorothyDUUU/Info-Mosaic.git # If applicable
cd InfoMosaic

# Install dependencies
pip install . 

πŸ”‘ API Key Configuration

For detailed instructions on API key configuration, please refer to the following README API Key and Configuration Management Guide

This document provides comprehensive information about configuring API keys for all external services used in InfoMosaic, including web search, maps, and other tools.

Starting Tool Backends

This is by far the simplest automated tool deployment repository! InfoMosaic Tool Backend Services launches MCP server based on Python sandbox, providing an extremely simple one-click deployment solution.

To enable the full functionality of the tools, please refer to the detailed deployment guide:

This document provides complete deployment steps, including Docker deployment, quick deployment scripts, service management, and testing tools. All services can be configured and started with just one command.

πŸš€ Inference & Evaluation

Data Preparation

First, prepare the dataset by combining the HuggingFace benchmark data with ground truth answers:

python data/prepare_data.py

This script will:

  1. Download the InfoMosaic_Bench dataset from HuggingFace
  2. Load the ground truth answers from data/info_mosaic_gt_answer.jsonl
  3. Combine the datasets and save to data/info_mosaic_w_gt.jsonl

Running Inference

Run the inference script to evaluate a model on the benchmark:

export OPENAI_API_BASE_URL="https://api.openai.com/v1"
export OPENAI_API_KEY="sk-..."
export SERPER_API_KEY="your_serper_api_key"
sh inference/run_infer.sh

Key arguments:

  • --model_name: Type of agent to use: agent_wo_tool (no tools), agent_w_web_tool (web tool only), or agent_w_multi_tool (multiple tools)
  • --parallel_size: Number of parallel threads for processing
  • --llm_name: Name of the LLM model to use, default is "gpt-5-mini"
  • --domain: Domain to evaluate, default is "all", optional values are 'all', 'map', 'bio', 'financial', 'web', 'video', 'multidomain'

πŸ“Š Evaluation

Evaluate the model's performance using the pass rate evaluation script:

export OPENAI_API_BASE_URL="https://api.openai.com/v1"
export OPENAI_API_KEY="sk-..."
sh eval/run_eval.sh

Key arguments:

  • --model_name: Type of agent to use: agent_wo_tool (no tools), agent_w_web_tool (web tool only), or agent_w_multi_tool (multiple tools)
  • --domain: Domain to evaluate, default is "all", optional values are 'all', 'map', 'bio', 'financial', 'web', 'video', 'multidomain'

This script will:

  1. Load the model's generated answers
  2. Use a judge LLM to evaluate the correctness of answers
  3. Calculate pass rates for sub-questions and final answers
  4. Generate detailed evaluation metrics

πŸ”„ Data Synthesis Pipeline: InfoMosaic Flow (comming soon)

Looking forward to the release of InfoMosaic Flow!

InfoMosaic Flow

πŸ“š Citation

If you use this framework or dataset in your research, please cite our paper:

@article{du2025infomosaic,
  title={InfoMosaic-Bench: Evaluating Multi-Source Information Seeking in Tool-Augmented Agents},
  author={Du, Yaxin and Zhang, Yuanshuo and Yang, Xiyuan and Zhou, Yifan and Wang, Cheng and Zou, Gongyi and Pang, Xianghe and Wang, Wenhao and Chen, Menglan and Tang, Shuo and others},
  journal={arXiv preprint arXiv:2510.02271},
  year={2025}
}

And the dataset:

@dataset{InfoMosaic_Bench,
  title = {InfoMosaic_Bench},
  author = {Dorothydu},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/datasets/Dorothydu/InfoMosaic_Bench}
}

πŸ”— Links

🀝 Contributing

Contributions to improve InfoMosaic are welcome! Please refer to the project's GitHub repository (if available) for contribution guidelines.

πŸ™ Acknowledgements

We would like to express our gratitude to the following open-source projects that have inspired and supported the development of InfoMosaic:

πŸ“ License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •