OriGene: A Self-Evolving Virtual Disease Biologist Automating Therapeutic Target Discovery

Important: This is the OriGene, an open‑source, self-evolving multi-agent system that acts as a virtual disease biologist. We also introduce the TRQA Benchmark — a benchmark of 1,921 expert-level questions for evaluating biomedical AI agents. The product was launched at the 2025 WAIC!

What's New -- Aug 2025

1.Public online Launch – OriGene is now live and available to try at https://origene.lglab.ac.cn/.

2.Open‑Source Release – The entire OriGene codebase and benchmark is now available, Fork away!

3.Officially presented at the 2025 World Artificial Intelligence Conference (WAIC).

1. OriGene Overview

Therapeutic target discovery remains one of the most critical yet intuition-driven stages in drug development. We present OriGene, a self-evolving multi-agent system that functions as a virtual disease biologist to identify and prioritize therapeutic targets at scale.

2. Demo

3. Getting Started

Deploy the MCP Server
OriGene relies on the MCP Server, which aggregates more than 600 bioinformatics tools. Follow the guidelines in Origene MCP to deploy the MCP service and record the server endpoint (for example, http://127.0.0.1:8788).
Configure OriGene
Edit src/local_deep_research/_settings/.secrets.toml and fill in the MCP server URL together with your LLM API keys. Because OriGene is model-agnostic, you can freely switch between different base models or customize additional settings.

[mcp]          
server_url = "Enter your mcp url"

[embedding]          
api_key = "Enter your api key (match url : https://api.siliconflow.cn/v1/embeddings)"
cache   = "embedding_cache.pkl"

[template]
api_base = "https://ark.cn-beijing.volces.com/api/v3"
api_key  = "Enter your api key"


[openai]             
api_base = "https://api.openai-proxy.org/v1"
api_key  = "Enter your api key"

[deepseek]          
api_base = "https://api.deepseek.com"
api_key  = "Enter your api key"

Quick Start

Install dependencies

cd src
uv sync

Activate the virtual environment

source ./.venv/bin/activate

(Optional) Add the project root to PYTHONPATH

export PYTHONPATH=$(pwd):$PYTHONPATH

Command-Line Usage

Launch the interactive assistant:

uv run -m local_deep_research.main

You will see a prompt similar to the following:

Welcome to the Advanced Research System
Type 'quit' to exit

Select output type:
1) Analysis (few minutes, answers questions, summarizes findings)
2) Detailed Report (more time, generates a comprehensive report with deep analysis)
Enter number (1 or 2):

After selecting an output type, enter your research query and OriGene will return the results.

Benchmark: Running and Scoring

Run the benchmark to generate agent answers (you can use either command):

# From the project root (this directory), after activating the venv
uv run -m local_deep_research.evaluate_local

# Or using python
python -m local_deep_research.evaluate_local

Then score the generated results (replace paths if you changed dataset/output names):

# Example: score TRQA-lit-choice core set results
python local_deep_research/score_evaluation_results.py \
  --agent_results benchmark/TRQA_lit_choice/agent_answers_test.txt \
  --original_data benchmark/TRQA_lit_choice/TRQA-lit-choice-172-coreset.csv \
  --model_name "OriAgent"

# Or using uv
uv run -m local_deep_research.score_evaluation_results \
  --agent_results benchmark/TRQA_lit_choice/agent_answers_test.txt \
  --original_data benchmark/TRQA_lit_choice/TRQA-lit-choice-172-coreset.csv \
  --model_name "OriAgent"

4. TRQA Benchmark Description

To evaluate performance, we constructed TRQA, a benchmark of 1,921 questions specific to therapeutic target identification tasks across multiple disease areas.

Target Research-related Question Answering (TRQA) is a comprehensive evaluation benchmark designed to assess the capabilities of OriGene and similar systems in biomedical reasoning and target discovery.

TRQA evaluates core competencies including:

Scientific planning
Information retrieval
Tool selection
Reasoning toward biological conclusions
Critical self-evolution

It spans domains such as fundamental biology, disease biology, pharmacology, and clinical medicine, integrating both scientific literature and real-world data from drug development pipelines and clinical trials.

TRQA includes two subsets:

TRQA-lit: Focuses on recent research findings. Includes 172 multiple-choice questions (for rapid model/human comparison) and 1,108 short-answer questions covering key biomedical areas.
TRQA-db: Centers on competitive landscape analysis. Includes 641 short-answer questions that evaluate the ability to retrieve, integrate, and reason over data related to drug R&D and clinical trials.

5. Evaluation Results

Target Research-related Question Answering (TRQA) benchmark leader board

Method	TRQA-lit Choice (Core Set)	TRQA-lit Short-Answer	TRQA-db
OriGene	0.601	0.826	0.721
o3-mini	0.578	0.720	0.487
Claude-3.7-Sonnet	0.558	0.695	0.504
DeepSeek-R1	0.548	0.714	0.446
DeepSeek-V3	0.541	0.768	0.466
GPT-4o-search	0.531	0.651	0.493
Gemini-2.5-pro	0.529	0.678	0.359
GPT-4o	0.512	0.696	0.392
TxAgent	0.190	0.472	0.426
Human Group 3 (PhD + 3-5 year exp.)	0.523	✗	✗
Human Group 2 (PhD + 1-3 year exp.)	0.378	✗	✗
Human Group 1 (senior PhD candidates)	0.215	✗	✗

6. Tools Sets

OriGeneTools integrates over 600 tools to support target discovery and biomedical reasoning.

On the left, tools are grouped by multi-omics domians (e.g., genomics, transcriptomics, proteomics, phenomics, clinical evidence), highlighting OriGene’s ability to process biological data across scales.
On the right, the same tools are reorganized by biomedical knowledge domains: fundamental biology, disease biology, pharmacology, and competitive landscape, reflecting how OriGene supports expert-level reasoning across diverse therapeutic tasks.

7. Citing OriGene

Any publication that discloses findings arising from using this source code, the model parameters or outputs produced by those should cite:

@article{origene,
    title={{OriGene}: A Self-Evolving Virtual Disease Biologist Automating Therapeutic Target Discovery},
    author={Zhang, Zhongyue and Qiu, Zijie and Wu, Yingcheng and Li, Shuya and Wang, Dingyan and Zhou, Zhuomin and An, Duo and Chen, Yuhan and Li, Yu and Wang, Yongbo and Ou, Chubin and Wang, Zichen and Chen, Jack Xiaoyu and Zhang, Bo and Hu, Yusong and Zhang, Wenxin and Wei, Zhijian and Ma, Runze and Liu, Qingwu and Dong, Bo and He, Yuexi and Feng, Qiantai and Bai, Lei and Gao, Qiang and Sun, Siqi and Zheng, Shuangjia},
    journal={bioRxiv},
    year={2025},
    publisher={Cold Spring Harbor Laboratory}
}

8. License

This code repository is licensed under the Creative Commons Attribution-Non-Commercial ShareAlike International License, Version 4.0 (CC-BY-NC-SA 4.0) (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at https://github.com/GENTEL-lab/OriGene/blob/main/LICENSE.

9. Contact

If you have any questions, please raise an issue or contact us at [email protected] or [email protected].

10. Acknowledgements

Thanks to DeepSeek, ChatGPT, Claude, and Gemini for providing powerful language models that made this project possible.

Special thanks to the human experts who assisted us in benchmarking and evaluating the agent's performance!

Name		Name	Last commit message	Last commit date
Latest commit History 89 Commits
TRQA_benchmark		TRQA_benchmark
assets		assets
src		src
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

OriGene: A Self-Evolving Virtual Disease Biologist Automating Therapeutic Target Discovery

What's New -- Aug 2025

1. OriGene Overview

2. Demo

3. Getting Started

Quick Start

Command-Line Usage

Benchmark: Running and Scoring

4. TRQA Benchmark Description

5. Evaluation Results

6. Tools Sets

7. Citing OriGene

8. License

9. Contact

10. Acknowledgements

About

Uh oh!

Releases

Packages

Contributors 8

Uh oh!

Languages

License

GENTEL-lab/OriGene

Folders and files

Latest commit

History

Repository files navigation

OriGene: A Self-Evolving Virtual Disease Biologist Automating Therapeutic Target Discovery

What's New -- Aug 2025

1. OriGene Overview

2. Demo

3. Getting Started

Quick Start

Command-Line Usage

Benchmark: Running and Scoring

4. TRQA Benchmark Description

5. Evaluation Results

6. Tools Sets

7. Citing OriGene

8. License

9. Contact

10. Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 8

Uh oh!

Languages

Packages