A Python package for LLM-based prior elicitation in Bayesian metabolomics models.
apriomics provides tools for generating informative priors for Bayesian metabolomics analysis using Large Language Models (LLMs). The package enables:
- LLM-based prior elicitation: Query LLMs to predict metabolite effect sizes and uncertainties for different experimental conditions
- Dual approach support: Both categorical mapping (qualitative predictions → numerical priors) and direct numerical estimation
- Database-enhanced predictions: Optional integration with HMDB (Human Metabolome Database) for context-aware predictions
The package focuses on prior generation for metabolomics studies, leveraging LLM knowledge for informed Bayesian inference.
# Install directly from GitHub
pip install git+https://github.com/chi-raag/apriomics.git- Python >= 3.13
- OpenAI API key (for LLM-based priors)
- Core dependencies: pandas, numpy, requests, openai, pymc
from apriomics.priors.base import get_llm_priors, get_llm_quantitative_priors
from apriomics.priors.base import PriorData
# Define metabolites and experimental condition
metabolites = ["glucose", "lactate", "acetoacetate", "pyruvate"]
condition = "type 2 diabetes vs healthy controls"
# Create PriorData object
priors = PriorData(metabolites=metabolites)
# Option 1: Categorical mapping approach (qualitative → numerical)
llm_priors = get_llm_priors(
priors=priors,
condition=condition,
model_name="gpt-4o-2024-08-06",
use_hmdb_context=True
)
# Option 2: Direct numerical estimation
quantitative_priors = get_llm_quantitative_priors(
priors=priors,
condition=condition,
model_name="gpt-4o-2024-08-06",
use_hmdb_context=True
)
# Access prior parameters
for metabolite in metabolites:
mu = llm_priors[metabolite]['mu'] # Prior mean
sigma = llm_priors[metabolite]['sigma'] # Prior std
print(f"{metabolite}: μ={mu:.3f}, σ={sigma:.3f}")The generated priors can be used directly in Bayesian models:
import pymc as pm
with pm.Model() as model:
# Use LLM-informed priors
for i, metabolite in enumerate(metabolites):
pm.Normal(f"beta_{metabolite}",
mu=llm_priors[metabolite]['mu'],
sigma=llm_priors[metabolite]['sigma'])The examples directory contains implementations showing how to use LLM-generated priors in Bayesian metabolomics models:
- Gaussian Process Example: Demonstrates using LLM priors in a GP regression model for metabolomics data
Run the example with:
uv run python examples/gp_example.pyget_llm_priors(): Categorical mapping approach (qualitative → numerical priors)get_llm_quantitative_priors(): Direct numerical prior elicitationPriorData: Data structure for metabolite information- HMDB integration utilities for enhanced context
- LLM Query: Ask LLM for qualitative predictions (increase/decrease, small/moderate/large, confidence)
- Numerical Mapping: Convert categorical responses to numerical prior parameters
- Prior Generation: Use mapped values as μ and σ in Normal priors
- LLM Query: Directly ask LLM for numerical estimates (mean log fold change, uncertainty)
- Prior Generation: Use LLM outputs directly as Normal prior parameters
- Query Human Metabolome Database for additional metabolite context
- Enhance LLM predictions with biochemical pathway information
MIT License