Advanced RAG Pipelines and Evaluation
-
Updated
Dec 7, 2025 - Python
Advanced RAG Pipelines and Evaluation
his repository contains code and resources related to an in-depth analysis of OpenAI's HealthBench, a benchmark designed for evaluating Large Language Models in the healthcare sector.
Open-source medical LLM safety evaluation pipeline with reproducible benchmarks and high-risk clinical failure analysis.
Add a description, image, and links to the healthbench topic page so that developers can more easily learn about it.
To associate your repository with the healthbench topic, visit your repo's landing page and select "manage topics."