Thanks to visit codestin.com
Credit goes to github.com

Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
f7dfe35
add class
nkaenzig Oct 7, 2025
b0936b6
minor update to prompt text
nkaenzig Oct 7, 2025
6727011
updated PubMedQA dataset class to use new JSON answer template & adde…
nkaenzig Oct 7, 2025
b5a1a26
enable json prompts for pcam
nkaenzig Oct 8, 2025
b40888f
add unit tests for JsonMultipleChoicePromptTemplate
nkaenzig Oct 8, 2025
8ecd267
update unit tests for JsonMultipleChoicePromptTemplate
nkaenzig Oct 8, 2025
a2297dd
add unit tests for ExtractAnswerFromJson
nkaenzig Oct 8, 2025
031c2ff
remove num_samples
nkaenzig Oct 8, 2025
3434f39
implement missing_limit
nkaenzig Oct 8, 2025
68d0864
Merge remote-tracking branch 'origin/main' into 904-add-json-answer-t…
nkaenzig Oct 8, 2025
d05a349
fix unit tests
nkaenzig Oct 8, 2025
cfd380a
make prompt template configurable & provide default template
nkaenzig Oct 8, 2025
383d169
fix single word
nkaenzig Oct 9, 2025
b7c4511
add FreeFormQuestionPromptTemplate
nkaenzig Oct 10, 2025
c666685
resolved merge conflicts
nkaenzig Oct 10, 2025
baa14f9
delete dataset again
nkaenzig Oct 10, 2025
846f80d
move pubmedqa to multiple_choice folder
nkaenzig Oct 10, 2025
d4d4a31
add QuiltVQA dataset class
nkaenzig Oct 10, 2025
49a1c9f
renamed to test split
nkaenzig Oct 10, 2025
c725786
add unit tests & docs
nkaenzig Oct 10, 2025
a239a99
move FreeFormQuestionPromptTemplate to raw subfolder
nkaenzig Oct 10, 2025
740f5c0
add example enumeration and remove multiple blank lines
nkaenzig Oct 10, 2025
6ae15ed
mrege main
nkaenzig Oct 13, 2025
8fe601e
enable cot & moved bullet point formatting function into utils
nkaenzig Oct 13, 2025
1f88c07
add enable_cot option to RawMultipleChoicePromptTemplate
nkaenzig Oct 13, 2025
908fb9f
add enable_cot to render function
nkaenzig Oct 13, 2025
d611d0d
fix tests
nkaenzig Oct 13, 2025
a2ac76c
fixed pyright issues
nkaenzig Oct 13, 2025
4e89f1d
remove <think> tokens from prompts
nkaenzig Oct 15, 2025
6b6e410
rename format_as_bullet_points to format_list_items
nkaenzig Oct 15, 2025
8b1d69c
rename options to items
nkaenzig Oct 15, 2025
2ccf6f0
remove context from qa pairs
nkaenzig Oct 15, 2025
77494f2
fix tests
nkaenzig Oct 15, 2025
e75d6e4
remove file commited by accident
nkaenzig Oct 15, 2025
9747fce
use new line to join the example reason & answer
nkaenzig Oct 15, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 8 additions & 1 deletion docs/datasets/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,4 +44,11 @@

| Dataset | #Samples | Task | Domain | Download provided |
|---------|----------|------|---------|-------------------|
| [PubMedQA](pubmedqa.md) | 1,000 | Classification (3 classes) | Biomedical Q&A | Yes |
| [PubMedQA](pubmedqa.md) | 1,000 | Multiple Choice | Biomedical Q&A | Yes |


## Multimodal Datasets Overview
| Dataset | #Samples | Modality | Task | Domain | Download provided |
|---------|----------|----------|------|--------|-------------------|
| [PatchCamelyon](patch_camelyon.md) | 500 | Image + Text | Multiple Choice | Breast Cancer | Yes |
| [QuiltVQA](quilt_vqa.md) | 985 | Image + Text | Free-form VQA | Histopathology | Yes |
32 changes: 32 additions & 0 deletions docs/datasets/quilt_vqa.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Quilt_VQA

Quilt_VQA is a histopathology visual question answering dataset released with Quilt-LLaVA for evaluating multimodal models on realistic pathology questions. It pairs microscopy frames with naturally occurring questions and answers that were mined from expert-narrated videos and refined with GPT-4 plus manual review.

## Raw data

### Key stats
| Modality | Task | Domain | Sample Size | Question Format | License |
|----------|------|--------|-------------|-----------------|---------|
| Image + Text | Visual Question Answering (free-form) | Histopathology (medical) | 985 evaluation samples | Mix of closed-ended and open-ended questions with short textual answers | CC-BY-NC-ND-3.0 |

### Data organization
- Hugging Face exposes a single `default` configuration with 985 examples stored under a `train` split (eva treats this as the evaluation/test split).
- Each record provides an `image`, `question`, free-form `answer`, categorical `answer_type` (e.g., closed vs. open response), and a short textual `context` snippet from the source narration.
- The repository also packages the original Parquet export (`data/train-*.parquet`) alongside helper files (`quilt_vqa.zip`, `quiltvqa_test_w_ans.json`, `quiltvqa_test_wo_ans.jsonl`) that separate the open and closed subsets used by the Quilt benchmark.

## Download and preprocessing

Quilt_VQA is gated. Accept the terms on the [Hugging Face dataset page](https://huggingface.co/datasets/wisdomik/Quilt_VQA) and generate a user access token before triggering automated downloads.

Once access is granted, set `DOWNLOAD_DATA="true"` (and optionally `DATA_ROOT` for the cache directory) when launching eva with a configuration that references `QuiltVQA`. Provide your Hugging Face token via `HF_TOKEN` so the downloader can authenticate.

## Relevant links

- **Project**: [Quilt-LLaVA](https://quilt-llava.github.io/)
- **Dataset card (Hugging Face)**: https://huggingface.co/datasets/wisdomik/Quilt_VQA
- **Companion dataset**: [Quilt-1M](https://quilt1m.github.io/)
- **Paper**: [Quilt-LLaVA: Visual Instruction Tuning by Extracting Localized Narratives from Open-Source Histopathology Videos](https://arxiv.org/abs/2312.04746)

## License information

Distributed under the [CC-BY-NC-ND 3.0](https://creativecommons.org/licenses/by-nc-nd/3.0/) license. Access is limited to non-commercial research use as outlined in the Hugging Face gated download agreement.
1 change: 1 addition & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,7 @@ Supported datasets & tasks include:
*Multimodal datasets*

- **[PatchCamelyon (image-text)](datasets/patch_camelyon.md)**: Vision-language benchmark variation for the popular vision Patch Camelyon task, where the goal is to classify breast cancer patches, using both the image and a text prompt.
- **[QuiltVQA (image-text)](datasets/quilt_vqa.md)**: Visual question answering for histopathology images


To evaluate FMs, *eva* provides support for different model-formats, including models trained with PyTorch, models available on HuggingFace and ONNX-models. For other formats custom wrappers can be implemented.
Expand Down
2 changes: 1 addition & 1 deletion src/eva/language/data/datasets/__init__.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
"""Language Datasets API."""

from eva.language.data.datasets.base import LanguageDataset
from eva.language.data.datasets.classification import PubMedQA
from eva.language.data.datasets.multiple_choice import PubMedQA
from eva.language.data.datasets.prediction import TextPredictionDataset

__all__ = [
Expand Down
7 changes: 0 additions & 7 deletions src/eva/language/data/datasets/classification/__init__.py

This file was deleted.

7 changes: 7 additions & 0 deletions src/eva/language/data/datasets/multiple_choice/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
"""Text classification datasets API."""

from eva.language.data.datasets.multiple_choice.pubmedqa import PubMedQA

__all__ = [
"PubMedQA",
]
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
from loguru import logger
from typing_extensions import override

from eva.language.data.datasets.classification import base
from eva.language.data.datasets.multiple_choice import base
from eva.language.data.messages import MessageSeries, UserMessage
from eva.language.prompts import templates
from eva.language.prompts.templates.preambles import DEFAULT_QA_PREAMBLE
Expand Down Expand Up @@ -68,7 +68,6 @@ def __init__(

self.prompt_template = prompt_template or self._default_prompt_template
self.prompt_render_kwargs = prompt_render_kwargs or self._default_render_kwargs
prompt_render_kwargs = prompt_render_kwargs or self._default_render_kwargs

def _load_dataset(self, dataset_path: str | None) -> Dataset:
"""Loads the PubMedQA dataset from the local cache or downloads it.
Expand Down
3 changes: 2 additions & 1 deletion src/eva/language/prompts/templates/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,6 @@

from eva.language.prompts.templates.base import PromptTemplate
from eva.language.prompts.templates.json import JsonMultipleChoicePromptTemplate
from eva.language.prompts.templates.raw.free_form import FreeFormQuestionPromptTemplate

__all__ = ["PromptTemplate", "JsonMultipleChoicePromptTemplate"]
__all__ = ["PromptTemplate", "JsonMultipleChoicePromptTemplate", "FreeFormQuestionPromptTemplate"]
2 changes: 1 addition & 1 deletion src/eva/language/prompts/templates/json/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
"""Prompt templating API."""
"""Prompt templates with json answer formats."""

from eva.language.prompts.templates.json.multiple_choice import JsonMultipleChoicePromptTemplate

Expand Down
26 changes: 7 additions & 19 deletions src/eva/language/prompts/templates/json/multiple_choice.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
from typing_extensions import override

from eva.language.prompts.templates import base
from eva.language.utils.text import format as format_utils


class JsonMultipleChoicePromptTemplate(base.PromptTemplate):
Expand All @@ -29,7 +30,7 @@ class JsonMultipleChoicePromptTemplate(base.PromptTemplate):
contains your answer, and "{{ reason_key }}" should contain a brief
explanation for why the provided answer was chosen.
{% if enable_cot -%}
Think step-by-step inside <think>...</think> tags before giving your final answer.
Think step-by-step before giving your final answer.
{%- endif -%}
{% if use_option_letters %}
The value for "{{ answer_key }}" must be the letter (e.g., "A", "B", "C", ...)
Expand Down Expand Up @@ -94,6 +95,7 @@ def render(
example_answer: str | None = None,
example_reason: str | None = None,
preamble: str | None = None,
enable_cot: bool | None = None,
) -> str:
"""Render the template with provided values.

Expand All @@ -104,6 +106,7 @@ def render(
example_answer: Optional example answer for the JSON snippet. Defaults to first option.
example_reason: Example reasoning string.
preamble: Optional preamble text to include at the top of the prompt.
enable_cot: Optionally override the instance's CoT setting for this render call.

Returns:
The rendered prompt string.
Expand All @@ -114,7 +117,7 @@ def render(
jinja_template = Template(self.template)
rendered = jinja_template.render(
question=question.strip(),
context=_format_context(context) if context else None,
context=format_utils.format_list_items(context) if context else None,
answer_options=_format_answer_options(
answer_options, use_option_letters=self.use_option_letters
),
Expand All @@ -128,10 +131,10 @@ def render(
example_reason=(example_reason or self._default_reason).strip(),
preamble=(preamble or "").strip(),
use_option_letters=self.use_option_letters,
enable_cot=self.enable_cot,
enable_cot=self.enable_cot if enable_cot is None else enable_cot,
)

return textwrap.dedent(rendered).strip() + "\n"
return format_utils.remove_multi_blank_lines(textwrap.dedent(rendered).strip()) + "\n"


def _format_answer_options(options: Sequence[str], use_option_letters: bool) -> str:
Expand All @@ -151,18 +154,3 @@ def _format_answer_options(options: Sequence[str], use_option_letters: bool) ->
return "\n".join(f"{letters[i]}. {opt.strip()}" for i, opt in enumerate(options))
else:
return "\n".join(f"- {opt.strip()}" for i, opt in enumerate(options))


def _format_context(context: str | Sequence[str]) -> str:
"""Formats the context for inclusion in the prompt.

Args:
context: The context string or list of context strings. If a list is provided,
the contexts will be formatted as a bullet point list.

Returns:
The formatted context string.
"""
if not isinstance(context, list):
context = [context] # type: ignore[assignment]
return "\n".join(f"- {item.strip()}" for item in context if item.strip())
3 changes: 2 additions & 1 deletion src/eva/language/prompts/templates/raw/__init__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
"""Prompt templating API."""

from eva.language.prompts.templates.raw.free_form import FreeFormQuestionPromptTemplate
from eva.language.prompts.templates.raw.multiple_choice import RawMultipleChoicePromptTemplate

__all__ = ["RawMultipleChoicePromptTemplate"]
__all__ = ["RawMultipleChoicePromptTemplate", "FreeFormQuestionPromptTemplate"]
88 changes: 88 additions & 0 deletions src/eva/language/prompts/templates/raw/free_form.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
"""Prompt templates for free-form questions."""

from __future__ import annotations

import textwrap
from typing import Sequence

from jinja2 import Template
from typing_extensions import override

from eva.language.prompts.templates import base, typings
from eva.language.utils.text import format as format_utils


class FreeFormQuestionPromptTemplate(base.PromptTemplate):
"""Prompt template for free-form questions."""

template = textwrap.dedent(
"""\
{{ preamble }}

{% if examples %}
Below are some examples:

{% for ex in examples %}
Example {{ loop.index }}:
Question: {{ ex.question }}
Answer: {{ ex.answer }}
---
{% endfor %}
Now please answer the following question.
{%- if enable_cot %}
Think step-by-step before giving your final answer.
{% endif %}

{% endif %}
Question: {{ question }}
Answer:
"""
)
"""Base template to be rendered via Jinja2."""

def __init__(self, enable_cot: bool = False) -> None:
"""Initializes the prompt template.

Args:
enable_cot: Whether to explicitly prompt the model to use reasoning/CoT for answering.
"""
super().__init__()
self.enable_cot = enable_cot

@override
def render(
self,
*,
question: str,
context: str | Sequence[str] | None = None,
examples: Sequence[typings.QuestionAnswerExample] | None = None,
preamble: str | None = None,
enable_cot: bool | None = None,
) -> str:
"""Render the template with provided values.

Args:
question: The question to ask the model.
context: Supporting context text(s) for the question.
examples: A sequence of question & answer pairs to include as examples.
Expected format is a list of dicts with 'question', 'answer', and
optional 'context' keys.
preamble: Optional preamble text to include at the top of the prompt.
enable_cot: Optionally override the instance's CoT setting for this render call.

Returns:
The rendered prompt string.
"""
if not isinstance(question, str) or not question.strip():
raise ValueError("`question` must be a non-empty string.")

jinja_template = Template(self.template)
rendered = jinja_template.render(
question=question.strip(),
context=format_utils.format_list_items(context) if context else None,
examples=examples,
preamble=(preamble or "").strip(),
enable_cot=self.enable_cot if enable_cot is None else enable_cot,
)

return format_utils.remove_multi_blank_lines(textwrap.dedent(rendered).strip()) + "\n"
56 changes: 13 additions & 43 deletions src/eva/language/prompts/templates/raw/multiple_choice.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
"""Prompt templates for multiple choice questions without strict formatting requirements."""

# ruff: noqa: E501

from __future__ import annotations

import string
Expand All @@ -10,6 +12,7 @@
from typing_extensions import override

from eva.language.prompts.templates import base
from eva.language.utils.text import format as format_utils


class RawMultipleChoicePromptTemplate(base.PromptTemplate):
Expand All @@ -27,11 +30,10 @@ class RawMultipleChoicePromptTemplate(base.PromptTemplate):
Provide a brief explanation for your choice before stating your final answer.

{%- if enable_cot %}
Think step-by-step inside <think>...</think> tags before giving your answer.
{%- endif %}
Think step-by-step before giving your final answer.
{% endif %}

IMPORTANT: You must provide your reasoning first.
Then end your response with only your final choice
IMPORTANT: You must provide your reasoning first. Then end your response with only your final choice
{%- if use_option_letters %} letter
{%- else %} exactly as written below
{%- endif %}.
Expand Down Expand Up @@ -79,6 +81,7 @@ def render(
example_answer: str | None = None,
example_reason: str | None = None,
preamble: str | None = None,
enable_cot: bool | None = None,
) -> str:
"""Render the template with provided values.

Expand All @@ -89,15 +92,16 @@ def render(
example_answer: Optional example answer. Defaults to first option.
example_reason: Example reasoning string.
preamble: Optional preamble text to include at the top of the prompt.
enable_cot: Optionally override the instance's CoT setting for this render call.

Returns:
The rendered prompt string.
"""
if not isinstance(question, str) or not question.strip():
raise ValueError("`question` must be a non-empty string.")

answer_options = _format_answer_options(
answer_options, use_option_letters=self.use_option_letters
answer_options = format_utils.format_list_items(
answer_options, style="letters" if self.use_option_letters else "bullets"
)
example_answer = (
example_answer
Expand All @@ -109,46 +113,12 @@ def render(
jinja_template = Template(self.template)
rendered = jinja_template.render(
question=question.strip(),
context=_format_context(context) if context else None,
context=(format_utils.format_list_items(context, style="bullets") if context else None),
answer_options=answer_options,
preamble=(preamble or "").strip(),
use_option_letters=self.use_option_letters,
enable_cot=self.enable_cot,
enable_cot=self.enable_cot if enable_cot is None else enable_cot,
example_response="\n".join([example_reason, example_answer]),
)

return textwrap.dedent(rendered).strip() + "\n"


def _format_answer_options(options: Sequence[str], use_option_letters: bool) -> str:
"""Format answer options for inclusion in the prompt.

Args:
options: List of answer options.
use_option_letters: Whether to prefix options with letters (A, B, C, ...).
"""
if not options or not all(isinstance(opt, str) and opt.strip() for opt in options):
raise ValueError("`answer_options` must contain at least one non-empty option.")

if use_option_letters:
letters = string.ascii_uppercase
if len(options) > len(letters):
raise ValueError(f"If using option letters, max {len(letters)} options are supported.")
return "\n".join(f"{letters[i]}. {opt.strip()}" for i, opt in enumerate(options))
else:
return "\n".join(f"- {opt.strip()}" for i, opt in enumerate(options))


def _format_context(context: str | Sequence[str]) -> str:
"""Formats the context for inclusion in the prompt.

Args:
context: The context string or list of context strings. If a list is provided,
the contexts will be formatted as a bullet point list.

Returns:
The formatted context string.
"""
if not isinstance(context, list):
context = [context] # type: ignore[assignment]
return "\n".join(f"- {item.strip()}" for item in context if item.strip())
return format_utils.remove_multi_blank_lines(textwrap.dedent(rendered).strip()) + "\n"
10 changes: 10 additions & 0 deletions src/eva/language/prompts/templates/typings.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"""Typings for prompt templates."""

from typing_extensions import TypedDict


class QuestionAnswerExample(TypedDict):
"""A question-answer example for few-shot prompting."""

question: str
answer: str
Loading