kaiko-ai · nkaenzig · Oct 15, 2025 · Oct 7, 2025 · Oct 7, 2025 · Oct 7, 2025
diff --git a/docs/datasets/index.md b/docs/datasets/index.md
@@ -44,4 +44,11 @@
 
 | Dataset | #Samples | Task | Domain | Download provided |
 |---------|----------|------|---------|-------------------|
-| [PubMedQA](pubmedqa.md) | 1,000 | Classification (3 classes) | Biomedical Q&A | Yes |
+| [PubMedQA](pubmedqa.md) | 1,000 | Multiple Choice | Biomedical Q&A | Yes |
+
+
+## Multimodal Datasets Overview
+| Dataset | #Samples | Modality | Task | Domain | Download provided |
+|---------|----------|----------|------|--------|-------------------|
+| [PatchCamelyon](patch_camelyon.md) | 500 | Image + Text | Multiple Choice | Breast Cancer | Yes |
+| [QuiltVQA](quilt_vqa.md) | 985 | Image + Text | Free-form VQA | Histopathology | Yes |
diff --git a/docs/datasets/quilt_vqa.md b/docs/datasets/quilt_vqa.md
@@ -0,0 +1,32 @@
+# Quilt_VQA
+
+Quilt_VQA is a histopathology visual question answering dataset released with Quilt-LLaVA for evaluating multimodal models on realistic pathology questions. It pairs microscopy frames with naturally occurring questions and answers that were mined from expert-narrated videos and refined with GPT-4 plus manual review.
+
+## Raw data
+
+### Key stats
+| Modality | Task | Domain | Sample Size | Question Format | License |
+|----------|------|--------|-------------|-----------------|---------|
+| Image + Text | Visual Question Answering (free-form) | Histopathology (medical) | 985 evaluation samples | Mix of closed-ended and open-ended questions with short textual answers | CC-BY-NC-ND-3.0 |
+
+### Data organization
+- Hugging Face exposes a single `default` configuration with 985 examples stored under a `train` split (eva treats this as the evaluation/test split).
+- Each record provides an `image`, `question`, free-form `answer`, categorical `answer_type` (e.g., closed vs. open response), and a short textual `context` snippet from the source narration.
+- The repository also packages the original Parquet export (`data/train-*.parquet`) alongside helper files (`quilt_vqa.zip`, `quiltvqa_test_w_ans.json`, `quiltvqa_test_wo_ans.jsonl`) that separate the open and closed subsets used by the Quilt benchmark.
+
+## Download and preprocessing
+
+Quilt_VQA is gated. Accept the terms on the [Hugging Face dataset page](https://huggingface.co/datasets/wisdomik/Quilt_VQA) and generate a user access token before triggering automated downloads.
+
+Once access is granted, set `DOWNLOAD_DATA="true"` (and optionally `DATA_ROOT` for the cache directory) when launching eva with a configuration that references `QuiltVQA`. Provide your Hugging Face token via `HF_TOKEN` so the downloader can authenticate.
+
+## Relevant links
+
+- **Project**: [Quilt-LLaVA](https://quilt-llava.github.io/)
+- **Dataset card (Hugging Face)**: https://huggingface.co/datasets/wisdomik/Quilt_VQA
+- **Companion dataset**: [Quilt-1M](https://quilt1m.github.io/)
+- **Paper**: [Quilt-LLaVA: Visual Instruction Tuning by Extracting Localized Narratives from Open-Source Histopathology Videos](https://arxiv.org/abs/2312.04746)
+
+## License information
+
+Distributed under the [CC-BY-NC-ND 3.0](https://creativecommons.org/licenses/by-nc-nd/3.0/) license. Access is limited to non-commercial research use as outlined in the Hugging Face gated download agreement.
diff --git a/docs/index.md b/docs/index.md
@@ -72,6 +72,7 @@ Supported datasets & tasks include:
 *Multimodal datasets*
 
 -	**[PatchCamelyon (image-text)](datasets/patch_camelyon.md)**: Vision-language benchmark variation for the popular vision Patch Camelyon task, where the goal is to classify breast cancer patches, using both the image and a text prompt.
+-	**[QuiltVQA (image-text)](datasets/quilt_vqa.md)**: Visual question answering for histopathology images
 
 
 To evaluate FMs, *eva* provides support for different model-formats, including models trained with PyTorch, models available on HuggingFace and ONNX-models. For other formats custom wrappers can be implemented.

diff --git a/src/eva/language/data/datasets/__init__.py b/src/eva/language/data/datasets/__init__.py
@@ -1,7 +1,7 @@
 """Language Datasets API."""
 
 from eva.language.data.datasets.base import LanguageDataset
-from eva.language.data.datasets.classification import PubMedQA
+from eva.language.data.datasets.multiple_choice import PubMedQA
 from eva.language.data.datasets.prediction import TextPredictionDataset
 
 __all__ = [

diff --git a/src/eva/language/data/datasets/classification/__init__.py b/src/eva/language/data/datasets/classification/__init__.py
diff --git a/src/eva/language/data/datasets/multiple_choice/__init__.py b/src/eva/language/data/datasets/multiple_choice/__init__.py
@@ -0,0 +1,7 @@
+"""Text classification datasets API."""
+
+from eva.language.data.datasets.multiple_choice.pubmedqa import PubMedQA
+
+__all__ = [
+    "PubMedQA",
+]
diff --git a/...uage/data/datasets/classification/base.py → ...age/data/datasets/multiple_choice/base.py b/...uage/data/datasets/classification/base.py → ...age/data/datasets/multiple_choice/base.py
diff --git a/.../data/datasets/classification/pubmedqa.py → ...data/datasets/multiple_choice/pubmedqa.py b/.../data/datasets/classification/pubmedqa.py → ...data/datasets/multiple_choice/pubmedqa.py
@@ -9,7 +9,7 @@
 from loguru import logger
 from typing_extensions import override
 
-from eva.language.data.datasets.classification import base
+from eva.language.data.datasets.multiple_choice import base
 from eva.language.data.messages import MessageSeries, UserMessage
 from eva.language.prompts import templates
 from eva.language.prompts.templates.preambles import DEFAULT_QA_PREAMBLE
@@ -68,7 +68,6 @@ def __init__(
 
         self.prompt_template = prompt_template or self._default_prompt_template
         self.prompt_render_kwargs = prompt_render_kwargs or self._default_render_kwargs
-        prompt_render_kwargs = prompt_render_kwargs or self._default_render_kwargs
 
     def _load_dataset(self, dataset_path: str | None) -> Dataset:
         """Loads the PubMedQA dataset from the local cache or downloads it.

diff --git a/src/eva/language/prompts/templates/__init__.py b/src/eva/language/prompts/templates/__init__.py
@@ -2,5 +2,6 @@
 
 from eva.language.prompts.templates.base import PromptTemplate
 from eva.language.prompts.templates.json import JsonMultipleChoicePromptTemplate
+from eva.language.prompts.templates.raw.free_form import FreeFormQuestionPromptTemplate
 
-__all__ = ["PromptTemplate", "JsonMultipleChoicePromptTemplate"]
+__all__ = ["PromptTemplate", "JsonMultipleChoicePromptTemplate", "FreeFormQuestionPromptTemplate"]
diff --git a/src/eva/language/prompts/templates/json/__init__.py b/src/eva/language/prompts/templates/json/__init__.py
@@ -1,4 +1,4 @@
-"""Prompt templating API."""
+"""Prompt templates with json answer formats."""
 
 from eva.language.prompts.templates.json.multiple_choice import JsonMultipleChoicePromptTemplate
 

diff --git a/src/eva/language/prompts/templates/json/multiple_choice.py b/src/eva/language/prompts/templates/json/multiple_choice.py
@@ -10,6 +10,7 @@
 from typing_extensions import override
 
 from eva.language.prompts.templates import base
+from eva.language.utils.text import format as format_utils
 
 
 class JsonMultipleChoicePromptTemplate(base.PromptTemplate):
@@ -29,7 +30,7 @@ class JsonMultipleChoicePromptTemplate(base.PromptTemplate):
         contains your answer, and "{{ reason_key }}" should contain a brief
         explanation for why the provided answer was chosen. 
         {% if enable_cot -%}
-        Think step-by-step inside <think>...</think> tags before giving your final answer.
+        Think step-by-step before giving your final answer.
         {%- endif -%}
         {% if use_option_letters %}
         The value for "{{ answer_key }}" must be the letter (e.g., "A", "B", "C", ...)
@@ -94,6 +95,7 @@ def render(
         example_answer: str | None = None,
         example_reason: str | None = None,
         preamble: str | None = None,
+        enable_cot: bool | None = None,
     ) -> str:
         """Render the template with provided values.
 
@@ -104,6 +106,7 @@ def render(
             example_answer: Optional example answer for the JSON snippet. Defaults to first option.
             example_reason: Example reasoning string.
             preamble: Optional preamble text to include at the top of the prompt.
+            enable_cot: Optionally override the instance's CoT setting for this render call.
 
         Returns:
             The rendered prompt string.
@@ -114,7 +117,7 @@ def render(
         jinja_template = Template(self.template)
         rendered = jinja_template.render(
             question=question.strip(),
-            context=_format_context(context) if context else None,
+            context=format_utils.format_list_items(context) if context else None,
             answer_options=_format_answer_options(
                 answer_options, use_option_letters=self.use_option_letters
             ),
@@ -128,10 +131,10 @@ def render(
             example_reason=(example_reason or self._default_reason).strip(),
             preamble=(preamble or "").strip(),
             use_option_letters=self.use_option_letters,
-            enable_cot=self.enable_cot,
+            enable_cot=self.enable_cot if enable_cot is None else enable_cot,
         )
 
-        return textwrap.dedent(rendered).strip() + "\n"
+        return format_utils.remove_multi_blank_lines(textwrap.dedent(rendered).strip()) + "\n"
 
 
 def _format_answer_options(options: Sequence[str], use_option_letters: bool) -> str:
@@ -151,18 +154,3 @@ def _format_answer_options(options: Sequence[str], use_option_letters: bool) ->
         return "\n".join(f"{letters[i]}. {opt.strip()}" for i, opt in enumerate(options))
     else:
         return "\n".join(f"- {opt.strip()}" for i, opt in enumerate(options))
-
-
-def _format_context(context: str | Sequence[str]) -> str:
-    """Formats the context for inclusion in the prompt.
-
-    Args:
-        context: The context string or list of context strings. If a list is provided,
-                 the contexts will be formatted as a bullet point list.
-
-    Returns:
-        The formatted context string.
-    """
-    if not isinstance(context, list):
-        context = [context]  # type: ignore[assignment]
-    return "\n".join(f"- {item.strip()}" for item in context if item.strip())
diff --git a/src/eva/language/prompts/templates/raw/__init__.py b/src/eva/language/prompts/templates/raw/__init__.py
@@ -1,5 +1,6 @@
 """Prompt templating API."""
 
+from eva.language.prompts.templates.raw.free_form import FreeFormQuestionPromptTemplate
 from eva.language.prompts.templates.raw.multiple_choice import RawMultipleChoicePromptTemplate
 
-__all__ = ["RawMultipleChoicePromptTemplate"]
+__all__ = ["RawMultipleChoicePromptTemplate", "FreeFormQuestionPromptTemplate"]
diff --git a/src/eva/language/prompts/templates/raw/free_form.py b/src/eva/language/prompts/templates/raw/free_form.py
@@ -0,0 +1,88 @@
+"""Prompt templates for free-form questions."""
+
+from __future__ import annotations
+
+import textwrap
+from typing import Sequence
+
+from jinja2 import Template
+from typing_extensions import override
+
+from eva.language.prompts.templates import base, typings
+from eva.language.utils.text import format as format_utils
+
+
+class FreeFormQuestionPromptTemplate(base.PromptTemplate):
+    """Prompt template for free-form questions."""
+
+    template = textwrap.dedent(
+        """\
+        {{ preamble }}
+
+        {% if examples %}
+        Below are some examples:
+
+        {% for ex in examples %}
+        Example {{ loop.index }}:
+        Question: {{ ex.question }}
+        Answer: {{ ex.answer }}
+        ---
+        {% endfor %}
+        Now please answer the following question.
+        {%- if enable_cot %}
+        Think step-by-step before giving your final answer.
+        {% endif %}
+
+        {% endif %}
+        Question: {{ question }}
+        Answer:
+        """
+    )
+    """Base template to be rendered via Jinja2."""
+
+    def __init__(self, enable_cot: bool = False) -> None:
+        """Initializes the prompt template.
+
+        Args:
+            enable_cot: Whether to explicitly prompt the model to use reasoning/CoT for answering.
+        """
+        super().__init__()
+        self.enable_cot = enable_cot
+
+    @override
+    def render(
+        self,
+        *,
+        question: str,
+        context: str | Sequence[str] | None = None,
+        examples: Sequence[typings.QuestionAnswerExample] | None = None,
+        preamble: str | None = None,
+        enable_cot: bool | None = None,
+    ) -> str:
+        """Render the template with provided values.
+
+        Args:
+            question: The question to ask the model.
+            context: Supporting context text(s) for the question.
+            examples: A sequence of question & answer pairs to include as examples.
+                Expected format is a list of dicts with 'question', 'answer', and
+                optional 'context' keys.
+            preamble: Optional preamble text to include at the top of the prompt.
+            enable_cot: Optionally override the instance's CoT setting for this render call.
+
+        Returns:
+            The rendered prompt string.
+        """
+        if not isinstance(question, str) or not question.strip():
+            raise ValueError("`question` must be a non-empty string.")
+
+        jinja_template = Template(self.template)
+        rendered = jinja_template.render(
+            question=question.strip(),
+            context=format_utils.format_list_items(context) if context else None,
+            examples=examples,
+            preamble=(preamble or "").strip(),
+            enable_cot=self.enable_cot if enable_cot is None else enable_cot,
+        )
+
+        return format_utils.remove_multi_blank_lines(textwrap.dedent(rendered).strip()) + "\n"
diff --git a/src/eva/language/prompts/templates/raw/multiple_choice.py b/src/eva/language/prompts/templates/raw/multiple_choice.py
@@ -1,5 +1,7 @@
 """Prompt templates for multiple choice questions without strict formatting requirements."""
 
+# ruff: noqa: E501
+
 from __future__ import annotations
 
 import string
@@ -10,6 +12,7 @@
 from typing_extensions import override
 
 from eva.language.prompts.templates import base
+from eva.language.utils.text import format as format_utils
 
 
 class RawMultipleChoicePromptTemplate(base.PromptTemplate):
@@ -27,11 +30,10 @@ class RawMultipleChoicePromptTemplate(base.PromptTemplate):
         Provide a brief explanation for your choice before stating your final answer.
 
         {%- if enable_cot %}
-        Think step-by-step inside <think>...</think> tags before giving your answer.
-        {%- endif %}
+        Think step-by-step before giving your final answer.
+        {% endif %}
 
-        IMPORTANT: You must provide your reasoning first.
-        Then end your response with only your final choice
+        IMPORTANT: You must provide your reasoning first. Then end your response with only your final choice
         {%- if use_option_letters %} letter
         {%- else %} exactly as written below
         {%- endif %}.
@@ -79,6 +81,7 @@ def render(
         example_answer: str | None = None,
         example_reason: str | None = None,
         preamble: str | None = None,
+        enable_cot: bool | None = None,
     ) -> str:
         """Render the template with provided values.
 
@@ -89,15 +92,16 @@ def render(
             example_answer: Optional example answer. Defaults to first option.
             example_reason: Example reasoning string.
             preamble: Optional preamble text to include at the top of the prompt.
+            enable_cot: Optionally override the instance's CoT setting for this render call.
 
         Returns:
             The rendered prompt string.
         """
         if not isinstance(question, str) or not question.strip():
             raise ValueError("`question` must be a non-empty string.")
 
-        answer_options = _format_answer_options(
-            answer_options, use_option_letters=self.use_option_letters
+        answer_options = format_utils.format_list_items(
+            answer_options, style="letters" if self.use_option_letters else "bullets"
         )
         example_answer = (
             example_answer
@@ -109,46 +113,12 @@ def render(
         jinja_template = Template(self.template)
         rendered = jinja_template.render(
             question=question.strip(),
-            context=_format_context(context) if context else None,
+            context=(format_utils.format_list_items(context, style="bullets") if context else None),
             answer_options=answer_options,
             preamble=(preamble or "").strip(),
             use_option_letters=self.use_option_letters,
-            enable_cot=self.enable_cot,
+            enable_cot=self.enable_cot if enable_cot is None else enable_cot,
             example_response="\n".join([example_reason, example_answer]),
         )
 
-        return textwrap.dedent(rendered).strip() + "\n"
-
-
-def _format_answer_options(options: Sequence[str], use_option_letters: bool) -> str:
-    """Format answer options for inclusion in the prompt.
-
-    Args:
-        options: List of answer options.
-        use_option_letters: Whether to prefix options with letters (A, B, C, ...).
-    """
-    if not options or not all(isinstance(opt, str) and opt.strip() for opt in options):
-        raise ValueError("`answer_options` must contain at least one non-empty option.")
-
-    if use_option_letters:
-        letters = string.ascii_uppercase
-        if len(options) > len(letters):
-            raise ValueError(f"If using option letters, max {len(letters)} options are supported.")
-        return "\n".join(f"{letters[i]}. {opt.strip()}" for i, opt in enumerate(options))
-    else:
-        return "\n".join(f"- {opt.strip()}" for i, opt in enumerate(options))
-
-
-def _format_context(context: str | Sequence[str]) -> str:
-    """Formats the context for inclusion in the prompt.
-
-    Args:
-        context: The context string or list of context strings. If a list is provided,
-                 the contexts will be formatted as a bullet point list.
-
-    Returns:
-        The formatted context string.
-    """
-    if not isinstance(context, list):
-        context = [context]  # type: ignore[assignment]
-    return "\n".join(f"- {item.strip()}" for item in context if item.strip())
+        return format_utils.remove_multi_blank_lines(textwrap.dedent(rendered).strip()) + "\n"
diff --git a/src/eva/language/prompts/templates/typings.py b/src/eva/language/prompts/templates/typings.py
@@ -0,0 +1,10 @@
+"""Typings for prompt templates."""
+
+from typing_extensions import TypedDict
+
+
+class QuestionAnswerExample(TypedDict):
+    """A question-answer example for few-shot prompting."""
+
+    question: str
+    answer: str