Thanks to visit codestin.com
Credit goes to github.com

Skip to content

mbzuai-oryx/ARB

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

94 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

ARB: A Comprehensive Arabic Multimodal Reasoning Benchmark

Sara Ghaboura * Β  Ketan More * Β  Wafa Alghallabi Β  Omkar Thawakar Β 
Jorma Laaksonen Β  Hisham Cholakkal Β  Salman Khan Β  Rao M. Anwer
*Equal Contribution


arXiv Our Page GitHub issues GitHub stars GitHub license
*Equal Contribution

If you like our project, please give us a star ⭐ on GitHub for the latest update.



Latest Updates

πŸ”₯ [22 May 2025] ARB is 1st Arabic multimodal benchmark focused on step-by-step reasoning is released.
πŸ€— [22 May 2025] ARB dataset available on HuggingFace.



ARB Scope and Diversity

ARB is the first benchmark focused on step-by-step reasoning in Arabic cross both textual and visual modalities, covering 11 diverse domains spanning science, culture, OCR, and historical interpretation.

Figure: ARB Dataset Coverage

🌟 Key Features

  • 1,356 multimodal samples, each with an image, Arabic question, and reasoning-based answer.
  • 5,119 curated reasoning steps reflecting human logic
  • 11 diverse domains, from visual reasoning to historical and scientific analysis.
  • Native Arabic speakers and domain experts verified.
  • Hybrid sources: original Arabic data, high-quality translations, and synthetic samples.
  • Robust evaluation framework for final answer accuracy and reasoning quality
  • Fully open-source dataset and toolkit to support research in Arabic reasoning and multimodal AI.

πŸ—οΈ ARB Construction Pipeline

Figure: ARB Pipeline Overview


ARB Collection

Figure: ARB Collection


ARB Data Distribution over Domains

Figure: ARB dist

Source Types Across Domains

Domain English Bench Arabic Bench Human-Created Synthetic
Visual Reasoning βœ… – – –
OCR & Document Analysis – – βœ… βœ…
Chart & Data Table (CDT) βœ… βœ… βœ… βœ…
Math & Logic βœ… – – –
Social & Cultural βœ… – – –
Computer Vision Perception βœ… – – –
Medical Image Analysis βœ… βœ… – –
Scientific Reasoning βœ… – – –
Agricultural Interpretation βœ… – βœ… βœ…
Remote Sensing Understanding – βœ… – –
Historical & Anthropological βœ… – βœ… βœ…

Download

from datasets import load_dataset

# Login using e.g. `huggingface-cli login` to access this dataset
ds = load_dataset("MBZUAI/ARB")

Evaluation Protocol

We evaluated 12 open- and closed-source LMMs using:

  • Lexical and Semantic Similarity Scoes: BLEU, ROUGE, BERTScore.

  • Cross-lingual semantic alignment: LaBSE

  • Custom Rubric (Arabic):: Our curated metric rebric includes 10 factors like faithfulness, interpretive depth, coherence, hallucination, and more.

LLM-as-Judge (Arabic prompt-based)

We evaluate models using:

  • Step-by-step reasoning quality (coherence, informativeness, commonsense)
  • Final answer accuracy
  • Agreement with human raters (Krippendorff’s Alpha > 87%)


Stepwise Evaluation Results

For Closed-Source Models:

GPT-4o GPT-4o-mini GPT-4.1 o4-mini Gemini 1.5 Pro Gemini 2.0 Flash
Final Answer (%) 60.22 52.22 59.43 58.93 56.7 57.8
Reasoning Steps (%) 64.29 61.02 80.41 80.75 64.34 64.09

For Open-Source Models:

Qwen2.5-VL-7B Llama-3.2-11B AIN Llama-4 Scout Aya-Vision-8B InternVL3-8B
Final Answer (%) 37.02 25.58 27.35 48.52 28.81 31.04
Reasoning Steps (%) 64.03 53.2 52.77 77.7 63.64 54.5

πŸ“‚ Dataset Structure

Each sample includes:

  • image: Visual input
  • question: Arabic reasoning prompt
  • choices: The choices for MCQ
  • steps: Ordered reasoning chain
  • answer: Final solution (Arabic)
  • domain: One of 11 categories (e.g., OCR, Scientific, Visual, Math)
  • curriculum: One of the 4 curricula followed by the prompt for steps generation (Computational, Sci/Med, Textual/Partial, and General)


Citation

If you use ARB dataset in your research, please consider citing:

@misc{ghaboura2025arbcomprehensivearabicmultimodal,
      title={ARB: A Comprehensive Arabic Multimodal Reasoning Benchmark}, 
      author={Sara Ghaboura and Ketan More and Wafa Alghallabi and Omkar Thawakar and Jorma Laaksonen and Hisham Cholakkal and Salman Khan and Rao Muhammad Anwer},
      year={2025},
      eprint={2505.17021},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2505.17021}, 
}

About

ARB: A Comprehensive Arabic Multimodal Reasoning Benchmark

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published