ChemTable is a large-scale benchmark designed to test the capabilities of multimodal large language models (MLLMs) in understanding real-world chemical tables—one of the most information-dense and visually complex formats in scientific literature.
📘 Built from over 1,300 tables from high-impact chemistry journals, ChemTable combines visual, textual, symbolic, and domain-specific information to push the boundaries of scientific AI.
-
Multimodal Benchmark
Combines symbolic chemical formulas, table structures, visual molecule diagrams, and scientific text. -
Two Core Tasks
- Table Recognition: Detect structure, extract content, and identify molecules.
- Table Understanding: Answer descriptive and reasoning-based questions from tables.
-
Challenging QA Dataset
Includes 9,000+ questions (descriptive + reasoning), curated with a mix of human annotation and LLM-assisted synthesis.
- Table Types: Reaction optimization, substrate screening, property comparison, molecular structure tables, and more.
- Visual Annotations: Bounding boxes, styles (bold/color), molecule diagrams.
- Logical Annotations: Row/column positions, cell values, chemical metadata.
Subtask | Description | Metric |
---|---|---|
Value Retrieval | Locate exact content at given (row, column) | Accuracy |
Position Retrieval | Infer position from given content | Accuracy |
Molecular Recognition | Identify SMILES from embedded diagrams | Tanimoto |
A copy of the dataset is included directly in this repository for convenience.
Please refer to the data/
directory for access and usage.
The eval/
directory contains evaluation scripts organized by tasks:
smiles_eval.py
: Molecular structure recognition evaluationTR_eval.py
: General table structure recognition evaluationbenzene_ring_eval.py
: Specific evaluation for benzene ring detectionevaluate_table_qa.py
: General QA evaluationvisual_reasoning_eval.py
: Visual reasoning capability evaluationlogical_reasoning_trend_eval.py
: Logical reasoning evaluationmultihop_reference_eval.py
: Multi-hop reasoning evaluation
Each script can be run independently and includes its own command-line arguments for customization. Check the script headers for specific usage instructions.