This repository hosts the source code of the RACOON+ prototype system, which is introduced in the paper: "RACOON+: A System for LLM-based Table Understanding with a Knowledge Graph".
RACOON/
├── README.md
├── requirements.txt # Python dependencies
└── src/ # Source code directory
├── CTA/ # Column Type Annotation module
├── RE/ # Relation Extraction module
├── data/ # Datasets used and label mappings
├── KG_Linker.py/ # The KG-Linker component in RACOON+
├── pruning.py/ # The pruning module in RACOON+
└── utils.py # Shared utility functions for both the CTA and RE workloads
You can install the required packages with pip install -r requirements.txt. The experiments were run on Python 3.12.
The datasets used in the paper experiments are available on Google Drive:
Please download the datasets and place them in the src/data/ directory before running the experiments.
The RACOON+ system supports column type annotation and relation extraction using different context types for entity linking and different types of retrieved information to augment LLM prompts. Run the CTA or RE module with the following commands:
# Column Type Annotation
cd src/CTA
python RACOON_CTA.py [OPTIONS]
# Relation Extraction
cd src/RE
python RACOON_RE.py [OPTIONS]-
--model: OpenAI model to use (default:gpt-4o-mini)- Options: Any valid OpenAI model name
- Example:
--model gpt-4o
-
--context: Context type for knowledge graph linking (default:col)- Options:
wikiAPI,cell,table,col,hybrid wikiAPI: Uses Wikipedia API for entity linkingcell: Cell context for entity linkingtable: Table context for entity linkingcol: Column context for entity linkinghybrid: Hybrid context for entity linking
- Options:
-
--info: Information type to extract from knowledge graph (default:type)- Options:
entity,des,type,relation entity: Extract entity labels from Wikidatades: Extract entity labels and descriptions from Wikidatatype: Extract entity types from Wikidatarelation: Extract entity relations between cell pairs from Wikidata
- Options:
# Use GPT-4o with hybrid context and type information
python RACOON_CTA.py --model gpt-4o --context hybrid --info type
# Use cell-level context with entity information
python RACOON_CTA.py --context cell --info entityBoth programs generate CSV files with the format: {model}/RACOON_{context}_{info}.csv
CTA Output - Each row contains:
- Table ID
- Column index
- Predicted type
- Wikidata hint used
RE Output - Each row contains:
- Table ID
- Column pair index
- Predicted relation
- Wikidata hint used
Make sure to set the following environment variables:
OPENAI_API_KEY: Your OpenAI API keyDATA_DIR: Path to the data directory (defaults to../datarelative to the script)
To evaluate the results of CTA or RE experiments, use the evaluation scripts:
Column Type Annotation (CTA):
cd src/CTA
python CTA_eval.py <path_to_results_csv>Relation Extraction (RE):
cd src/RE
python RE_eval.py <path_to_results_csv>Output: The evaluation scripts compute and print the micro-averaged F1 score, which is the main metric used in the paper.