Construction-Disclosure-Documents-Reviewing-System

This is the official repo for our paper: "Generative Knowledge-Guided Review System for Construction Disclosure Documents Reviewing"(ADVEI2025)

Data

Example data files are in ./example/.

Weights

The pretrained weights can be acquired at google_drive.

Train

You can train the extraction modules in this commend:

# Train with default parameters
python train_extract.py -i dataset.csv

# Custom output file and training parameters
python train_extract.py -i dataset.csv -o my_model.pth -e 100 -l 1e-5 -b 32

# Use different BERT model
python train_extract.py -i dataset.csv --model_name bert-base-multilingual-cased

Command Line Arguments

Argument	Short	Default	Description
`--input`	`-i`	Required	Input CSV file path
`--output`	`-o`	`pretrain-model.pth`	Output model file path
`--epochs`	`-e`	`200`	Number of training epochs
`--learning_rate`	`-l`	`5e-6`	Learning rate
`--batch_size`	`-b`	`16`	Batch size
`--split_ratio`	`-s`	`0.9`	Train/validation split ratio
`--max_length`	`-m`	`512`	Maximum sequence length
`--weight_decay`	`-w`	`0.01`	Weight decay
`--warmup_steps`		`0`	Number of warmup steps
`--print_interval`		`20`	Print F1 score interval
`--model_name`		`bert-base-chinese`	BERT model name

Dataset Format

The input CSV file should contain the following columns:

Column	Description	Required	Example
`Query`	The input text/query to be processed	✅	"How to train a machine learning model?"
`max`	Maximum priority chunk/span to extract	✅	"machine learning model"
`mid`	Medium priority chunk/span to extract	❌	"train"
`lit`	Low priority chunk/span to extract	❌	"How to"

Sample CSV Structure

Query,max,mid,lit
"How to train a machine learning model?","machine learning model","train","How to"
"What is deep learning?","deep learning",,
"Explain neural networks","neural networks","Explain",

Inference

Retrieval inference example.

from CDDRS import GKGR

source_knowledge_base = 'path_to_knowledge_base'
query = 'your_retrieval_query'
retrieval_result = GKGR(
    query, 
    source_knowledge_base, 
    topk=3, 
    llm='deepseek', 
    api='your_own_deepseek_api', 
    base_url='https://api.deepseek.com'
)

Parameters

Core Parameters

Parameter	Type	Default	Description
`query`	str	Required	The search query text
`source_knowledge_base`	str	Required	Path to the document directory
`topk`	int	`3`	Number of top results to return
`llm`	str	`'deepseek-chat'`	LLM model name (`'gpt-4o'`, `'deepseek-chat'`, etc.)
`api`	str	`'your-api-key'`	API key for the LLM service
`base_url`	str	`'https://api.deepseek.com/v1'`	API base URL
`embedding_model`	str	`'./models/bge-m3'`	Path to embedding model
`bert_model_path`	str	`'pretrain_model.pth'`	Path to BERT query expansion model
`chunk_size`	int	`512`	Document max chunk size for processing
`retrieval_mode`	str	`'gkgr'`	Retrieval mode: `'vector'`, `'kg'`, or `'gkgr'`

Advanced Parameters

Parameter	Type	Default	Description
`force_reinit`	bool	`False`	Force reinitialization of cached instances
`fusion_weights`	List[float]	`[0.6, 0.4]`	Weights for combining vector and KG retrieval
`expansion_weights`	List[float]	`[0.5, 0.3, 0.2]`	Weights for original and expanded queries

Test

from utils.test import retrieve_test, generate_test
annotated_files = 'retrieve_files_with_annotated'
metric = ['MRR', 'Acc'] # generate is 'F1'
test_results = retrieve_test(annotated_files, metric, mode='retrieve')

Cite

@article{XIAO2025103618,
title = {Generative knowledge-guided review system for construction disclosure documents},
journal = {Advanced Engineering Informatics},
volume = {68},
pages = {103618},
year = {2025},
issn = {1474-0346},
doi = {https://doi.org/10.1016/j.aei.2025.103618},
url = {https://www.sciencedirect.com/science/article/pii/S1474034625005117},
author = {Hongru Xiao and Jiankun Zhuang and Bin Yang and Jiale Han and Yantao Yu and Songning Lai},
keywords = {Construction documents review, Large language model (LLM), Knowledge-guided retrieval, Natural Language Processing (NLP)}
}

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
example		example
utils		utils
README.md		README.md
__init__.py		__init__.py
kg_retriever.py		kg_retriever.py
query_expander.py		query_expander.py
train_extract.py		train_extract.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Construction-Disclosure-Documents-Reviewing-System

Data

Weights

Train

Command Line Arguments

Dataset Format

Sample CSV Structure

Inference

Parameters

Core Parameters

Advanced Parameters

Test

Cite

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Hongru0306/CDDRS

Folders and files

Latest commit

History

Repository files navigation

Construction-Disclosure-Documents-Reviewing-System

Data

Weights

Train

Command Line Arguments

Dataset Format

Sample CSV Structure

Inference

Parameters

Core Parameters

Advanced Parameters

Test

Cite

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages