KI-QFS

This repository contains codes and KI-QFS dataset described in the paper "Tackling Query-Focused Summarization as A Knowledge-Intensive Task: A Pilot Study". The paper is accepted by GenIR@SIGIR 2023 workshop.

Updates

data description
relevance annotation
codes

Data Description

The dataset is based on DUC 2005-2007 datasets in NIST. Please ask for their data access before using our dataset. The dataset is located in dataset/. Please refer to the paper for more details of the dataset.

Data Structure

We repurpose the DUC datasets for a knowledge-intensive task, spliting them into input-output pairs and a knowledge corpus.
For the pairs, we also divide them into train, validation, and test splits, which are kiqfs_pairs_train/val/test.jsonl. The data format of each line in *.jsonl is:

{
    'id': 'D301I', # original id of each cluster on the DUC Datasets
    'query': 'Nobel prizes are awarded each year for achievement...',
    'summaries': ['s1', 's2', ..., 'sn'] # a list of summaries
}

For knowledge corpora, we consider three alternatives:

Internal corpus
External corpus
Augmented corpus

The internal corpus is kiqfs_internal_knowledge.json, which only contain documents from the DUC datasets. The data format is:

{
    'D301I': [{'title': 'FT 02 NOV 94...', 'text': 'CRIME WITHOUT FRONTIERS By...'}, ...] # a list of documents in the cluster  D301I,
    ... # all clusters
}

For external corpus, we use Wikipedia dump kilt_w100_title.tsv from KILT Benchmark. Please follow their instructions to download the data.
We also provide processed version of internal corpus kiqfs_internal_w100_title.tsv, which has the same data format with kilt_w100_title.tsv.
For augmented corpus, we simply combine previous two corpora to form it.

Relevance Annotation

TODO

License

KI-QFS is MIT licensed. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
dataset		dataset
paper		paper
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

KI-QFS

Updates

Data Description

Data Structure

Relevance Annotation

License

About

Uh oh!

Releases

Packages

Uh oh!

License

Uh oh!

wjzhang392/KI-QFS

Folders and files

Latest commit

History

Repository files navigation

KI-QFS

Updates

Data Description

Data Structure

Relevance Annotation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages