📑 Paper: https://arxiv.org/abs/2501.00888
🌏 Chinese Web Demo: https://modelscope.cn/studios/vickywu1022/CHRONOS
- We propose CHRONOS, a novel retrieval-based approach to Timeline Summarization (TLS) by iteratively posing questions about the topic and the retrieved documents to generate chronological summaries.
- We construct an up-to-date dataset for open- domain TLS, which surpasses existing public datasets in terms of both size and the duration of timelines.
- Experiments demonstrate that our method is effective on open-domain TLS and achieves comparable results with state-of-the-art methods of closed-domain TLS, with significant improvements in efficiency and scalability.
We release our Open-TLS dataset for open-domain Timeline Summarization.
The target news query is presented in news_keywords.py and the ground truth timeline is presented in data/open/{NEWS_KEYWORD}/timelines.jsonl following the below format:
[["YYY-MM-DDT00:00:00", ["", "", ""]]]pip install -r requirements.txtThe second step is to construct a topic-questions example pool for datasets in data/.
python question_exampler.pyOr, you can use our provided data/question_examples.json, which contains examples for the Crisis, T17 and Open-TLS datasets.
We have released the code of CHRONOS to complete open-domain Timeline Summarization task. You may also refer to our modelscope repo to build an app with streamlit.
Before running, please replace the placeholder with your own API keys in src/model.py to call either Qwen or GPT models.
DASHSCOPE_API_KEY = "YOUR_API_KEY"
OPENAI_API_KEY = "YOUR_API_KEY"Please also replace it with your own BING Web Search API key in src/searcher.py to search news from the Internet.
BING_SEARCH_KEY = "YOUR_API_KEY"If you want the CHRONOS to use the full page instead of only the snippet, please replace your own JINA key in src/reader.py.
JINA_API_KEY = "YOUR_API_KEY"To experiment with the Open-TLS dataset, run:
python main.py \
--model_name "$model" \
--max_round "$round" \
--dataset open \
--output "$output_dir" \
--question_exswhere "$round" is the maximum self-questioning round and "$output_dir" sets the output directory containing: (1) retrieved news, (2) generated timelines and (3) evaluation scores.
@article{wu2025unfoldingheadlineiterativeselfquestioning,
title={Unfolding the Headline: Iterative Self-Questioning for News Retrieval and Timeline Summarization},
author={Weiqi Wu and Shen Huang and Yong Jiang and Pengjun Xie and Fei Huang and Hai Zhao},
year={2025},
eprint={2501.00888},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2501.00888},
}