- Navigate to the root directory of the project -
cd CS399-NLP/ - Create a virtual environment
2.
pip install virtualenv2.virtualenv venv - Activate the virtual environment -
source venv/bin/activate - Install the requirements -
pip install -r requirements.txt
This project is to collect datasets for the task of generating descriptions for charts and graphs. The datasets are collected from the following sources:
- Wikipedia (Concadia)
- Statista
- Pew
- Accessbility Journals (HCI)
- Concadia
- Download
wiki_split.jsonandresized.zip. - Run
concadia.py - Filter manually via visual inspection
- Download
- HCI
- Download
images.zipandhci.jsonl - Run
hci.py - Filter manually via visual inspection
- Download
- Conceptual Captions
- Download
Train_GCC-training.tsvfile - Run
conceptual_captions.py - Filter manually via visual inspection
- Download
- Chart-to-Text - Download
chart-to-text-dataset.zipfrom https://github.com/vis-nlp/Chart-to-text- Statista
- Pew