Thanks to visit codestin.com
Credit goes to github.com

Skip to content

cdqa-suite/cdQA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

99 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

cdQA

Build Status codecov PyPI Version PyPI Downloads Binder Colab License

An end-to-end closed-domain question answering system with BERT and classic IR methods πŸ“š

Installation

With pip

pip install cdqa

From source

git clone https://github.com/fmikaelian/cdQA.git
cd cdQA
pip install -e .

Hardware Requirements

Experiments have been done on an AWS EC2 p3.2xlarge Deep Learning AMI (Ubuntu) Version 22.0 + a single Tesla V100 16GB with 16-bits training enabled (to accelerate training and prediction). To enable this feature, you will need to install apex:

git clone https://github.com/NVIDIA/apex.git
cd apex/
python setup.py install --cuda_ext --cpp_ext

Getting started

How it works

Preparing your data

To use cdqa you need a .csv corpus file with the following columns:

date title category link abstract paragraphs content
DD/MM/YY The Article Title The Article Category https://the-article-link.com The Article Abstract [Paragraph 1 of Article, Paragraph N of Article] Paragraph 1 of Article Paragraph N of Article

You can use the converters to create this file:

# create a corpus dataframe from a directory containing .pdf files

Training models

Read your corpus in .csv format:

import pandas as pd
from cdqa.pipeline.cdqa_sklearn import QAPipeline

df = pd.read_csv('your-custom-corpus-here.csv')

Fit the pipeline on your corpus using the pre-trained reader:

cdqa_pipeline = QAPipeline(model='bert_qa_squad_v1.1_sklearn.joblib')
cdqa_pipeline.fit(X=df)

If you want to fine-tune the reader on your custom data:

cdqa_pipeline = QAPipeline()
cdqa_pipeline.fit(X=df, fit_reader=True)

Making predictions

To get the best prediction given an input query:

query = 'your custom question here'

cdqa_pipeline.predict(X=query)

Evaluating models

In order to evaluate models on your custom dataset you will need to annotate it. The annotation process can be done in 3 steps:

  1. Convert your pandas DataFrame into a json file with SQuAD format:

    from cdqa.utils.converter import df2squad
    
    json_data = df2squad(df=df, squad_version='v2.0', output_dir='../data', filename='bnpp_newsroom-v1.1')
  2. Use an annotator to add ground truth question-answer pairs:

    Please refer to cdQA-annotator, a web-based annotator for closed-domain question answering datasets with SQuAD format.

  3. Evaluate your model:

    from cdqa.utils.metrics import evaluate, evaluate_from_files
    
    evaluate(dataset, predictions) # as json objects
    
    evaluate_from_files(dataset_file='dev-v1.1.json', prediction_file='predictions.json') # as json files

Practical examples

A complete worfklow is described in our examples notebook.

Deployment

Manual

You can deploy a cdQA REST API by executing:

FLASK_APP=api.py flask run -h 0.0.0.0

To try it, execute:

http localhost:5000/api query=='your question here'

If you wish to serve a user interface, follow the instructions of cdQA-ui, a web interface developed for cdQA.

With docker

You can use the Dockerfile to deploy the full cdQA app.

Contributing

Read our Contributing Guidelines.

References

Type Title Author Year
πŸ“Ή Video Stanford CS224N: NLP with Deep Learning Lecture 10 – Question Answering Christopher Manning 2019
πŸ“° Paper End-to-End Open-Domain Question Answering with BERTserini Wei Yang, Yuqing Xie, Aileen Lin, Xingyu Li, Luchen Tan, Kun Xiong, Ming Li, Jimmy Lin 2019
πŸ“° Paper Contextual Word Representations: A Contextual Introduction Noah A. Smith 2019
πŸ“° Paper BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova 2018
πŸ“° Paper Neural Reading Comprehension and Beyond Danqi Chen 2018
πŸ“° Paper Reading Wikipedia to Answer Open-Domain Questions Danqi Chen, Adam Fisch, Jason Weston, Antoine Bordes 2017