SkimLit

An NLP model to classify abstract sentences into the role they play (e.g. objective, methods, results, etc..) to enable researchers to skim through the literature and dive deeper when necessary.

More specificially, I'am going to replicate the deep learning model behind the 2017 paper PubMed 200k RCT: a Dataset for Sequenctial Sentence Classification in Medical Abstracts.

Dataset Used

PubMed 200k RCT dataset

The PubMed 200k RCT dataset is described in Franck Dernoncourt, Ji Young Lee. PubMed 200k RCT: a Dataset for Sequential Sentence Classification in Medical Abstracts. International Joint Conference on Natural Language Processing (IJCNLP). 2017.

Some miscellaneous information:

PubMed 20k is a subset of PubMed 200k. I.e., any abstract present in PubMed 20k is also present in PubMed 200k.
PubMed_200k_RCT is the same as PubMed_200k_RCT_numbers_replaced_with_at_sign, except that in the latter all numbers had been replaced by @. (same for PubMed_20k_RCT vs. PubMed_20k_RCT_numbers_replaced_with_at_sign).
Count Plot

Models Tried

All the note books are availabel here

NaiveBiase Model -> 72% Accuracy
Conv1D Model -> 78% Accuracy
Model using pretrained token embedding ( Universal sentence embedding ) -> 75% Accuracy
Conv1D Model using character level embedding -> 73% Accuracy
Model with both token and charcter level embedding -> 76% Accuracy
Model with token, character and position level embedding ( https://arxiv.org/pdf/1612.05251.pdf ) -> 81% Accuracy

Model described in this paper with bert embedding -> 88% Accuracy

Final Results

Results of all Models

Best Performong Model

Final Outputs

Packages Used

Tensorflow
tensorflow_text
tensorflow_hub
sklearn
Matplotlib
numpy
pandas
spaCy

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
__pycache__		__pycache__
images		images
.gitignore		.gitignore
README.md		README.md
SkimLit.ipynb		SkimLit.ipynb
SkimLit.ipynb:Zone.Identifier		SkimLit.ipynb:Zone.Identifier
app.py		app.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SkimLit

Dataset Used

Models Tried

Final Results

Results of all Models

Best Performong Model

Final Outputs

Packages Used

About

Uh oh!

Releases

Packages

Languages

aayushxrj/SkimLit

Folders and files

Latest commit

History

Repository files navigation

SkimLit

Dataset Used

Models Tried

Final Results

Results of all Models

Best Performong Model

Final Outputs

Packages Used

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages