Interpret textual data generated from medical vocal memos
In the Library of Celsus in Ephesus, built in the 2nd century, there are four statues depicting wisdom (Sophia), knowledge (Episteme), intelligence (Ennoia) and excellence (Arete). Our project is named after this city and the goddess Sophia.
After visiting a patient nurses and doctors need to quickly and easily send information
So they record a vocal memo after each visit
Today these memos are read by humans and the infos are manually entered in the database
We want to ease their work by automatically extracting informations from the vocal memos and pre-filling the informations to be entered in the database
4000 vocal memo recordings (4000 sentences)
14 targets to predict (up to 14 different pieces of informations per memo)
Here is an example of a memo
And here is the corresponding informations we need to extract
Clean the data from stop words and punctuation
We identify which part of the memo (which group of words) corresonds to which information
For this, we build a Named Entity Recognition model (NER) using the spaCy library
We build models to convert each information into the target classes using the nltk library
You can play around with our demo here
In this demo, we let you try your own sentences and see the results from our models
Show our success percentage
Give our feedback on possible improvement points and share the hypotheses we used to build our models
Clone the project:
git clone [email protected]:GeoffroyGit/ephesus.gitWe recommend you to create a fresh virtual environment
Create a python3 virtualenv and activate it:
cd ephesus
pyenv virtualenv ephesus
pyenv local ephesusUpgrade pip if needed:
pip install --upgrade pipInstall the package:
pip install -r requirements.txt
pip install -e .Run the API on your machine:
make run_apiBuild the docker image:
make docker_buildRun a container on your machine:
make docker_runStop the container running on your machine
docker ps
docker stop <container id>Push the image to Google Cloud Platform (GCP):
make docker_pushRun a container on GCP:
make docker_deployYou'll need similar training data in order to train the models
We're sorry we can't share our data
mkdir models
mkdir models/configDownload base config on https://spacy.io/usage/training (select only French and NER) and save it to models/config/base_config.cfg
Fill config file with default values:
cd models/config/
python -m spacy init fill-config base_config.cfg config.cfgCreate train set and test set for the model:
cd ephesus/
python sentence.pyCreate variable to host training data file name (put the same names as in ephesus/sentence.py):
export EPHESUS_TRAINING_DATA = "train_set_v2.spacy"
export EPHESUS_TEST_DATA = "test_set_v2.spacy"Train the model:
cd models/
mkdir model_v2
cd models/config/
python -m spacy train config.cfg --output ../model_v2 --paths.train ../../raw_data/$EPHESUS_TRAINING_DATA --paths.dev ../../raw_data/$EPHESUS_TRAINING_DATAEvaluate the model:
cd models/model_v2/
mkdir eval
cd models/config/
python -m spacy evaluate ../model_v2/model-best ../../raw_data/$EPHESUS_TEST_DATA -dp ../model_v2/EVAL -o ../model_v2/EVAL/model_v2_scores.jsonTrain and evaluate the models for treatment and location:
cd ephesus/
python nlp.pyCheck the classes for date and time:
cd ephesus/
python timedate.pyCongratulations!