Thanks to visit codestin.com
Credit goes to github.com

Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,8 @@ Articles predicted to be relevant will then be submitted to the Data Extraction

To run the Docker image for article relevance prediction pipeline, please refer to the instructions [here](docker/article-relevance/README.md)

The model could be retrained using reviewed article data. Please refer to [here](docker/article-relevance-retrain/README.md) for the instructions.

### **Data Extraction Pipeline**

The full text is provided by the xDD team for the articles that are deemed to be relevant and a custom trained **Named Entity Recognition (NER)** model is used to extract entities of interest from the article.
Expand Down
6 changes: 4 additions & 2 deletions docker/article-relevance-retrain/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@ Running this docker image will:
3. Train a new logistic regression model using the expanded data.
4. Key model evaluation are outputted in the specified results directory.

**Data Requirement**
The original training data is required so that the retraining process can be built based on both old and new examples. To run the retrain pipeline, download [metadata_embeded_specter2.csv](https://drive.google.com/file/d/1vIiTryi-BDoLYSQlCWrgoKlV3t9joiTO/view?usp=drive_link) and save it under the path provided in the environment variable TRAIN_DATA_PATH (see below for a sample docker compose).

## Additional Options Enabled by Environment Variables

Expand All @@ -37,10 +39,10 @@ services:
image: metaextractor-article-relevance-retrain:v0.0.1
environment:
- USE_REVIEWED_DATA=True
- TRAIN_DATA_PATH=data/article-relevance/processed/metadata_processed_embedded.csv
- TRAIN_DATA_PATH=data/article-relevance/processed/metadata_embeded_specter2.csv
- MODEL_FOLDER=/outputs/model/
- RESULT_DIR=/outputs/model_eval/
- REVIEWED_FOLDER_PATH=data/data-review-tool/processed/
- REVIEWED_FOLDER_PATH=data/data-review-tool/

volumes:
- ./data/article-relevance/retrain-outputs:/outputs/
Expand Down