Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@brabbit61
Copy link
Collaborator

  • Added documentation to train and retrain spacy models
  • Added scripts for the above tasks

Copy link
Collaborator

@tieandrews tieandrews left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good just need to rearrange some things and reuse our existing code for parts. Nice work.

if os.path.exists(val_path):
shutil.rmtree(val_path)
logger.info(f"The folder '{val_path}' has been deleted.")
if os.path.exists(test_path):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: this is duplicating the process done in the labelling_data_split.py file, instead run that script first in bash with these args then make this function run with those processed folders, that way we remove code duplication and make the parts of the process more re-usable.

test_gdd_ids
)

def get_article_gdd_ids(labelled_file_path: str):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete duplicate function, see comment above


# TODO: Else If the data_path consists of parquet files, load JSON files from all parquet files in the directory

def split_train_val_test(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete duplicate function, see comment above


rm -f spacy_transformer_$VERSION.cfg

python3 spacy_preprocess.py \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments on this file below, should remove duplicate code, use labelling_data_split.py here then call pre-processing

Copy link
Collaborator

@tieandrews tieandrews left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, merging.

@tieandrews tieandrews merged commit d6cb366 into dev Jun 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants