Last active
April 6, 2025 12:10
-
-
Save codez0mb1e/135f5d4b1d0440984d4b5eb094ddfa9a to your computer and use it in GitHub Desktop.
Revisions
-
codez0mb1e revised this gist
Sep 11, 2024 . 1 changed file with 42 additions and 34 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,51 +1,72 @@ # Template for Data Science Project ## Main Principles - **Reproducibility**: - code files: under version control, code review - data: data pipeline or snapshots - environment: venv/conda/docker - models: training pipeline or pickled models, saved hyper-parameters and metrics - *experiment*: tracking, report - **Maintainability**: - code: modularity, code review, documentation, logging - data: data quality checks, format docs, metadata, data versioning - environment: venv/conda, requirements.txt - models: hyper-parameters as configuration, model versioning - *experiment*: code/data/env/models comparison using its artifacts, changelog - **Security and Privacy**: - No data outside DMZ. ## Repository Structure Directories: ```md |-- src/ | |-- core/ <- Core functions and utils | |-- abstracts.(py|R) | |-- configuration.(py|R) | |-- experiment.(py|R) | |-- logging.(py|R) | |-- ... | |-- utils.(py|R) | |-- training/ | |-- model.(py|R) <- Model definition | |-- preprocessing.(py|R) <- Preprocessing functions | |-- ... | |-- utils.(py|R) | |-- __init.(py|R) | |-- 1_load_data.(py|R) <- Data loading pipeline | |-- 2_preprocessing.(py|R) <- Data preprocessing pipeline | |-- 2.1_hypothesis_1.ipynb <- Hypothesis testing and data exploration notebook | |-- 2.2_hypothesis_2.ipynb | |-- 3_feature_engineering.(py|R) <- Feature engineering pipeline | |-- 4_model_training.(py|R) <- Model training pipeline, e.g. hyper-params optimization | |-- 5_model_evaluation.(py|R) <- Model evaluation pipeline | |-- ... | |-- config.yml | |-- config-(dev|release).yml | |-- secrets.yml | |-- secrets-(dev|release).yml |-- data/ <- Data directory (not under version control, in S3) | |-- {data_version}/ <- Raw data |-- experiments/ <- Experiments artifacts, outputs and temp files | |-- {experiment_version}/ | |-- cache/ <- Cache for different experiment stages | |-- output/ <- validate dataset, test dataset, hyper-opt artifacts, plots | |-- models/ or model.pkl <- Final model (or models ensemble) | |-- report.md <- Manual report | |-- changelog <- Automated report |-- logs/ | |-- {experiment_name}_{stage_name}_{timestamp}.log |-- tests/ | |-- unit/ | |-- integration/ | |-- e2e/ |-- docs/ |-- labs/ <- Jupyter notebooks and other experiments |-- requirements.txt |-- requirements-dev.txt |-- Dockerfile |-- Dockerfile.release |-- .dockerignore @@ -58,16 +79,3 @@ Directories: |-- LICENSE |-- CHANGELOG ``` -
codez0mb1e revised this gist
Jan 28, 2024 . 1 changed file with 10 additions and 10 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -7,16 +7,16 @@ Directories: ``` |-- src/ | |-- core/ <- Core functions and utils | |-- configuration.(py|R) | |-- experiment.(py|R) | |-- logging.(py|R) | |-- ... | |-- utils.(py|R) | |-- training/ | |-- model.(py|R) | |-- preprocessing.(py|R) | |-- ... | |-- utils.(py|R) | |-- __init.(py|R) | |-- 1_load_data.(py|R) | |-- 2_preprocessing.(py|R) @@ -50,7 +50,7 @@ Directories: |-- Dockerfile.release |-- .dockerignore |-- .gitignore |-- .github/workflows/ | |-- build.yml | |-- release.yml |-- run.(sh|ps) -
codez0mb1e revised this gist
Jan 28, 2024 . 1 changed file with 20 additions and 11 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -22,7 +22,7 @@ Directories: | |-- 2_preprocessing.(py|R) | |-- 3_feature_engineering.(py|R) | |-- 4_model_training.(py|R) | |-- 5_model_evaluation.(py|R) | |-- ... | |-- config.yml | |-- config.(dev|release).yml @@ -32,33 +32,42 @@ Directories: | |-- {data_version}/ |-- experiments/ | |-- {experiment_version}/ | |-- cache/ | |-- output/ | |-- report.md |-- logs/ | |-- {experiment_name}_{stage_name}_{timestamp}.log |-- tests/ | |-- unit/ | |-- integration/ | |-- e2e/ |-- docs/ |-- notebooks/ |-- requirements.txt |-- requirements-dev.txt |-- conda_env.yml |-- Dockerfile |-- Dockerfile.release |-- .dockerignore |-- .gitignore |-- .github/workflows | |-- build.yml | |-- release.yml |-- run.(sh|ps) |-- README.md |-- LICENSE |-- CHANGELOG ``` #### Branches ``` main dev features/<name> fix/<bug#> ``` #### `README.md` Template TBD -
codez0mb1e revised this gist
Dec 5, 2021 . 1 changed file with 49 additions and 41 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,56 +1,64 @@ ### Template for Data Science Project #### Repository Structure Directories: ``` |-- src/ | |-- core/ <- Core functions and utils | | |-- configuration.(py|R) | | |-- experiment.(py|R) | | |-- logging.(py|R) | | |-- ... | | |-- utils.(py|R) | |-- training/ | | |-- model.(py|R) | | |-- preprocessing.(py|R) | | |-- ... | | |-- utils.(py|R) | |-- __init.(py|R) | |-- 1_load_data.(py|R) | |-- 2_preprocessing.(py|R) | |-- 3_feature_engineering.(py|R) | |-- 4_model_training.(py|R) | |-- 5_evaluate_training.(py|R) | |-- ... | |-- config.yml | |-- config.(dev|release).yml | |-- secrets.yml | |-- secrets.(dev|release).yml |-- data/ | |-- {data_version}/ |-- experiments/ | |-- {experiment_version}/ | |-- cache/ | |-- output/ | |-- report.md |-- logs/ | |-- {experiment_name}_{stage_name}_{timestamp}.log |-- tests/ |-- docs |-- notebooks |-- README.md |-- LICENSE |-- requirements.txt |-- conda_env.yml |-- Dockerfile |-- Dockerfile.release |-- .dockerignore |-- .gitignore ``` #### Branches ``` main dev features/<name_issue#> fix/<name_bug#> ``` #### `README.md` Template TODO -
codez0mb1e revised this gist
Nov 27, 2021 . 1 changed file with 15 additions and 2 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,6 +1,8 @@ ### Data Science Project #### Repository Structure: Directories: ``` src/ @@ -41,3 +43,14 @@ LICENSE requirements.txt Dockerfile ``` Branches: ``` main dev features/<name_issue#> fixes/<name_bug#> ``` -
codez0mb1e renamed this gist
Nov 27, 2021 . 1 changed file with 0 additions and 0 deletions.There are no files selected for viewing
File renamed without changes. -
codez0mb1e created this gist
Nov 27, 2021 .There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,43 @@ # Data Science Project ## Repository structure ``` src/ | core/ | | configuration.(py|R) | | experiment.(py|R) | | logging.(py|R) | | ... | | helpers.py | training/ | | model.(py|R) | | preprocessing.(py|R) | | ... | | helpers.py | __init.(py|R) | 1_load_data.(py|R) | 2_preprocessing.(py|R) | 3_feature_engineering.(py|R) | 4_model_training.(py|R) | 5_evaluate_training.(py|R) | ... | config.release.yml | config.debug.yml | secrets.release.yml | secrets.debug.yml data/ | | {data_version}/ experiments/ | {experiment_version}/ | | cache/ | | output/ | | report.md logs/ | {experiment_name}_{stage_name}_{timestamp}.log tests/ README.md LICENSE requirements.txt Dockerfile ```