Thanks to visit codestin.com
Credit goes to gist.github.com

Skip to content

Instantly share code, notes, and snippets.

@codez0mb1e
Last active April 6, 2025 12:10
Show Gist options
  • Save codez0mb1e/135f5d4b1d0440984d4b5eb094ddfa9a to your computer and use it in GitHub Desktop.
Save codez0mb1e/135f5d4b1d0440984d4b5eb094ddfa9a to your computer and use it in GitHub Desktop.

Revisions

  1. codez0mb1e revised this gist Sep 11, 2024. 1 changed file with 42 additions and 34 deletions.
    76 changes: 42 additions & 34 deletions ds_project_template.md
    Original file line number Diff line number Diff line change
    @@ -1,51 +1,72 @@
    ### Template for Data Science Project
    # Template for Data Science Project

    #### Repository Structure
    ## Main Principles

    - **Reproducibility**:
    - code files: under version control, code review
    - data: data pipeline or snapshots
    - environment: venv/conda/docker
    - models: training pipeline or pickled models, saved hyper-parameters and metrics
    - *experiment*: tracking, report
    - **Maintainability**:
    - code: modularity, code review, documentation, logging
    - data: data quality checks, format docs, metadata, data versioning
    - environment: venv/conda, requirements.txt
    - models: hyper-parameters as configuration, model versioning
    - *experiment*: code/data/env/models comparison using its artifacts, changelog
    - **Security and Privacy**:
    - No data outside DMZ.

    ## Repository Structure

    Directories:

    ```
    ```md
    |-- src/
    | |-- core/ <- Core functions and utils
    | |-- core/ <- Core functions and utils
    | |-- abstracts.(py|R)
    | |-- configuration.(py|R)
    | |-- experiment.(py|R)
    | |-- logging.(py|R)
    | |-- ...
    | |-- utils.(py|R)
    | |-- training/
    | |-- model.(py|R)
    | |-- preprocessing.(py|R)
    | |-- model.(py|R) <- Model definition
    | |-- preprocessing.(py|R) <- Preprocessing functions
    | |-- ...
    | |-- utils.(py|R)
    | |-- __init.(py|R)
    | |-- 1_load_data.(py|R)
    | |-- 2_preprocessing.(py|R)
    | |-- 3_feature_engineering.(py|R)
    | |-- 4_model_training.(py|R)
    | |-- 5_model_evaluation.(py|R)
    | |-- 1_load_data.(py|R) <- Data loading pipeline
    | |-- 2_preprocessing.(py|R) <- Data preprocessing pipeline
    | |-- 2.1_hypothesis_1.ipynb <- Hypothesis testing and data exploration notebook
    | |-- 2.2_hypothesis_2.ipynb
    | |-- 3_feature_engineering.(py|R) <- Feature engineering pipeline
    | |-- 4_model_training.(py|R) <- Model training pipeline, e.g. hyper-params optimization
    | |-- 5_model_evaluation.(py|R) <- Model evaluation pipeline
    | |-- ...
    | |-- config.yml
    | |-- config.(dev|release).yml
    | |-- config-(dev|release).yml
    | |-- secrets.yml
    | |-- secrets.(dev|release).yml
    |-- data/
    | |-- {data_version}/
    |-- experiments/
    | |-- secrets-(dev|release).yml
    |-- data/ <- Data directory (not under version control, in S3)
    | |-- {data_version}/ <- Raw data
    |-- experiments/ <- Experiments artifacts, outputs and temp files
    | |-- {experiment_version}/
    | |-- cache/
    | |-- output/
    | |-- report.md
    | |-- cache/ <- Cache for different experiment stages
    | |-- output/ <- validate dataset, test dataset, hyper-opt artifacts, plots
    | |-- models/ or model.pkl <- Final model (or models ensemble)
    | |-- report.md <- Manual report
    | |-- changelog <- Automated report
    |-- logs/
    | |-- {experiment_name}_{stage_name}_{timestamp}.log
    |-- tests/
    | |-- unit/
    | |-- integration/
    | |-- e2e/
    |-- docs/
    |-- notebooks/
    |-- labs/ <- Jupyter notebooks and other experiments
    |-- requirements.txt
    |-- requirements-dev.txt
    |-- conda_env.yml
    |-- Dockerfile
    |-- Dockerfile.release
    |-- .dockerignore
    @@ -58,16 +79,3 @@ Directories:
    |-- LICENSE
    |-- CHANGELOG
    ```

    #### Branches

    ```
    main
    dev
    features/<name>
    fix/<bug#>
    ```

    #### `README.md` Template

    TBD
  2. codez0mb1e revised this gist Jan 28, 2024. 1 changed file with 10 additions and 10 deletions.
    20 changes: 10 additions & 10 deletions ds_project_template.md
    Original file line number Diff line number Diff line change
    @@ -7,16 +7,16 @@ Directories:
    ```
    |-- src/
    | |-- core/ <- Core functions and utils
    | | |-- configuration.(py|R)
    | | |-- experiment.(py|R)
    | | |-- logging.(py|R)
    | | |-- ...
    | | |-- utils.(py|R)
    | |-- configuration.(py|R)
    | |-- experiment.(py|R)
    | |-- logging.(py|R)
    | |-- ...
    | |-- utils.(py|R)
    | |-- training/
    | | |-- model.(py|R)
    | | |-- preprocessing.(py|R)
    | | |-- ...
    | | |-- utils.(py|R)
    | |-- model.(py|R)
    | |-- preprocessing.(py|R)
    | |-- ...
    | |-- utils.(py|R)
    | |-- __init.(py|R)
    | |-- 1_load_data.(py|R)
    | |-- 2_preprocessing.(py|R)
    @@ -50,7 +50,7 @@ Directories:
    |-- Dockerfile.release
    |-- .dockerignore
    |-- .gitignore
    |-- .github/workflows
    |-- .github/workflows/
    | |-- build.yml
    | |-- release.yml
    |-- run.(sh|ps)
  3. codez0mb1e revised this gist Jan 28, 2024. 1 changed file with 20 additions and 11 deletions.
    31 changes: 20 additions & 11 deletions ds_project_template.md
    Original file line number Diff line number Diff line change
    @@ -22,7 +22,7 @@ Directories:
    | |-- 2_preprocessing.(py|R)
    | |-- 3_feature_engineering.(py|R)
    | |-- 4_model_training.(py|R)
    | |-- 5_evaluate_training.(py|R)
    | |-- 5_model_evaluation.(py|R)
    | |-- ...
    | |-- config.yml
    | |-- config.(dev|release).yml
    @@ -32,33 +32,42 @@ Directories:
    | |-- {data_version}/
    |-- experiments/
    | |-- {experiment_version}/
    | |-- cache/
    | |-- output/
    | |-- report.md
    | |-- cache/
    | |-- output/
    | |-- report.md
    |-- logs/
    | |-- {experiment_name}_{stage_name}_{timestamp}.log
    |-- tests/
    |-- docs
    |-- notebooks
    |-- README.md
    |-- LICENSE
    | |-- unit/
    | |-- integration/
    | |-- e2e/
    |-- docs/
    |-- notebooks/
    |-- requirements.txt
    |-- requirements-dev.txt
    |-- conda_env.yml
    |-- Dockerfile
    |-- Dockerfile.release
    |-- .dockerignore
    |-- .gitignore
    |-- .github/workflows
    | |-- build.yml
    | |-- release.yml
    |-- run.(sh|ps)
    |-- README.md
    |-- LICENSE
    |-- CHANGELOG
    ```

    #### Branches

    ```
    main
    dev
    features/<name_issue#>
    fix/<name_bug#>
    features/<name>
    fix/<bug#>
    ```

    #### `README.md` Template

    TODO
    TBD
  4. codez0mb1e revised this gist Dec 5, 2021. 1 changed file with 49 additions and 41 deletions.
    90 changes: 49 additions & 41 deletions ds_project_template.md
    Original file line number Diff line number Diff line change
    @@ -1,56 +1,64 @@
    ### Data Science Project
    ### Template for Data Science Project

    #### Repository Structure:
    #### Repository Structure

    Directories:

    ```
    src/
    | core/
    | | configuration.(py|R)
    | | experiment.(py|R)
    | | logging.(py|R)
    | | ...
    | | helpers.py
    | training/
    | | model.(py|R)
    | | preprocessing.(py|R)
    | | ...
    | | helpers.py
    | __init.(py|R)
    | 1_load_data.(py|R)
    | 2_preprocessing.(py|R)
    | 3_feature_engineering.(py|R)
    | 4_model_training.(py|R)
    | 5_evaluate_training.(py|R)
    | ...
    | config.release.yml
    | config.debug.yml
    | secrets.release.yml
    | secrets.debug.yml
    data/
    | | {data_version}/
    experiments/
    | {experiment_version}/
    | | cache/
    | | output/
    | | report.md
    logs/
    | {experiment_name}_{stage_name}_{timestamp}.log
    tests/
    README.md
    LICENSE
    requirements.txt
    Dockerfile
    |-- src/
    | |-- core/ <- Core functions and utils
    | | |-- configuration.(py|R)
    | | |-- experiment.(py|R)
    | | |-- logging.(py|R)
    | | |-- ...
    | | |-- utils.(py|R)
    | |-- training/
    | | |-- model.(py|R)
    | | |-- preprocessing.(py|R)
    | | |-- ...
    | | |-- utils.(py|R)
    | |-- __init.(py|R)
    | |-- 1_load_data.(py|R)
    | |-- 2_preprocessing.(py|R)
    | |-- 3_feature_engineering.(py|R)
    | |-- 4_model_training.(py|R)
    | |-- 5_evaluate_training.(py|R)
    | |-- ...
    | |-- config.yml
    | |-- config.(dev|release).yml
    | |-- secrets.yml
    | |-- secrets.(dev|release).yml
    |-- data/
    | |-- {data_version}/
    |-- experiments/
    | |-- {experiment_version}/
    | |-- cache/
    | |-- output/
    | |-- report.md
    |-- logs/
    | |-- {experiment_name}_{stage_name}_{timestamp}.log
    |-- tests/
    |-- docs
    |-- notebooks
    |-- README.md
    |-- LICENSE
    |-- requirements.txt
    |-- conda_env.yml
    |-- Dockerfile
    |-- Dockerfile.release
    |-- .dockerignore
    |-- .gitignore
    ```

    Branches:
    #### Branches

    ```
    main
    dev
    features/<name_issue#>
    fixes/<name_bug#>
    fix/<name_bug#>
    ```

    #### `README.md` Template

    TODO
  5. codez0mb1e revised this gist Nov 27, 2021. 1 changed file with 15 additions and 2 deletions.
    17 changes: 15 additions & 2 deletions ds_project_template.md
    Original file line number Diff line number Diff line change
    @@ -1,6 +1,8 @@
    # Data Science Project
    ### Data Science Project

    ## Repository structure
    #### Repository Structure:

    Directories:

    ```
    src/
    @@ -41,3 +43,14 @@ LICENSE
    requirements.txt
    Dockerfile
    ```

    Branches:

    ```
    main
    dev
    features/<name_issue#>
    fixes/<name_bug#>
    ```


  6. codez0mb1e renamed this gist Nov 27, 2021. 1 changed file with 0 additions and 0 deletions.
    File renamed without changes.
  7. codez0mb1e created this gist Nov 27, 2021.
    43 changes: 43 additions & 0 deletions gistfile1.txt
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,43 @@
    # Data Science Project

    ## Repository structure

    ```
    src/
    | core/
    | | configuration.(py|R)
    | | experiment.(py|R)
    | | logging.(py|R)
    | | ...
    | | helpers.py
    | training/
    | | model.(py|R)
    | | preprocessing.(py|R)
    | | ...
    | | helpers.py
    | __init.(py|R)
    | 1_load_data.(py|R)
    | 2_preprocessing.(py|R)
    | 3_feature_engineering.(py|R)
    | 4_model_training.(py|R)
    | 5_evaluate_training.(py|R)
    | ...
    | config.release.yml
    | config.debug.yml
    | secrets.release.yml
    | secrets.debug.yml
    data/
    | | {data_version}/
    experiments/
    | {experiment_version}/
    | | cache/
    | | output/
    | | report.md
    logs/
    | {experiment_name}_{stage_name}_{timestamp}.log
    tests/
    README.md
    LICENSE
    requirements.txt
    Dockerfile
    ```