Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@paulinek13
Copy link
Contributor

@paulinek13 paulinek13 commented Mar 13, 2025

Description

Related issue: #775

This PR is about refactoring the dataset fetching functions to improve their organization and maintainability as the codebase grows and new datasets are introduced.

πŸ› οΈ The main changes:

  1. Extracting individual dataset fetching functions from fetch_example_datasets.py into separate files (similar to how converters are handled)
  2. Moving dataset tests to a dedicated directory (tests/unit/datasets)
  3. Improving docs by sorting dataset functions alphabetically in both implementation (__init__.py) and docs (api.rst)

✏️ Other modifications:

  • Added two missing functions related to datasets to the API reference: fetch_babelscape_alert_dataset and fetch_librAI_do_not_answer_dataset
  • Updated references and imports to maintain functionality after refactoring (and to fix broken tests)
  • Renamed fetch_example_datasets.py to fetch_examples.py
  • Updated .pre-commit-config.yaml
  • Updated /doc files: doc/code/datasets/0_dataset.md, doc/code/datasets/2_fetch_dataset.ipynb, doc/code/datasets/2_fetch_dataset.py

Close #775

@paulinek13 paulinek13 changed the title [DRAFT] REFACTOR: improve organization and maintainability of dataset fetch functions [DRAFT] MAINT: improve organization and maintainability of dataset fetch functions Mar 15, 2025
@paulinek13 paulinek13 changed the title [DRAFT] MAINT: improve organization and maintainability of dataset fetch functions [DRAFT] MAINT: improve organization of dataset fetch functions (refactoring) Mar 15, 2025
@paulinek13
Copy link
Contributor Author

paulinek13 commented Mar 15, 2025

This is almost ready to be reviewed. I just have a question:

Should I update the blog post about Datasets and Seed Prompts since, after the changes I've made in this PR, it will no longer be up-to-date? It's about the following paragraph specifically: 2025_02_11.md#loading-datasets-with-seed-prompts

I'll absolutely update the User guide for Datasets. Just wondering whether I should also modify the blog post πŸ˜„

@romanlutz
Copy link
Contributor

This is almost ready to be reviewed. I just have a question:

Should I update the blog post about Datasets and Seed Prompts since, after the changes I've made in this PR, it will no longer be up-to-date? It's about the following paragraph specifically: 2025_02_11.md#loading-datasets-with-seed-prompts

I'll absolutely update the User guide for Datasets. Just wondering whether I should also modify the blog post πŸ˜„

Awesome! We usually don't update blog posts substantially, but this is easy enough of a fix that I'm inclined to make the change. CC @eugeniavkim

I would replace

They are in the fetch_example_datasets.py file.

with

They are in the pyrit.datasets module.

@paulinek13 paulinek13 changed the title [DRAFT] MAINT: improve organization of dataset fetch functions (refactoring) MAINT: improve organization of dataset fetch functions (refactoring) Mar 15, 2025
@paulinek13 paulinek13 marked this pull request as ready for review March 15, 2025 11:35
Copy link
Contributor

@romanlutz romanlutz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! This is perfect!

@paulinek13
Copy link
Contributor Author

paulinek13 commented Mar 15, 2025

I see the checks are failing. I've run pytest tests/unit && pre-commit run --all-files before marking this PR as ready for review and it was successful πŸ€”, but I used Python 3.11
Now I tried it locally but with Python 3.10 and it's failing as in the checks

I'll try to fix the problem tomorrow πŸ˜ƒ

@romanlutz
Copy link
Contributor

There might be a naming collision since fetch_examples is both the file and function name. But that's just a guess.

@paulinek13
Copy link
Contributor Author

There might be a naming collision since fetch_examples is both the file and function name. But that's just a guess.

That's right, renaming did the trick! Thank you so much!

@romanlutz romanlutz merged commit 3779df9 into Azure:main Mar 17, 2025
18 checks passed
@romanlutz
Copy link
Contributor

Fantastic @paulinek13 !!! Thanks once again for a great contribution.

@paulinek13 paulinek13 deleted the refactor/775/improve_datasets_organization branch March 17, 2025 06:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

REFACTOR/DOCS: sort datasets fetch functions alphabetically

2 participants