-
Notifications
You must be signed in to change notification settings - Fork 9
feat: Adds update_dataset_from_dir #430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Adds update_dataset_from_dir #430
Conversation
This pull request has been linked to Shortcut Story #833032: Ingest into dataset from directory. |
8916a0f
to
398a17d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!! Small comments regarding the usage of the functions.
nucleus/__init__.py
Outdated
@@ -1285,15 +1296,88 @@ def create_dataset_from_dir( | |||
|
|||
if len(items) == 0: | |||
print(f"Did not find any items in {dirname}") | |||
return None | |||
return existing_dataset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if it would be actually better to create the dataset in this case also. Even though there were not items in the directory, we wanted to create the dataset in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess that's a question for @jean-lucas , since the original behaviour is not to create a dataset if the directory is empty or nothing was found
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point, i think the dataset should still be created. I think the original behaviour was misleading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed
nucleus/__init__.py
Outdated
skip_size_warning=skip_size_warning, | ||
) | ||
|
||
def update_dataset_from_dir( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this one would be better to be had on the dataset class itself. I think it's a better workflow to create/fetch a dataset explicitly then ingesting items into it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I think that's better encapsulation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's a good suggestion. It would avoid the user having to copy the dataset id.
I think we can even rename it to:
dataset.add_images_from_dir(dirname)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good - can you also add a method to the CLI? See cli/*
- this is a perfect use case for a CLI tool. LMK if you need an intro into click
and rich
.
nucleus/__init__.py
Outdated
skip_size_warning=skip_size_warning, | ||
) | ||
|
||
def update_dataset_from_dir( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I think that's better encapsulation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall LGTM, tried it out, worked well.
I agree with @ntamas92 comments.
Before merging though, lets update the changelog and version
db6b48a
to
54e7972
Compare
54e7972
to
5c9519e
Compare
Yes, but i think it's better to do it in a different PR |
6731c6e
to
ef147e0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
No description provided.