-
Couldn't load subscription status.
- Fork 6
Fix length in custom datasets #160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThe pull request adds a new method, Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant BaseDataset
participant FilterMethod as _get_lf_idx_list
note over BaseDataset: Dataset initialization
Client->>BaseDataset: Instantiate dataset
BaseDataset->>FilterMethod: Call _get_lf_idx_list()
FilterMethod-->>BaseDataset: Return filtered index list (lf_idx_list)
BaseDataset->>BaseDataset: Set lf_idx_list attribute
note over BaseDataset: Cache population
Client->>BaseDataset: Call _fill_cache()
BaseDataset->>BaseDataset: Iterate over lf_idx_list
BaseDataset->>Client: Process each valid labeled frame
Poem
Tip β‘π§ͺ Multi-step agentic review comment chat (experimental)
β¨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. πͺ§ TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
π§Ή Nitpick comments (1)
sleap_nn/data/custom_datasets.py (1)
91-105: Excellent implementation of the new method for determining valid labeled framesThis is the core fix for the issue described in the PR. The method correctly filters labeled frames based on user instances when configured and skips empty instances, creating a reliable index list for the dataset.
Consider simplifying the nested if statements for better readability:
- # Filter to user instances - if self.data_config.user_instances_only: - if lf.user_instances is not None and len(lf.user_instances) > 0: - lf.instances = lf.user_instances + # Filter to user instances + if self.data_config.user_instances_only and lf.user_instances is not None and len(lf.user_instances) > 0: + lf.instances = lf.user_instancesπ§° Tools
πͺ Ruff (0.8.2)
96-97: Use a single
ifstatement instead of nestedifstatements(SIM102)
π Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
π Files selected for processing (1)
sleap_nn/data/custom_datasets.py(7 hunks)
π§° Additional context used
𧬠Code Definitions (1)
sleap_nn/data/custom_datasets.py (1)
sleap_nn/inference/predictors.py (4) (4)
data_config(213:214)data_config(526:534)data_config(965:970)data_config(1339:1344)
πͺ Ruff (0.8.2)
sleap_nn/data/custom_datasets.py
96-97: Use a single if statement instead of nested if statements
(SIM102)
β° Context from checks skipped due to timeout of 90000ms (4)
- GitHub Check: Tests (macos-14, Python 3.9)
- GitHub Check: Tests (windows-latest, Python 3.9)
- GitHub Check: Tests (ubuntu-latest, Python 3.9)
- GitHub Check: Lint
π Additional comments (8)
sleap_nn/data/custom_datasets.py (8)
73-73: LGTM: Added initialization of lf_idx_listThis change initializes
lf_idx_listin theBaseDatasetconstructor, which will be used to track valid labeled frames.
122-124: Good modification to use the filtered index listThis change properly uses
lf_idx_listto iterate over valid labeled frames instead of all frames, which is more efficient and ensures that only valid frames are processed.
167-170: Fixed cache key management for numpy chunksUsing the loop index rather than the frame index for filenames ensures sequential file numbering and proper access during loading, which is important for consistent behavior.
172-172: Consistent cache key handling for in-memory cachingThis change ensures that the in-memory cache also uses a sequential index for keys, maintaining consistency with the file-based approach.
183-183: Key fix for the dataset length issueThis change addresses the core problem identified in the PR description. Now
__len__returns the actual number of valid labeled frames rather than depending on the cache, which solves the issue where length would be 0 when numpy chunks are reused.
597-599: Correctly applying the same pattern in CentroidDataset classThe same approach for iterating through valid labeled frames is now applied in the CentroidDataset class, ensuring consistency across the codebase.
649-652: Consistent cache management in CentroidDatasetThe filename and cache key generation has been updated to match the changes in the BaseDataset class, maintaining consistency across classes.
654-654: In-memory cache consistency in CentroidDatasetThis ensures that in-memory caching in CentroidDataset follows the same pattern as BaseDataset.
Codecov ReportAll modified and coverable lines are covered by tests β
Additional details and impacted files@@ Coverage Diff @@
## main #160 +/- ##
=======================================
Coverage 97.00% 97.01%
=======================================
Files 46 46
Lines 4945 4961 +16
=======================================
+ Hits 4797 4813 +16
Misses 148 148 β View full report in Codecov by Sentry. π New features to boost your workflow:
|
This PR fixes a bug in computing the length of custom datasets in
sleap_nn.data.custom_datasets.BaseDatasetclass. It computes the length of the dataset fromself.cache. However, this is not initialized when we are reusing the numpy chunks, which setslen(dataset)as 0. In this PR, we change this by creating alf_idx_listwhich computes the list of user labelled frames from the labels object and computing the length of the dataset using this index list.Summary by CodeRabbit