Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

philippjfr
Copy link
Member

This PR proposes an API change (or rather an additional signature) for Dataset.select.

Currently we support passing per-dimensions select specifications as keyword arguments. This is generally quite convenient because in most cases dimensions are valid identifiers so the keyword syntax, e.g. .select(x=(0, 10)) provides the shortest and most convenient syntax. However when the dimension name is not a valid identifier, e.g. it's a string digit or contains other non-valid identifiers you have to write it out using dictionary unpacking:

x = hv.Dimension('0')

curve = hv.Curve([], kdims=[x])

curve.select(**{'0': (1, 3)})

This is not even particularly contrived because when you construct an element from a pandas DataFrame with default column names this is what happens. While a little cumbersome this use case at least works.

However, we are currently in the process of creating a new data interface for anndata and because of the complex data model we have to create special dimension objects, which are not easily mapped onto simple string names. This is where the current select approach completely breaks down since keyword arguments must be string based we cannot perform a select operation on elements backed by an anndata dataset, e.g.:

x = hv.Dimension('0')

curve = hv.Curve([], kdims=[x])

curve.select(**{x: 1})

Will error because x is not a string. Therefore I propose we overload the selection_expr argument for .select making it possible to instead write select operations as:

x = hv.Dimension('0')

curve = hv.Curve([], kdims=[x])

curve.select({x: 1})

This is fully backward compatible since it would previously just error. Before I do any more work on this I'd love to hear feedback.

@philippjfr philippjfr changed the title Support passing selection dictionary to Dataset.select enh: Support passing selection dictionary to Dataset.select Jun 12, 2025
@hoxbro
Copy link
Member

hoxbro commented Jun 12, 2025

I don't see any problem with this approach.

Copy link

codecov bot commented Jun 12, 2025

Codecov Report

Attention: Patch coverage is 93.10345% with 2 lines in your changes missing coverage. Please review.

Project coverage is 88.83%. Comparing base (d936f70) to head (9dc1d8b).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
holoviews/core/data/__init__.py 80.00% 1 Missing ⚠️
holoviews/element/raster.py 83.33% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #6617   +/-   ##
=======================================
  Coverage   88.83%   88.83%           
=======================================
  Files         327      328    +1     
  Lines       69708    69743   +35     
=======================================
+ Hits        61922    61954   +32     
- Misses       7786     7789    +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@philippjfr
Copy link
Member Author

Found out that Graph.select and Image.select were not aligned with the Dataset.select signature.

@philippjfr philippjfr added this to the 1.21.0 milestone Jun 13, 2025
@philippjfr philippjfr added tag: API type: enhancement Minor feature or improvement to an existing feature tag: component: data labels Jun 13, 2025
@philippjfr philippjfr merged commit 9f9d599 into main Jun 13, 2025
14 checks passed
@philippjfr philippjfr deleted the select_dictionary branch June 13, 2025 09:40
@droumis droumis added this to NIH-NCI Jun 16, 2025
@@ -415,6 +415,13 @@ def select(self, selection_specs=None, **selection):
specs match the selected object.

"""
if isinstance(selection_expr, dict):
Copy link
Contributor

@flying-sheep flying-sheep Jun 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be Mapping. Never check isinstance(..., list) or isinstance(..., dict) unless you know exactly why you want to treat these different from other Sequences and Mappings.

@@ -629,6 +635,13 @@ def select(self, selection_expr=None, selection_specs=None, **selection):
or a scalar if a single value was selected
"""
from ...util.transform import dim
if isinstance(selection_expr, dict):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@flying-sheep
Copy link
Contributor

This doesn't work at all when actually passing dimensions, see https://github.com/holoviz-topics/hv-anndata/blob/586e34beabeed853be172b9982e3efa7737834cd/tests/test_interface.py#L97-L121

  1. when passing a dict[Dimension, Any], like select({A.obs["type"]: 0}), it breaks here

    data = self.interface.select(self, **selection)

    TypeError: keywords must be strings
    
  2. when passing strings that can be parsed as Dimension, it returns early here

    selection = {dim: sel for dim, sel in selection.items() if dim in sel_dims}
    if (selection_specs and not any(self.matches(sp) for sp in selection_specs)
    or (not selection and not selection_expr)):
    return self

@philippjfr
Copy link
Member Author

Thanks, rather than changing all the Interface.select implementations I'll probably just map the dimension to it's index inside Dateset.select.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tag: API tag: component: data type: enhancement Minor feature or improvement to an existing feature
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

3 participants