Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

hoxbro
Copy link
Member

@hoxbro hoxbro commented Apr 24, 2025

Still very much draft... A lot of the logic is currently copied/pasted from the PandasInterface.

import polars as pl
import pyarrow as pa

import holoviews as hv

hv.extension("bokeh", "matplotlib", "plotly")

data = {"a": [1, 2, 3], "b": [4, 5, 6]}
df_pandas = pd.DataFrame(data)
df_polars = pl.DataFrame(data)
table_pa = pa.table(data)

layout = hv.Curve(df_polars) + hv.Scatter(table_pa)
layout

image

Copy link

codecov bot commented Apr 28, 2025

Codecov Report

❌ Patch coverage is 90.75713% with 94 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.06%. Comparing base (2dfe271) to head (3351438).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
holoviews/core/data/narwhals.py 84.57% 62 Missing ⚠️
...oloviews/tests/core/data/test_narwhalsinterface.py 92.92% 15 Missing ⚠️
holoviews/tests/core/data/test_cudfinterface.py 30.00% 7 Missing ⚠️
holoviews/util/transform.py 72.72% 3 Missing ⚠️
holoviews/core/data/cudf.py 50.00% 1 Missing ⚠️
holoviews/core/data/dictionary.py 50.00% 1 Missing ⚠️
holoviews/core/util/__init__.py 96.96% 1 Missing ⚠️
holoviews/operation/datashader.py 93.75% 1 Missing ⚠️
holoviews/operation/element.py 87.50% 1 Missing ⚠️
holoviews/plotting/bokeh/graphs.py 88.88% 1 Missing ⚠️
... and 1 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6567      +/-   ##
==========================================
+ Coverage   88.97%   89.06%   +0.09%     
==========================================
  Files         328      331       +3     
  Lines       70320    71147     +827     
==========================================
+ Hits        62570    63370     +800     
- Misses       7750     7777      +27     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@@ -645,7 +645,7 @@ def select(self, selection_expr=None, selection_specs=None, **selection):
return self

# Handle selection dim expression
if selection_expr is not None:
if selection_expr is not None and selection_expr.ops:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would also make something like hv.Dataset(df_pandas).select(selection_expr=hv.dim('a')) fail

Comment on lines 41 to 43
datatypes = ['dataframe', 'dictionary', 'grid', 'xarray', 'multitabular',
'spatialpandas', 'dask_spatialpandas', 'dask', 'cuDF', 'array',
'spatialpandas', 'dask_spatialpandas', 'dask', 'cuDF', 'array', 'narwhals',
'ibis']
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open for updating the position here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Position seems fine, though question is at what point we deprecate dask and cuDF interfaces (if ever).

@hoxbro hoxbro marked this pull request as ready for review September 3, 2025 16:33
@hoxbro
Copy link
Member Author

hoxbro commented Sep 3, 2025

I think this PR is in a good state. Likely, there are some rough edges, but don’t think that should stop a review/merge.

@hoxbro hoxbro requested a review from philippjfr September 3, 2025 17:56
if isinstance(df, (nw.DataFrame, nw.LazyFrame)):
df = df.select(list(map(str, kdims + vdims)))
if isinstance(df, nw.LazyFrame):
df = df.collect()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which LazyFrame types does narwhals support these days? Can we check if it's backed by a dask dataframe and avoid collect for that case?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup you can do if df.implementation.is_dask():

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have done this in 2c6388e

return False


class NarwhalsDtype:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would implement type as well if possible.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented like this:

    @property
    def type(self):
        return type(self.dtype)

col = nw.col(name)
else:
col = nw.col(name).drop_nulls()
# NOTE: Some narwhals backends (duckdb) will return nan as
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, there's no nanmin/nanmax?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Polars defaults to ignoring NaN values, so Polars' nan_min actually propagates nan instead of ignoring them

In general NaN values are rare to encounter outside of pandas, as other libraries have proper null value support (and uniformly use null to indicate missing data) and only result from undefined mathematical operations like 0/0 or log(-1). My general advise is to just deal with null values (as you're already doing here with drop_nulls) and let each backend deal with its own definition of null, and then let NaN be treated as a user-error. Alternatively, you can call fill_nan but note that it's only supported on float columns

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was some discussion previously #6567 (comment) and #6567 (comment).

if isinstance(selection_mask, np.ndarray):
# Boolean ndarray does not work, so we convert it to list
# If the dtype is not boolean, we let narwhals error in filter
selection_mask = selection_mask.tolist()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that sucks.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey - I just checked and Polars allows this:

In [4]: df = pl.DataFrame({'a': [3,3,2,1], 'b': [3,2,2,1], 'c': [1,2,3,4]})

In [5]: df.filter((df['a']>1).to_numpy())
Out[5]:
shape: (3, 3)
┌─────┬─────┬─────┐
│ abc   │
│ --------- │
│ i64i64i64 │
╞═════╪═════╪═════╡
│ 331   │
│ 322   │
│ 223   │
└─────┴─────┴─────┘

whereas Narwhals errors

I think we should allow for this in Narwhals (at least, for the eager case). Would it be OK for you to have this supported just for the eager case?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I don't think it can be supported properly for the lazy case, e.g. for our dask interface we don't support it either.

Copy link

@FBruzzesi FBruzzesi Sep 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey sorry to chime in. Without waiting for a new narwhals release/having to pin narwhals to latest one, it is still possible to avoid casting to list by converting numpy to a narwhals series backed by the backend:

import narwhals as nw
import numpy as np
import pyarrow as pa

frame = nw.from_native(pa.table({"a": [1,2,3]}), eager_only=True)
mask_np = np.array([True, False, True])
mask_nw = nw.new_series(
    name="mask",
    values=mask_np,
    backend=frame.implementation,
)

# or in more recent versions, even better
# mask_nw = nw.Series.from_numpy("mask", mask_np, backend=frame.implementation)

frame.filter(mask_nw)
┌──────────────────┐
|Narwhals DataFrame|
|------------------|
|  pyarrow.Table   |
|  a: int64        |
|  ----            |
|  a: [[1,3]]      |
└──────────────────┘

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems more elegant, thanks @FBruzzesi.

And you are always welcome to review the code 👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return data.collect()[dim.name]
else:
return data # Cannot slice LazyFrame
return data[dim.name]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the return type here, e.g. pandas this would be a pd.Series if keep_index else np.ndarray?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will return a nw.Series, except if compute is False, then a LazyFrame input will return a single-column LazyFrame.


"""
if issubclass(dataset.interface, NarwhalsInterface):
return dataset.data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely a considerable broadening of our definition of DataFrame.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to force it to pandas? Or should we try this out for now and enable it if we see too many problems with it?

@maximlt
Copy link
Member

maximlt commented Sep 4, 2025

What is the plan for documenting this?

@hoxbro
Copy link
Member Author

hoxbro commented Sep 9, 2025

What is the plan for documenting this?

I don't think there should be many documentation updates for this, other than the updates I pushed in 4c80fdb.

We should definitely highlight that this is now supported in the release notes and other announcements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants