-
-
Notifications
You must be signed in to change notification settings - Fork 407
feat: Support Narwhals #6567
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Support Narwhals #6567
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #6567 +/- ##
==========================================
+ Coverage 89.03% 89.08% +0.04%
==========================================
Files 329 331 +2
Lines 70455 71213 +758
==========================================
+ Hits 62728 63437 +709
- Misses 7727 7776 +49 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
||
# Handle selection dim expression | ||
if selection_expr is not None: | ||
if selection_expr is not None and selection_expr.ops: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would also make something like hv.Dataset(df_pandas).select(selection_expr=hv.dim('a'))
fail
datatypes = ['dataframe', 'dictionary', 'grid', 'xarray', 'multitabular', | ||
'spatialpandas', 'dask_spatialpandas', 'dask', 'cuDF', 'array', | ||
'spatialpandas', 'dask_spatialpandas', 'dask', 'cuDF', 'array', 'narwhals', | ||
'ibis'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Open for updating the position here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Position seems fine, though question is at what point we deprecate dask and cuDF interfaces (if ever).
I think this PR is in a good state. Likely, there are some rough edges, but don’t think that should stop a review/merge. |
if isinstance(df, (nw.DataFrame, nw.LazyFrame)): | ||
df = df.select(list(map(str, kdims + vdims))) | ||
if isinstance(df, nw.LazyFrame): | ||
df = df.collect() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which LazyFrame types does narwhals support these days? Can we check if it's backed by a dask dataframe and avoid collect for that case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup you can do if df.implementation.is_dask():
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have done this in 2c6388e
return data.collect()[dim.name] | ||
else: | ||
return data # Cannot slice LazyFrame | ||
return data[dim.name] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the return type here, e.g. pandas this would be a pd.Series if keep_index else np.ndarray
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will return a nw.Series, except if compute
is False, then a LazyFrame
input will return a single-column LazyFrame
.
""" | ||
if issubclass(dataset.interface, NarwhalsInterface): | ||
return dataset.data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely a considerable broadening of our definition of DataFrame.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to force it to pandas? Or should we try this out for now and enable it if we see too many problems with it?
What is the plan for documenting this? |
I don't think there should be many documentation updates for this, other than the updates I pushed in 4c80fdb. We should definitely highlight that this is now supported in the release notes and other announcements. |
Still very much draft... A lot of the logic is currently copied/pasted from the
PandasInterface
.