-
Notifications
You must be signed in to change notification settings - Fork 48
test: Add unit testing using local polars-based engine #1007
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ee86291
to
5df15b3
Compare
noxfile.py
Outdated
UNIT_TEST_EXTRAS: List[str] = ["polars"] | ||
UNIT_TEST_EXTRAS_BY_PYTHON: Dict[str, List[str]] = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make sure we run some unit tests without polars to prevent accidental hard dependencies on it.
UNIT_TEST_EXTRAS: List[str] = ["polars"] | |
UNIT_TEST_EXTRAS_BY_PYTHON: Dict[str, List[str]] = {} | |
UNIT_TEST_EXTRAS: List[str] = [] | |
UNIT_TEST_EXTRAS_BY_PYTHON: Dict[str, List[str]] = {"3.12": ["polars"]} |
Should be extended to dispatch based on bigframes schema types. | ||
""" | ||
|
||
@functools.singledispatchmethod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like this is causing a hard-ish dependency on polars if this module is ever imported. Might be able to exclude the method definition with an if block.
"Groupby rolling windows not yet implemented in polars engine" | ||
) | ||
# polars is columnar, so this is efficient | ||
# TODO: why can't just add collumns? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# TODO: why can't just add collumns? | |
# TODO: why can't just add columns? |
def get_args( | ||
self, | ||
agg: ex.Aggregation, | ||
) -> Sequence[pl.Expr]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add a doc string here explaining the need for it (projection-y stuff, IIRC?)
"Polars is not installed, cannot compile to polars engine." | ||
) | ||
|
||
# Polars has incomplete slice support in lazy mode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe TODO to generalize BFET -> BFET rewrites
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added comment
|
||
@compile_node.register | ||
def compile_rowcount(self, node: nodes.RowCountNode): | ||
rows = self.compile_node(node.child).count()[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can make this lazy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
made lazy with pl.len()
|
||
@compile_node.register | ||
def compile_slice(self, node: nodes.SliceNode): | ||
return self.compile_node(node.child)[node.start : node.stop : node.step] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dead code with the rewrites?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed this
return self.compile_node(node.child)[node.start : node.stop : node.step] | ||
|
||
@compile_node.register | ||
def compile_join(self, node: nodes.JoinNode): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a comment saying we always do ordered mode locally since it's easy with polars?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
@@ -31,12 +31,15 @@ def to_forward_offsets( | |||
elif start < 0: | |||
start = max(0, input_rows + start) | |||
else: | |||
start = min(start, input_rows) | |||
start = min(start, input_rows - 1) if step < 0 else start |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe comment about why different behavior depending on step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
from typing import Mapping, Optional, Union | ||
import weakref | ||
|
||
import polars |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should use polars = pytest.importorskip("polars")
Or maybe not necessary since this isn't a _test.py
file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems to be running fine for now as importing only in fixtures used for polars tests
tests/unit/test_local_engine.py
Outdated
|
||
|
||
@pytest.fixture(scope="module") | ||
def test_frame() -> pd.DataFrame: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we rename this? The leading test_
threw me for a loop before I realized it was a fixture.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed
tests/unit/test_local_engine.py
Outdated
|
||
bf_result = (bf_df_1 + bf_df_2).to_pandas() | ||
pd_result = pd_df_1 + pd_df_2 | ||
# Sort by index because ordering logic isn't quite consistent yet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe fixed now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eh, I made it consitent with sql engine now, but still not with pandas. Pandas order I think is a bit of a runtime decision
tests/unit/test_local_engine.py
Outdated
(1, None, None), | ||
(None, 4, None), | ||
(None, None, 2), | ||
(None, 50000000000, 1), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(None, 50000000000, 1), | |
(None, 50_000_000_000, 1), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
tests/unit/test_local_engine.py
Outdated
(5, 4, None), | ||
(3, None, 2), | ||
(1, 7, 2), | ||
(1, 7, 50000000000), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(1, 7, 50000000000), | |
(1, 7, 50_000_000_000), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Fixes #<issue_number_goes_here> 🦕