Thanks to visit codestin.com
Credit goes to github.com

Skip to content

test: Add unit testing using local polars-based engine #1007

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 29 commits into from
Nov 16, 2024

Conversation

TrevorBergeron
Copy link
Contributor

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕

@product-auto-label product-auto-label bot added size: m Pull request size is medium. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Sep 20, 2024
@product-auto-label product-auto-label bot added size: l Pull request size is large. and removed size: m Pull request size is medium. labels Sep 21, 2024
@TrevorBergeron TrevorBergeron changed the title Prototype polars local engine test: Add unit testing using local polars-based engine Oct 2, 2024
@TrevorBergeron TrevorBergeron marked this pull request as ready for review October 18, 2024 00:18
@TrevorBergeron TrevorBergeron requested review from a team as code owners October 18, 2024 00:18
@TrevorBergeron TrevorBergeron requested a review from shobsi October 18, 2024 00:18
@TrevorBergeron TrevorBergeron requested a review from tswast October 18, 2024 00:18
noxfile.py Outdated
Comment on lines 63 to 64
UNIT_TEST_EXTRAS: List[str] = ["polars"]
UNIT_TEST_EXTRAS_BY_PYTHON: Dict[str, List[str]] = {}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make sure we run some unit tests without polars to prevent accidental hard dependencies on it.

Suggested change
UNIT_TEST_EXTRAS: List[str] = ["polars"]
UNIT_TEST_EXTRAS_BY_PYTHON: Dict[str, List[str]] = {}
UNIT_TEST_EXTRAS: List[str] = []
UNIT_TEST_EXTRAS_BY_PYTHON: Dict[str, List[str]] = {"3.12": ["polars"]}

Should be extended to dispatch based on bigframes schema types.
"""

@functools.singledispatchmethod
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like this is causing a hard-ish dependency on polars if this module is ever imported. Might be able to exclude the method definition with an if block.

"Groupby rolling windows not yet implemented in polars engine"
)
# polars is columnar, so this is efficient
# TODO: why can't just add collumns?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# TODO: why can't just add collumns?
# TODO: why can't just add columns?

def get_args(
self,
agg: ex.Aggregation,
) -> Sequence[pl.Expr]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a doc string here explaining the need for it (projection-y stuff, IIRC?)

"Polars is not installed, cannot compile to polars engine."
)

# Polars has incomplete slice support in lazy mode
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe TODO to generalize BFET -> BFET rewrites

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added comment


@compile_node.register
def compile_rowcount(self, node: nodes.RowCountNode):
rows = self.compile_node(node.child).count()[0]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can make this lazy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made lazy with pl.len()


@compile_node.register
def compile_slice(self, node: nodes.SliceNode):
return self.compile_node(node.child)[node.start : node.stop : node.step]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dead code with the rewrites?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed this

return self.compile_node(node.child)[node.start : node.stop : node.step]

@compile_node.register
def compile_join(self, node: nodes.JoinNode):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment saying we always do ordered mode locally since it's easy with polars?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

@@ -31,12 +31,15 @@ def to_forward_offsets(
elif start < 0:
start = max(0, input_rows + start)
else:
start = min(start, input_rows)
start = min(start, input_rows - 1) if step < 0 else start
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe comment about why different behavior depending on step.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

from typing import Mapping, Optional, Union
import weakref

import polars
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should use polars = pytest.importorskip("polars")

Or maybe not necessary since this isn't a _test.py file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems to be running fine for now as importing only in fixtures used for polars tests



@pytest.fixture(scope="module")
def test_frame() -> pd.DataFrame:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we rename this? The leading test_ threw me for a loop before I realized it was a fixture.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed


bf_result = (bf_df_1 + bf_df_2).to_pandas()
pd_result = pd_df_1 + pd_df_2
# Sort by index because ordering logic isn't quite consistent yet
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe fixed now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eh, I made it consitent with sql engine now, but still not with pandas. Pandas order I think is a bit of a runtime decision

(1, None, None),
(None, 4, None),
(None, None, 2),
(None, 50000000000, 1),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(None, 50000000000, 1),
(None, 50_000_000_000, 1),

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

(5, 4, None),
(3, None, 2),
(1, 7, 2),
(1, 7, 50000000000),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(1, 7, 50000000000),
(1, 7, 50_000_000_000),

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@TrevorBergeron TrevorBergeron enabled auto-merge (squash) November 16, 2024 00:25
@TrevorBergeron TrevorBergeron merged commit 0a5f4ee into main Nov 16, 2024
22 of 23 checks passed
@TrevorBergeron TrevorBergeron deleted the local_executor branch November 16, 2024 01:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. size: l Pull request size is large.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants