Thanks to visit codestin.com
Credit goes to github.com

test: Add unit testing using local polars-based engine #1007

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

TrevorBergeron merged 29 commits into main from local_executor

Nov 16, 2024

Contributor

TrevorBergeron commented Sep 20, 2024

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
Ensure the tests and linter pass
Code coverage does not decrease (if any source code was changed)
Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕

product-auto-label bot added size: m api: bigquery labels


          Prototype polars local engine

5df15b3

TrevorBergeron force-pushed the local_executor branch from ee86291 to 5df15b3 Compare

September 20, 2024 22:35


          support more node types in polars

1c00699

product-auto-label bot added size: l and removed size: m labels

TrevorBergeron added 5 commits

September 22, 2024 01:13


          add explode support

951b289


          further fixes

90fec84


          Merge remote-tracking branch 'github/main' into local_executor

612ef7a


          define test session

ec5a123


          define test session

5f073fd

TrevorBergeron changed the title ~~Prototype polars local engine~~ test: Add unit testing using local polars-based engine

TrevorBergeron added 8 commits

October 2, 2024 21:52


          setup optional polars dependecy for unit tests

2c7593a


          fix mypy issues

cacb773


          workaround large pyarrow string casting to pandas

998379f


          Merge remote-tracking branch 'github/main' into local_executor

0a8bf42


          fix test execution

3c2679d


          Merge remote-tracking branch 'github/main' into local_executor

ffccdf4


          Merge remote-tracking branch 'github/main' into local_executor

e890012


          fix unwanted polars import

7d06ca2

TrevorBergeron marked this pull request as ready for review

October 18, 2024 00:18

TrevorBergeron requested review from a team as code owners

October 18, 2024 00:18

TrevorBergeron requested a review from shobsi

October 18, 2024 00:18

blunderbuss-gcf bot assigned mattyopl

TrevorBergeron requested a review from tswast

October 18, 2024 00:18

TrevorBergeron added 3 commits

October 18, 2024 17:43


          Merge remote-tracking branch 'github/main' into local_executor

c276be6


          guard against polars non-existence

e7f6758


          Merge remote-tracking branch 'github/main' into local_executor

309b9af

TrevorBergeron added 8 commits

October 24, 2024 23:47


          Merge remote-tracking branch 'github/main' into local_executor

b82622b


          force test session to always use inline path

9cc6925


          fix mypy issue

922c92b


          fix doctest by exlucding polars module

ec406ce


          Merge remote-tracking branch 'github/main' into local_executor

04f7750


          Merge remote-tracking branch 'github/main' into local_executor

becba3a


          amend optional import pattern

3392c1d


          rearrange polars compile module

23489f5

tswast reviewed

View reviewed changes

noxfile.py Outdated

Comment on lines 63 to 64

		UNIT_TEST_EXTRAS: List[str] = ["polars"]
		UNIT_TEST_EXTRAS_BY_PYTHON: Dict[str, List[str]] = {}

Collaborator

tswast Nov 14, 2024

Let's make sure we run some unit tests without polars to prevent accidental hard dependencies on it.

Suggested change

      
            UNIT_TEST_EXTRAS: List[str] = ["polars"]
          
            UNIT_TEST_EXTRAS_BY_PYTHON: Dict[str, List[str]] = {}
          
            UNIT_TEST_EXTRAS: List[str] = []
          
            UNIT_TEST_EXTRAS_BY_PYTHON: Dict[str, List[str]] = {"3.12": ["polars"]}

noxfile.py Show resolved Hide resolved

bigframes/core/compile/polars/compiler.py Outdated

+                  Should be extended to dispatch based on bigframes schema types.
+                  """
+                  @functools.singledispatchmethod

Collaborator

tswast Nov 14, 2024

Sounds like this is causing a hard-ish dependency on polars if this module is ever imported. Might be able to exclude the method definition with an if block.

bigframes/core/compile/polars/compiler.py Outdated

+                                  "Groupby rolling windows not yet implemented in polars engine"
+                              )
+                          # polars is columnar, so this is efficient
+                          # TODO: why can't just add collumns?

Collaborator

tswast Nov 14, 2024

Suggested change

      
                        # TODO: why can't just add collumns?
          
                        # TODO: why can't just add columns?

TrevorBergeron added 2 commits

November 14, 2024 23:02


          Merge remote-tracking branch 'github/main' into local_executor

48bf099


          fix various issues

c5ede3f

tswast reviewed

View reviewed changes

bigframes/core/compile/polars/compiler.py

+                      def get_args(
+                          self,
+                          agg: ex.Aggregation,
+                      ) -> Sequence[pl.Expr]:

Collaborator

tswast Nov 15, 2024

Could we add a doc string here explaining the need for it (projection-y stuff, IIRC?)

tswast approved these changes

View reviewed changes

bigframes/core/compile/polars/compiler.py

+                              "Polars is not installed, cannot compile to polars engine."
+                          )
+                      # Polars has incomplete slice support in lazy mode

Collaborator

tswast Nov 15, 2024

Maybe TODO to generalize BFET -> BFET rewrites

Contributor Author

TrevorBergeron Nov 16, 2024

added comment

bigframes/core/compile/polars/compiler.py Outdated

+                  @compile_node.register
+                  def compile_rowcount(self, node: nodes.RowCountNode):
+                      rows = self.compile_node(node.child).count()[0]

Collaborator

tswast Nov 15, 2024

Maybe we can make this lazy?

Contributor Author

TrevorBergeron Nov 16, 2024

made lazy with pl.len()

bigframes/core/compile/polars/compiler.py Outdated

+                  @compile_node.register
+                  def compile_slice(self, node: nodes.SliceNode):
+                      return self.compile_node(node.child)[node.start : node.stop : node.step]

Collaborator

tswast Nov 15, 2024

Dead code with the rewrites?

Contributor Author

TrevorBergeron Nov 16, 2024

removed this

bigframes/core/compile/polars/compiler.py

+                      return self.compile_node(node.child)[node.start : node.stop : node.step]
+                  @compile_node.register
+                  def compile_join(self, node: nodes.JoinNode):

Collaborator

tswast Nov 15, 2024

Maybe add a comment saying we always do ordered mode locally since it's easy with polars?

Contributor Author

TrevorBergeron Nov 16, 2024

added

bigframes/core/slices.py

@@ @@ -31,12 +31,15 @@ def to_forward_offsets( @@
                   elif start < 0:
                       start = max(0, input_rows + start)
                   else:
-                      start = min(start, input_rows)
+                      start = min(start, input_rows - 1) if step < 0 else start

Collaborator

tswast Nov 15, 2024

Maybe comment about why different behavior depending on step.

Contributor Author

TrevorBergeron Nov 16, 2024

added

tests/unit/polars_session.py

+              from typing import Mapping, Optional, Union
+              import weakref
+              import polars

Collaborator

tswast Nov 15, 2024

Should use polars = pytest.importorskip("polars")

Or maybe not necessary since this isn't a _test.py file.

Contributor Author

TrevorBergeron Nov 16, 2024

seems to be running fine for now as importing only in fixtures used for polars tests

tests/unit/test_local_engine.py Outdated



		@pytest.fixture(scope="module")
		def test_frame() -> pd.DataFrame:

Collaborator

tswast Nov 15, 2024

Could we rename this? The leading test_ threw me for a loop before I realized it was a fixture.

Contributor Author

TrevorBergeron Nov 16, 2024

renamed

tests/unit/test_local_engine.py Outdated

+                  bf_result = (bf_df_1 + bf_df_2).to_pandas()
+                  pd_result = pd_df_1 + pd_df_2
+                  # Sort by index because ordering logic isn't quite consistent yet

Collaborator

tswast Nov 15, 2024

Maybe fixed now?

Contributor Author

TrevorBergeron Nov 16, 2024

eh, I made it consitent with sql engine now, but still not with pandas. Pandas order I think is a bit of a runtime decision

tests/unit/test_local_engine.py Outdated

+                      (1, None, None),
+                      (None, 4, None),
+                      (None, None, 2),
+                      (None, 50000000000, 1),

Collaborator

tswast Nov 15, 2024

Suggested change

      
                    (None, 50000000000, 1),
          
                    (None, 50_000_000_000, 1),

Contributor Author

TrevorBergeron Nov 16, 2024

fixed

tests/unit/test_local_engine.py Outdated

+                      (5, 4, None),
+                      (3, None, 2),
+                      (1, 7, 2),
+                      (1, 7, 50000000000),

Collaborator

tswast Nov 15, 2024

Suggested change

      
                    (1, 7, 50000000000),
          
                    (1, 7, 50_000_000_000),

Contributor Author

TrevorBergeron Nov 16, 2024

fixed


          pr comments

bc3d0c7

TrevorBergeron enabled auto-merge (squash)

November 16, 2024 00:25

TrevorBergeron disabled auto-merge

November 16, 2024 00:25

TrevorBergeron merged commit 0a5f4ee into main

22 of 23 checks passed

TrevorBergeron deleted the local_executor branch

November 16, 2024 01:37

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api: bigquery size: l