-
Notifications
You must be signed in to change notification settings - Fork 48
feat: Allow iloc to support lists of negative indices #1497
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
tests/system/small/test_dataframe.py
Outdated
@@ -4409,7 +4409,7 @@ def test_loc_list_multiindex(scalars_dfs_maybe_ordered): | |||
|
|||
|
|||
def test_iloc_list(scalars_df_index, scalars_pandas_df_index): | |||
index_list = [0, 0, 0, 5, 4, 7] | |||
index_list = [0, 0, 0, 5, 4, 7, -2, -5, 3] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we support base case of iloc[neg_number]?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems works.
bigframes/core/indexers.py
Outdated
if not is_key_unisigned or key[0] < 0: | ||
neg_block, _ = block.apply_window_op( | ||
offsets_id, | ||
ops.aggregations.ReverseRowNumberOp(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a new op? Or could we have just used existing ops?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, using SizeUnaryOp and SubOp instead.
bigframes/core/indexers.py
Outdated
elif "shape" in series_or_dataframe._block.__dict__: | ||
# If there is a cache, we convert all indices to positive. | ||
row_count = series_or_dataframe._block.shape[0] | ||
key = [k if k >= 0 else row_count + k for k in key] | ||
is_key_unisigned = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems a bit fragile. We can use block.expr.node.row_count
, but going though shape
depends on some implementation details that might change. I don't know if we necessarily need this optimization at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
bigframes/core/indexers.py
Outdated
@@ -477,6 +478,19 @@ def _iloc_getitem_series_or_dataframe( | |||
Union[bigframes.dataframe.DataFrame, bigframes.series.Series], | |||
series_or_dataframe.iloc[0:0], | |||
) | |||
|
|||
# Check if both positive index and negative index are necessary | |||
if isinstance(key, bigframes.series.Series): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might need to check for bigframes.Index as well? Or maybe we should have a helper that helps identify an "remote" or "large" object we don't want to iterate over
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to also check index type.
For large object it's may not be necessary, in cloudtop, I tried 1 million keys(which have the same sign), and this process took 0.03s.
* feat: support iloc with negative indices * update partial ordering test * update naming * update logic * update comment * update logic and tests * update filter
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Fixes #<issue_number_goes_here> 🦕