Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Remove DataFrame.take #347

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
MarcoGorelli opened this issue Jan 25, 2024 · 6 comments
Closed

Remove DataFrame.take #347

MarcoGorelli opened this issue Jan 25, 2024 · 6 comments

Comments

@MarcoGorelli
Copy link
Contributor

MarcoGorelli commented Jan 25, 2024

@kkraus14 #344 (comment)

dataframe with arbitrary / undefined order

If we only have one DataFrame class, and its order is undefined, then DataFrame.take isn't a well-defined operation

Alternatives

Accept some level of re-design, even if it means extra work. But with the current design, DataFrame.take is undefined, so I suggest we remove it first

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Jan 25, 2024

But then how do you use the methods we have that return indices? (sorted_indices, unique_indices)

@MarcoGorelli
Copy link
Contributor Author

Exactly, you don't

Unless we accept some level of redesign, starting with #346

@kkraus14
Copy link
Collaborator

A DataFrame can have an arbitrary or can have an undefined order, but that doesn't mean it has to be. If it has a defined order or an arbitrary order, i.e. someone ran a sort operation against it, or the operations run thus far are defined to be order maintaining, then take is well defined. If someone ran something that makes no ordering guarantees then the order could be undefined, in which calling take against it should be able to return an undefined order as well.

The only situations where take is arguably undefined is when the input order is undefined, where that feels like perfectly reasonable behavior to me.

Let's continue discussion in #346 regarding Expressions, but I don't think take is a problematic operation.

@MarcoGorelli
Copy link
Contributor Author

How does a user know if a dataframe has input order defined or not?

@shwina
Copy link
Contributor

shwina commented Jan 25, 2024

How does a user know if a dataframe has input order defined or not?

Some examples:

  • They read it in from a CSV for example where the records are in some known order
  • They previously sorted it

@kkraus14
Copy link
Collaborator

I think we could also generally specify that operations maintain the input order of the DataFrame unless otherwise noted. I believe we've made sure to add that into the docstring where appropriate, i.e. things like joins, groupbys, getting unique values, etc. are documented to not guarantee a specific output order.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants