-
Notifications
You must be signed in to change notification settings - Fork 21
Add statistical methods #33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
A note on |
@rgommers Another alternative that I have seen is |
For some comments on Koalas: Koalas can support the followings, so we can include them in the proposal:
As for Thanks. Updates: |
Re cuDF:
|
As discussed, closing this PR given that we already have the most common reductions and the rest doesn't really fit with the direction we've taken over the last 6 months. |
This PR
Notes
Statistical methods are widely implemented across dataframe libraries and are used by downstream libraries.
Series
not included based on previous consortium discussions where dataframe/series distinction not considered necessary. See Avoiding the "pandas trap" #4 and Separate object for a dataframe colum? (is Series needed?) #6.vaex and ibis have considerably different APIs than pandas, Dask, Modin, cuDF, and Koalas, and only influenced API inclusion based on whether the libraries provided a particular method name (or equivalent), but not keyword arguments.
Comments for each proposed method:
skipna
. Both cuDF and Koalas supportskipna
, but notaxis
(pandas, Dask, Modin).cummax
.cummax
.cummax
.axis
. pandas, Modin, and Koalas supportnumeric_only
, but others do not.max
.max
.nlargest
.max
. Koalas can only support positive numbers due to implementation algorithm. pandas, Dask, Modin, and cuDF support amin_count
keyword argument, but Koalas does not.max
. Koalas does not support a correction factor. Similar to the array API specification, renamedddof
tocorrection
, as this is a historical "bug" carried over from NumPy.max
. pandas, Dask, Modin, and cuDF support amin_count
keyword argument, but Koalas does not.std
.methods excluded from this initial proposal:
mode
,median
,nunique
, andquantile
due to either lack of universal availability, divergent behavior, increased complexity, or lack of downstream usage. These can be considered in a future PR.