-
Notifications
You must be signed in to change notification settings - Fork 21
Study on the pandas API: What is the most commonly used? #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This is really cool, thanks a lot for sharing! |
This is awesome! I'm somewhat surprised at the common use of |
I use |
At some point a pandas dev told me to use |
I agree with both points (I think they meant |
We have done the same thing over 1 million I do not think the bias of Kaggle diminishes the value of this data (not that you are implying this). |
Here are the top 40ish pandas methods by pageviews on pandas' docs
I view the number of page views as some |
I love this list. Below are some decisions we made in riptide to work with existing pandas users while trying to eliminate duplicate methods or sometimes too much being put into the same method (like sort and merge). na: fill_na and drop_na. We broke out fill_na to fill_forward, fill_backward |
To add to this discussion, we've done some analysis of downstream library usage of pandas APIs, which can be found here. |
I have spent a lot of time trying to understand users and their behaviors in order to optimize for them. As a part of this work, I have done numerous studies on what gets used in pandas.
This will be extremely useful when it comes to defining a dataframe standard, because what people are using can help inform us on what behaviors to support.
For this study, we scraped the top 6000 notebooks from Kaggle by upvote.
Repo here, reproduction script included: https://github.com/modin-project/study_kaggle_usage
Results here: results.csv
The text was updated successfully, but these errors were encountered: