-
Notifications
You must be signed in to change notification settings - Fork 20
More refactor: data frame split apply combine #198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More refactor: data frame split apply combine #198
Conversation
|
I think this is ready for review. This PR lays the groundwork for further simplification, but I think it already stands alone as a nice useful chunk, so it makes sense to review and merge before doing anything else. |
|
Super, thanks for this @vincentarelbundock. I'm currently on vacation without a laptop, but will aim to do a proper review once I'm back (and have had a chance to look at #197 too). Qq: Does going through the data.frame intermediary affect performance (snappiness) at all? |
Cool cool. No rush at all.
I have not benchmarked, but I wouldn't expect this to have any effect at all. We're just storing equal length vectors into a data.frame and calling |
|
@vincentarelbundock do you mind merging in the recent changes that we pushed to the main branch? I know there aren't any file conflicts, but I edited some tests and wanted to make sure that everything is passing against the up-to-date test suite. Cheers. (I'm aiming to review this PR properly by the end of the week.) |
|
done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again @vincentarelbundock. I really appreciate the continued efforts at further modularization.
I must admit that some of this feels like over-engineering to me at the moment (e.g., all the datapoints <-> dp internal shuffling that happens within the individual types). But I'm willing to take it on faith that this will enable the functionality that you highlight in your concept outline. (And I'm really excited about this possibility.)
The requested here changes are mostly minor.
|
I think that all issues are resolved.
Fair enough. I removed all assignments to intermediary I'm optimistic about the long term vision. Think it's going to be very cool. I'm still going to do one more run of simplification to fully exploit the But note that we are already benefiting. The old code to prep data for histograms was roughly 96 lines long; the new one is roughly 51 lines. Not a massive deal, but it illustrates some of the simplifications possible, and there are some more. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merci, Vincent!
Thanks again for actioning these changes and for the continued improvements to the tinyplot codebase. Again, I'm very excited by some of the proposed features that this should unlock.
As a HU, I would like to submit a patch release to CRAN in the next day or two, since we've fixed quite few bugs in last month. (Hopefully, I'll be able to include a fix for #206 too, which is proving a little more finicky than I originally thought). But then we can work at a bigger release after that, which includes these new features.
This is a refactor of the histogram, boxplot, area, and other helper functions to kickstart a larger refactoring strategy that I could implement.
Concepts:
by,facet,x,yin a single data frame nameddatapointsso we can use split-apply-combine for facet and groups, instead of the complicated use ofinteractions().datapointis converted directly to the currentsplit_data.type_*(), which must accept adatapointsdata.frame with known characteristics and return another data frame with the same characteristics.Benefits:
byandfacetindices when we split-apply-combine on a dataframe.xand friends, in case we need to check what the original user input was.type_*()functions.