More refactor: data frame split apply combine #198

vincentarelbundock · 2024-08-12T00:01:43Z

This is a refactor of the histogram, boxplot, area, and other helper functions to kickstart a larger refactoring strategy that I could implement.

Concepts:

Store by, facet, x, y in a single data frame named datapoints so we can use split-apply-combine for facet and groups, instead of the complicated use of interactions().
In a future PR, datapoint is converted directly to the current split_data.
This lays the groundwork for user-supplied type_*(), which must accept a datapoints data.frame with known characteristics and return another data frame with the same characteristics.

Benefits:

Code simplification: it's easier to keep track of by and facet indices when we split-apply-combine on a dataframe.
Avoid overwriting x and friends, in case we need to check what the original user input was.
Pathways to user-supplied type_*() functions.

vincentarelbundock · 2024-08-12T01:59:58Z

I think this is ready for review.

This PR lays the groundwork for further simplification, but I think it already stands alone as a nice useful chunk, so it makes sense to review and merge before doing anything else.

grantmcdermott · 2024-08-12T19:43:37Z

Super, thanks for this @vincentarelbundock. I'm currently on vacation without a laptop, but will aim to do a proper review once I'm back (and have had a chance to look at #197 too).

Qq: Does going through the data.frame intermediary affect performance (snappiness) at all?

vincentarelbundock · 2024-08-12T20:01:30Z

Super, thanks for this @vincentarelbundock. I'm currently on vacation without a laptop, but will aim to do a proper review once I'm back (and have had a chance to look at #197 too).

Cool cool. No rush at all.

Qq: Does going through the data.frame intermediary affect performance (snappiness) at all?

I have not benchmarked, but I wouldn't expect this to have any effect at all. We're just storing equal length vectors into a data.frame and calling split() only once, instead of holding them as separate variables, putting them in a list, and calling Map()+split(). If anything, this simplifies things and reduces the number of calls.

grantmcdermott · 2024-08-20T04:42:41Z

@vincentarelbundock do you mind merging in the recent changes that we pushed to the main branch? I know there aren't any file conflicts, but I edited some tests and wanted to make sure that everything is passing against the up-to-date test suite. Cheers.

(I'm aiming to review this PR properly by the end of the week.)

vincentarelbundock · 2024-08-20T21:05:36Z

done

grantmcdermott

Thanks again @vincentarelbundock. I really appreciate the continued efforts at further modularization.

I must admit that some of this feels like over-engineering to me at the moment (e.g., all the datapoints <-> dp internal shuffling that happens within the individual types). But I'm willing to take it on faith that this will enable the functionality that you highlight in your concept outline. (And I'm really excited about this possibility.)

The requested here changes are mostly minor.

R/zzz.R

R/type_histogram.R

R/type_jitter.R

R/type_pointrange.R

R/type_boxplot.R

R/type_ribbon.R

R/tinyplot.R

vincentarelbundock · 2024-08-25T04:39:56Z

I think that all issues are resolved.

I must admit that some of this feels like over-engineering to me at the moment (e.g., all the datapoints <-> dp internal shuffling

Fair enough.

I removed all assignments to intermediary dp.

I'm optimistic about the long term vision. Think it's going to be very cool. I'm still going to do one more run of simplification to fully exploit the datapoints.

But note that we are already benefiting. The old code to prep data for histograms was roughly 96 lines long; the new one is roughly 51 lines. Not a massive deal, but it illustrates some of the simplifications possible, and there are some more.

grantmcdermott

Merci, Vincent!

Thanks again for actioning these changes and for the continued improvements to the tinyplot codebase. Again, I'm very excited by some of the proposed features that this should unlock.

As a HU, I would like to submit a patch release to CRAN in the next day or two, since we've fixed quite few bugs in last month. (Hopefully, I'll be able to include a fix for #206 too, which is proving a little more finicky than I originally thought). But then we can work at a bigger release after that, which includes these new features.

vincentarelbundock added 3 commits August 11, 2024 18:14

histogram simplification

d2de312

histogram refactor: fix legend

74be4f8

rename and reorganize

38741ac

vincentarelbundock marked this pull request as draft August 12, 2024 00:28

vincentarelbundock added 6 commits August 11, 2024 21:20

type_boxplot

bb6ccdb

type_ribbon

a543ded

simplification

2a0ca66

type_pointrange

1d864c8

rename: _args -> type_

f228845

global variables

b516c7d

vincentarelbundock marked this pull request as ready for review August 12, 2024 01:59

vincentarelbundock added 3 commits August 11, 2024 23:45

simplify lim_args

bb54bf3

lim args simplify more

38f4152

datapoints -> split_data

e3e4e71

vincentarelbundock changed the title ~~Histogram refactor~~ More refactor: data frame split apply combine Aug 12, 2024

superfluous code thanks to datapoints

f6bbd9c

This was referenced Aug 12, 2024

type = function() #168

Closed

Split data at the top #172

Closed

vincentarelbundock mentioned this pull request Aug 12, 2024

Avoid type aliases #199

Closed

Merge branch 'main' into histogram_refactor

1ad61f3

grantmcdermott requested changes Aug 24, 2024

View reviewed changes

vincentarelbundock added 2 commits August 24, 2024 23:23

code review

fe4372f

dp -> datapoints

38f088d

vincentarelbundock requested a review from grantmcdermott August 25, 2024 16:08

grantmcdermott approved these changes Aug 25, 2024

View reviewed changes

grantmcdermott merged commit 3917b16 into grantmcdermott:main Aug 25, 2024

vincentarelbundock deleted the histogram_refactor branch September 9, 2024 00:55

More refactor: data frame split apply combine #198

More refactor: data frame split apply combine #198

Uh oh!

Conversation

vincentarelbundock commented Aug 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vincentarelbundock commented Aug 12, 2024

Uh oh!

grantmcdermott commented Aug 12, 2024

Uh oh!

vincentarelbundock commented Aug 12, 2024

Uh oh!

grantmcdermott commented Aug 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vincentarelbundock commented Aug 20, 2024

Uh oh!

grantmcdermott left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vincentarelbundock commented Aug 25, 2024

Uh oh!

grantmcdermott left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vincentarelbundock commented Aug 12, 2024 •

edited

Loading

grantmcdermott commented Aug 20, 2024 •

edited

Loading

grantmcdermott left a comment •

edited

Loading

grantmcdermott left a comment •

edited

Loading