-
Notifications
You must be signed in to change notification settings - Fork 0
Restructure as R Package #25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This problem has been around for over a decade. When using tidyverse packages like ggplot2 and dplyr that use non-standard evaluation to map data frame columns into variables in the functions, those variables are not visible during R CMD check and show up as undeclared global variables, typically showing up as a NOTE such as: server: no visible binding for global variable ‘latitude’ These are fixed by importing `.data` from ggplot2 or rlang, which then makes the undeclared variables accessible as `.data$variable`. See https://stackoverflow.com/a/57496617 for background. Also fixed a single undeclared reference to the `tags` variable.
|
@DarianGill Overall this package structure looked great. I am not sure which linting issues you are talking about, but I did see some standard problems with using tidyverse non-standard evaluation approaches inside an R package that produced NOTEs on This problem has been around for over a decade. When using tidyverse packages like ggplot2 and dplyr that use non-standard evaluation to map data frame columns into variables in the functions, those variables are not visible during R CMD check and show up as undeclared global variables, typically showing up as a NOTE such as These are fixed by importing I pushed changes to the branch to fix these, and I also fixed a single undeclared reference to the |
|
@DarianGill At some point we should do a more complete review of your package (e.g., the package is missing tests), but its looking good as is as a starting point and probably not needed now. Let us know when you want a deeper code review in this branch or in develop -- don't want to hold you up at all and keep momentum going. |
…gnize importFrom ggplot2 .data
… to launch via IDE exension
regetz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good stuff @DarianGill, thanks! I made a bunch of comments. Nothing critical. Almost entirely either minor editing changes for text docs, and code suggestions that either simplify the code or make it more readable (IMHO). Happy to chat more about any of them.
I'm marking this as Approved to leave it in your hands to change what you think makes sense, follow up more on anything as wish, and then be unblocked to merge to develop when you see fit.
| Tests are automatically run via GitHub Actions. Check the root `README.md` file | ||
| for this GitHub Actions status badge and make sure it says "Passing": | ||
| Tests are automatically run via GitHub Actions. Check the root | ||
| `README.md` file for e GitHub Actions status badge and make sure it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be "the" GitHub Actions status badge?
| the `develop` branch can be fast-forwarded to sync with `main` to | ||
| start work on the next release. | ||
| 3. Releases can be downloaded from the [GitHub releases | ||
| page](https://github.com/NCEAS/vegbankr/releases). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
|
||
| - add an [issue](https://github.com/DataONEorg/REPONAME/issues) describing your planned changes, or add a comment to an existing issue; | ||
| - on GitHub, fork the [repository](https://github.com/DataONEorg/REPONAME) | ||
| - add an [issue](https://github.com/NCEAS/vegbankr/issues) describing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| - on GitHub, fork the [repository](https://github.com/DataONEorg/REPONAME) | ||
| - add an [issue](https://github.com/NCEAS/vegbankr/issues) describing | ||
| your planned changes, or add a comment to an existing issue; | ||
| - on GitHub, fork the [repository](https://github.com/NCEAS/vegbankr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be https://github.com/NCEAS/vegbank-web
| the same “printed page” as the copyright notice for easier identification within | ||
| third-party archives. | ||
|
|
||
| Copyright [yyyy] [name of copyright owner] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copyright [2025] [Regents of the University of California]
| dates_df <- dates_df[!is.na(dates_df$parsed), ] | ||
| dates_df <- dates_df[!duplicated(dates_df$parsed), ] | ||
| top_dates <- head(dates_df[order(dates_df$parsed, decreasing = TRUE), ], n) | ||
| top_dates <- utils::head(dates_df[order(dates_df$parsed, decreasing = TRUE), ], n) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for completeness, here's an equivalent way to get the top dates using chained dplyr verbs, similar pattern to what I suggested in other comments. You could make an argument that it's better to stick with base R functionality especially when you can do so with readable code, though we already depend on dplyr, and these are very standard usages of core, stable dplyr functions for tabular data manipulatino. Toss up in my mind.
date_formats <- c("a, d b Y H:M:S z", "d b Y H:M:S", "Y-m-d H:M:S")
top_dates <- data |>
dplyr::select(original = date_field) |>
dplyr::mutate(parsed = lubridate::parse_date_time(
data[[date_field]], orders = date_formats)) |>
dplyr::filter(!is.na(parsed), !duplicated(parsed)) |>
dplyr::arrange(dplyr::desc(parsed)) |>
utils::head(n)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this more readable than the previous two suggestions, but am warming up to all of them, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@regetz Thanks for these suggestions, I think ultimately we may want this summarization to occur on the backend where it can access the totality of the data or whatever we've queried. The overviews are currently not functional after I've started using the paginated data in the table, but it's valuable seeing the other ways to phrase the same functionality using dplyr anyway.
| aes(x = long, y = lat, group = group), # nolint: object_usage_linter. | ||
| fill = "white", color = "gray70", size = 0.3 | ||
| ggplot2::aes( | ||
| x = .data$long, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar comment to elsewhere, you should be able to drop the .data$ accessor throughout the ggplot expressions here. But then I think you'll need to use longitude and latitude, assuming those are the full column names in the data. The $ in R does partial matching, I guess for convenience, but in my experience it's just a source of bugs!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably shouldn't drop .data$, again as per above... See https://dplyr.tidyverse.org/articles/in-packages.html#data-masking-and-tidy-selection-notes
| ggplot2::aes( | ||
| x = stats::reorder(.data$name, .data$count), | ||
| y = .data$count | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% sure the linter won't balk at this, but I'd move this closing parenthesis to the end of the previous line, then indent the parenthesis on the next line by two spaces. That way the entire ggplot expression is easier to see because all of the continuation lines are intended relative to the initiating ggplot2::ggplot(...) call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Linter doesn't like it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kinda surprised me to see this file in the root directory, but I do see precedent for it (like here). Seems like another option is maybe in inst/? Just wondering out loud what is best. No need to change anything now. Important thing is that wherever it is, (a) the package validates fine, and (b) we're able to easily deploy and start the Shiny app in both development and production modes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same. I don't know where it should go, but it seems like the root isn't the best default location. It would be a shiny-specific recommendation is my guess. Mastering Shiny says to put in the R directory. Maybe let's discuss on slack. https://mastering-shiny.org/scaling-packaging.html#converting-an-existing-app
| } else { | ||
| data_grouped <- data %>% | ||
| dplyr::group_by(latitude, longitude) %>% # nolint: object_usage_linter. | ||
| data_grouped <- data |> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not totally sure what's happening here, but the embedded mapply seems atypical in this context. Seems like neither than nor the sprintf are needed? I think this simplified code yields the same thing:
data_grouped <- data %>%
dplyr::group_by(.data$latitude, .data$longitude) |>
dplyr::mutate(
authorobscode_label =
paste0(
"<a href=\"#\" onclick=\"Shiny.setInputValue('label_link_click',
'", obsaccessioncode, "', {priority:'event'})\">", authorobscode,
"</a>", collapse = "<br>")
) |>
dplyr::ungroup()There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And a second question/comment: With the combination of group_by and mutate, the code here is producing one output record per input record. So for records with the same lat and lon, it's producing a "grouped" HTML snippet (per the collapse = "<br>"), but repeatedly for every member of that group. Is that the intent? If the goal is to produce one single HTML snippet for each unique lat x lon grouping, then we should use dplyr::summarize rather than dplyr::mutate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, this is what I was referring to when I said I realized after cutting the PR that I was making too many labels and misusing the grouping. I've since changed it to summarize, but I appreciate the validation that that was the right move.
|
Thanks @regetz! I'll implement these in my new branch on Monday. Have a great weekend 🤙 |
mbjones
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything built and ran for me this time, and I made a few more comments, but all looks like a great package start,
This moves the code from the MVP that was only set up to run on my local environment into a re-producible R package. It also contains a new
app.Rfile following the convention set forth in Shiny App Packages for easy launching during development (which causes a Non-standard file/directory found at top level Note when running devtools::check()). I stubbed out some simple tests for the functions in server.R and ui.R as well and am open to hearing feedback about any other necessary changes required to make this "officially" a package before I get back to work on the functionality of the app.