-
Notifications
You must be signed in to change notification settings - Fork 23
Updates to ATTAINS fxns + new USGS continuous data fxns #589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Co-authored-by: Matt Brousil <[email protected]>
Co-authored-by: Matt Brousil <[email protected]>
Co-authored-by: Matt Brousil <[email protected]>
Co-authored-by: Matt Brousil <[email protected]>
Co-authored-by: Matt Brousil <[email protected]>
Co-authored-by: Matt Brousil <[email protected]>
Co-authored-by: Matt Brousil <[email protected]>
Functions for NWIS continuous data
Co-authored-by: Matt Brousil <[email protected]>
Co-authored-by: Matt Brousil <[email protected]>
Co-authored-by: Matt Brousil <[email protected]>
big updates
build getATTAINS
| dplyr::group_by(site_no) %>% | ||
| dplyr::summarize( | ||
| parameters = paste(unique(parameter), collapse = "; "), | ||
| parameter_codes = paste(unique(parameter_code), collapse = "; ") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we meet this week, let's chat more about TADA comparable data IDs, USGS p codes, and USGS observed properties in this context of bringing in the USGS continuous data & integrating it with the WQP data (compatibility).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be happy to help work on WQP/USGS continuous data compatibility issues.
hillarymarler
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Increase in efficiency for TADA_GetATTAINS is great, as are the options to include only the nearest true feature and use ResultIdentifier to track instances of duplicate rows. The continuous data functions worked well on for a variety of test queries I tried them with - looking forward to discussing how to further integrate them w/ EPATADA workflow as a future effort.
|
There are some global variable notes. We can address these by removing the variable using rm() once the variable is no longer needed. TADA_GetATTAINS: no visible binding for global variable 'count'
TADA_GetATTAINS : find_distances: no visible binding for global
variable 'TADA.DistanceAway.Meters'
TADA_GetATTAINS: no visible binding for global variable
'TADA.DistanceAway.Meters'
TADA_GetATTAINS: no visible global function definition for
'st_drop_geometry'
TADA_getNWIS: no visible binding for global variable 'dec_long_va'
TADA_getNWIS: no visible binding for global variable 'dec_lat_va'
TADA_getNWIS: no visible binding for global variable 'site_no'
TADA_getNWIS: no visible binding for global variable 'agency_cd'
TADA_getNWIS: no visible binding for global variable 'Date'
TADA_getNWIS: no visible binding for global variable 'NWIS.parameter'
TADA_getNWIS: no visible binding for global variable 'NWIS.value'
TADA_getNWIS: no visible binding for global variable 'NWIS.status'
TADA_listNWIS : pcodes: no visible binding for global variable
'parameter_code'
TADA_listNWIS: no visible binding for global variable 'dec_long_va'
TADA_listNWIS: no visible binding for global variable 'dec_lat_va'
TADA_listNWIS: no visible binding for global variable 'site_no'
TADA_listNWIS: no visible binding for global variable 'station_nm'
TADA_listNWIS: no visible binding for global variable 'site_type'
TADA_listNWIS: no visible binding for global variable 'site_tp_cd'
TADA_listNWIS: no visible binding for global variable 'data_type'
TADA_listNWIS: no visible binding for global variable 'data_type_cd'
TADA_listNWIS: no visible binding for global variable
'parameter_name_description'
TADA_listNWIS: no visible binding for global variable 'parm_cd'
TADA_listNWIS: no visible binding for global variable 'count_nu'
TADA_listNWIS: no visible binding for global variable 'begin_date'
TADA_listNWIS: no visible binding for global variable 'end_date'
fetchATTAINS : perform_iterative_clustering : bbox_area: no visible
binding for global variable 'cluster'
fetchATTAINS : perform_iterative_clustering : split_clusters_by_area:
no visible binding for global variable 'cluster'
fetchATTAINS: no visible binding for global variable 'cluster'
Undefined global functions or variables:
Date NWIS.parameter NWIS.status NWIS.value TADA.DistanceAway.Meters
agency_cd begin_date cluster count count_nu data_type data_type_cd
dec_lat_va dec_long_va end_date parameter_code
parameter_name_description parm_cd site_no site_tp_cd site_type
st_drop_geometry station_nm |
|
@kathryn-willi A few of these look the same except for the total N. Do you know if this is mostly duplicate data? Curious if this is a common issue we would want to handle before analysis (making sure we don't have dups) # Example 2: Query by specific site numbers
site_nums <- c("11530500", "11532500")
sites_specific <- TADA_listNWIS(sites = site_nums) |
R/ContinuousDataFunctions.R
Outdated
| #' features (or "rows" in the sf data frame) must be under 118,078 square miles | ||
| #' (roughly the area of Nevada). | ||
| #' @param statecode Character vector of two-letter state codes (e.g., c("CA", "OR")). | ||
| #' @param sites Character vector of USGS site numbers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggest to change from sites to siteid for consistency with other TADA data retrieval functions. I am working on this now
sites to siteid
Ooof, this i think is exposing my ignorance of non-flow USGS data. I bet this has to do with different statistical data being collected (e.g., both the max and mean). I will make the necessary changes to the functions to ensure we are capturing the statistical information related to both listNWIS and getNWIS! Please stay tuned for another commit from me... |
|
@cristinamullin - it was in fact that there were multiple statistics being published for the same site/parameter combos. (The "mean" value is the default and by far the most common statistic for flow data, hence why I didn't catch this earlier. My apologies!) I have incorporated code to 1) list all available statistics in TADA_listNWIS(), and 2) allow the user to select which statistic(s) to download in TADA_getNWIS(). On another topic - I notice that many of the checks fail when I submit PRs, and I'm not sure how to keep this from happening; sorry for the inconvenience! For future PRs I'd love to make sure that doesn't continue to happen. If discussing how to ensure the tests pass would be something easier to discuss over a call please let me know. |
|
Did you consider using the USGS DV service vs. the statistics service? It sounds like the main difference is that the DV service provides provisional results (most recent values) while the statistics service only provides final results. We may want to chat with them to confirm which is most appropriate for TADA use cases. "Please note that most recent data are marked provisional, so these data should be interpreted with caution as it is possible (although unlikely) to be incorrect. See the USGS Provisional Data Disclaimer page for more information." https://waterservices.usgs.gov/docs/ Daily Values Statistics |
Yes! We could definitely swap to a statistics call... However, as-is, TADA_getNWIS()'s NWIS.status column contains the information about whether the data is provisional or approved. So, the user could filter to only approved data if they wanted. In a future PR, I could modify the codes to make them more clear (A=approved, P=provisional, etc.) and add some deeper clarifying information about what that column is? |
Sounds good. I like the idea of keeping the most recent data |
This PR combines two major updates to the TADA package:
1. ATTAINS Function Enhancements
fetch_ATTAINS() Iterative Clustering for Large Spatial Areas: Implemented a new {dbscan} clustering approach that partitions large spatial areas into smaller clusters, allowing ATTAINS features to be fetched more efficiently. This improves performance when loading features over large extents.
TADA_GetATTAINS() Enhancements: Added a distance column that reports the distance (in meters) between each WQP observation and intersecting ATTAINS features within its catchment.
Introduced a new argument, return_nearest. When set to TRUE, the function returns only the nearest ATTAINS feature to each WQP observation. When FALSE, it returns all features in the same catchment (ie, how it previously worked).
Replaced the index column with ResultIdentifier to more clearly and consistently handle cases where multiple ATTAINS features relate to a single WQP observation, per Cristina’s feedback (closes TADA_GetATTAINS index usability #539).
Testing & Tidying: Added new {testthat} functions to validate the updated geospatial behavior. General code tidying and improvements to documentation/examples across the affected functions.
2. New NWIS Continuous Data Functions (partial fulfilment of #222)
These NWIS functions fulfill the continuous data access functionality TADA was aiming for and provide a foundation for future enhancements.
Let me know if anything is unclear or if you'd like changes to the implementation, documentation, or examples. Looking forward to your feedback. 🙂
Thanks!
Katie