Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@cristinamullin
Copy link
Collaborator

@cristinamullin cristinamullin commented Mar 13, 2025

  • Updates the DOMAINS link for ATTAINS: Hillary is looking into options for making that link a stable URL on drupal so we can leverage that in the future as that would benefit both ATTAINS and TADA, but this updated link is correct for now.
  • Fixes a bug in the harmonization reference file: now assumes "NITRATE + NITRITE" speciation is "as N" for all known combinations if left blank.
  • Incorporates USGS dataRetrieval updates: we are now leveraging their "develop" branch for EPATADA instead of CRAN.
  • Updates the citation: now includes new collaborators.
  • Updates TADA_FindNearbySites: The major disadvantage to the adjacency matrix approach now used in TADA_FindNearbySites is that it takes longer to find groups of related sites than just identifying sites within a buffer distance of each other. The main improvement that it represents is the function is no longer assigning the same site to multiple groups. So, we may need to discuss that tradeoff more as a group.
  • Creates tadamonitoringlocationidentifier in tada autoclean
  • Includes significant updates to TADA_dataRetrieval
  1. adds sf option,
  2. adds tribal options
  3. updates big data options

Overview

  • Addition of {sf} methods to allow users to query WQP data using {sf} objects
  • Addition of options to allow tribal lands to be more directly queried using TADA_DataRetrieval
  • New function, TADA_TribalOptions, to assist users with identifying and querying tribal lands
  • Folding the processes in TADA_BigDataRetrieval into TADA_DataRetrieval and removing TADA_BigDataRetrieval to avoid confusion
  • Adding progress bar to large data pulls, user prompt to confirm download, silencing {dataRetrieval} messages + error handling for HTTP errors, vignette update

Additional info

  • {sf} methods use aoi_sf arg and largely begin here. First checks what data are available for the bbox of the {sf} object provided, then uses only MonitoringLocationIdentifiers inside the {sf} object when running the full query
  • Tribal land queries use tribal_area_type and tribe_name_parcel args and are handled alongside {sf} because they use this EPA spatial data. Both tribal_area_type and tribe_name_parcel are required. {sf} and tribal args can't be used at the same time (error), and if geographic info like statecode are provided in addition to either {sf} or tribal args then a warning is returned
  • tribal_area_type refers to one of the EMEF/Tribal MapServer layers. tribe_name_parcel refers to either TRIBE_NAME or PARCEL_NO entries from that layer. The TADA_TribalOptions function is included to help users see TRIBE_NAME/PARCEL_NO options available to them and check punctuation, etc.
  • TADA_BigDataHelper is now used to handle "big" data requests within TADA_DataRetrieval. By default this is triggered with maxrecs = 250000 & maxsites = 300.
  • Two (1, 2) progress bars are included inside TADA_BigDataHelper
  • The ask_user function is used to confirm that the user wants to download the dataset after the number of records is determined
  • In general the messages from {dataRetrieval} are now silenced because they were returning a lot of information that was hiding (what we considered) more useful information from TADA_DataRetrieval. But we've made sure to include checks for HTTP errors, which will then be communicated back to the user
  • Additional info now in vignette 1 to explain the new {sf}, tribal, and big data functionality

A few notes:

  • I left NULL as the default for the aoi_sf argument instead of "null" because the character version didn't work properly
  • I had hoped to work on issues related to character length limits in queries, as discussed with Cristina, but ran out of time
  • From my tests it didn't seem like the way that data are indexed by calendar date affected query speed
  • Please let me know if I can provide any other info on any of this! For example I didn't include any info from speed tests to avoid overwhelming amounts of info here. Thanks for your help.

Closes #361, closes #427, closes #345, closes #159

hillarymarler and others added 30 commits September 18, 2024 13:26
Potential fix for Field table bug
I included a few spots where TADA.MonitoringLocationIdentifier or TADA.MonitoringLocationType may be appropriate

but if they don't see appropriate for change, please feel free to revert.
Updates to nearby sites
@hillarymarler
Copy link
Collaborator

hillarymarler commented Mar 14, 2025

For TADA_FindNearby sites, the change that is causing the much longer run time is the inclusion of fetchNHD to only group sites that are within the same catchment. I have not tried this yet, but it would be possible to use the adjacency matrix approach only using the buffer distance, without grouping by catchment.

We could offer users the option as to whether they want to group by catchment or not and explain that grouping by catchment will result in a longer run time. I can make the updates for whatever route we choose.

That is a potential option to prevent returning multiple groups for a single monitoring location and reduce the current run time. So I think the three options are really: 1) incorporate catchments (longer run time) , 2) rely only on buffer distance for grouping (shorter run time) or 3) users decide if they want to incorporate catchments when finding nearby sites.

@hillarymarler
Copy link
Collaborator

hillarymarler commented Mar 14, 2025

There are occasional test failures of ('test-ResultFlagsIndependent.R:137:3'): QC results are not flagged as Continuous ──
unique(cont_QC$TADA.ActivityType.Flag) == "Non_QC" is not TRUE

I am working on troubleshooting this one.

Update: I updated the test to be more inclusive of allowable values in TADA.ActivityType.Flag (anything that is not one of the QC variations is acceptable)

updated test to check for all QC options and fail if it finds them, previous test was too restrictive as it expected only "Non_QC" as a valid TADA.ActivityTpye.Flag
Copy link
Collaborator

@hillarymarler hillarymarler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good to me. Are there any specific functions you'd like me to test?

EPA304A.PollutantName, ATTAINS.ParameterName
) %>%
dplyr::arrange(organization_identifier, TADA.CharacteristicName) %>%
dplyr::arrange(organization_identifier) %>%
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to also arrange by TADA.ComparableDataIdentifier so it is easier for users to find a specific row?

@hillarymarler
Copy link
Collaborator

I am still seeing occasional failures of the continuous/QC results test. They seem to be related to TADA_RandomTestingData return empty data frames (which it should not do), so I'm taking a look at that now.

@cristinamullin cristinamullin merged commit e090929 into develop Mar 14, 2025
6 of 7 checks passed
@cristinamullin cristinamullin deleted the prerelease branch March 14, 2025 18:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

7 participants