-
Notifications
You must be signed in to change notification settings - Fork 23
Attains parameter to wqx characteristic alias table update and create the cst internal ref file #698
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
wokenny13
commented
Oct 17, 2025
- Changed the name of the reference file to ATTAINSParamToWQXCharRef.
- Updated the ATTAINSParamToWQXCharRef file from the WQX Characteristic Alias Table
- Created the CST internal ref file. This is in a clean format to be used for future TADA functions in the future.
- Created an optional review table for potential additional ATTAINS to WQX alias that could be added as additional rows to the WQX Characteristic Alias Table (these will need to be reviewed). See TADA_AdditionalCharAliasForReview(). This does not create an internal ref file of any kind. We can consider moving this function to utilities? This function uses an exact/like match logic to help identify if there are potential additional matches between ATTAINS Parameter Names and WQX Characteristic Names.
Changed the name of the reference file to ATTAINSParamToWQXCharRef. Updated the ATTAINSParamToWQXCharRef file from the WQX Characteristic Alias Table Created the CST internal ref file. This is in a clean format to be used in the future. Created an optional review for potential additional ATTAINS to WQX alias that could be added as additional rows to the WQX Characteristic Alias Table (these will need to be reviewed).
|
When auto_assign = TRUE for creating the auto assign crosswalk between ATTAINS.ParameterName and TADA.CharacteristicName/TADA.ComparableDataIdenitifer in TADA_CreateParamRef and TADA_DefineCriteriaMethodology, the TADA alias table used to be 1-1, but there will likely be 1 to many matches now. Will need to think through how this will impact Mod 3 criteria and methods workflow and how to handle appropriately. |
…lias-Table-Update-and-Create-the-CST-Internal-Ref-File
see examples for output dataframe which shows potential additional alias matches. added @export added draft test-that to ensure ATTAINS param domain and CST pollutant name domain are up to date in the ref file
|
@wokenny13 - would you like any help with the checks? Or just review the new changes? |
|
It is fine to just review the changes. I can work on the checks. |
removing intermediate variables, and add to global variables in utilities. some minor code comment cleanups
hillarymarler
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left comments on a few minor things - warning messages for many-to-many relationships, intermediate objects, etc.
I'm excited to see all of this incorporated into the mod 3 workflow!
| return(ATTAINSParameterWQPCharRef_Cached) | ||
| TADA_GetATTAINSParamToWQPCharRef <- function(charAliasType = c("All", "ATTAINS")) { | ||
|
|
||
| charAliasType <- match.arg(charAliasType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ATTAINSParamRef <- ATTAINS.raw[, "name", drop = FALSE] | ||
|
|
||
| # Create the initial ATTAINS param to WQX char crosswalk | ||
| if(charAliasType == "ATTAINS") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The condition has length > 1 error appears in both the "ATTAINS" and "All" sections too.
| return(ATTAINSWQX2.0_non_matched3) | ||
| } | ||
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wokenny13 - this is a great function! The different steps you use to match/compare are logical and well thought out. I think some intermediate objects can be removed from this one too during/at the end of the workflow. Also, a few more comments on what is happening in some of the intermediate tables generated might be helpful for future maintenance and review.
|
|
||
| # Find the first row that has all values populated. This will indicate the column names of the CST data frame. | ||
| # Note: Why not use a static row number? The CST may get new entries that may change the start of the data frame's. | ||
| first_filled_row_index <- which(rowSums(is.na(CST.raw)) == 0)[1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to write any of this as an internal function so it can be used in both the function and this test?
add quiet to rExpertQuery function used in test
…lias-Table-Update-and-Create-the-CST-Internal-Ref-File
…lias-Table-Update-and-Create-the-CST-Internal-Ref-File
…lias-Table-Update-and-Create-the-CST-Internal-Ref-File
The current placeholder for EPA304a criteria and methods using the CST. Changed the extdata name to EPACST to clarify this. +minor text fixes
condensed and cleaned up code and updated documentation (examples and params) in TADA_AdditionalCharAliasForReview moved TADA CriteriaSearchTool to CriteriaRefTables.R
…lias-Table-Update-and-Create-the-CST-Internal-Ref-File
hillarymarler
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, the updates look good to me. I was able to run TADA_CreateParamRef successfully on random data sets from a few different states and the output made sense.
I think we could take a look as a team at some of the documentation related to the Mod 3 functions and work to make it a little more concise and clear, especially some of the text regarding review by the TADA team or future development. But that could be done as part of a future PR.
I pushed a couple of small commits - a minor grammar change or two in documentation and the addition of spsUtil::quiet. I suggest applying that when we use rExpertQuery functions as part of larger TADA functions as the printed rExpertQuery messages are probably less useful in the greater TADA context.
| #' drop down list of all ATTAINS parameters that have been listed as a cause in | ||
| #' prior ATTAINS cycle for the organization selected in the function input 'org_id'. | ||
| #' It also highlights the cells in which users should input information. The excel | ||
| #' spreadsheet will be automatically downloaded to a user's downloads folder path. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should users have the option to specify a path for the download? (With the download folder set as the default). I'm thinking about situations where users might be using this and other TADA functions as part of their own assessment package or tool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At some point in the past, it was decided to keep the path to just the downloads path. It would be an extra param argument input that would be then needed to be used throughout all of the other TADA functions, adding an extra step in the process if a user chooses to proceed with using a different path.
I am thinking it could still be useful to allow users to change the path and this is something we can discuss more in future meetings!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At line 782-784: "Future development efforts may allow users to pull in magnitude values
#' for an ATTAINS parameter through the Criteria Search Tool depending on a
#' users quality control and review of these metrics."
Does this mean that future efforts will allow users to pull in magnitude values (but those values will require review). Or that we will go through a quality control/review process as part of the function development?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Users would be the ones required to the review process if we do pull in the magnitude values. This would be dependent on doing a crosswalk of CST pollutant names with the ATTAINS.ParameterName. The WQX Characteristic alias table in the ATTAINSParamToWQPCharRef could also contain this crosswalk if we do decide to proceed with this route in the future (make sure we choose to source from "all" to show what matches have been found for WQX Characteristics and CST pollutant names and see if an ATTAINS.Parameter is also listed as an alias for the same source.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At line 789 - should we have a link to a doc listing the TADA priority characteristics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have this on line 793 edited
'TADAPriorityChar <- utils::read.csv(system.file("extdata", "TADAPriorityCharUnitRef.csv", package = "EPATADA"))'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at 817 - could users also run TADA_GetATTAINSOrgIDsRef() to see the list?
| #' so subsequent calls will be faster. | ||
| #' | ||
| #' @return Updated sysdata.rda with updated ATTAINSParameterWQPCharRef object | ||
| #' @param charAliasType A string value to indicate the WQX data source to use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be an example of a use case where charAliasType is something other than "ATTAINS"? I'm having trouble picturing how this would look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. @wokenny13 are any of the other alias types useful/logical for use in this function? I bolded a few to consider.
Here is the full list of alias types from WQX:
ATTAINS.PARAMETER
CAS NUMBER
CST.POLLUTANT
CST.STD.POLLUTANT**
EPA ID (SUBSTANCE REGISTRY #)
ITIS TAXON SERIAL NUMBER
MOLECULAR WEIGHT
NOAA - National Center for Environm
NWIS PARM CODE
ONTOLOGY - HYDRO.GEODAB.EU
RETIRED NAME
STANDARDIZE NAME (Normalized)
STORET CHARACTERISTIC NAME
STORET PARM CODE
SYSTEMATIC NAME
TAXON COMMON NAME
WQP COMPARABLE NAME
WQX SYNONYM REGISTRY (validation)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: WQP COMPARABLE NAME can be populated by the TADA team to support any additional matches that would otherwise be added manually.
No longer requires a one to one match for WQX Characteristics to ATTAINS Parameter.
changed autofill in TADA_DefineCriteriaMethodology to "Org" for ATTAINS ParameterName to WQX Char crosswalk. We should discuss how to handle this in a future TADA team meeting. Minor updates to the EPACST.csv (R8 seems to be working on a more up to date and reviewed EPA304s criteria table, this table has the information crosswalk from the CST for the time being) shorten TADA mod 3 example data frame names in Mod3Vignette - AtlOptions
…lias-Table-Update-and-Create-the-CST-Internal-Ref-File