Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@wokenny13
Copy link
Collaborator

NA logical value for TADA.MonitoringLocationIdentifier created in TADA_FindNearbySites() function it seems.

versus

"NA" character values for TADA.MonitoringLocationIdentifier in the dupsdat dataframe in this function.

Joining causes additional rows created in its return value due to mismatch of type.

Convert logical NA to character "NA".

If TADA_FindNearbySite does not return NA for TADA.MonitoringLocationIdentifier, it will return the respective TADA.MonitoringLocationIdentifier still.

NA logical value for TADA.MonitoringLocationIdentifier created in TADA_FindNearbySites() function it seems.

versus
"NA" character values for TADA.MonitoringLocationIdentifier in the dupsdat dataframe in this function.

Joining causes additional rows due to mismatch.

Convert logical NA to character "NA".

If TADA_FindNearbySite does not return NA for TADA.MonitoringLocationIdentifier, it will return the respective TADA.MonitoringLocationIdentifier still.
@wokenny13 wokenny13 changed the title Update ResultFlagsIndependent.R 513-occasional-test-failure-tada_findpotentialduplicates-does-not-grow-dataset Aug 23, 2024
@cristinamullin
Copy link
Collaborator

Good catch Kenny. We've run into similar issues before and applied a similar approach to resolve. We changed NA to "NA - Not Available" (character).

Should we convert to "NA - Not Available" here as well for consistency with other TADA functions?

@wokenny13
Copy link
Collaborator Author

Keeping any NA changes to "NA - Not Available" for consistency sounds like it would be good.

In general, does storing a value as a logical/missing NA versus a character NA make any difference in speed/performance/memory?

Best spot to handle this change?

I think dupsdata convert the logical NA for TADA.MonitoringLocationIdentifier to a character in this chunk of the code in TADA_FindPotentialDuplicatesMultipleOrgs. In this case, since a full join is used at the end of the code, I believe it would require then changing "NA" found in dupsdata to "NA - Not Available"

dupsdat <- dupsdat %>% dplyr::rename(SingleNearbyGroup = TADA.MonitoringLocationIdentifier) %>% dplyr::mutate( TADA.MonitoringLocationIdentifier = paste(SingleNearbyGroup, sep = ","), TADA.ResultSelectedMultipleOrgs = ifelse(ResultIdentifier %in% duppicks$ResultIdentifier, "Y", "N") ) %>% dplyr::select(-SingleNearbyGroup)

Or would we want to make edits to NA value in lines 841 under TADA_FindNearbySites() where

if (!"TADA.MonitoringLocationIdentifier" %in% colnames(.data)) { .data$TADA.MonitoringLocationIdentifier <- NA }

@hillarymarler
Copy link
Collaborator

@wokenny13 - is this ready for review?

@wokenny13
Copy link
Collaborator Author

@hillarymarler

Good catch Kenny. We've run into similar issues before and applied a similar approach to resolve. We changed NA to "NA - Not Available" (character).

Should we convert to "NA - Not Available" here as well for consistency with other TADA functions?

I will work on converting it to "NA - Not Available" for consistency. I will let you know when I get this finished and for it to be reviewed then

@hillarymarler
Copy link
Collaborator

Sounds great - thank you!

@wokenny13
Copy link
Collaborator Author

@hillarymarler this is ready for review

@hillarymarler hillarymarler merged commit 436a98f into develop Aug 26, 2024
@hillarymarler hillarymarler deleted the 513-occasional-test-failure-tada_findpotentialduplicates-does-not-grow-dataset branch August 26, 2024 20:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Occasional test failure - TADA_FindPotentialDuplicatesMultipleOrgs does not grow dataset

4 participants