Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@hillarymarler
Copy link
Collaborator

New approach to flagging continuous data. Instead of relying on a loop, this version groups data (TADA.ComparableDataIdentifier, MonitoringLocation, various depth fields), arranges the date consecutively by ActivityStartDateTime and then calculates the time difference between each result and the result before and after it within each group.

There are some slight differences in the results between this function and the original. Additional results are identified as "discrete" in this version. I think this is correct as when I reviewed the data set, in each of these cases, the result labeled as "discrete" was the only one in its "group". I am not sure why they were identified as continuous in the previous function.

I have an example data set with results from the old function. It was too large to attach here, but I can send it if you would like to compare its results to results from the updated function.

@hillarymarler hillarymarler requested a review from renaemyers July 18, 2024 14:58
@wokenny13
Copy link
Collaborator

Ran through the examples with Data_Nutrients_UT and looked into using different time_difference rather than the default 4. Also ran through an own example dataset with just Florida, and the flagging of continuous data seems to have worked as expected.

Run time was quick, and I think the options to flag or clean the dataset for continuous data is nice. I tested fractional time values, and it does run and seem to work, but perhaps fractional values are not of much interest.

Copy link
Collaborator

@wokenny13 wokenny13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ran through the examples with Data_Nutrients_UT and looked into using different time_difference rather than the default 4. Also ran through an own example dataset with just Florida, and the flagging of continuous data seems to have worked as expected.

Run time was quick, and I think the options to flag or clean the dataset for continuous data is nice. I tested fractional time values, and it does run and seem to work, but perhaps fractional values are not of much interest.

#' in hours between measurements of the same TADA.ComparableDataIdentifier taken at the same
#' latitude, longitude, and depth. This is used to search for
#' continuous time series data (i.e., if there are multiple measurements within the selected
#' time_difference, then the row will be flagged as continuous). The default time window is 4 hours.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would decimal values for the time windows work? I've tried an example with 0.25 and even 0.20 and it seems to have worked, but didn't play around too much with fractional times

Copy link
Contributor

@renaemyers renaemyers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The filtering approach is much better than the loop that the function previously used. I had run times between a couple seconds using the Data_Nutrients_UT example data and 2.5 minutes using a large custom dataset of about 207,000 rows.

Copy link
Collaborator

@cefergus cefergus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the revised function looks good from what I could see. I ran the revised function on Fond du Lac data and a random TADA test data set. I looked for observations with same location, depth, comparable data identifier, and organization and it looked like the function is correctly flagging continuous vs discrete observations. Some observations labeled "Field Msr/Obs" and flagged as "Continuous" look like duplicated result values. But maybe a different function can flag those incidences.

@hillarymarler
Copy link
Collaborator Author

Thanks for the reviews and comments, @renaemyers and @cefergus.

I know we need to revisit the find QA/QC and paired replicates functions again at some point in the future, so I will copy @cefergus comment to the appropriate paired replicates issue or create a new one, so that we can consider it in the context of future updates to be applied to QC or continuous flagging functions.

@hillarymarler hillarymarler mentioned this pull request Aug 6, 2024
6 tasks
@hillarymarler hillarymarler merged commit 6ac2171 into develop Aug 6, 2024
@hillarymarler hillarymarler deleted the 494-epatada-continuous-data-flag-error branch August 6, 2024 17:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

EPATADA Continuous Data Flag Error TADA_FlagContinuousData takes too long to run

5 participants