494 epatada continuous data flag error #501

hillarymarler · 2024-07-17T12:39:41Z

New approach to flagging continuous data. Instead of relying on a loop, this version groups data (TADA.ComparableDataIdentifier, MonitoringLocation, various depth fields), arranges the date consecutively by ActivityStartDateTime and then calculates the time difference between each result and the result before and after it within each group.

There are some slight differences in the results between this function and the original. Additional results are identified as "discrete" in this version. I think this is correct as when I reviewed the data set, in each of these cases, the result labeled as "discrete" was the only one in its "group". I am not sure why they were identified as continuous in the previous function.

I have an example data set with results from the old function. It was too large to attach here, but I can send it if you would like to compare its results to results from the updated function.

Updated function to use grouping, not a loop to find matches. This update is taking about 1.3 min on a ~100,000 row data set, while the previous version took ~45 min.

Corrected issue with counting rows to check how many cont results there are

Moved location of flag.data to address check issue

Styler updates, checked and corrected {} placement in function

wokenny13 · 2024-07-18T20:32:43Z

Ran through the examples with Data_Nutrients_UT and looked into using different time_difference rather than the default 4. Also ran through an own example dataset with just Florida, and the flagging of continuous data seems to have worked as expected.

Run time was quick, and I think the options to flag or clean the dataset for continuous data is nice. I tested fractional time values, and it does run and seem to work, but perhaps fractional values are not of much interest.

wokenny13

Ran through the examples with Data_Nutrients_UT and looked into using different time_difference rather than the default 4. Also ran through an own example dataset with just Florida, and the flagging of continuous data seems to have worked as expected.

Run time was quick, and I think the options to flag or clean the dataset for continuous data is nice. I tested fractional time values, and it does run and seem to work, but perhaps fractional values are not of much interest.

wokenny13 · 2024-07-18T20:22:50Z

R/ResultFlagsIndependent.R

+#' in hours between measurements of the same TADA.ComparableDataIdentifier taken at the same
+#' latitude, longitude, and depth. This is used to search for
+#' continuous time series data (i.e., if there are multiple measurements within the selected
+#' time_difference, then the row will be flagged as continuous). The default time window is 4 hours.


Would decimal values for the time windows work? I've tried an example with 0.25 and even 0.20 and it seems to have worked, but didn't play around too much with fractional times

Update examples, remove extraneous column

R/ResultFlagsIndependent.R

renaemyers

The filtering approach is much better than the loop that the function previously used. I had run times between a couple seconds using the Data_Nutrients_UT example data and 2.5 minutes using a large custom dataset of about 207,000 rows.

cefergus

I think the revised function looks good from what I could see. I ran the revised function on Fond du Lac data and a random TADA test data set. I looked for observations with same location, depth, comparable data identifier, and organization and it looked like the function is correctly flagging continuous vs discrete observations. Some observations labeled "Field Msr/Obs" and flagged as "Continuous" look like duplicated result values. But maybe a different function can flag those incidences.

hillarymarler · 2024-08-06T16:26:08Z

Thanks for the reviews and comments, @renaemyers and @cefergus.

I know we need to revisit the find QA/QC and paired replicates functions again at some point in the future, so I will copy @cefergus comment to the appropriate paired replicates issue or create a new one, so that we can consider it in the context of future updates to be applied to QC or continuous flagging functions.

hillarymarler added 13 commits July 10, 2024 09:20

Testing new approach to finding matches

788a811

Testing new draft of find continuous

ce3a7f0

Update ResultFlagsIndependent.R

0d9dd16

Update ResultFlagsIndependent.R

5f3fb4d

Merge branch 'develop' into 494-epatada-continuous-data-flag-error

8ec987b

Update ResultFlagsIndependent.R

17da9ed

Update ResultFlagsIndependent.R

05e23b6

Update ResultFlagsIndependent.R

4ba75db

Update ResultFlagsIndependent.R

763325f

Documentation update for TADA_FlagContinuousData

3a6717d

Update TADA_FlagContinuousData

476a759

Updated function to use grouping, not a loop to find matches. This update is taking about 1.3 min on a ~100,000 row data set, while the previous version took ~45 min.

Update TADA_FlagContinuousData

a26e01e

Corrected issue with counting rows to check how many cont results there are

Update TADA_FlagContinuousData.Rd

785d161

hillarymarler assigned cefergus and wokenny13 Jul 17, 2024

This was linked to issues Jul 17, 2024

EPATADA Continuous Data Flag Error #494

Closed

TADA_FlagContinuousData takes too long to run #466

Closed

Merge branch 'develop' into 494-epatada-continuous-data-flag-error

e6dc512

hillarymarler unassigned cefergus and wokenny13 Jul 17, 2024

hillarymarler requested review from cefergus and wokenny13 July 17, 2024 12:44

hillarymarler added 5 commits July 17, 2024 12:45

Update ResultFlagsIndependent.R

4a469d0

Moved location of flag.data to address check issue

Update ResultFlagsIndependent.R

f1b9a12

Update ResultFlagsIndependent.R

eb4c987

Update ResultFlagsIndependent.R

964b5b5

Update ResultFlagsIndependent.R

1d2e008

Styler updates, checked and corrected {} placement in function

hillarymarler requested a review from renaemyers July 18, 2024 14:58

wokenny13 approved these changes Jul 18, 2024

View reviewed changes

hillarymarler and others added 4 commits July 29, 2024 12:30

Merge branch 'develop' into 494-epatada-continuous-data-flag-error

6f5e48c

Update ResultFlagsIndependent.R

14acbc8

Update examples, remove extraneous column

Update ResultFlagsIndependent.R

16cb06f

Ref and example updates

ca2dc83

renaemyers reviewed Aug 5, 2024

View reviewed changes

R/ResultFlagsIndependent.R Show resolved Hide resolved

renaemyers approved these changes Aug 5, 2024

View reviewed changes

cefergus approved these changes Aug 6, 2024

View reviewed changes

hillarymarler mentioned this pull request Aug 6, 2024

Replicate QC flag #393

Open

6 tasks

hillarymarler merged commit 6ac2171 into develop Aug 6, 2024

hillarymarler deleted the 494-epatada-continuous-data-flag-error branch August 6, 2024 17:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

494 epatada continuous data flag error #501

494 epatada continuous data flag error #501

Uh oh!

hillarymarler commented Jul 17, 2024

Uh oh!

wokenny13 commented Jul 18, 2024

Uh oh!

wokenny13 left a comment

Uh oh!

wokenny13 Jul 18, 2024

Uh oh!

Uh oh!

renaemyers left a comment

Uh oh!

cefergus left a comment

Uh oh!

hillarymarler commented Aug 6, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

494 epatada continuous data flag error #501

494 epatada continuous data flag error #501

Uh oh!

Conversation

hillarymarler commented Jul 17, 2024

Uh oh!

wokenny13 commented Jul 18, 2024

Uh oh!

wokenny13 left a comment

Choose a reason for hiding this comment

Uh oh!

wokenny13 Jul 18, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

renaemyers left a comment

Choose a reason for hiding this comment

Uh oh!

cefergus left a comment

Choose a reason for hiding this comment

Uh oh!

hillarymarler commented Aug 6, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants