Inputs spec chars eh #205

ehinman · 2023-02-09T22:26:49Z

This branch consists of:

Consistent date format YYYY-MM-DD in all functions and examples
Column name inputs to names that match the query input (project and organization) for TADAdataRetrieval
New special characters function that acts on a column input rather than hard coded ResultMeasureValue and DetectionQuantitationTypeLimit. MeasureValue
Spec char function also recognizes more special character situations (% and #,####) and flags them for the user
Spec char function does not flag scientific notation as text
Added lines to replace current special characters function to new function if desired/approved.

- all dates are YYYY-MM-DD in functions, examples - changed TADAdataRetrieval inputs to "project" and "organization" to match WQP queries.

- could replace MeasureValueSpecialCharacters, more generalized, not column specific, handles % and flags numbers with commas as well - Fixed the issue flagging scientific notation as text.

- added new functions to try

ResultIdentifiers are unique to samples. There may be multiple detection limit types related to a result identifier, but result identifiers connect to one observation and one activity. This change in code streamlines the process so that duplicates are only checked on result identifiers represented in two or more rows.

cristinamullin

Overall these changes look great! I added a few comments/suggestions for edits & will try to dig up some examples for you with exact duplicates and potential duplicates.

R/DataDiscoveryRetrieval.R

R/Utilities.R

cristinamullin · 2023-02-13T14:33:29Z

R/Utilities.R

+  # Remove duplicate rows - turned into a test because duplicated() takes a long
+  # time acting on all columns in a large dataset.
+  if(!length(unique(.data$ResultIdentifier))==dim(.data)[1]){
+    print("Duplicate records may be present. Filtering to unique records. This may take a while on large datasets.")


change to "Duplicate records are present" ?

I wrote "may be present" because I could potentially see someone using autoclean on a dataset where they joined detection limit data to result data and thus have unique rows with the same result identifier and different detection limits. In this case, the function will check to make sure these cases are truly unique.

- updated data harmonization vignette to include information on other TADAdataRetrieval inputs (project and organization), and information on the discrepancy between WQP and dataRetrieval regarding date format. - Added more commentary on input specification origins and date format discrepancy in TADAdataRetrieval function - Added date format check to help user ensure dates are in format YYYY-MM-DD. - updated documentation for ConvertSpecialChars

ehinman added 3 commits February 9, 2023 12:17

consistent dates, consistent inputs

d660d08

- all dates are YYYY-MM-DD in functions, examples - changed TADAdataRetrieval inputs to "project" and "organization" to match WQP queries.

new special chars function

1d68901

- could replace MeasureValueSpecialCharacters, more generalized, not column specific, handles % and flags numbers with commas as well - Fixed the issue flagging scientific notation as text.

Update Utilities.R

227b0b9

- added new functions to try

ehinman requested a review from cristinamullin February 9, 2023 22:26

This was linked to issues Feb 10, 2023

date format #202

Closed

TADAdataRetrieval function input name consistency with WQP UI, WQP Profiles, and dataRetrieval #196

Closed

Merge branch 'develop' into inputs_spec_chars_eh

07454ea

cristinamullin requested changes Feb 13, 2023

View reviewed changes

ehinman requested a review from cristinamullin February 13, 2023 16:35

cristinamullin approved these changes Feb 13, 2023

View reviewed changes

cristinamullin merged commit d3f13e7 into develop Feb 13, 2023

cristinamullin deleted the inputs_spec_chars_eh branch February 13, 2023 20:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Inputs spec chars eh #205

Inputs spec chars eh #205

Uh oh!

ehinman commented Feb 9, 2023

Uh oh!

cristinamullin left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cristinamullin Feb 13, 2023

Uh oh!

ehinman Feb 13, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Inputs spec chars eh #205

Inputs spec chars eh #205

Uh oh!

Conversation

ehinman commented Feb 9, 2023

Uh oh!

cristinamullin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cristinamullin Feb 13, 2023

Choose a reason for hiding this comment

Uh oh!

ehinman Feb 13, 2023

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants