From f0b6b1c30d0bd5456f983592cae59dc75108980e Mon Sep 17 00:00:00 2001
From: cristinamullin <46969696+cristinamullin@users.noreply.github.com>
Date: Fri, 14 Mar 2025 14:44:53 -0400
Subject: [PATCH 1/7] Create GeospatialDataIntegration.Rmd

---
 vignettes/GeospatialDataIntegration.Rmd | 647 ++++++++++++++++++++++++
 1 file changed, 647 insertions(+)
 create mode 100644 vignettes/GeospatialDataIntegration.Rmd

diff --git a/vignettes/GeospatialDataIntegration.Rmd b/vignettes/GeospatialDataIntegration.Rmd
new file mode 100644
index 000000000..3223735b7
--- /dev/null
+++ b/vignettes/GeospatialDataIntegration.Rmd
@@ -0,0 +1,647 @@
+---
+title: "Geospatial Data Integration"
+format: html
+editor: visual
+date: "`r Sys.Date()`"
+output: 
+  rmarkdown::html_vignette:
+    toc: true
+    fig_caption: yes
+    fig_height: 8
+    fig_width: 8
+vignette: >
+  %\VignetteEncoding{UTF-8}
+  %\VignetteIndexEntry{Geospatial Data Integration}
+  %\VignetteEngine{knitr::rmarkdown}
+editor_options:
+  chunk_output_type: console
+  markdown:
+    wrap: 72
+authors: 
+  - name: "Cristina Mullin"
+    orcid: 0000-0002-0615-6087
+    affiliation: EPA
+  - name: "Hillary Marler"
+    affiliation: EPA
+  - name: "Marc Weber"
+    affiliation: EPA
+  - name: "Dave Blodgett"
+    affiliation: USGS
+  - name: "Michael Dumelle"
+    affiliation: EPA
+  - name: "Shelly Thawley"
+    affiliation: EPA
+  - name: "Kenny Wong"
+    affiliation: ORISE at EPA
+---
+
+```{r knitsetup, include = FALSE}
+library(knitr)
+
+knitr::opts_chunk$set(
+  echo = TRUE,
+  warning = FALSE,
+  message = FALSE
+)
+```
+
+# Overview
+
+Let's walk through how to create an efficient and reproducible workflow
+that integrates several R Packages developed by the U.S. Environmental
+Protection Agency (EPA) and the U.S. Geological Survey (USGS) to support
+water quality programs (such as the Clean Water Act) and geospatial
+(watershed or waterbody) level analyses.
+
+This workflow demonstrates potential uses (beyond their original
+collection purpose) for publicly available water quality data from WQP.
+To start, participants will learn how to use EPA's Tools for Automated
+Data Analysis (TADA) R Package to retrieve, wrangle, harmonize, quality
+check, visualize and analyze WQP data from multiple organizations.
+
+Next, we will showcase how to bring in other web services and libraries
+for easy integration of additional hydrologic and geospatial data. We
+then plan to touch briefly on packages that can assist with building
+statistical models. Finally, we will demonstrate an example for
+analyzing water quality by Assessment Units (AUs), which are state or
+tribal nation defined watershed or waterbody areas used for CWA
+assessments and reporting water quality conditions to EPA and the
+public.
+
+**Intended Audience**
+
+Water Quality eXchange (WQX) and Water Quality Portal (WQP) community,
+Clean Water Act (CWA) community (EPA, States and Tribal Nations), water
+quality and geospatial data analysts/researchers, EPA/USGS and other
+federal agencies.
+
+**Leveraged R Packages**
+
+-   EPA: EPATADA, StreamCatTools, spsurvey, spmodel, SSN2
+
+-   USGS: dataRetrieval, nhdplusTools, hydroloom
+
+-   Fundamental geospatial packages: sf, prism, terra, leaflet and tmap
+
+# Install and load packages
+
+We will be leveraging the EPATADA R Package for WQP data retrieval,
+cleaning, visualization and other steps needed to prepare for analysis.
+Let's dive into Green Bay, WI!
+
+First, install and load the remotes package specifying the repo. This is
+needed before installing EPATADA because it is only available on GitHub
+(not CRAN).
+
+```{r install_remotes, results = 'hide'}
+install.packages("remotes",
+  repos = "http://cran.us.r-project.org"
+)
+library(remotes)
+```
+
+Next, install and load EPATADA using the remotes package. All other
+dependencies for this workflow will be downloaded automatically.
+
+```{r devsetup, include = F, eval = T}
+remotes::install_github("USEPA/EPATADA", 
+                        ref = "prerelease",
+                        dependencies = TRUE,
+                        force = TRUE)
+library(EPATADA)
+```
+
+```{r usersetup, eval = F, results = 'hide'}
+remotes::install_github("USEPA/EPATADA", 
+                        ref = "develop",
+                        dependencies = TRUE,
+                        force = TRUE)
+library(EPATADA)
+```
+
+It's go time! Let's time our process.
+
+```{r startime}
+# Record start time
+start.time <- Sys.time()
+```
+
+# WQP data discovery and cleaning
+
+This is an abbreviated introduction to key TADA Module 1 WQP Data
+Discovery and Cleaning functions. Additional functions and a more
+detailed example workflow is available
+[here](https://usepa.github.io/EPATADA/articles/TADAModule1.html).
+
+**Retrieve data from the WQP**
+
+In this example, we will first use EPA's How's My Waterway (HMW)
+application to find an applicable Hydrologic Unit Code (HUC) for our
+area of interest - the [Fox River, Green Bay,
+WI](https://mywaterway.epa.gov/community/040302040405/monitoring). Next,
+let's query the WQP using the identified HUC, state abbreviation, and a
+date range. In this example, we'll start by pulling all data available
+in the WQP for this HUC 12 in Wisconsin for the last 5 years.
+
+WATERSHED: City of Green Bay-Fox River (040302040405)
+
+```{r getdata}
+# # Uncomment to query the WQP
+# GreenBay_FoxRiver <- TADA_DataRetrieval(
+#   statecode = "WI",
+#   startDate = "2015-01-01",
+#   endDate = "2024-12-30",
+#   huc = c("040302040405"),
+#   applyautoclean = TRUE
+# )
+
+GreenBay_FoxRiver <- NMCWorkshopData::GreenBay_FoxRiver
+```
+
+**Wrangle**
+
+Now, let's use EPATADA functions to review, visualize, and whittle the
+returned WQP data down to include only results that are applicable to
+our water quality analysis and area of interest.
+
+**Flag and remove duplicate results from a single organization**
+
+```{r single-org-dups}
+GreenBay_FoxRiver <- TADA_FindPotentialDuplicatesSingleOrg(GreenBay_FoxRiver)
+
+GreenBay_FoxRiver <- dplyr::filter(GreenBay_FoxRiver, TADA.SingleOrgDup.Flag == "Unique")
+```
+
+**Autoclean**
+
+```{r autoclean}
+GreenBay_FoxRiver <- TADA_AutoClean(GreenBay_FoxRiver)
+```
+
+**Handle censored results**
+
+```{r censored}
+GreenBay_FoxRiver <- TADA_SimpleCensoredMethods(GreenBay_FoxRiver, nd_method = "multiplier", nd_multiplier = 0.5, od_method = "as-is", od_multiplier = "null")
+```
+
+**Flag and remove duplicates from multiple organizations**
+
+Two organizations sometimes submit the same exact data to WQP.
+
+```{r multi-org-dups, eval = F}
+GreenBay_FoxRiver <- TADA_FindPotentialDuplicatesMultipleOrgs(GreenBay_FoxRiver)
+
+GreenBay_FoxRiver <- dplyr::filter(GreenBay_FoxRiver, TADA.ResultSelectedMultipleOrgs == "Y")
+```
+
+**Filter out any remaining irrelevant data, NA's and empty columns**
+
+```{r autofilter}
+unique(GreenBay_FoxRiver$TADA.ResultMeasureValueDataTypes.Flag)
+
+sum(is.na(GreenBay_FoxRiver$TADA.ResultMeasureValue))
+
+GreenBay_FoxRiver <- TADA_AutoFilter(GreenBay_FoxRiver)
+# Topic for discussion: Would users like results with NA units to be dealt with in TADA_AutoFilter or in the TADA_ConvertResultUnits()? These have not yet been addressed.
+
+unique(GreenBay_FoxRiver$TADA.ResultMeasureValueDataTypes.Flag)
+
+sum(is.na(GreenBay_FoxRiver$TADA.ResultMeasureValue))
+```
+
+**Flag and remove QAQC samples and suspect results**
+
+```{r flag}
+GreenBay_FoxRiver <- TADA_RunKeyFlagFunctions(GreenBay_FoxRiver, clean = TRUE)
+```
+
+**Flag results above and below threshold, but do not remove them**
+
+```{r thresholds}
+GreenBay_FoxRiver <- TADA_FlagAboveThreshold(GreenBay_FoxRiver, clean = FALSE, flaggedonly = FALSE)
+
+GreenBay_FoxRiver <- TADA_FlagBelowThreshold(GreenBay_FoxRiver, clean = FALSE, flaggedonly = FALSE)
+```
+
+**Harmonize synonyms across characteristic, fraction, and speciation**
+
+```{r harmonize}
+GreenBay_FoxRiver <- TADA_HarmonizeSynonyms(GreenBay_FoxRiver)
+```
+
+**Calculate Total N and Total P from various species and fractions**
+
+```{r TN-TP}
+GreenBay_FoxRiver <- TADA_CalculateTotalNP(GreenBay_FoxRiver, daily_agg = "max")
+```
+
+**Review unique characteristic, fraction, and species combinations**
+
+```{r table}
+GreenBay_FoxRiver_Counts <- TADA_FieldValuesTable(GreenBay_FoxRiver, field = "TADA.ComparableDataIdentifier")
+
+DT::datatable(GreenBay_FoxRiver_Counts, fillContainer = TRUE)
+```
+
+**Filter to focus on frequently monitored characteristics in example
+data**
+
+```{r subset}
+GreenBay_FoxRiver_Subset <- GreenBay_FoxRiver %>%
+  dplyr::filter(TADA.ComparableDataIdentifier %in%
+      c("SPECIFIC CONDUCTANCE_NA_NA_US/CM", 
+      "PH_NA_NA_NA", 
+      "TOTAL NITROGEN, MIXED FORMS_UNFILTERED_AS N_MG/L",
+      "TOTAL PHOSPHORUS, MIXED FORMS_UNFILTERED_AS P_UG/L",
+      "DISSOLVED OXYGEN (DO)_NA_NA_MG/L"))
+```
+
+**Review organizations for subset**
+
+```{r org-review}
+# Create pie of results by organization
+TADA_FieldValuesPie(GreenBay_FoxRiver_Subset, field = "OrganizationFormalName")
+```
+
+# Exploratory visualizations
+
+**Generate stats table**
+
+```{r stats}
+GreenBay_FoxRiver_Subset_Stats <- TADA_Stats(GreenBay_FoxRiver_Subset)
+
+DT::datatable(GreenBay_FoxRiver_Subset_Stats, fillContainer = TRUE)
+```
+
+**Generate scatterplot**
+
+```{r scatterplot}
+TADA_TwoCharacteristicScatterplot(GreenBay_FoxRiver_Subset, id_cols = "TADA.ComparableDataIdentifier",  groups = c("TOTAL PHOSPHORUS, MIXED FORMS_UNFILTERED_AS P_UG/L", "TOTAL NITROGEN, MIXED FORMS_UNFILTERED_AS N_MG/L"))
+```
+
+**Generate map**
+
+```{r map}
+TADA_OverviewMap(GreenBay_FoxRiver_Subset)
+
+GreenBay_FoxRiver = TADA_FlagCoordinates(GreenBay_FoxRiver_Subset, clean_outsideUSA = "change sign", clean_imprecise = FALSE)
+
+# CM note for TADA team discussion: Should results with NA lat/long be addressed within TADA_FlagCoordinates? For example, this df has NA lons from USGS that must be addressed before TADA_MakeSpatial can be run... 
+# sum(is.na(GreenBay_FoxRiver$LongitudeMeasure))
+GreenBay_FoxRiver_Subset <- GreenBay_FoxRiver_Subset[!is.na(GreenBay_FoxRiver_Subset$LongitudeMeasure),]
+```
+
+# Geospatial data integration
+
+## Make spatial
+
+Leverage TADA_MakeSpatial to transform a WQP dataframe into a geospatial
+sf object.
+
+```{r}
+GreenBay_FoxRiver_sf = TADA_MakeSpatial(GreenBay_FoxRiver_Subset)
+```
+
+Then create a unique identifier based on shared lat long values and
+filter to just the 25 unique locations.
+
+```{r}
+GreenBay_FoxRiver_sf$latlon <- paste0(GreenBay_FoxRiver_sf$TADA.LongitudeMeasure, GreenBay_FoxRiver_sf$TADA.LatitudeMeasure)
+
+GreenBay_FoxRiver_sf <- GreenBay_FoxRiver_sf |> 
+  dplyr::group_by(latlon) |> 
+  dplyr::mutate(loc_id = dplyr::cur_group_id())
+
+GreenBay_FoxRiver_sf_locs <- GreenBay_FoxRiver_sf |>
+  dplyr::filter(!duplicated(loc_id))
+```
+
+## Access NHDPlus COMIDs for sites
+
+We use `StreamCatTools` function `sc_get_comid` (which uses an
+`nhdplusTools` web service client) to get the comid for each location.
+
+```{r}
+GreenBay_FoxRiver_sf_locs$COMID <- as.integer(strsplit(StreamCatTools::sc_get_comid(GreenBay_FoxRiver_sf_locs), split = ",")[[1]])
+
+nhdplus_data <- nhdplusTools::subset_nhdplus(GreenBay_FoxRiver_sf_locs$COMID, nhdplus_data = "download")
+
+outlet <- dplyr::filter(nhdplus_data$NHDFlowline_Network, hydroseq == min(hydroseq)) 
+
+nhdplusTools::plot_nhdplus(bbox = sf::st_bbox(outlet))
+plot(sf::st_transform(sf::st_geometry(GreenBay_FoxRiver_sf_locs), 3857), add = TRUE)
+```
+
+## dataRetrieval/NLDI, nhdplusTools, hydroloom
+
+Do a network navigation and get NHDPlus for our data. Note that the
+network navigation only includes flowline geometry. `nhdplusTools`
+subsets all of the NHDPlus.
+
+```{r}
+all_network <- dataRetrieval::findNLDI(comid = outlet$comid, nav = "UT", distance_km = 500)
+
+# we could select only comids on network
+if(FALSE) # don't run this one
+nhdplus_data <- nhdplusTools::subset_nhdplus(comids = as.integer(all_network$UT_flowlines$nhdplus_comid), nhdplus_data = "download", flowline_only = FALSE)
+
+# or we could just get everything in the bbox to be sure we get non-network stuff too!
+nhdplus_data <- nhdplusTools::subset_nhdplus(
+  bbox = sf::st_bbox(all_network$UT_flowlines), 
+  nhdplus_data = "download", 
+  flowline_only = FALSE)
+
+# see ?nhdplusTools::subset_nhdplus for lots more options!
+
+sapply(nhdplus_data, nrow)
+
+sapply(nhdplus_data, names)
+```
+
+## Addressing sites to the network
+
+There are two forms of hydrographic addresses: catchment indexing and
+linear referencing. The former is established with a point in polygon
+analysis. The latter is more nuanced. The following block shows how to
+establish both with the data we just retrieved.
+
+Note that hydroloom is compatible with nhdplus and other attribute
+systems. See [hydroloom documentation for
+more!](https://doi-usgs.github.io/hydroloom/articles/hydroloom.html)
+
+```{r}
+GreenBay_FoxRiver_sf_locs <- sf::st_join(
+  GreenBay_FoxRiver_sf_locs, 
+  hydroloom::st_compatibalize(dplyr::select(nhdplus_data$CatchmentSP, featureid),
+                              GreenBay_FoxRiver_sf_locs))
+
+# NOTE that featureid and comid are the same!!
+all(GreenBay_FoxRiver_sf_locs$COMID == GreenBay_FoxRiver_sf_locs$featureid)
+
+(linear_references <- hydroloom::index_points_to_lines(
+  nhdplus_data$NHDFlowline_Network,
+  GreenBay_FoxRiver_sf_locs))
+
+GreenBay_FoxRiver_sf_locs <- dplyr::bind_cols(GreenBay_FoxRiver_sf_locs, linear_references)
+```
+
+We can take this one step further by indexing points to waterbodies! The
+return here tells us what waterbody our locations are near or within.
+For on-network waterbodies, it will also include the outlet flowling for
+each waterbody.
+
+```{r}
+all_wb <- dplyr::bind_rows(dplyr::select(nhdplus_data$NHDWaterbody, wbid = comid),
+                           dplyr::select(nhdplus_data$NHDArea, wbid = comid))
+
+(waterbody_indexes <- hydroloom::index_points_to_waterbodies(
+  sf::st_transform(all_wb, 5070), 
+  GreenBay_FoxRiver_sf_locs, 
+  flines = nhdplus_data$NHDFlowline_Network, 
+  search_radius = units::as_units(1000, "m")))
+```
+
+```{r}
+par(mar=c(0,0,0,0))
+nhdplusTools::plot_nhdplus(bbox = sf::st_bbox(GreenBay_FoxRiver_sf), 
+                           cache_data = tempfile(fileext = ".rds"))
+plot(sf::st_transform(all_wb[all_wb$wbid %in% waterbody_indexes$near_wbid,], 
+                      3857),
+     add = TRUE, 
+     col = "darkblue", border = NA)
+plot(sf::st_transform(sf::st_geometry(GreenBay_FoxRiver_sf_locs), 3857), add = TRUE, col = "white")
+```
+
+There's much much more where that came from. See the pkgdown sites for
+[nhdplusTools](https://doi-usgs.github.io/nhdplusTools/) and
+[hydroloom](https://doi-usgs.github.io/hydroloom/index.html) for more!
+
+## Accessing watershed information for sites
+
+### Discover what StreamCat metrics we might want to use
+
+More to come here...
+
+### Discover land cover of waterhsheds for sites
+
+We'll pull in all the NLCD categories at the local catchment level for
+each location
+
+```{r}
+library(ggplot2)
+GB_FR_NLCD <- sc_nlcd(year='2019', aoi='cat', comid=GreenBay_FoxRiver_sf_locs$COMID)
+
+
+GB_FR_Urb <- GB_FR_NLCD |> 
+  dplyr::mutate(Pct_Urbanized = pcturbop2019cat+pcturbmd2019cat+pcturblo2019cat+pcturbhi2019cat) |> 
+  dplyr::select(comid,Pct_Urbanized)
+GB_FR_Urb
+```
+
+### Visualize urbanization for local catchment for each location
+
+```{r}
+ggplot(GB_FR_Urb, aes(x=Pct_Urbanized)) + 
+  geom_density()
+```
+
+### Pull in data for modeling using `StreamCatTools`
+
+Now we'll just demonstrate pulling in watershed data that we might use
+in a modeling exercise as spatial covariates
+
+```{r}
+ws_data <- sc_get_data(metric='fert,nsurp,nani,manure,IWI', aoi='cat,ws', comid=GreenBay_FoxRiver_sf_locs$COMID)
+```
+
+# Example Use Case 1: Building Statistical Models
+
+## Spatial Dependence
+
+-   For spatial data, nearby observations tend to be more similar than
+    distant observations
+
+-   This phenomena is called *spatial dependence* and can be built into
+    statistical models
+
+-   The benefits of incorporating spatial dependence are *significant*
+    and include:
+
+    -   More realistic characterization of ecological drivers
+    -   More precise predictions at unobserved locations
+
+## The `spmodel` R package
+
+-   The `spmodel` R package makes spatial models accessible via
+    straightforward extensions to common modeling functions like `lm()`
+    and `glm()`
+-   Spatial dependence is based on Euclidean (straight-line) distance
+-   Learn more at <https://usepa.github.io/spmodel/>
+
+## The `SSN2` R package
+
+-   Like `spmodel`, `SSN2` extends common modeling functions like `lm()`
+    and `glm()`
+-   Spatial dependence is based on stream network distance
+    (flow-connected, flow-unconnected)
+-   `SSN2` is an updated version of `SSN` (`SSN` has been archived)
+-   Learn more at <https://usepa.github.io/SSN2/>
+
+# Example Use Case 2: Clean Water Act (CWA) Section 303(d) Assessments Part A
+
+TADA_MakeSpatial(), TADA_GetATTAINS(), TADA_ViewATTAINS()
+
+## TADA Module 2: Geospatial Functions
+
+Additional functions and a more detailed example workflow is available
+here: <https://usepa.github.io/EPATADA/articles/TADAModule2.html>
+
+## CWA Assessment Process
+
+We do not have time to cover the full process today. Let's focus on
+geospatial aspects!
+
+[Integrated Reporting Memoranda under CWA Sections 303(d), 305(b) and
+314](https://www.epa.gov/tmdl/Integrated%20Reporting%20Guidance%20under%20CWA%20Sections%20303%28d%29%2C%20305%28b%29%20and%20314).
+
+## What are Assessment Units?
+
+Geospatial areas for analysis. Let's assign data to those units!
+
+CWA assessment determinations are made by assessment unit, meaning the
+entire assessment unit is assessed as either meeting or not meeting
+water quality standards (i.e., thresholds or criteria) for all
+designated uses.
+
+## How are assessment units delineated?
+
+Assessment units are typically delineated by using watershed-oriented
+collections of stream reaches, often broken down by physical features
+like waterfalls, bridge crossings, or changes in land use, to analyze
+water quality impairments within a specific area, ensuring data
+homogeneity and spatial clarity within the assessment unit.
+
+-   Existing Assessment Units are available from ATTAINS geospatial
+    services
+
+## Associating ATTAINS Assessment Units with WQP Monitoring Locations
+
+One of the first steps in the CWA assessment process is to define
+Assessment Units and associate data with them. A major source for water
+quality data is the WQP.
+
+## Associating ATTAINS Assessment Units with WQP Monitoring Locations
+
+-   Assessment Units: state or tribal waterbody geospatial features
+    -   These may be lines, areas or points
+-   Water Quality Portal Monitoring Locations
+    -   These are points
+
+## TADA_GetATTAINS()
+
+-   Automates matching of WQP monitoring locations with ATTAINS
+    assessment units that fall within (intersect) the same NHDPlus
+    catchment
+    ([details](https://usepa.github.io/EPATADA/articles/TADAModule2.html))
+-   The function uses high resolution NHDPlus catchments by default
+    because 80% of state submitted assessment units in ATTAINS were
+    developed based on high res NHD; users can select med-res if
+    applicable to their use case
+
+```{r}
+WQP_with_ATTAINSonly <- TADA_GetATTAINS(GreenBay_FoxRiver_Subset, fill_catchments = FALSE, return_sf = TRUE)
+```
+
+## TADA_ViewATTAINS()
+
+-   Allows for viewing the
+
+## Challenges with Automated Approach
+
+-   Certain NHDPlus high res catchments overlap multiple ATTAINS
+    assessment units (state submitted hydrography) which means the sites
+    are assigned to both AUs in the current functions. Another challenge
+    is that the WQP sites are not always accurate (imprecise
+    coordinates). WQP location metadata may also be helpful for
+    matching/QAQC'ing waterbody names with ATTAINS waterbody names
+    instead of relying solely on the lat/long and geospatial/mapping
+    information. Users must manually review associations for accuracy.
+
+## Using all available data from the WQP
+
+Finally, some waterbodies have data available in the WQP or from other
+sources, but there are no existing Assessment Units in ATTAINS for them.
+In the next section, we will share a way to create AU's using NHDPlus
+high resolution catchments.
+
+# Example Use Case 2: Clean Water Act (CWA) Section 303(d) Assessments Part B
+
+nhdplusTools, TADA::fetchNHD(), TADA_GetATTAINS(), TADA_ViewATTAINS()
+
+## Creating new AUs to assess additional areas (leveraging USGS's nhdplusTools and TADA geospatial functions)
+
+TADA has included a way to do this using TADA_GetATTAINS()
+fill_catchments function input. This is included for exploratory
+purposes only. In theory, states and tribal nations could use the high
+res catchments as new assessment unit polygons to assess additional
+areas where there is WQP data but no Assessment Unit yet in ATTAINS, but
+that process is outside of TADA.
+
+## Creating new AUs to assess additional areas
+
+For WQP monitoring sites that DO NOT overlap an existing ATTAINS feature
+(neither ATTAINS NHDPlus high res catchment snap shot or state submitted
+points/lines/polys), there is nothing to use from ATTAINS because these
+are areas where there is WQP data but no ATTAINS Assessment Unit yet.
+
+## Creating new AUs to assess additional areas
+
+For these, we implemented a solution using NHDPlusTools to pull in
+either NHDPlus high res or med res catchments (user can choose, but high
+res is the default) and match those with the WQP sites & create new IDs
+(essentially creating new AUs that are the catchments that intersect
+these WQP sites).
+
+```{r}
+WQP_withATTAINSandNHDPluscatchments <- TADA_GetATTAINS(GreenBay_FoxRiver_Subset, fill_catchments = TRUE, return_sf = TRUE)
+```
+
+# Visualizing Water Quality Issues
+
+-   Placeholder: Now that we have data assigned to watershed/waterbodies
+    (Assessment Units). Let's showcase how to visualize water quality
+    issues for a few characteristics of interest on a map and in figures
+    using available packages.
+
+# Conclusion
+
+```{r}
+end.time <- Sys.time()
+
+end.time - start.time
+```
+
+# Contribute
+
+Note: TADA is still under development. New functionality is added
+weekly, and sometimes we need to make bug fixes in response to tester
+and user feedback. We appreciate your feedback, patience, and interest
+in these helpful tools.
+
+If you are interested in contributing to TADA development, more
+information is available at:
+
+<https://usepa.github.io/EPATADA/articles/CONTRIBUTING.html>
+
+We welcome collaboration with external partners.
+
+Contribute to EPATADA in a way that helps elevate work you have already
+done, broadens the user base of the package, or improves the resource
+for all!
+
+**Thank you to our workshop contributors!**
+
+-   EPA: Cristina Mullin (mullin.cristina\@epa.gov), Marc Weber, Hillary
+    Marler, Kenny Wong, Michael Dumelle, Shelly Thawley
+
+-   USGS: Dave Blodgett

From d8f5a3cade056b59bd0921075756ad47ce3ecd4d Mon Sep 17 00:00:00 2001
From: cristinamullin <46969696+cristinamullin@users.noreply.github.com>
Date: Fri, 14 Mar 2025 16:27:24 -0400
Subject: [PATCH 2/7] add depends

- StreamCatTools
---
 DESCRIPTION                             |  3 ++-
 vignettes/GeospatialDataIntegration.Rmd | 10 +++++-----
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/DESCRIPTION b/DESCRIPTION
index 3a64720ce..9544d1c3c 100644
--- a/DESCRIPTION
+++ b/DESCRIPTION
@@ -99,7 +99,8 @@ Suggests:
     yaml, 
     remotes
 Remotes:
-    DOI-USGS/dataRetrieval@develop
+    DOI-USGS/dataRetrieval@develop,
+    USEPA/StreamCatTools
 VignetteBuilder: knitr, rmarkdown
 Language: en-US
 Config/testthat/edition: 3
diff --git a/vignettes/GeospatialDataIntegration.Rmd b/vignettes/GeospatialDataIntegration.Rmd
index 3223735b7..1b2dc7246 100644
--- a/vignettes/GeospatialDataIntegration.Rmd
+++ b/vignettes/GeospatialDataIntegration.Rmd
@@ -105,7 +105,7 @@ dependencies for this workflow will be downloaded automatically.
 
 ```{r devsetup, include = F, eval = T}
 remotes::install_github("USEPA/EPATADA", 
-                        ref = "prerelease",
+                        ref = "geospatial-vignette",
                         dependencies = TRUE,
                         force = TRUE)
 library(EPATADA)
@@ -428,8 +428,7 @@ We'll pull in all the NLCD categories at the local catchment level for
 each location
 
 ```{r}
-library(ggplot2)
-GB_FR_NLCD <- sc_nlcd(year='2019', aoi='cat', comid=GreenBay_FoxRiver_sf_locs$COMID)
+GB_FR_NLCD <- StreamCatTools::sc_nlcd(year='2019', aoi='cat', comid=GreenBay_FoxRiver_sf_locs$COMID)
 
 
 GB_FR_Urb <- GB_FR_NLCD |> 
@@ -441,7 +440,7 @@ GB_FR_Urb
 ### Visualize urbanization for local catchment for each location
 
 ```{r}
-ggplot(GB_FR_Urb, aes(x=Pct_Urbanized)) + 
+ggplot2::ggplot(GB_FR_Urb, aes(x=Pct_Urbanized)) + 
   geom_density()
 ```
 
@@ -603,7 +602,8 @@ res is the default) and match those with the WQP sites & create new IDs
 these WQP sites).
 
 ```{r}
-WQP_withATTAINSandNHDPluscatchments <- TADA_GetATTAINS(GreenBay_FoxRiver_Subset, fill_catchments = TRUE, return_sf = TRUE)
+# # need to troubleshoot why this is not working
+# WQP_withATTAINSandNHDPluscatchments <- TADA_GetATTAINS(GreenBay_FoxRiver_Subset, fill_catchments = TRUE, return_sf = TRUE)
 ```
 
 # Visualizing Water Quality Issues

From 8da4e079e86090cf81d9dc708596f091a7e20072 Mon Sep 17 00:00:00 2001
From: Weber <Weber.Marc@epa.gov>
Date: Fri, 21 Mar 2025 15:07:49 -0700
Subject: [PATCH 3/7] A couple suggested updates

---
 vignettes/GeospatialDataIntegration.Rmd | 25 +++++++++++++++----------
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/vignettes/GeospatialDataIntegration.Rmd b/vignettes/GeospatialDataIntegration.Rmd
index 1b2dc7246..2b6d07176 100644
--- a/vignettes/GeospatialDataIntegration.Rmd
+++ b/vignettes/GeospatialDataIntegration.Rmd
@@ -146,16 +146,16 @@ in the WQP for this HUC 12 in Wisconsin for the last 5 years.
 WATERSHED: City of Green Bay-Fox River (040302040405)
 
 ```{r getdata}
-# # Uncomment to query the WQP
-# GreenBay_FoxRiver <- TADA_DataRetrieval(
-#   statecode = "WI",
-#   startDate = "2015-01-01",
-#   endDate = "2024-12-30",
-#   huc = c("040302040405"),
-#   applyautoclean = TRUE
-# )
+# Uncomment to query the WQP
+GreenBay_FoxRiver <- TADA_DataRetrieval(
+  statecode = "WI",
+  startDate = "2015-01-01",
+  endDate = "2024-12-30",
+  huc = c("040302040405"),
+  applyautoclean = TRUE
+)
 
-GreenBay_FoxRiver <- NMCWorkshopData::GreenBay_FoxRiver
+# GreenBay_FoxRiver <- NMCWorkshopData::GreenBay_FoxRiver
 ```
 
 **Wrangle**
@@ -417,10 +417,15 @@ There's much much more where that came from. See the pkgdown sites for
 [hydroloom](https://doi-usgs.github.io/hydroloom/index.html) for more!
 
 ## Accessing watershed information for sites
+We can access watershed information for each unique site location, getting both the landscape data for local catchment or the full upstream watershed for each particular site using [StreamCatTools](https://usepa.github.io/StreamCatTools/index.html)
 
 ### Discover what StreamCat metrics we might want to use
 
-More to come here...
+```{r}
+metrics <- sc_get_params(param='name')
+print(paste0('A selection of available StreamCat metrics include: ',paste(metrics[1:10],collapse = ', ')))
+```
+
 
 ### Discover land cover of waterhsheds for sites
 

From 0bd74bed289d1b97accbadc5abaa4327d593891e Mon Sep 17 00:00:00 2001
From: hillarymarler <Marler.Hillary@epa.gov>
Date: Mon, 31 Mar 2025 09:26:10 -0400
Subject: [PATCH 4/7] Update GeospatialDataIntegration.Rmd

---
 vignettes/GeospatialDataIntegration.Rmd | 51 ++++++++++++++++++++-----
 1 file changed, 41 insertions(+), 10 deletions(-)

diff --git a/vignettes/GeospatialDataIntegration.Rmd b/vignettes/GeospatialDataIntegration.Rmd
index 2b6d07176..4c0d1d1e2 100644
--- a/vignettes/GeospatialDataIntegration.Rmd
+++ b/vignettes/GeospatialDataIntegration.Rmd
@@ -152,7 +152,8 @@ GreenBay_FoxRiver <- TADA_DataRetrieval(
   startDate = "2015-01-01",
   endDate = "2024-12-30",
   huc = c("040302040405"),
-  applyautoclean = TRUE
+  applyautoclean = TRUE,
+  ask = FALSE
 )
 
 # GreenBay_FoxRiver <- NMCWorkshopData::GreenBay_FoxRiver
@@ -164,33 +165,60 @@ Now, let's use EPATADA functions to review, visualize, and whittle the
 returned WQP data down to include only results that are applicable to
 our water quality analysis and area of interest.
 
+**Autoclean**
+
+In the default arguments for TADA_DataRetrieval, applyautoclean = TRUE.
+This runs TADA_AutoClean on the newly retrieved data frame.
+TADA_AutoClean is a powerful function which performs a variety of tasks
+including: (1) creating new "TADA" prefixed columns and and capitalizing
+their contents to reduce case sensitivity issues, (2) converts special
+characters in value columns, (3) converts latitude and longitude values
+to numeric, (4) replaces "meters" with "m", (5) replaces deprecated
+characteristic names with current WQX names, (6) harmonizes result and
+detection limit units to WQX, TADA or user supplied target units, (7)
+converts depths to meters, and (8) creates the column TADA.ComparableID
+by concatenating characteristic name, result sample fraction, method
+speciation, and result measure unit.
+
 **Flag and remove duplicate results from a single organization**
 
+We can identify data records uploaded by the same organization with the
+same date, time, monitoring location, activity type, characteristic
+name, fraction, taxonomic name, depth columns, and result value and
+flags them as potential duplicates. The data user must determine if the
+data records are unique or represent overlap that could cause issues in
+analysis. For this example, we will retain only results flagged as
+"Unique".
+
 ```{r single-org-dups}
+# find duplicate results submitted by single org
 GreenBay_FoxRiver <- TADA_FindPotentialDuplicatesSingleOrg(GreenBay_FoxRiver)
 
+# retain unique flagged results
 GreenBay_FoxRiver <- dplyr::filter(GreenBay_FoxRiver, TADA.SingleOrgDup.Flag == "Unique")
 ```
 
-**Autoclean**
+**Censored results**
 
-```{r autoclean}
-GreenBay_FoxRiver <- TADA_AutoClean(GreenBay_FoxRiver)
-```
-
-**Handle censored results**
+TADA provides some simple methods for dealing with censored results,
+such as using multiplying the detection limit by a user supplied value
+or leaving the result as is.
 
 ```{r censored}
+# substitude nondetects with 0.5 detection limit, leave overdetects as is
 GreenBay_FoxRiver <- TADA_SimpleCensoredMethods(GreenBay_FoxRiver, nd_method = "multiplier", nd_multiplier = 0.5, od_method = "as-is", od_multiplier = "null")
 ```
 
 **Flag and remove duplicates from multiple organizations**
 
-Two organizations sometimes submit the same exact data to WQP.
+Two organizations sometimes submit the same exact data to WQP. Filtering
+out these duplicates can prevent issues in analysis.
 
 ```{r multi-org-dups, eval = F}
+# find potential dups multiple orgs
 GreenBay_FoxRiver <- TADA_FindPotentialDuplicatesMultipleOrgs(GreenBay_FoxRiver)
 
+# filter out 
 GreenBay_FoxRiver <- dplyr::filter(GreenBay_FoxRiver, TADA.ResultSelectedMultipleOrgs == "Y")
 ```
 
@@ -417,7 +445,11 @@ There's much much more where that came from. See the pkgdown sites for
 [hydroloom](https://doi-usgs.github.io/hydroloom/index.html) for more!
 
 ## Accessing watershed information for sites
-We can access watershed information for each unique site location, getting both the landscape data for local catchment or the full upstream watershed for each particular site using [StreamCatTools](https://usepa.github.io/StreamCatTools/index.html)
+
+We can access watershed information for each unique site location,
+getting both the landscape data for local catchment or the full upstream
+watershed for each particular site using
+[StreamCatTools](https://usepa.github.io/StreamCatTools/index.html)
 
 ### Discover what StreamCat metrics we might want to use
 
@@ -426,7 +458,6 @@ metrics <- sc_get_params(param='name')
 print(paste0('A selection of available StreamCat metrics include: ',paste(metrics[1:10],collapse = ', ')))
 ```
 
-
 ### Discover land cover of waterhsheds for sites
 
 We'll pull in all the NLCD categories at the local catchment level for

From f842ddaeb44cd5b6634a2012a8695c53fd9bbad1 Mon Sep 17 00:00:00 2001
From: hillarymarler <Marler.Hillary@epa.gov>
Date: Mon, 7 Apr 2025 10:52:29 -0400
Subject: [PATCH 5/7] Update GeospatialDataIntegration.Rmd

add install streamcatools to vignette
---
 vignettes/GeospatialDataIntegration.Rmd | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/vignettes/GeospatialDataIntegration.Rmd b/vignettes/GeospatialDataIntegration.Rmd
index 4c0d1d1e2..e09140c28 100644
--- a/vignettes/GeospatialDataIntegration.Rmd
+++ b/vignettes/GeospatialDataIntegration.Rmd
@@ -101,7 +101,9 @@ library(remotes)
 ```
 
 Next, install and load EPATADA using the remotes package. All other
-dependencies for this workflow will be downloaded automatically.
+dependencies for this workflow will be downloaded automatically. We will
+also need to install StreamCatTools for some steps later in the
+workflow.
 
 ```{r devsetup, include = F, eval = T}
 remotes::install_github("USEPA/EPATADA", 
@@ -116,7 +118,16 @@ remotes::install_github("USEPA/EPATADA",
                         ref = "develop",
                         dependencies = TRUE,
                         force = TRUE)
+
+remotes::install_github("USEPA/StreamCatTools", 
+                        ref = "master",
+                        dependencies = TRUE,
+                        force = TRUE)
+
 library(EPATADA)
+library(StreamCatTools)
+
+
 ```
 
 It's go time! Let's time our process.

From 223ca6ffee8643f37d4f668449118227c975478e Mon Sep 17 00:00:00 2001
From: cristinamullin <46969696+cristinamullin@users.noreply.github.com>
Date: Mon, 5 May 2025 17:08:11 -0400
Subject: [PATCH 6/7] finish draft

---
 DESCRIPTION                             |  4 +-
 vignettes/GeospatialDataIntegration.Rmd | 75 ++++++++++++-------------
 2 files changed, 37 insertions(+), 42 deletions(-)

diff --git a/DESCRIPTION b/DESCRIPTION
index 0b55c722b..b05039471 100644
--- a/DESCRIPTION
+++ b/DESCRIPTION
@@ -80,7 +80,7 @@ Imports:
     DT,
     dataRetrieval,
     dbscan,
-    janitor,
+    janitor
 Depends: 
     R (>= 3.5.0)
 Suggests:
@@ -103,7 +103,7 @@ Suggests:
     remotes
 Remotes:
     DOI-USGS/dataRetrieval@develop,
-    USEPA/StreamCatTools
+    USEPA/StreamCatTools@master
 VignetteBuilder: knitr, rmarkdown
 Language: en-US
 Config/testthat/edition: 3
diff --git a/vignettes/GeospatialDataIntegration.Rmd b/vignettes/GeospatialDataIntegration.Rmd
index e09140c28..7d178969f 100644
--- a/vignettes/GeospatialDataIntegration.Rmd
+++ b/vignettes/GeospatialDataIntegration.Rmd
@@ -126,8 +126,6 @@ remotes::install_github("USEPA/StreamCatTools",
 
 library(EPATADA)
 library(StreamCatTools)
-
-
 ```
 
 It's go time! Let's time our process.
@@ -241,8 +239,11 @@ unique(GreenBay_FoxRiver$TADA.ResultMeasureValueDataTypes.Flag)
 sum(is.na(GreenBay_FoxRiver$TADA.ResultMeasureValue))
 
 GreenBay_FoxRiver <- TADA_AutoFilter(GreenBay_FoxRiver)
-# Topic for discussion: Would users like results with NA units to be dealt with in TADA_AutoFilter or in the TADA_ConvertResultUnits()? These have not yet been addressed.
+```
+
+Check to make sure there are no more NA's in TADA.ResultMeasureValue.
 
+```{r recheck}
 unique(GreenBay_FoxRiver$TADA.ResultMeasureValueDataTypes.Flag)
 
 sum(is.na(GreenBay_FoxRiver$TADA.ResultMeasureValue))
@@ -282,8 +283,7 @@ GreenBay_FoxRiver_Counts <- TADA_FieldValuesTable(GreenBay_FoxRiver, field = "TA
 DT::datatable(GreenBay_FoxRiver_Counts, fillContainer = TRUE)
 ```
 
-**Filter to focus on frequently monitored characteristics in example
-data**
+Filter to focus on frequently monitored characteristics in example data
 
 ```{r subset}
 GreenBay_FoxRiver_Subset <- GreenBay_FoxRiver %>%
@@ -320,14 +320,24 @@ TADA_TwoCharacteristicScatterplot(GreenBay_FoxRiver_Subset, id_cols = "TADA.Comp
 
 **Generate map**
 
-```{r map}
+```{r TADA Overview Map}
 TADA_OverviewMap(GreenBay_FoxRiver_Subset)
+```
+
+**Clean up coordinate issues**
 
+```{r Coordinate issues}
+# Change coordinate sign if appropriate
 GreenBay_FoxRiver = TADA_FlagCoordinates(GreenBay_FoxRiver_Subset, clean_outsideUSA = "change sign", clean_imprecise = FALSE)
 
-# CM note for TADA team discussion: Should results with NA lat/long be addressed within TADA_FlagCoordinates? For example, this df has NA lons from USGS that must be addressed before TADA_MakeSpatial can be run... 
-# sum(is.na(GreenBay_FoxRiver$LongitudeMeasure))
+# This df has NA lons from USGS that must be addressed before TADA_MakeSpatial can be run... 
+sum(is.na(GreenBay_FoxRiver_Subset$LongitudeMeasure))
+
+# Remove rows with NA lons from df
 GreenBay_FoxRiver_Subset <- GreenBay_FoxRiver_Subset[!is.na(GreenBay_FoxRiver_Subset$LongitudeMeasure),]
+
+# Recheck
+sum(is.na(GreenBay_FoxRiver_Subset$LongitudeMeasure))
 ```
 
 # Geospatial data integration
@@ -341,8 +351,7 @@ sf object.
 GreenBay_FoxRiver_sf = TADA_MakeSpatial(GreenBay_FoxRiver_Subset)
 ```
 
-Then create a unique identifier based on shared lat long values and
-filter to just the 25 unique locations.
+Then create a unique identifier based on shared lat long values and filter to just the 25 unique locations.
 
 ```{r}
 GreenBay_FoxRiver_sf$latlon <- paste0(GreenBay_FoxRiver_sf$TADA.LongitudeMeasure, GreenBay_FoxRiver_sf$TADA.LatitudeMeasure)
@@ -465,11 +474,11 @@ watershed for each particular site using
 ### Discover what StreamCat metrics we might want to use
 
 ```{r}
-metrics <- sc_get_params(param='name')
+metrics <- StreamCatTools::sc_get_params(param = 'metric_names')
 print(paste0('A selection of available StreamCat metrics include: ',paste(metrics[1:10],collapse = ', ')))
 ```
 
-### Discover land cover of waterhsheds for sites
+### Discover land cover of watersheds for sites
 
 We'll pull in all the NLCD categories at the local catchment level for
 each location
@@ -487,8 +496,8 @@ GB_FR_Urb
 ### Visualize urbanization for local catchment for each location
 
 ```{r}
-ggplot2::ggplot(GB_FR_Urb, aes(x=Pct_Urbanized)) + 
-  geom_density()
+ggplot2::ggplot(GB_FR_Urb, ggplot2::aes(x=Pct_Urbanized)) + 
+  ggplot2::geom_density()
 ```
 
 ### Pull in data for modeling using `StreamCatTools`
@@ -497,10 +506,10 @@ Now we'll just demonstrate pulling in watershed data that we might use
 in a modeling exercise as spatial covariates
 
 ```{r}
-ws_data <- sc_get_data(metric='fert,nsurp,nani,manure,IWI', aoi='cat,ws', comid=GreenBay_FoxRiver_sf_locs$COMID)
+ws_data <- StreamCatTools::sc_get_data(metric='fert,nsurp,nani,manure,IWI', aoi='cat,ws', comid=GreenBay_FoxRiver_sf_locs$COMID)
 ```
 
-# Example Use Case 1: Building Statistical Models
+# Building Statistical Models
 
 ## Spatial Dependence
 
@@ -533,10 +542,6 @@ ws_data <- sc_get_data(metric='fert,nsurp,nani,manure,IWI', aoi='cat,ws', comid=
 -   `SSN2` is an updated version of `SSN` (`SSN` has been archived)
 -   Learn more at <https://usepa.github.io/SSN2/>
 
-# Example Use Case 2: Clean Water Act (CWA) Section 303(d) Assessments Part A
-
-TADA_MakeSpatial(), TADA_GetATTAINS(), TADA_ViewATTAINS()
-
 ## TADA Module 2: Geospatial Functions
 
 Additional functions and a more detailed example workflow is available
@@ -583,7 +588,7 @@ quality data is the WQP.
 -   Water Quality Portal Monitoring Locations
     -   These are points
 
-## TADA_GetATTAINS()
+## TADA_GetATTAINS() Part A
 
 -   Automates matching of WQP monitoring locations with ATTAINS
     assessment units that fall within (intersect) the same NHDPlus
@@ -596,11 +601,9 @@ quality data is the WQP.
 
 ```{r}
 WQP_with_ATTAINSonly <- TADA_GetATTAINS(GreenBay_FoxRiver_Subset, fill_catchments = FALSE, return_sf = TRUE)
-```
-
-## TADA_ViewATTAINS()
 
--   Allows for viewing the
+TADA_ViewATTAINS(WQP_with_ATTAINSonly)
+```
 
 ## Challenges with Automated Approach
 
@@ -617,16 +620,14 @@ WQP_with_ATTAINSonly <- TADA_GetATTAINS(GreenBay_FoxRiver_Subset, fill_catchment
 
 Finally, some waterbodies have data available in the WQP or from other
 sources, but there are no existing Assessment Units in ATTAINS for them.
-In the next section, we will share a way to create AU's using NHDPlus
-high resolution catchments.
 
-# Example Use Case 2: Clean Water Act (CWA) Section 303(d) Assessments Part B
+# TADA_GetATTAINS() Part B
 
-nhdplusTools, TADA::fetchNHD(), TADA_GetATTAINS(), TADA_ViewATTAINS()
+See: nhdplusTools, TADA::fetchNHD(), TADA_GetATTAINS(), TADA_ViewATTAINS()
 
 ## Creating new AUs to assess additional areas (leveraging USGS's nhdplusTools and TADA geospatial functions)
 
-TADA has included a way to do this using TADA_GetATTAINS()
+TADA has included a way to explore this using TADA_GetATTAINS()
 fill_catchments function input. This is included for exploratory
 purposes only. In theory, states and tribal nations could use the high
 res catchments as new assessment unit polygons to assess additional
@@ -649,18 +650,12 @@ res is the default) and match those with the WQP sites & create new IDs
 these WQP sites).
 
 ```{r}
-# # need to troubleshoot why this is not working
-# WQP_withATTAINSandNHDPluscatchments <- TADA_GetATTAINS(GreenBay_FoxRiver_Subset, fill_catchments = TRUE, return_sf = TRUE)
-```
+WQP_withATTAINSandNHDPluscatchments <- TADA_GetATTAINS(GreenBay_FoxRiver_Subset, fill_catchments = TRUE, return_sf = TRUE)
 
-# Visualizing Water Quality Issues
-
--   Placeholder: Now that we have data assigned to watershed/waterbodies
-    (Assessment Units). Let's showcase how to visualize water quality
-    issues for a few characteristics of interest on a map and in figures
-    using available packages.
+TADA_ViewATTAINS(WQP_withATTAINSandNHDPluscatchments)
+```
 
-# Conclusion
+# The end!
 
 ```{r}
 end.time <- Sys.time()

From 9ecc1ed282ce136e051ac9afc12c25ebe2bdfdf2 Mon Sep 17 00:00:00 2001
From: cristinamullin <46969696+cristinamullin@users.noreply.github.com>
Date: Mon, 5 May 2025 17:20:18 -0400
Subject: [PATCH 7/7] add StreamCatTools to suggests

---
 DESCRIPTION | 1 +
 1 file changed, 1 insertion(+)

diff --git a/DESCRIPTION b/DESCRIPTION
index b05039471..5084ad873 100644
--- a/DESCRIPTION
+++ b/DESCRIPTION
@@ -84,6 +84,7 @@ Imports:
 Depends: 
     R (>= 3.5.0)
 Suggests:
+    StreamCatTools, 
     lwgeom,
     RColorBrewer,
     rmarkdown,