Ecological Informatics: Vicky Anand, Bakimchandra Oinam, Silke Wieprecht
Ecological Informatics: Vicky Anand, Bakimchandra Oinam, Silke Wieprecht
Ecological Informatics
journal homepage: www.elsevier.com/locate/ecolinf
A R T I C L E I N F O A B S T R A C T
Keywords: Water quality analysis is a vital component of the water resources management and has to be undertaken
Sentinel promptly to make sure environmental regulations are being followed and to eliminate any pollution that could
ResourceSat harm the ecosystem. The main objective of this study to retrieve and map the water quality parameters from
Support vector machine
Sentinel-2 and ResourceSat-2 [Linear Imaging Self-Scanning Sensor (LISS)–IV] multi-spectral satellite data, using
Random forest
Regression
Support Vector Machines (SVM), Random Forests (RF), and Multi-Linear regression (MLR) models. This study
Machine learning represents the first attempt to demonstrate the applicability and performance of high-spatial resolution
ResourceSat-2 remote sensing satellite’s LISS-4 sensor, which operates in three spectral bands in the Visible and
Near Infrared Region (VNIR), to predict water quality. Spectral bands of each satellite were used as independent
parameter to generate the algorithms for pH, Dissolved Oxygen (DO), Total Suspended Solids (TSS) and Total
Dissolved Solids (TDS). The model performance was evaluated based on coefficient of determination (R2), Mean
Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), and the Root Mean Square Error (RMSE) sta-
tistical indices. The result of this study indicates that the SVM yielded the highest accuracy followed by the RF
and MLR. The R2, MAE, MAPE and RMSE ranged between 0.78 and 0.99, 0.049–0.24, 0.01–10.9 % and
0.05–0.28 respectively for all the four SVM models across both the sensors. Based on the spatial trend Sentinel-2
was found to be slightly superior to the ResourceSat-2 (LISS-IV) for the estimation of water quality parameters
owing to its superior spectral and radiometric resolution, nevertheless ResourceSat-2 (LISS-IV) has its own
advantage in terms of high spatial resolution. The results of this study highlight the high potential of machine
learning models in conjunction with multispectral satellite images to manage water quality.
* Corresponding author at: Department of Hydraulic Engineering and Water Resources Management, Institute for Modeling Hydraulic and Environmental Systems,
University of Stuttgart, Stuttgart, Germany.
E-mail addresses: [email protected] (V. Anand), [email protected] (B. Oinam), [email protected] (S. Wieprecht).
https://doi.org/10.1016/j.ecoinf.2024.102868
Received 7 August 2024; Received in revised form 22 October 2024; Accepted 24 October 2024
Available online 28 October 2024
1574-9541/© 2024 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
V. Anand et al. Ecological Informatics 84 (2024) 102868
extended retention duration, nutrient enrichment in lakes can be lethal Aquatic remote sensing requires consideration of atmospheric correc-
from both natural and man-made causes. Numerous factors influence the tion, and while bio-optical modelling has made significant advance-
amount of nutrients in lakes. Changes in the levels of dissolved oxygen, ments in recent years, it is severely limited by data restrictions and
nitrogen, phosphorus, and other chemicals can lead to eutrophication challenges with atmospheric correction (Li et al., 2021). A portion of the
(Akinnawo, 2023). The increased biomass of phytoplankton, the growth scientific community has been exploring direct modelling techniques
of benthic and epiphytic algae, the loss of fisheries, the reduction of with satellite reflectance data. It is widely recognised that there can be a
habitat diversity, the increase in odour problems caused by an increase substantial correlation between remote sensing techniques and optically
in mineral contents, and the depletion of dissolved oxygen in lakes are active parameters (Zhao et al., 2022; Zhu et al., 2022). Yet, due of their
all among the high potential effects of eutrophication (Costa et al., intricate interaction between the different components in water, non-
2018). As a result of this, one specific issue with water quality moni- optically active parameters can also be quantitatively recovered by
toring is a complex one that requires analysing a lot of associated remote sensing (Lan et al., 2023; Zhao et al., 2022). Machine learning
measures of variables (Gholizadeh et al., 2016). An increase in anthro- algorithms have been used more often in the assessment of water quality
pogenic activities, including natural influences, is the cause of the high in recent years due to the advancements in the field of artificial intel-
variability among the variables (Mwanake et al., 2023). The thorough ligence (Peterson et al., 2020). In order to provide a generic and optimal
assessment of water quality at the regional and national levels is heavily method for water quality parameter identification, machine learning
reliant on data from water pollution monitoring (Swain and Sahoo, models can reveal underlying complex nonlinear interactions (Shams
2017). Assessing the water environment efficiently and precisely is et al., 2024). There is a steady increase in its use in modelling and
therefore vital. In-situ measurements have historically been able to detecting water quality (Arias-Rodriguez et al., 2020). Support Vector
provide exact water quality parameters by employing specific sampling Machine Regression (SVR) and Random Forest Regression (RF) are two
points; but, large-scale monitoring is limited by the time, labour, and popular machine learning techniques for evaluating the quality of water
high costs associated with this method (Schaeffer et al., 2013). (Peterson et al., 2020).
Furthermore, sampling techniques and laboratory processes may have Random Forest (RF), Artificial Neural Network (ANN), Multiple
an impact on the quality of field-based data (Lloyd et al., 2022). The Linear Regression (MLR), Ridge Regression (RR), Adaptive Boosting
spatial distribution and dynamic variations in water quality components (AdaBoost), Support Vector Regression (SVR), and eXtreme Gradient
have been reflected by remote sensing technology because of its ad- Boosting tree (XGBoost) are the machine learning models that are most
vantages in terms of both temporal and spatial coverage (Malahlela, frequently used for optical water quality parameters. Adusei et al.
2019; Shi et al., 2018). A variety of spaceborne sensors with visible, (2021) forecasted the pH, turbidity, dissolved oxygen (DO), alkalinity,
infrared, and microwave wavelengths can be utilised to monitor water and total dissolved solids (TDS) in the Owabi Dam reservoir using MLR,
quality because of high frequency data collecting and large-scale RF, SVM, and Sentinel satellite imagery. The research found that RF
coverage. Since remote sensing data rely on the weather, empirical al- performed better than MLR and SVM among the three algorithms uti-
gorithms based on remote sensing must be precisely calibrated with lised in this study. Kim et al. (2014) used random forest and support
respect to the research region (Jepsen et al., 2021). As, remote sensing vector regression (SVR) to estimate the concentrations of suspended
can give very cost-effective spatial and multitemporal information about particulate matter (SPM) and chlorophyll-a (Chl-a), two optical water
the uppermost layers of water bodies, it has been employed to monitor quality indicators, in coastal environments. They found that SVR per-
water quality parameters in various lakes and reservoirs. Integrative formed better than the other two machine learning techniques. Muhoyi
methods based on conventional field-based data collection and remote et al. (2022) developed multitemporal spatialized maps of the concen-
sensing are viewed as appropriate for water resource surveillance pro- trations of total suspended solids (TSS), total nitrogen (TN), total
grams since remote sensing data are publicly available (Muhoyi et al., phosphorus (TP), and chemical oxygen demand (COD) using empirical
2022). The benefits of extended time series, large-scale simultaneous equations and satellite data obtained from Sentinel-2. Zheng et al.
observation, and historical backtracking in satellite remote sensing (2024) used the XGBoost machine learning algorithm to demonstrate an
technology can compensate for the drawbacks of in-situ surveys and enhanced water quality index for the assessment of water quality
allow the detection of spatiotemporal fluctuations in water quality. Yopurga landfill, China. Using a machine learning approach, gradient
In order to enable swift detection of changes and trends in water boosting outperformed the random forest and support vector machines
quality indicators, modelling of water quality based on process-based for the prediction of water quality in Mirpurkhas, Sind (Abbas et al.,
hydrodynamic-water quality models have been widely applied 2024).
(Mardani et al., 2020; Mohammed et al., 2019). However, recent de- Among the most important water resources on Earth’s surface are
velopments in remote sensing sensors and prediction using artificial lakes and wetlands, which are regarded as important ecohydrological
intelligence, deep learning and machine learning based algorithms are units. Loktak Lake is home to a wide range of wetland plants and ani-
currently popular methods (Bui et al., 2020; Chen et al., 2020; Kong mals, some of which are internationally threatened, and is rich in
et al., 2019; Tang et al., 2023; Yin et al., 2021; Yu et al., 2024). For biodiversity. The lake is well-known for its phumdis, or floating islands,
simulating and forecasting temperature dynamics and constituent which are covered in flora and vary in size and thickness. According to
movement in surface water bodies such lakes, rivers, and estuaries, Anand et al. (2021) they are essentially heterogeneous masses of soil,
process-based hydrodynamic and water quality models have been used plant, and organic matter at different stages of decomposition. The
extensively (Mohammed et al., 2021). Process-based hydrodynamic and lake’s natural processes depend heavily on the phumdis. As a biological
water quality simulations are hard to calibrate and frequently need high sink of essential nutrients, phumdis play a crucial role in preventing
levels of expertise to implement due to their complex model structures eutrophication and algae blooms (LDA and WISA, 2002). Phumdis
and high computational requirements (Baracchini et al., 2020; Chen therefore regulate the lake’s nutrient dynamics and water quality (LDA
et al., 2016). This limits their use in water resource management. and WISA, 2002). Research has identified significant pollution problems
Furthermore, there are not many common uses for process-based models in the Loktak Lake due to high nitrite and nitrate concentrations, with
in near-real-time simulations in water source reservoirs. The applica- water quality index ranging from ‘very poor’ to ‘poor’ (Laishram et al.,
bility of process-based hydrodynamics and water quality models in 2022; Talukdar et al., 2024). There has been reduction in the area of
comprehending the physical mechanisms and variables affecting the herbaceous wetlands around the Loktak Lake and it is expected to
thermal dynamics and water quality properties of surface waterbodies is decline in the near future due to anthropogenic factors (Anand et al.,
undeniable. Empirical analysis and bio-optical analysis are two distinct 2021). Damming effect caused due to the barrage in the South-Eastern
types of techniques for assessing water quality indicators using data periphery of the lake has caused ecohydrological deficit in the Loktak
from satellite remote sensing (Dong et al., 2023; Sagan et al., 2020). Lake sub-catchment (Mahato et al., 2023). Over time, a number of
2
V. Anand et al. Ecological Informatics 84 (2024) 102868
studies have evaluated the water quality of Loktak Lake and have re- three spectral bands in the Visible and Near Infrared Region (VNIR)
ported on a range of indicators related to water quality (Kangabam et al., novel for prediction of water quality. The research focuses on dynamic
2017; Laishram et al., 2022; Mayanglambam and Neelam, 2020; Roy monitoring of water quality parameters through high resolution satellite
and Majumder, 2019). Large lakes like Loktak Lake make it impractical sensors over a large scale. This study was implemented in Loktak Lake
and expensive to conduct continuous field monitoring of multiple water located in the North-Eastern part of India. This research offers an
quality parameters. These studies have been restricted to in-situ mea- excellent scientific basis for enhancing lake pollution control, promoting
surements to give water quality metrics by using selected sampling water pollution reduction, and setting in effective operational methods
points. In the water bodies of North-Eastern India’s Inner Himalayan for monitoring water quality.
Ranges, there has not been any research done on the prediction of water
quality using multispectral satellite imagery and machine learning. 2. Materials and methodology
Loktak Lake being a Ramsar site in the Inner Himalayan range is a
partially or ungauged in terms of both water quality and quantity. The 2.1. Study area
size and the ecological dynamics of such large lake like Loktak Lake
possess serious challenges in continuous spatial-temporal gauging. Ma- Loktak Lake, a freshwater wetland system is situated in Northeastern
chine learning algorithms when coupled with satellite remote sensing India’s Inner Himalayan Ranges. The Loktak Lake lies between 24o 42′ N
can uncover underlying complicated connections to provide an - 93o 55′ E and 24o 25′ N - 93o 46′ E. The international significance of the
encompassing and efficient approach for evaluating water quality pa- lake has led to its designation under the Ramsar Convention (NWA,
rameters for a large area especially in the data scarce regions. The main 2009). According to the Montreux Record, Loktak Lake is one of the 48
objective of this research is to provide a dynamic method for predicting wetland areas in the world. Montreux record lists the wetland sites
the quality of water by employing machine learning algorithms and where changes in ecological character have occurred, are occurring, or
multi-spectral ResourceSat-2 and Sentinel-2 satellite data. This research are likely to occur as a result of technological developments, pollution or
also emphasises on the advantages and limitations of resolutions of other human interference. Although the primary body of water in the
satellites while comparing the Sentinel-2 with better spectral and lake is 287 km2, the catchment as a whole is 5020 km2 (Anand et al.,
radiometric resolution against ResourceSat-2 (LISS IV) with better 2020). The basin experiences 1350 mm of rain on average. The relative
spatial resolution. The current study is the first attempt to introduce the humidity varies from 51 to 81 % and the annual temperature ranges
applicability of deploying ResourceSat-2 remote sensing satellite’s high- from 12 ◦ C to 31 ◦ C (Directorate of Environment Government of Man-
resolution Linear Imaging Self Scanner (LISS-4) camera operating in ipur, 2013). Nine sub-catchments constitute the Loktak Lake catchment
3
V. Anand et al. Ecological Informatics 84 (2024) 102868
Table 1
Metadata related to multispectral satellite imageries.
Satellite/Mission Spatial Resolution Radiometric Resolution Bands used Central Wavelength Acquisition date
hydrologically: Khuga, Western, Nambul, Imphal, Kongba, Iril, Thoubal, 2.3. Normalised difference water index (NDWI)
Heirok, and Sekmai. The two sub-catchments that feed the lake directly
are the Nambul and the Western sub-catchments. Over time, untreated It ought to be emphasised that this lake’s extent varies over time and
sewage, fertilizers, and plastic pollutants are introduced into the lake by that it reaches its maximum size after the rainy season. Phumdis, also
the feeding rivers that flow past large towns and continuously growing known as floating islands, are heterogeneous masses of soil, vegetation,
farms. The effects of pollution on fish health and mortality are posing a and organic materials in varying stages of decomposition that float on
growing danger to the ecosystem of Loktak Lake. Removing the the water’s surface and whose positions fluctuate over time based on the
ecological stressors might be the most effective method to restore the intensity and direction of the wind. As a result of phumdi proliferation
wetland and prevent further loss and degradation (Thongam and Meitei, and movement, the area under open water continuously fluctuates. For
2021). The Nambul River, which is the primary tributary, has been research based on reflectance, it becomes crucial to extract the open
identified as one of the major environmental stressors that contribute to water surface in satellite imagery. The current study employed the
the wetland’s contamination. Therefore, it is essential to regularly Normalised Difference Water Index (NDWI) to extract the surface of
monitor the lake and comprehend its three dimensions. The Loktak Lake open water.
along with the sampling locations is shown in (Fig. 1). Enhancement of the target items’ brightness value and suppression
of the background objects’ brightness value can be achieved by utilising
the disparity in reflectance values between the target objects in distinct
2.2. Field sampling and satellite data bands. McFeeters built the NDWI employing the green and NIR bands
based on this theory, which can efficiently suppress information about
The field visits were carried out for water quality sampling using the soil and plants and boost information about water (McFeeters,
portable multi-parameter water quality probe. The field visits were 1996). The following is the equation for NDWI (Eq. 1).
carried out on the days with clear sky approximately synchronous to the
λgreen − λNIR
pass of the satellites. It was reported that the time gap between the NDWI = (1)
λgreen + λNIR
satellite image and the in-situ measurements affects the reflectance
comparison, it is still reasonable when environmental and water con- Where, λNIR indicates the reflectance of the near-infrared band (Band
ditions do not undergo rapid changes (Martins et al., 2017). The sam- 4 of the ResourceSat-2 and Band 8 of the Sentinel-2 satellite imageries)
pling locations were randomly chosen to cover the maximum extent of and λgreen indicates the reflectance of the green band (Band 2 of the
the lake covering both the centre and the shores of the lake. Total of 38 ResourceSat-2 and Band 3 of the Sentinel-2 satellite imageries).
sites were monitored between the years 2022–2023. Due to the re-
strictions imposed by the environment and forest department certain 2.4. Atmospheric correction for flat terrain (ATCOR)
parts of the lake is inaccessible to maintain the biodiversity. Since cloud
free days and large storm events would have a substantial impact on the The retrieval of reflectance in ATCOR remains constant throughout
results, field work planning was done to synchronize the field data sun positions and time intervals. It enables coherent reflectance and
collection dates with satellite fly over. Surveys were conducted either on temperature retrievals, facilitating relevant time series analysis. It
date of pass of satellites or with a time frame of one week of satellite makes use of absolute reflectance to enable quantitative index analysis.
pass. Factor of minimal cloud cover over the lake was also considered The support of the particular sensor is the most crucial component of
during the selection of the date of survey. The Sentinel satellite datasets atmospheric correction algorithms. The preliminary requirement of such
used in this study was downloaded from official portal (Copernicus open algorithms is relative spectral response file of the particular sensor or the
access hub) of European Space Agency. The ResourcseSat-2 (LISS IV) satellite. There are several algorithms for atmospheric corrections but
data were purchased from the Indian Space research Organization their compatibility with LISS-IV sensor aboard ResourceSat-2 is limited
(ISRO). The field surveys were carried on 9 March 2022, 8 April 2022, which is one of the very important satellite/sensors in Indian Sub-
14 November 2022 and 31 January 2023. The information related image continent as it provides multispectral information at very high spatial
acquisition day of the satellites, resolutions and bands is tabulated in resolution. ATCOR supports the LISS-IV with the advantage of definition
Table 1. The preliminary pre-processing of the Sentinel was done in and addition of new sensor. ATCOR correction is intended to eliminate
ERDAS Imagine environment, whereas for LISS-IV was done in Natural air impacts in order to restore the physical characteristics of the Earth’s
Color Image Generator for IRS Satellite Datasets environment. The field surface, which include surface temperature, soil visibility, and surface
surveys were carried out using Hanna HI98194 portable multi- reflectance. Pre-classification of the scene, atmospheric parameter re-
parameter measuring device for the instantaneous onsite measure- covery, and surface reflectance recovery are the three primary features
ments of dissolved oxygen (DO), total dissolved solids (TDS) and pH. For of ATCOR (Marcello et al., 2016; Richter, 1996). Eq. (2) is used to
the measurement of total suspended solids, the water samples were determine the surface reflectance (λsup).
collected in clean 500 ml bottles. The total suspended solids were [ ]
measured using Gravimetric analysis in the laboratory. The water 1 d2 π LTOA
λsup = − a0 (2)
sampling was carried out by following the guidelines of Central Pollu- a1 ETOA cos∅i
tion Control Board (CPCB), Govt. of India.
4
V. Anand et al. Ecological Informatics 84 (2024) 102868
⎧
Where, ∅i is the solar zenith angle, d is the direct distance to the sun,
⎨ di −( wɸ(xi ) − bi ≤ ε + ξi , i = 1, 2, 3, ….., N
⎪
⎪
LTOA is the satellite’s spectral radiance, and ETOA is the solar spectral *
Subjected to. wɸ xi + bi − yi ≤ ε + ξi , i = 1, 2, 3, ……, N
radiance on a surface perpendicular to the rays from the sun outside the ⎪
⎪
atmosphere. It is necessary to have the standard atmospheric parameters
⎩ ξ , ξ* ≥ 0, i = 1, 2, 3, ……, N.
i i
in order to obtain the coefficients a0 and a1. The adjacency effect is where, di is the intended value, and C is the regularised constant that
corrected for by computing the mean reflectance of the investigated was used to calculate the trade-off between the regularised term and the
area λsupi . Thus, the relationship to determine the reflectance of the empirical error. ε is referred to as the SVM tube size.
adjacent effect’s free surface is described by eq. (3). The decision function of the SVR model under the nonlinear issue can
be obtained following the final solution, in accordance with the Karush-
[ ∫ α2 ][ nP
]
ρ0dif ∑ Kuhn-Tucker (KKT) condition:
λʹsup = λsup + P dα λsup − λsupi wi (3)
α1 ρ0dir i=1 ∑N ( )
y(x) = i=1
αi − α*i K(x, xi ) + b (9)
Where, wi denotes the weighting coefficients as a function of the
distance-dependence, P is the sensor-specific spectral response curve, αi and αi* are the so-called Lagrange multipliers. They satisfy the
and ρ0dif and ρ0dir , respectively, are the diffuse and direct transmittance. conditions αi, αi* ∕ = 0. The input samples are transferred to the high-
dimensional feature space. The nonlinear issue is then transformed
into the convex quadratic programming problem below based on the
2.5. Multiple linear regression (MLR) Lagrange duality argument:
⎧ N ∑ N N N
MLR is a statistical method that uses a single functional formula to ⎪ 1∑ ( * )( ) ( ) ∑ ( * ) ∑ ( )
⎪ min αi + αi α*i + αi K xi ,xj + ε di α*i − αi
⎪
⎪ αi + αi −
depict the linear relationship between a number of independent vari- ⎨ a* ∈R2N 2 i=0 j=0
⎪
i=1 i=1
ables and a dependent variable. When it comes to understanding the
⎪
⎪ N
majority of natural events, the dependent variable is typically influenced ⎪ ∑ ( )
⎪
⎪
⎩ subjected to., α*i − αi = 0,0 ≤ αi , α*i ≤ C,i = 1,2,3,…..,N
by more than two independent variables. This is the same premise as i=1
simple linear regression, which shows the link between one independent
variable and a dependent variable. In this study, the MLR model, using Where K (xi, xj) is defined as kernel function. In this study Gaussian
one dependent variable (y) and multiple independent variables (xi) was Radial basis kernel and linear kernel were used.
generated using the eq. (4): ( )
Linear kernel : K xi , xj = xTi × xj (10)
y = a1 x1 + a2 x2 + a3 x3 + …… + an xn + c (4)
( ) ( ⃦ ⃦2 )
where, ai (i = 1,2,3, …..,k) represents regression coefficients for the Radial basis kernel : K xi , xj = exp − ⃦xi − xj ⃦ , γ > 0 (11)
independent variables xi (i = 1,2,3, ….,k), c is the constant of the linear
Where, d, r, and γ are kernel parameters
equation, and y is the dependent variable.
5
V. Anand et al. Ecological Informatics 84 (2024) 102868
Where, the average prediction for the ith observation from all trees (Pilario et al., 2020).
for which this observation is OOB is represented by yi In order to achieve higher level model accuracy, the ML technique’s
hyper-parameters are adjusted (Yang and Shami, 2020). Grid search and
random search techniques are the most commonly used optimization
2.8. Hierarchy of machine learning model development techniques for hyper-parameters (Bischl et al., 2023). However, in this
study grid search was used for optimization. The model was validated
The precise correction of atmospheric effects is essential for the ac- based on k-fold cross validation approach. In order to evaluate model
curate analysis of inland water color from remote sensing data. Hence performance, statistical indices namely Root Mean Squared Error
the satellite remote sensing data were atmospherically corrected using (RMSE), Mean Absolute Error (MAE), Mean Absolute Percentage Error
ATCOR algorithm. In order to extract the open water surface of the lake (MAPE) and Coefficient of determination (R2) were used (eq. 15–18).
NDWI was employed. The water quality model was developed by There are several algorithms which have been used in the past to
importing the atmospherically corrected and in-situ measured datasets determine the uncertainty associated with the model predictions such
into the Python environment. The datasets were initially normalised Monte-Carlo simulation (Pakyuz-Charrier et al., 2019), percent of rela-
using the “MinMaxScaler” transformation. It scales every single attri- tive error index (PREI) (Bui et al., 2020), 95 Prediction Probability Unit
bute to a specified range in order to standardise features. This trans- (PPU) (Anand et al., 2024; Cha et al., 2023). In this study 95 PPU plot
formation scales and translates each feature individually such that it is in was employed. Understanding the uncertainty in model predictions is
the given range on the training set, i.e. between zero and one (Amorim usually involved in a 95 % Prediction Probability Unit (PPU) graphic. In
et al., 2023). Data was divided into training and testing sets in a 60:40 modelling, this type of illustration is frequently employed to assess the
ratio prior to the development of the machine learning-based statistical extent to which prediction model function. Five percent of the weaker
model. Nonetheless, there are 50:50, 70:30, 80:20, 75:25 data split ra- forecasts are rejected when the value is computed between 2.5 % and
tios have also been used in the several studies in the past (Joseph, 2022; 97.5 % of the output variable. The 95 % prediction uncertainties are
Nazarkar et al., 2023). After splitting data, ML algorithms were trained computed by (eq. 19). Fig. 2 shows the methodological hierarchy.
and tested based on the split datasets respectively. √̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
Regression models MLR, SVR, and RFR were employed in this 1∑ n
RMSE = |y − yi |2 (15)
research due to their past validity and versatility in a multitude of water n i=1
quality modelling studies (Masood et al., 2023; Nouraki et al., 2021;
Parveen et al., 2017). The random forest regression (RFR) method is 1∑ n
widely utilised for high-dimensional data analysis (Cheng et al., 2023). MAE = |y − yi | (16)
n i=1
RFR reduces overfitting in decision trees. However, they exhibit notable
fluctuations in response to even slight modifications in the data n
1∑ |y − yi |
(Shanmugasundar et al., 2021). Utilising a kernel function to project the MAPE = .100% (17)
n i=1 y
initial data onto a high-dimensional linearly separable space provides an
immense advantage in SVR when dealing with nonlinear processes
6
V. Anand et al. Ecological Informatics 84 (2024) 102868
Table 3
Machine learning models performance indices for Sentinel.
Regression Parameters Statistical Indices
Technique
7
V. Anand et al. Ecological Informatics 84 (2024) 102868
total suspended solids in the range of 1.5 to 5 mg/l with an average Regression and Multi-Linear Regression. The water quality parameters
variation of ±2.5 mg/l. The changes in the pH, dissolved oxygen, total model was designed to forecast the quality of the water by using the
dissolved solids and total suspended solids are mainly due to drastic reflectance of the water that the sensors have detected. The optical
fluctuations in the temperature, precipitation and streamflow contrib- bands were used in this study as independent parameters for model
uting to the lake during pre-monsoon and post-monsoon seasons. development as they highly sensitive to changes in the water quality
Anthropogenic activities such as agriculture practices, amount of (Leggesse et al., 2023; Villota-González et al., 2023; Zhu et al., 2022).
effluent bought by the intersecting streams can influence the pH and Four Sentinel bands and three ResourceSat bands were used as input
TDS in the lakes hence affecting the dissolved oxygen. Similar type of variables for the model simulations in order to estimate pH, total sus-
trends was observed in the Loktak Lake in the studies carried out by pended particles, total dissolved solids, and dissolved oxygen. The
(Laishram et al., 2022; Mayanglambam and Neelam, 2020; Tuboi et al., Support Vector Regression for Sentinel-2 had the lowest root mean
2018) square error, ranging from 0.05 to 0.28; the Random Forest Regression,
In this study satellite imageries from two different sensors were with total dissolved solids being the lone exception, came closely
tested using three different techniques. Among the two sensors used in behind, at 0.18 to 0.24; and the Multi-Linear Regression, with a range of
this study, the Sentinel multispectral images had comparatively low 0.13 to 0.52 was the highest. The Support Vector Regression model
spatial resolution and high radiometric resolution, while ResourceSat produced a better result in the ResourceSat-2 case, with the root mean
had high spatial resolution and low radiometric resolution. The accuracy square ranging between 0.08 and 0.13. The Random Forest Regression
of the two sensors for each of the three models—MLR, RFR, and SVR—is model followed with values ranging between 0.16 and 0.22 (the total
displayed in Tables 2, 3, and 4. Irrespective of the sensor, Support Vector dissolved solids being the only exception), and Multi-Linear Regression
Regression yielded the highest accuracy followed by the Random Forest model produced poor results (0.39–2.65). However, to quantify the
Fig. 4. Observed versus predicted values using Support Vector Machine and Random Forest models generated from Sentinel multi-spectral satellite imageries.
Support Vector Machine models: (a) Dissolved oxygen, (b) pH, (c) Total Suspended Solids, (d) Total Dissolved Solids. Random Forest models: (e) Dissolved oxygen, (f)
pH, (g) Total Suspended Solids, (h) Total Dissolved Solids.
8
V. Anand et al. Ecological Informatics 84 (2024) 102868
Fig. 4. (continued).
robustness of model performance two more indices were added for RF determined in this study. When compared to the other water quality
and SVR interpretation Tables 3,4). The main reason behind the better metrics, the total suspended solids demonstrated the highest level of
performance of SVM model could be, the capacity of the SVM model to variance, regardless of the three methods used in this study to estimate
maximise the margin between data points and the decision frontier, the parameters based on reflectance. Multi-Linear Regression was out-
which makes it especially well-suited for high-dimensional spaces, could performed by the two machine learning methods, Support Vector
be one of the main causes of its superior performance when compared to Regression and Random Forest Regression. The mapping of water
the RF model. It is able to generate intricate limits on decisions by uti- quality indicators in Loktak Lake using Sentinel and ResourceSat satel-
lising kernel tricks. SVM can be more effective when working with lite imagery was done using Support Vector Regression models, which
sparse data because it only uses a portion of the support vectors to were chosen based on model performance.
generate predictions, which lowers complexity. Tables 3 and 4 lists the The uncertainty in the model was analysed using the 95 PPU. From
model performance indices for both sensors under various conditions. the model uncertainty analysis, it can be observed that support vector
The linear regression models for pH, total suspended particles, total machine performed better than the random forest. The majority of the
dissolved solids, and dissolved oxygen that were created using Sentinel- data fell within the 95 PPU plot thickness. If 95 % of the observed data
2 and ResourceSat-2 are highlighted in Table 2. points fall within the intervals, it indicates the model adequately cap-
For Sentinel, the blue and the red spectral bands were the most tures the variability in the data. This thickness of the intervals reflects
sensitive independent variable followed by green band in the visible the uncertainty in the predictions. However, in case of SVM the thick-
spectrum, while the near-infrared band showed the weakest correlation. ness of the intervals was narrower as compared to RF, where the interval
With the exception of the blue spectral band, which ResourceSat lacked, was found to be thicker. Narrow intervals indicate less uncertainty in the
identical patterns were noted. Figs. 4 and 5 for Sentinel and Resource- predictions and the majority of data points fall within narrow intervals,
Sat, respectively, display the prediction correlation between the pre- which suggests the model is highly reliable. Figs. 6 and 7 represents the
dicted and observed values for each of the water quality metrics 95 PPU plot for RF and SVM models for Sentinel multispectral imageries
9
V. Anand et al. Ecological Informatics 84 (2024) 102868
respectively. The uncertainty of RF and SVM models for ResourceSat variation in the water quality parameters between two different sensors.
(LISS IV) multispectral imageries is shown in the Fig. 8 and Fig. 9 In order to check the seasonal variations in water quality parameters,
respectively. different models for various parameters generated from Sentinel was
chosen. Water quality extracted from Sentinel multispectral satellite
3.2. Spatio-temporal variability of water quality parameters imageries was compared for the month of April (dry season) and
November (wet season). The mean values for pH, dissolved oxygen, total
Spatial variation of water quality for April (dry season) and suspended solids, and total dissolved solids were found to be 7.51, 5.72
November (wet season) for Sentinel is shown in Fig. 10 and Fig. 11 mg/l, 2.41 mg/l and 148.58 mg/l respectively during the month of
respectively. Fig. 12 shows the spatial distribution of water quality pa- April. While, during the month of November the mean value of pH,
rameters for the ResourceSat for the month of March. Investigating the dissolved oxygen, total suspended solids, and total dissolved solids were
viability of using Sentinel and ResourceSat to retrieve water quality found to be 7.78, 8.96 mg/l, 5.10 mg/l and 127.19 mg/l respectively.
indicators from the data-scarce Loktak Lake was an objective of this The similar kind of quantitative trends in the water quality parameter
research. Water quality characteristics could be derived from spectral were obtained in the research studies carried out by Mayanglambam and
reflectance data, pursuant to the developed models, which yielded Neelam (2020) and Laishram et al. (2022). This spatio-temporal varia-
higher accuracy results. Using machine learning models, specifically tions in the Loktak Lake is caused due to drastic shift in heat and wind
Support Vector Machine and Random Forest, this study was able to fluxes South-West Indian monsoon season. Due to excessive heating of
acquire more accuracy than it accomplished with conventional linear the water in the lake caused by the heat fluxes during the summer season
regression models. From the obtained results two comparisons were increases the water temperature which leads to in decrease the pH and
made: (1) seasonal variations in the water quality parameters; (2) mean dissolved oxygen in the lake. The total suspended solids during the post
Fig. 5. Observed versus predicted values using Support Vector Machine and Random Forest models generated from ResourceSat multi-spectral satellite imageries.
Support Vector Machine models: (a) Dissolved oxygen, (b) pH, (c) Total Suspended Solids, (d) Total Dissolved Solids. Random Forest models: (e) Dissolved oxygen, (f)
pH, (g) Total Suspended Solids, (h) Total Dissolved Solids.
10
V. Anand et al. Ecological Informatics 84 (2024) 102868
Fig. 5. (continued).
monsoon season in the month of November was much higher as in case of the models generated from ResourceSat, pH, dissolved oxygen,
compared to the pre-monsoon due to sediments driven by the streams total suspended solids, and total dissolved solids was found to be 7.15,
intersecting the lake from the upstream mountainous reaches of the 4.46 mg/l, 1.74 mg/l and 150.41 mg/l. The mean values obtained from
Himalayas. An average zonal variability statistical comparison was both the satellites showed very similar kind of results.
made with gauged data. From zonal statistical variability it was Although ResourceSat (LISS-IV) has finer spatial resolution than
observed that deviation in the water quality parameters predicted from Sentinel-2, the later satellite has more spectral bands. In this study, the
both the sensors/satellites based on machine learning model as three bands present in the visible spectrum and a NIR bands were used
compared gauged data was comparatively higher in on-shore pre- from the Sentinel-2, where in ResourceSat two bands present in the
dictions as compared to the off-shore predictions. An on-shore and off- visible spectrum and a NIR bands were used. As depicted from Fig. 11
shore comparison of both the sensors in shown in Fig. 13. and Fig. 12, sharper variations in water quality can be observed in the
parameters extracted from the Sentinel as compared to that of Resour-
3.3. Inter-comparison of sensors/satellites and their effects of resolutions ceSat. The main reason behind the detection of sharper variation in
Sentinel can be higher spectral resolution in terms of blue band and
For the evaluation of mean variation in the water quality parameters higher radiometric resolution compared to the ResourceSat (LISS IV).
between two different sensors, the multi-spectral satellite imageries on 9 Since the ResourceSat (LISS IV) have higher spatial resolution, the lower
March 2022 was chosen, since this was the only date which intersected radiometric and spectral resolutions were traded off. Since the radio-
between in-situ water quality sample collection, pass of Sentinel and metric resolution of the ResourceSat (LISS IV) is less, the capacity to
ResourceSat satellite over the Loktak Lake. In case of water quality detect the minor fluctuations in the amount of energy decreases. An
model generated from Sentinel satellite imageries returned a mean value instantaneous field of view (IFOV) of the sensor must be short in order to
of 7.08, 4.95 mg/l, 1.81 mg/l and 150.85 mg/l for pH, dissolved oxygen, achieve high spatial resolution. But when the area of the ground reso-
total suspended solids, and total dissolved solids respectively. Whereas, lution cell within the IFOV gets smaller, this decreases the amount of
11
V. Anand et al. Ecological Informatics 84 (2024) 102868
Fig. 6. 95 PPU uncertainty plot of Random Forest model generated from Sentinel.
Fig. 7. 95 PPU uncertainty plot of Support Vector Machine model generated from Sentinel.
energy that can be detected. This results in a decrease in radiometric offer superior spatial information.
resolution, or the capacity to identify minute variations in energy. The outcome of this study clearly shows the superiority of machine
Broadening the wavelength range detected for a specific band is essen- learning techniques over the other conventional techniques. The results
tial in order to increase the radiometric resolution without reducing the of this study, which employed machine learning approaches, are
spatial resolution. However, this lowers the sensor’s spectral resolution. consistent with the research conducted by Prasad et al. (2020), who
In contrast, better spectral and radiometric resolution would be possible discovered 94 % accuracy when mapping prospective groundwater in
with coarser spatial resolution. However, there are benefits associated India using random forests and machine learning models. Abdi (2020)
with having a higher spatial resolution. Compared to coarser-resolution proposed that the reason machine learning models perform better
satellites, ResourceSat’s higher spatial resolution may make it possible overall with remote sensing data than parametric models is that they
to remotely sense the water quality in smaller ponds, reservoirs, and render it likely to uncover higher-level and non-linear statistical re-
narrow-width streams. Since many streams with limited width and man- lations. However, a well-formulated parametric model can also yield
made aqua ponds for fishing dominate the Loktak Lake catchment. Due sufficient accuracy for water quality retrieval (Abdelmalik, 2018). The
to their small size, they are challenging to monitor with Landsat-8 or results of this study clearly indicate the better performance of the SVM
Sentinel-2, or even ground-based sampling. For instance, since Landsat 8 model over the other two models used in this study which is similar to
and Sentinel 2 have spatial resolutions of 30 and 10 m, respectively, a the findings of the several other studies (Kibtia et al., 2020; İskenderoğlu
stream with a width of 30 m would be covered by either one Landsat-8 et al., 2020). In comparison with the satellite remote sensing-based
pixel or three Sentinel-2 pixels. A ResourceSat (LISS-IV) image would study carried globally, Zhu et al. (2022) obtained 1.5 %, 0.02 % and
contain about 6 pixels for a stream with a width of 30 m. It can therefore 1.7 % errors associated turbidity, dissolved oxygen and chlorophyll
12
V. Anand et al. Ecological Informatics 84 (2024) 102868
Fig. 8. 95 PPU uncertainty plot of Random Forest model generated from ResourceSat (LISS-IV).
Fig. 9. 95 PPU uncertainty plot of Support Vector Machine model generated from ResourceSat (LISS-IV).
respectively while using Sentinel satellite imageries. While using trade-offs among spectral, spatial, and radiometric resolutions limit the
Landsat satellite imageries for the retrieval of water quality parameters spatial resolution of the space-borne images to medium or coarse reso-
based on remote sensing, Jakovljevic et al. (2024) obtained an R- lutions (Gege and Dekker, 2020). However, multispectral remote
squared value of 0.998, 0.999, 0.990 for the chlorophyll, dissolved ox- sensing is a better choice for studying the spatiotemporal variation of
ygen and total suspended solids respectively. A high R-squared value water quality parameters in large-scale inland areas. Nonetheless, pre-
indicates that the model can adequately explain the fluctuation in the vious research has shown that a wide bandwidth makes it difficult to
parameter using the input features while applied to an optically inactive separate the optical characteristics of various components of a water
water quality parameter that was produced through machine learning body with complex optical properties (Chen and Shen, 2016); however,
parameters. This implies that the attributes selected capture underlying the development and application of hyperspectral sensors is an impor-
patterns or relationships that contribute to the variance in the optical tant approach to resolving this problem. Secondly, the model perfor-
properties of water, even though the parameter may not have a direct mance was found to be good; however there were some limitations
impact on those qualities. The possible ramifications for water quality associated with the model uncertainty especially in the onshore zones as
monitoring must be carefully considered when interpreting these compared to the offshore zones. This uncertainty may be because of the
outcomes. onshore being more vulnerable and sensitive to the anthropogenic
Other approaches, that include deep learning, have been found to changes. Hence, in order to enhance the model performance there is a
deliver higher estimates of water quality retrieval in the United States need to integrate the in-situ data number at a higher frequency from
(Sagan et al., 2020). There are several limitations associated with this different seasons for the better capture of the variability associated with
study. Primarily, while working with remote sensing satellite datasets, water quality parameters both spatially and temporarily.
13
V. Anand et al. Ecological Informatics 84 (2024) 102868
Fig. 10. Spatial distribution of water quality parameters derived from Sentinel multi-spectral satellite imageries for the month of April 2022.
14
V. Anand et al. Ecological Informatics 84 (2024) 102868
Fig. 11. Spatial distribution of water quality parameters derived from Sentinel multi-spectral satellite imageries for the month of November 2022.
15
V. Anand et al. Ecological Informatics 84 (2024) 102868
Fig. 12. Spatial distribution of water quality parameters derived from ResourceSat multi-spectral satellite imageries for the month of March 2022.
16
V. Anand et al. Ecological Informatics 84 (2024) 102868
Fig. 13. Comparison between predicted and actual values for different water quality parameter from both the sensors.
Data availability
The performance of the water quality algorithms that were devel-
oped using the data from both sensors was highly compelling. These
Sentinel datasets used in this study are available in the public domain
results suggest that along with machine learning approach, the Sentinel
through the official portal (https://dataspace.copernicus.eu/). Resour-
and ResourceSat should be used in tandem for monitoring water quality,
ceSat datasets used in this study are available in the public domain
as they complement their respective design efforts. ResourceSat is more
through the official portal (https://bhoonidhi.nrsc.gov.
appropriate for assessing water quality parameters near water bodies
in/bhoonidhi/home.html). The field water quality data used in this
17
V. Anand et al. Ecological Informatics 84 (2024) 102868
study is under proprietary control can be shared on request. The sample Cha, G.W., Choi, S.H., Hong, W.H., Park, C.W., 2023. Development of machine learning
model for prediction of demolition waste generation rate of buildings in
of the field data along with the codes used in this study is accessible in
redevelopment areas. Int. J. Environ. Res. Public Health 20, 107. https://doi.org/
Zenodo repository (https://doi.org/10.5281/zenodo.13972855). 10.3390/ijerph20010107.
Chen, Y., Shen, F., 2016. Influence of suspended particulate matter on chlorophyll-a
Acknowledgement retrieval algorithms in Yangtze River estuary and adjacent turbid waters. Remote
Sens. Technol. Appl. 31, 126–133.
Chen, Y., Li, J., Xu, H., 2016. Improving flood forecasting capability of physically based
The authors express their heartfelt gratitude to the European Space distributed hydrological models by parameter optimization. Hydrol. Earth Syst. Sci.
Agency (ESA), and National Remote Sensing Centre (NRSC)/ Indian 20, 375–392.
Chen, K., Chen, H., Zhou, C., Huang, Y., Qi, X., Shen, R., 2020. Comparative analysis of
Space Research Organization (ISRO) for providing the valuable data- surface water quality prediction performance and identification of key water
base. We would like to thank Laishram Kishan Singh, Kh Digbijoy Singh, parameters using different machine learning models based on big data. Water Res.
Rajkumari Neetu Sana and Th Lenin Singh for supporting us during the 171, 115454.
Cheng, Q., Chunhong, Z., Qianglin, L., 2023. Development and application of random
extensive field campaigns. Last but not the least; we thank the editor and forest regression soft sensor model for treating domestic wastewater in a sequencing
two anonymous reviewers for very constructive and helpful suggestions. batch reactor. Sci. Rep. 13 (1), 9149. https://doi.org/10.1038/s41598-023-36333-8,
The research outcome of this study was supported by SERB sponsored 2023 Jun 5. PMID: 37277429; PMCID: PMC10241833.
Costa, J.A.D., Souza, J.P.D., Teixeira, A.P., Nabout, J.C., Carneiro, F.M., 2018.
project (CRG/2021/006665) and the National Institute of Technology Eutrophication in aquatic ecosystems: a scientometric study. Acta Limnol. Bras. 30,
Manipur. The first author would like thank the Ministry of Education, e2.
Govt. of India, and the Deutscher Akademischer Austauschdienst for Directorate of Environment (Government of Manipur), 2013. Manipur.
Dong, L., Gong, C., Huai, H., Wu, E., Lu, Z., Hu, Y., Li, L., Yang, Z., 2023. Retrieval of
providing the prestigious DAAD Doctoral Fellowship to carry out this
water quality parameters in Dianshan Lake based on Sentinel-2 MSI imagery and
study. Open Access funding enabled and organized by Projekt DEAL. machine learning: algorithm evaluation and spatiotemporal change research.
Remote Sens. 15, 5001. https://doi.org/10.3390/rs15205001.
References Ebtehaj, I., Bonakdari, H., Safari, M.J.S., Gharabaghi, B., Zaji, A.H., Madavar, H.R.,
Khozani, Z.S., Es-haghi, M.S., Shoshegaran, A., Mehr, A.D., 2020. Combination of
sensitivity and uncertainty analyses for sediment transport modeling in sewer pipes.
Abbas, F., Cai, Z., Shoaib, M., Iqbal, J., Ismail, M., Arifullah Alrefaei, A.F., Albeshr, M.F., Int. J. Sediment. Res. 35, 157–170.
2024. Machine learning models for water quality prediction: a comprehensive Ferreira, V., Bini, L.M., González Sagrario, M.D., Katya, E.K., Luigi, N., Andre, A.P.,
analysis and uncertainty assessment in Mirpurkhas, Sindh, Pakistan. Water 16, 941. Judit, P., 2023. Aquatic ecosystem services: an overview of the special issue.
https://doi.org/10.3390/w16070941. Hydrobiologia 850, 2473–2483. https://doi.org/10.1007/s10750-023-05235-1.
Abbaspour, K.C., Yang, J., Maximov, I., Siber, R., Bogner, K., Mieleitner, J., Zobrist, J., Gege, P., Dekker, A.G., 2020. Spectral and radiometric measurement requirements for
Srinivasan, R., 2007. Modelling hydrology and water quality in the pre-alpine/alpine inland, coastal and reef waters. Remote Sens. 12, 2247.
Thur watershed using SWAT. J. Hydrol. 333, 413–430. Gholizadeh, M.H., Melesse, A.M., Reddi, L.A., 2016. Comprehensive review on water
Abdelmalik, K.W., 2018. Role of statistical remote sensing for inland water quality quality parameters estimation using remote sensing techniques. Sensors 16, 1298.
parameters prediction. Egypt. J. Remote Sens. Space Sci. 21, 193–200. https://doi. https://doi.org/10.3390/s16081298.
org/10.1016/j.ejrs.2016.12.002. İskenderoğlu, F.C., Baltacioğlu, M.K., Demir, M.H., Baldinelli, A., Barelli, L., Bidini, G.,
Abdi, A.M., 2020. Land cover and land use classification performance of machine 2020. Comparison of support vector regression and random forest algorithms for
learning algorithms in a boreal landscape using Sentinel-2 data. GISci. Remote Sens. estimating the SOFC output voltage by considering hydrogen flow rates. Int. J.
57, 1–20. https://doi.org/10.1080/15481603.2019.1650447. Hydrog. Energy 45 (60), 35023–35038. https://doi.org/10.1016/j.
Adusei, Y.Y., Quaye-Ballard, J., Adjaottor, A.A., Mensah, A.A., 2021. Spatial prediction ijhydene.2020.07.265.
and mapping of water quality of Owabi reservoir from satellite imageries and Jakovljevic, G., Alvarez-Taboada, F., Govedarica, M., 2024. Long-term monitoring of
machine learning models. Egypt. J. Remote Sens. Space Sci. 24 (3(2)), 825–833. inlandwater quality parameters using Landsat time-series and back-propagated ANN:
ISSN 1110-9823,. https://doi.org/10.1016/j.ejrs.2021.06.006. assessment and usability in a real-case scenario. Remote Sens. 16, 68. https://doi.
Ahmed, N.S., Hikmat Sadiq, M., 2018. Clarify of the random forest algorithm in an org/10.3390/rs16010068.
educational field. In: Proceedings of the 2018 International Conference on Advanced Janicka, E., Kanclerz, J., Wiatrowska, K., Budka, A., 2022. Variability of Nitrogen and
Science and Engineering (ICOASE), Duhok, Iraq, 9–11 October 2018, 2018. IEEE, Phosphorus content and their forms in waters of a river-lake system. Front. Environ.
Piscataway, NJ, USA, pp. 179–184. Sci. 10, 874754. https://doi.org/10.3389/fenvs.2022.874754.
Akinnawo, S.O., 2023. Eutrophication: causes, consequences, physical, chemical and Jepsen, S.M., Harmon, T.C., Guan, B., 2021. Analyzing the suitability of remotely sensed
biological techniques for mitigation strategies. Environ. Challen. 12, 100733. ISSN ET for calibrating a watershed model of a Mediterranean Montane Forest. Remote
2667-0100. https://doi.org/10.1016/j.envc.2023.100733. Sens. 13, 1258. https://doi.org/10.3390/rs13071258.
Amorim, L.B.V., George, D.C.C., Cruz, R.M.O., 2023. The choice of scaling technique Joseph, V.R., 2022. Optimal ratio for data splitting. Stat. Anal. Data Min.: ASA Data Sci.
matters for classification performance. Appl. Soft Comput. 133, 109924. ISSN 1568- J. 15, 531–538. https://doi.org/10.1002/sam.11583.
4946. https://doi.org/10.1016/j.asoc.2022.109924. Kangabam, R.D., Bhoominathan, S.D., Kanagaraj, S., Govindaraju, M., 2017.
Anand, V., Oinam, B., Parida, B.R., 2020. Uncertainty in hydrological analysis using Development of a water quality index (WQI) for the Loktak Lake in India. Appl. Wat.
multi-GCM predictions and multi-parameters under RCP 2.6 and 8.5 scenarios in Sci. 7, 2907–2918. https://doi.org/10.1007/s13201-017-0579-4.
Manipur River basin, India. J. Earth Syst. Sci. 129, 223. Kibtia, H., Abdullah, S., Bustamam, A., 2020. Comparison of random forest and support
Anand, V., Oinam, B., Singh, I.H., 2021. Predicting the current and future potential vector machine for prediction of cognitive impairment in Parkinson’s disease. AIP
spatial distribution of endangered Rucervus eldii eldii (Sangai) using MaxEnt model. Conf. Proc. 2296 (1), 020093. https://doi.org/10.1063/5.0030332.
Environ. Monit. Assess. 193, 147. https://doi.org/10.1007/s10661-021-08950-1. Kim, Y.H., Im, J., Ha, H.K., Choi, J.K., Ha, S., 2014. Machine learning approaches to
Anand, V., Oinam, B., Wieprecht, S., Singh, S.K., Srinivasan, R., 2024. Enhancing coastal water quality monitoring using GOCI satellite data. Gisci. Remote Sens. 51,
hydrological model calibration through hybrid strategies in data-scarce regions. 158–174.
Hydrol. Process. 38 (2), e15084. https://doi.org/10.1002/hyp.15084. Kong, X., Zhan, Q., Boehrer, B., Rinke, K., 2019. High frequency data provide new
Arias-Rodriguez, L.F., Duan, Z., Sepúlveda, R., Martinez-Martinez, S.I., Disse, M., 2020. insights into evaluating and modeling nitrogen retention in reservoirs. Water Res.
Monitoring water quality of Valle de bravo reservoir, Mexico, using entire lifespan of 166, 115017.
MERIS data and machine learning approaches. Remote Sens. 12, 1586. Laishram, R.J., Yumnam, G., Alam, W., 2022. Assessment of ecohydrogeochemical status
Baracchini, T., Hummel, S., Verlaan, M., Cimatoribus, A., Wüest, A., Bouffard, D., 2020. of freshwater Loktak Lake of Manipur, India. Environ. Monit. Assess. 194, 659.
An automated calibration framework and open source tools for 3D lake https://doi.org/10.1007/s10661-022-10336-w.
hydrodynamic models. Environ. Model. Softw. 134, 104787. Lan, L., Gu, M., Gong, C., Hu, Y., Wang, X., Yang, Z., He, Z., 2023. An advanced remote
Beggel, S., Pander, J., Geist, J., 2022. Ecological indicators for surface water quality - sensing retrieval method for urban non-optically active water quality parameters: an
methodological approaches to fish community assessments in China and Germany. example from Shanghai. Sci. Tot. Environ. 880, 163389. ISSN 0048-9697,. https://
In: Dohmann, M., Grambow, M., Song, Y., Wermter, P. (Eds.), Chinese Water doi.org/10.1016/j.scitotenv.2023.163389.
Systems. Terrestrial Environmental Sciences. Springer, Cham. https://doi.org/ LDA, WISA, 2002. Loktak, Phumdismanagement. Loktak newsletter vol 2. Loktak
10.1007/978-3-030-80234-9_2. Development Authority, Imphal and Wetland International-South Asia, New Delhi.
Bischl, B., Binder, M., Lang, M., Pielok, T., Richter, J., Coors, S., Thomas, J., Ullmann, T., Leggesse, E.S., Zimale, F.A., Sultan, D., Enku, T., Srinivasan, R., Tilahun, S.A., 2023.
Becker, M., Boulesteix, A.-L., Deng, D., Lindauer, M., 2023. Hyperparameter Predicting optical water quality indicators from remote sensing using machine
optimization: foundations, algorithms, best practices, and open challenges. WIREs learning algorithms in tropical highlands of Ethiopia. Hydrology 10, 110.
Data Min. Knowledge Disc. 13 (2), e1484. https://doi.org/10.1002/widm.1484. Li, S., Song, K., Wang, S., Liu, G., Wen, Z., Shang, Y., Lyu, L., Chen, F., Xu, S., Tao, H.,
Breiman, L., 2001. Random forests. Mach. Learn. 45, 5–32. et al., 2021. Quantification of chlorophyll-a in typical lakes across China using
Bui, D.T., Khosravi, K., Tiefenbacher, J., Nguyen, H., Kazakis, N., 2020. Improving Sentinel-2 MSI imagery with machine learning algorithm. Sci. Total Environ. 778,
prediction of water quality indices using novel hybrid machine-learning algorithms. 146271.
Sci. Total Environ. 721, 137612. Lin, L., Yang, H., Xu, X., 2022. Effects of water pollution on human health and disease
Campos, L.C., Olago, D., Osborn, D., 2022. Water and the UN sustainable development heterogeneity: a review. Front. Environ. Sci. 10, 880246. https://doi.org/10.3389/
goals. UCL Open Environ. 4, e029. fenvs.2022.880246.
18
V. Anand et al. Ecological Informatics 84 (2024) 102868
Lloyd, C.E.M., Johnes, P.J., Pemberton, J.A., Yates, C.A., Jones, D., Evershed, R.P., 2022. Sagan, V., Peterson, K.T., Maimaitijiang, M., Sidike, P., Sloan, J., Greeling, B.A.,
Sampling, storage and laboratory approaches for dissolved organic matter Maalouf, S., Adams, C., 2020. Monitoring inland water quality using remote sensing:
characterisation in freshwaters: moving from nutrient fraction to molecular-scale potential and limitations of spectral indices, bio-optical simulations, machine
characterization. Sci. Total Environ. 827, 154105. https://doi.org/10.1016/j. learning, and cloud computing. Earth Sci. Rev. 205. https://doi.org/10.1016/j.
scitotenv.2022.154105. earscirev.2020.103187 103187.
Mahato, S., Pukhrambam, G., Joshi, P.K., 2023. Damming effects on hydrological Schaeffer, B.A., Schaeffer, K.G., Keith, D., Lunetta, R.S., Conmy, R., Gould, R.W., 2013.
abundance and eco-hydrological alteration in upstream wetlands of eastern Barriers to adopting satellite remote sensing for water quality management. Int. J.
Himalaya. J. Clean. Prod. 418, 138089. Remote Sens. 34, 7534–7544.
Malahlela, O.E., 2019. Spatio-temporal assessment of inland surface water quality using Seo, D.K., Kim, Y.H., Eo, Y.D., Park, W.Y., Park, H.C., 2017. Generation of radiometric,
remote sensing data in the wake of changing climate. In: In Proceedings of the IOP phenological normalized image based on random forest regression for change
Conference Series: Earth and Environmental Science, West Java, Indonesia, Vol. 227, detection. Remote Sens. 2017 (9), 1163.
p. 062012. Shams, M.Y., Elshewey, A.M., El-kenawy, E.S.M., et al., 2024. Water quality prediction
Marcello, J., Eugenio, F., Perdomo, U., Medina, A., 2016. Assessment of atmospheric using machine learning models based on grid search method. Multimed. Tools Appl.
algorithms to retrieve vegetation in natural protected areas using multispectral high 83, 35307–35334. https://doi.org/10.1007/s11042-023-16737-4.
resolution imagery. Sensors 16, 1624. Shanmugasundar, G., Vanitha, M., Čep, R., Kumar, V., Kalita, K., Ramachandran, M.A.,
Mardani, N., Suara, K., Fairweather, H., Brown, R., McCallum, A., Sidle, R.C., 2020. 2021. Comparative study of linear, random forest and adaboost regressions for
Improving the accuracy of hydrodynamic model predictions using Lagrangian modeling non-traditional machining. Processes 9, 2015. https://doi.org/10.3390/
calibration. Water 12 (2), 575. pr9112015.
Martins, V., Barbosa, C., de Carvalho, L., Jorge, D., Lobo, F., Novo, E., 2017. Assessment Shi, K., Zhang, Y., Zhu, G., Qin, B., Pan, D., 2018. Deteriorating water clarity in shallow
of atmospheric correction methods for Sentinel-2 MSI images applied to Amazon waters: evidence from long term MODIS and in-situ observations. Int. J. Appl. Earth
Floodplain Lakes. Remote Sens. 9 (4), 322. Obs. Geoinf. 68, 287–297 (CrossRef).
Masood, A., Niazkar, M., Zakwan, M., Piraei, R.A., 2023. Machine learning-based Swain, R., Sahoo, B., 2017. Improving river water quality monitoring using satellite data
framework for water quality index estimation in the Southern Bug River. Water 15, products and a genetic algorithm processing approach. Sustain. Wat. Qual. Ecol. 9-
3543. https://doi.org/10.3390/w15203543. 10, 88–114.
Mayanglambam, B., Neelam, S.S., 2020. Physicochemistry and water quality of Loktak Talukdar, S., Bera, S., Naikoo, M.W., Ramana, G.V., Mallik, S., Kumar, P.A., Rahman, A.,
Lake water, Manipur, India. Int. J. Environ. Anal. Chem. https://doi.org/10.1080/ 2024. Optimisation and interpretation of machine and deep learning models for
03067319.2020.1742888. improved water quality management in Lake Loktak. J. Environ. Manag. 351,
McFeeters, S.K., 1996. The use of the normalized difference water index (NDWI) in the 119866.
delineation of open water features. Int. J. Remote Sens. 17, 1425–1432. Tang, W., Pei, Y., Zheng, H., Zhao, Y., Shu, L., Zhang, H., 2022. Twenty years of China’s
Mohammed, H., Longva, A., Seidu, R., 2019. Impact of climate forecasts on the microbial water pollution control: experiences and challenges. Chemosphere 295, 133875.
quality of a drinking water source in Norway using hydrodynamic modeling. Water ISSN 0045-6535,. https://doi.org/10.1016/j.chemosphere.2022.133875.
11 (3), 527. Tang, Y., Sun, Y., Han, Z., Soomro, S., Wu, Q., Tan, B., Hu, C., 2023. Flood forecasting
Mohammed, H., Tornyeviadzi, H.M., Seidu, R., 2021. Modelling the impact of weather based on machine learning pattern recognition and dynamic migration of
parameters on the microbial quality of water in distribution systems. J. Environ. parameters. J. Hydro.: Reg. Stud. 47, 101406. https://doi.org/10.1016/j.
Manag. 284, 111997. https://doi.org/10.1016/j.jenvman.2021.111997. ejrh.2023.101406.
Muhoyi, H., Gumindoga, W., Mhizha, A., Misi, S.N., Nondo, N., 2022. Water quality Thongam, N., Meitei, M.D., 2021. Role of dominant macrophytes to treat Nambul river,
monitoring using remote sensing, Lower Manyame Sub-catchment, Zimbabwe. Wat. the main polluter of Loktak – a dying Ramsar site in the indo Burma hot spot
Prac. Tech. 17 (6), 1347–1357. https://doi.org/10.2166/wpt.2022.061. (Manipur, India). Int. J. Phytoremediat. 23 (11), 1132–1144. https://doi.org/
Mwanake, R.M., Gettel, G.M., Wangari, E.G., Glaser, C., Houska, T., Breuer, L., 10.1080/15226514.2021.1880367.
Butterbach-Bahl, K., Kiese, R., 2023. Anthropogenic activities significantly increase Tuboi, C., Irengbam, M., Hussain, A.S., 2018. Seasonal variations in the water quality of a
annual greenhouse gas (GHG) fluxes from temperate headwater streams in Germany. tropical wetland dominated by floating meadows and its implication for
EGU 2023. https://doi.org/10.5194/egusphere-2023-683. conservation of Ramsar wetlands. Phys. Chem. Ear., Parts A/B/C 103, 107–114. ISSN
Nazarkar, A.A., Harish, K.A., Paidimarry, A.C.S., Kulkarni, A.S., 2023. Impact of various 1474-7065,. https://doi.org/10.1016/j.pce.2017.09.001.
data splitting ratios on the performance of machine learning models in the Tzanakakis, V.A., Nikolaos, V.P., Andreas, N.A., 2020. Water supply and water scarcity.
classification of lung cancer. In: Proceedings of the Second International Conference Water 12 (9), 2347. https://doi.org/10.3390/w12092347.
on Emerging Trends in Engineering (ICETE 2023), 96–104, pp. 2352–5401. https:// Usharani, K., Umarani, K., Ayyasamy, P., Shanthi, K., Lakshmanaperumalsamy, P., 2010.
doi.org/10.2991/978-94-6463-252-1_12. Physico-chemical and bacteriological characteristics of Noyyal River and ground
Nouraki, A., Alavi, M., Golabi, M., Albaji, M., 2021. Prediction of water quality water quality of Perur, India. J. Appl. Sci. Environ. Manag. 14.
parameters using machine learning models: a case study of the Karun River, Iran. Villota-González, F.H., Sulbarán-Rangel, B., Zurita-Martínez, F., Gurubel-Tun, K.J.,
Environ. Sci. Pollut. Res. Int. 28 (40), 57060–57072. https://doi.org/10.1007/ Zúñiga-Grajeda, V., 2023. Assessment of machine learning models for remote
s11356-021-14560-8. Epub 2021 Jun 3. PMID: 34081285. sensing of water quality in lakes Cajititlán and Zapotlán, Jalisco—Mexico. Remote
NWA, 2009. National Wetland Atlas: Manipur. SAC/RESA/ AFEG/NWIA/ATLAS/03/ Sens. 15, 5505. https://doi.org/10.3390/rs15235505.
2009, Space Applications Centre, ISRO, Ahmedabad, India, p. 96. Yang, L., Shami, A., 2020. On hyperparameter optimization of machine learning
Pakyuz-Charrier, E., Jessell, M., Giraud, J., Lindsay, M., Ogarko, V., 2019. Topological algorithms: theory and practice. Neurocomp 415, 295–316. ISSN 0925-2312,. https
analysis in Monte Carlo simulation for uncertainty propagation. Solid Earth 10, ://doi.org/10.1016/j.neucom.2020.07.061.
1663–1684. https://doi.org/10.5194/se-10-1663-2019. Yin, J., Medellin-Azuara, J., Escriva-Bou, A., Liu, Z., 2021. Bayesian machine learning
Parveen, N., Zaidi, S., Danish, M., 2017. Development of SVR-based model and ensemble approach to quantify model uncertainty in predicting groundwater storage
comparative analysis with MLR and ANN models for predicting the sorption capacity change. Sci. Total Environ. 769, 144715.
of Cr(VI). Process. Saf. Environ. Prot. 107, 428–437. https://doi.org/10.1016/j. Yu, Q., Shi, C., Bai, Y., Zhang, J., Lu, Z., Xu, Y., Li, W., Liu, C., Soomro, S., Tian, L., Hu, C.,
psep.2017.03.007. 2024. Interpretable baseflow segmentation and prediction based on numerical
Peterson, K.T., Sagan, V., Sloan, J.J., 2020. Deep learning-based water quality estimation experiments and deep learning. J. Environ. Manag. 360, 121089. https://doi.org/
and anomaly detection using Landsat-8/Sentinel-2 virtual constellation and cloud 10.1016/j.jenvman.2024.121089.
computing. GISci. Rem. Sens. 57 (4), 510–525. https://doi.org/10.1080/ Zhao, Y., Yu, T., Hu, B., Zhang, Z., Liu, Y., Liu, X., Liu, H., Liu, J., Wang, X., Song, S.,
15481603.2020.1738061. 2022. Retrieval of water quality parameters based on near-surface remote sensing
Pilario, K.E., Shafiee, M., Cao, Y., Lao, L., Yang, S.H., 2020. A review of kernel methods and machine learning algorithm. Remote Sens. 14, 5305. https://doi.org/10.3390/
for feature extraction in nonlinear process monitoring. Processes 8, 24. https://doi. rs14215305.
org/10.3390/pr8010024. Zheng, H., Hou, S., Liu, J., Xiong, Y., Wang, Y., 2024. Advanced machine learning and
Prasad, P., Loveson, V.J., Kotha, M., Yadav, R., 2020. Application of machine learning water quality index (WQI) assessment: evaluating groundwater quality at the
techniques in groundwater potential mapping along the west coast of India. GISci. Yopurga landfill. Water 16, 1666.
Remote Sens. 1–18. https://doi.org/10.1080/15481603. 2020.1794104. Zhu, X., Guo, H., Huang, J.J., Tian, S., Xu, W., Mai, Y., 2022. An ensemble machine
Richter, R.A., 1996. Spatially adaptive fast atmospheric correction algorithm. Int. J. learning model for water quality estimation in coastal area based on remote sensing
Remote Sens. 17, 1201–1214 (CrossRef). imagery. J. Environ. Manag. 323, 116187. https://doi.org/10.1016/j.
Roy, R., Majumder, M., 2019. Assessment of water quality trends in Loktak Lake, jenvman.2022.116187.
Manipur. India. Environ. Ear.Sci. 78, 383. https://doi.org/10.1007/s12665-019-
8383-0.
19