Remotesensing 16 01224 v2
Remotesensing 16 01224 v2
Article
Evaluation and Selection of Multi-Spectral Indices to Classify
Vegetation Using Multivariate Functional Principal
Component Analysis
Simone Pesaresi 1, * , Adriano Mancini 2 , Giacomo Quattrini 1 and Simona Casavecchia 1
1 Department of Agricultural, Food, and Environmental Sciences, D3A, Università Politecnica delle Marche,
Via Brecce Bianche 12, 60131 Ancona, Italy; [email protected] (G.Q.); [email protected] (S.C.)
2 Department of Information Engineering, DII, Università Politecnica delle Marche, Via Brecce Bianche 12,
60131 Ancona, Italy; [email protected]
* Correspondence: [email protected]
Abstract: The identification, classification and mapping of different plant communities and habitats is
of fundamental importance for defining biodiversity monitoring and conservation strategies. Today,
the availability of high temporal, spatial and spectral data from remote sensing platforms provides
dense time series over different spectral bands. In the case of supervised mapping, time series based
on classical vegetation indices (e.g., NDVI, GNDVI, . . .) are usually input characteristics, but the
selection of the best index or set of indices (which guarantees the best performance) is still based
on human experience and is also influenced by the study area. In this work, several different time
series, based on Sentinel-2 images, were created exploring new combinations of bands that extend
the classic basic formulas as the normalized difference index. Multivariate Functional Principal
Component Analysis (MFPCA) was used to contemporarily decompose the multiple time series. The
principal multivariate seasonal spectral variations identified (MFPCA scores) were classified by using
a Random Forest (RF) model. The MFPCA and RF classifications were nested into a forward selection
strategy to identify the proper and minimum set of indices’ (dense) time series that produced the
most accurate supervised classification of plant communities and habitat. The results we obtained
Citation: Pesaresi, S.; Mancini, A.; can be summarized as follows: (i) the selection of the best set of time series is specific to the study
Quattrini, G.; Casavecchia, S.
area and the habitats involved; (ii) well-known and widely used indices such as the NDVI are not
Evaluation and Selection of
selected as the indices with the best performance; instead, time series based on original indices (in
Multi-Spectral Indices to Classify
terms of formula or combination of bands) or underused indices (such as those derivable with the
Vegetation Using Multivariate
visible bands) are selected; (iii) MFPCA efficiently reduces the dimensionality of the data (multiple
Functional Principal Component
Analysis. Remote Sens. 2024, 16, 1224.
dense time series) providing ecologically interpretable results representing an important tool for
https://doi.org/10.3390/rs16071224 habitat modelling outperforming conventional approaches that consider only discrete time series.
according to Durell et al. [19], usually use time series of individual spectral bands or
classic vegetation indices, such as the popular NDVI [20], consisting of a limited number of
scenes within a single year. However, these models rely on human experience and prior
knowledge of the best data acquisition time points and the most suitable set of indices to
capture habitats during their optimal phenological stages. Therefore, these models face
challenges in terms of transferability [21]. It is clear that recommending universal optimal
time points and indices for all habitats across diverse study areas with varying vegetation
and ecological characteristics is not feasible, despite the availability of indices tailored
for specific applications [22,23]. In this context it is necessary to develop adaptable and
transferable models that can autonomously select suitable indices and determine the ideal
times for data acquisition based on the specific vegetation and ecological characteristics of a
study area. A carefully selected set of area-specific indices offers significant advantages for
land management organisations in compliance with national and international guidelines,
such as the Habitats Directive [1,24,25]. These models should handle dense time series of
remotely sensed data. Such data, which, in a specific time window, provide a richer wealth
of information than multi-temporal data, are optimal for analysing seasonal changes in
vegetation and improving classification accuracy [26,27].
Recently, promising methods known as ‘Hybrid statistical-functional Machine
Learning’ [19], which combine machine learning with Functional Data Analysis (FDA) [28],
have been employed to classify and map vegetation and habitats in two Natura 2000 sites [29,30].
Exploring such hybrid models is essential because they are capable of efficiently analysing
dense time series of remote sensing data. The results are not only accurate but also facilitate
interpretations and provide support to phytosociologists and ecologists in understanding
the temporal spectral behaviour of plant associations (plant communities) [31–33]. The
efficiency of analysing dense time series by FDA lies in its fundamental philosophy, which
considers observed data functions as single entities, rather than merely as a sequence of
individual observations [34]. In practice, if the entire time series of a pixel is expressed
as a time function and considered as a single statistical unit, then a stack of remotely
sensed images (a cube with x, y and t axes) is considered as a single temporal archive [35],
essentially composed of as many functions as there are pixels in the area under test. The
pixel-based functions (times series) of remotely sensed data can be thought of as points (or
pixels) within a functional space [34]. The functional space can be univariate or multivari-
ate, depending on the number of metrics (band or indices) used to describe and track the
spectral variations within it (Figure 1).
Functional Principal Component Analysis (FPCA) is one of the most popular tech-
niques in FDA for reducing the amount of functional data [36,37]. FPCA adapts traditional
Principal Component Analysis (PCA) concepts to functions, allowing it to identify the
main modes of variation among observations (functions) within a univariate functional
space. It is evident that multivariate functional spaces are more natural and effective than
univariate ones when describing spectral variations in vegetation (Figure 1). This is because
seasonal patterns manifest differently across various spectral bands and vegetation indices,
depending on the phenological stages of vegetation [26]. Multivariate Functional Principal
Component Analysis (MFPCA) is well-suited for analysing multivariate functional spaces.
MFPCA decomposes the multivariate functional space into a set of orthogonal multivariate
functional principal components or modes of variation of functions (multivariate eigenfunc-
tions), together with corresponding functional principal component scores (FPC scores).
These FPC scores summarize the similarities between observations (functions), providing a
compact representation of the data (one score value per multivariate principal component
and per observation). In addition, these scores are uncorrelated by construction [38]. They
can then serve as a building block for further statistical analyses such as unsupervised
clustering, supervised classification methods or functional principal component regression
with multiple covariates [39].
Remote Sens. 2024, 16, x FOR PEER REVIEW 3 of 27
Figure 1. Spectral variations in remotely sensed images over time. (a) Finite discrete time series:
this panel shows a typical representation of remotely sensed data captured at discrete points in time
Figuredata).
(raw Each point
1. Spectral on the in
variations graph represents
remotely dataimages
sensed from a specific moment.
over time. (b,c) discrete
(a) Finite Spectral variations
time series: this
in pixels as functions of time (smoothed representation of variations).
panel shows a typical representation of remotely sensed data captured at discrete These two panels show howin time
points
individual
(raw pixelpoint
data). Each spectral
on characteristics evolve over
the graph represents datatime,
fromsimplifying trend observation.
a specific moment. In detail
(b,c) Spectral variations
(b) defines a univariate functional space that describe the spectral variations in pixels
in pixels as functions of time (smoothed representation of variations). These two panels show how characterized
by a singlepixel
individual bandspectral
or index,characteristics
such as NDVI.evolve
This helps
overustime,
to understand
simplifyinghowtrend
one specific aspect In
observation. of detail
vegetation changes over time while (c) shows spectral variations in pixels characterized by
(b) defines a univariate functional space that describe the spectral variations in pixels characterized multiple
bybands or indices,
a single band orsuch as NDVI,
index, such GNDVI
as NDVI. and NDWI,
This helpsdefining multivariate how
us to understand functional space (this
one specific aspect of
allows us to
vegetation study how
changes overdifferent aspects
time while of vegetation
(c) shows spectralchange together
variations in over time.
pixels characterized by multiple
bands or indices, such as NDVI, GNDVI and NDWI, defining multivariate functional space (this
In this study, we develop new hybrid models that combine machine learning with
allows us to study how different aspects of vegetation change together over time.
MFPCA. MFPCA, the best of our knowledge, has not been previously used for supervised
classification of habitats and vegetation. We believe that these models are valuable for
In this study, we develop new hybrid models that combine machine learning with
analysing multivariate satellite dense time series, simultaneously considering seasonal
MFPCA.
spectral MFPCA,
variationsthe best
from of our knowledge,
different has not been
bands or vegetation previously
indices, used for supervised
and for evaluating new
classification of habitats and vegetation. We believe that these models
vegetation indices through combinatorial calculations using different formulas are valuable
to identifyfor an-
alysing multivariate
distinctive features forsatellite dense To
classification. time series,
further simultaneously
improve considering
classification performanceseasonal
and cre- spec-
tral
atevariations
interpretablefrom different
models, bandsaor
we include vegetation
selection indices,
strategy andonly
to retain for relevant
evaluating new
index vegeta-
time
series and exclude unnecessary ones. Our study was conducted in two Natura
tion indices through combinatorial calculations using different formulas to identify dis- 2000 sites in
central features
tinctive Italy, characterized by different
for classification. Toenvironmental
further improveconditions and vegetation
classification types. We
performance and cre-
configured three distinct hybrid models by varying input data types and feature
ate interpretable models, we include a selection strategy to retain only relevant index time selection
strategies and compared the results.
series and exclude unnecessary ones. Our study was conducted in two Natura 2000 sites
The objectives of this study aim to address the following questions:
in central Italy, characterized by different environmental conditions and vegetation types.
1. Do supervised hybrid classification approaches based on FDA produce a higher accu-
We configured three distinct hybrid models by varying input data types and feature se-
racy compared to machine learning methods directly applied to raw multi-temporal
lection strategies and compared the results.
data in both test sites?
2. The objectives
Among of this study
the examined hybridaim to address
approaches, the following
is there questions:achieves the
one that consistently
1. Do supervised
highest accuracyhybrid
in bothclassification
test sites? approaches based on FDA produce a higher ac-
3. curacy
Amongcompared
the explored
to machine learningone
formulas, is there that consistently
methods produces
directly applied tothe
rawhighest
multi-tem-
accuracy
poral dataininboth
bothtest sites?
test sites?
2.4. Among
Can an the
appropriate
examinedset hybrid
of indices be identifiedisfor
approaches, eachone
there study site?
that consistently achieves the
This work
highest is structured
accuracy as follows:
in both in Section 2 we introduce the materials and methods,
test sites?
focusing on the study area and the
3. Among the explored formulas, is there‘hybrid statistical–functional–machine learning’ models
one that consistently produces the highest
to analyse and classify dense remotely sensed time series. In Section 3 we present the results
accuracy in both test sites?
of our methodology applied to two different case studies. In Section 4 we discuss the results
4. Can an appropriate set of indices be identified for each study site?
This work is structured as follows: in Section 2 we introduce the materials and meth-
ods, focusing on the study area and the ‘hybrid statistical–functional–machine learning’
Remote Sens. 2024, 16, 1224 models to analyse and classify dense remotely sensed time series. In Section 3 we present 4 of 26
the results of our methodology applied to two different case studies. In Section 4 we dis-
cuss the results and the impact of the developed approach, and in Section 5 we provide
and the impact
conclusions andofoutline
the developed approach, and in Section 5 we provide conclusions and
future work.
outline future work.
2. Materials and Methods
2. Materials and Methods
In this section we present two distinct approaches for classifying remotely sensed
data In
(seethis section
Figure we begin
2). We present
bytwo distinct
collecting approaches
Sentinel-2 for classifying
satellite time seriesremotely sensed
data, which can
data (see Figure 2). We begin by collecting Sentinel-2 satellite time series
be directly classified using Random Forest (first approach: ‘Pure Machine Learning’). data, which
Al-
can be directly classified using Random Forest (first approach: ‘Pure Machine
ternatively, spectral bands and indices created through combinatorial methods were Learning’).
Alternatively,
transformed into spectral bands functions
continuous and indices created
using through Additive
Generalized combinatorial methods
Models (GAM)were and
transformed into continuous functions using Generalized Additive
analysed with FDA (including FPCA and MFPCA). Random Forest can then be used Models (GAM) andto
analysed with FDA (including FPCA and MFPCA). Random Forest can then be used to
classify the FPCA-MFPCA scores (second approach: ‘Hybrid statistical-functional-Ma-
classify the FPCA-MFPCA scores (second approach: ‘Hybrid statistical-functional-Machine
chine Learning’). Further details are provided in the following sub-section. The developed
Learning’). Further details are provided in the following sub-section. The developed R code
R code is available in [40].
is available in [40].
Figure 2. Starting from a set of Sentinel-2 images, we trigger a processing pipeline that extracts the
Figure 2. Starting from a set of Sentinel-2 images, we trigger a processing pipeline that extracts the
most relevant
most relevant vegetation
vegetation indices
indices that
that could
could be
be used
used to
to characterize
characterize the
the study
study area.
area.
2.1. Study Area
2.1. Study Area
This study focuses on two distinct areas of central Italy, specifically in the Marche
region,This studyare
which focuses
part ofonthetwo distinct
Natura 2000areas of central
network Italy,
(Figure specifically
3). The first area inofthe Marche
interest is
region, which are
Mount Conero, part ofinthe
situated theNatura
coastal2000
area network
of central(Figure
Marche3).(43The
◦ 33first area
′ 00′′ N, 13of
◦ 36interest
′ 00′′ E). is
It
Mount Conero, situated in the coastal area of central Marche (43°33′00″N,
is a Special Area of Conservation (SAC) known as ‘Monte Conero’ (code IT5320007) and 13°36′00″E). It
is a Special Area of Conservation (SAC) known as ‘Monte Conero’ (code
covers an area of 650 hectares. Mount Conero has an elevation of 572 m above sea level, withIT5320007) and
covers an area
an average of 650
annual hectares. Mount
precipitation Conero
of 710 mm and has an elevation
a mean of 572 m above
annual temperature of 14.9 sea level,
◦ C. The
with an average annual precipitation of 710 mm and a mean annual temperature
second study area is the ‘Gola di Frasassi’ (code IT5320003), also referred to as the Frasassi of 14.9
°C. The located
Gorge, second study
in thearea is the ‘Golaregion
mountainous di Frasassi’ (codeMarche’s
of central IT5320003),Apennines (43◦ 23
also referred to′ as
23′′the
N,
12 ◦ ′
Frasassi ′′
57 36 Gorge,
E). This located
SAC spans in anthearea
mountainous
of 728 hectaresregion of central
and reaches Marche’s
an altitude Apennines
of 935 m above
sea level. The average annual precipitation in this area is 1115 mm, while the mean annual
temperature is 12.7 ◦ C. According to the bioclimatic classification of Rivas-Martinez [41],
both study areas belong to the temperate sub-Mediterranean macrobioclimate. The first
area is characterised by a strong sub-Mediterranean level with pronounced summer aridity,
(43°23′23″N, 12°57′36″E). This SAC spans an area of 728 hectares and reaches an altitude
of 935 m above sea level. The average annual precipitation in this area is 1115 mm, while
the mean annual temperature is 12.7 °C. According to the bioclimatic classification of Ri-
vas-Martinez [41], both study areas belong to the temperate sub-Mediterranean macrobi-
Remote Sens. 2024, 16, 1224 5 of 26
oclimate. The first area is characterised by a strong sub-Mediterranean level with pro-
nounced summer aridity, while the second area is characterised by a weak sub-Mediter-
ranean level
while the indicating
second lower summer
area is characterised by aaridity [42].
weak sub-Mediterranean level indicating lower
summer aridity [42].
Figure3.3.The
Figure Thetwo
twostudy
studyareas:
areas: (a)
(a)national
national and
and (b)
(b) regional
regionaloverview
overviewofofthethetwotwostudy areas;
study areas; S1 is
S1 is the Frasassi Gorge, and S2 is Mount Conero. (c) Panoramic image of the Frasassi
the Frasassi Gorge, and S2 is Mount Conero. (c) Panoramic image of the Frasassi Gorge area. (d) Gorge area.
(d) Panoramic
Panoramic imageimage of the
of the Mount
Mount Coneroarea.
Conero area.(e)
(e)Reference
Reference data
dataon
onthe
theDigital Elevation
Digital Elevation Model
Model with
with the boundary of the Frasassi Gorge Special Area of Conservation (SAC IT5320003).
the boundary of the Frasassi Gorge Special Area of Conservation (SAC IT5320003). (f) Reference (f) Reference
dataon
data onthe
theDigital
Digital Elevation
ElevationModel
Model with thethe
with boundary
boundaryof theofMount ConeroConero
the Mount area of interest.
area of interest.
Table 1. Reference data for the study areas. Target classes for the supervised classification are listed.
For plant associations, we report the syntaxa name and the corresponding habitat code (Annex 1 of
the European Union Habitats Directive). The * denotes a priority habitat.
The collected reference data, distributed over the two study areas are presented in Figure 3.
Table 2. List of formulas for different types of indices. We analyse formulas with 2–4 operands and
constraints on band order. We considered the following Sentinel-2 bands: B2, B3, B4, B5, B6, B7, B8*,
B11, B12; * corresponds to B8–NIR (832.8 nm). More info of Sentinel-2 bands could be found here [56].
Example
Figure4.4.Example
Figure of derived
of derived timetime series
series considering
considering meanmean
weekly weekly
annualannual Sentinel-2
Sentinel-2 GNDVI GNDVI
vari-
variations
ations (2017–2020)
(2017–2020) of theof theplots
172 172 of
plots
theof the Mount
Mount ConeroConero
study study
area. On area.
theOnleftthe
(a) left
the (a) the discrete
discrete mean
mean weekly
weekly time series,
time series, while
while on theon the (b)
right rightthe(b) the weekly
weekly functional
functional cycliccyclic
cubiccubic
splinespline representation
representation of
of the
the spectral
spectral plotplot variations.
variations. TheThe letters
letters at the
at the toptop correspond
correspond to to
thethe initials
initials ofof
thethemonths
monthsofofthe
theyear.
year.
3. Results
3.1. Models Performance and Comparison
The OA of the models is presented for both study areas, categorized into Pure Machine
Learning and Hybrid Machine Learning approaches. Within the Hybrid Machine Learning
category, the results are further detailed based on the different modelling strategies and
indices formula ids. See Table 3 and Figure 5 for a summary of the results.
In the Mount Conero area, the baseline B model achieved an OA of 81.8%. Among
the hybrid models, mF models exhibited an average OA of 84.3%, with the highest OA of
86% achieved using formula id #11 and the lowest at 81.6% with formula id #1. The M
models had an average OA of 78.6%, with the highest OA of 85.6% obtained with formula
id #18 and the lowest at 66.3% with formula id #8. The Ms models achieved an average OA
of 84.4%, with the highest OA of 87.2% linked to formula id #15 and the lowest at 77.9%
with formula id #4.
Remote Sens. 2024, 16, 1224 10 of 26
For the Frasassi Gorge area, the B model achieved an OA of 76.9%. Among the hybrid
models, the mF models showed an average OA of 80.9%, with the highest OA of 82.9%
achieved using formula id #3 and the lowest at 77.3% with formulas ids #0 and #1. The
M models had an average OA of 74.2%, with the highest OA of 82.3% using formula id
#7 and the lowest at 63.4% with formula id #17. Additionally, the Ms models obtained an
average OA of 83.1%, with the highest OA of 86.5% linked to formula id #15 and the lowest
at 81.1% with formula id #19.
Table 3. Comparison of model and formula performances in the two study areas based on Overall
Accuracy. B—baseline model (Pure Machine Learning approach). mF, M, Ms—RF models based on
Functional Data Analysis (Hybrid statistical—functional–Machine Learning approach). Formula id
represents the different formulas used to generate indices detailed in Table 2. CO—Mount Conero
area. VM—Frasassi Gorge area. In grey if the accuracy exceeds that of B. In bold, the best performance
for each distinct hybrid approach.
In both study areas, the Ms and mF models consistently outperformed the M and
B models, achieving a higher Overall Accuracy of 9.6 percentage points in the Frasassi
Gorge area, and 5.4 percentage points in the Mount Conero area (see Figure 5 and Table 3).
Furthermore, using indices (formula ids #1–#20 in Table 3) in the Ms and mF models
demonstrated superior performance compared to using individual bands (formula id #0 in
Table 3). In both study areas, the highest OA was achieved by the Ms models applied to
vegetation indices with formula id #15 (see Tables 2 and 3 for its definition).
area, and 5.4 percentage points in the Mount Conero area (see Figure 5 and Table 3). Fur-
thermore, using indices (formula ids #1–#20 in Table 3) in the Ms and mF models demon-
strated superior performance compared to using individual bands (formula id #0 in Table
Remote Sens. 2024, 16, 1224 3). In both study areas, the highest OA was achieved by the Ms models applied to vegeta-
11 of 26
tion indices with formula id #15 (see Tables 2 and 3 for its definition).
5. Comparison
Figure 5.
Figure ComparisonofofOverall Accuracy
Overall Accuracy(OA) among
(OA) different
among modelmodel
different strategies for the two
strategies study
for the two study
areas. The
areas. The dashed
dashedline
linerepresents thethe
represents OAOAachieved by the
achieved bybaseline B model
the baseline using ausing
B model Pure Machine
a Pure Machine
Learning approach.
Learning approach. M,M,mFmF andand
Ms are
Msthree
are hybrid model strategies
three hybrid combiningcombining
model strategies Random Forest with Forest
Random
Functional Data Analysis (Hybrid statistical–functional–Machine Learning approach).
with Functional Data Analysis (Hybrid statistical–functional–Machine Learning approach). (a) (a) Mount
Conero Conero
Mount area. (b)area.
Frasassi
(b)Gorge area.
Frasassi Gorge area.
Tables A2 and A3 offer a comprehensive overview of all models for both the Mount
Tables A2 and A3 offer a comprehensive overview of all models for both the Mount
Conero and Frasassi Gorge areas providing accuracy (OA and PA), and complexity metrics
Conero
(numberand Frasassipredictors,
of selected Gorge areas pr, providing
and the final accuracy
mtry of(OA and PCA
the RF). PA), ofand complexity
these tables met-
rics (number
(Figure of selected
6) allows for a visualpredictors, pr, and
representation the
that final mtry
facilitates of the
model RF). PCAbased
comparison of theseon tables
(Figure 6) allows(inter-
their multivariate for aandvisual representation
intra-group) that
variability. facilitates
Similar modelsmodel comparison
are close together, and based on
their multivariate
dissimilar models are(inter-
furtherand intra-group)
apart. variability.
The properties Similarare
of the models models are by
indicated close
blacktogether,
arrows.
and The B model
dissimilar models is represented
are further by a redThe
apart. triangle, while the
properties mF, models
of the M and Ms aremodels
indicated by
applied
black to different
arrows. The B formulas
model are represented in
is represented byspider
a redplots with while
triangle, distinctthe
colours.
mF, M Theand first
Ms mod-
principal component (PC1) axis, accounting for 49.5% and 43.8% of the
els applied to different formulas are represented in spider plots with distinct colours.total variation in the The
Mount Conero and Frasassi Gorge areas, respectively, indicates an increasing gradient of
first principal component (PC1) axis, accounting for 49.5% and 43.8% of the total variation
accuracy among the models. It clearly shows that the Ms and mF models outperform the B
in the
and MMount
models Conero
in both OA and (asFrasassi
shown inGorge
Table 3areas, respectively,
and Figure 5) and PA.indicates
The second anprincipal
increasing gra-
dient of accuracy
component among
(PC2) axis, the accounts
which models. forIt clearly
22.5% andshows thatofthe
17.0% theMs
totaland mF models
variation in the outper-
form the B and M models in both OA (as shown in Table 3 and
Mount Conero and Frasassi Gorge areas, respectively, is directly related to the increasingFigure 5) and PA. The
second
numberprincipal component
of predictors used as input(PC2) axis,
data (pr)which
and theaccounts
mtry value.for 22.5% and 17.0% of the total
PCA in
variation analysis revealsConero
the Mount that theandMs models
Frasassiare the most
Gorge parsimonious,
areas, respectively, achieving
is directlythe related
highest OA and PA accuracy while using the fewest predictors and
to the increasing number of predictors used as input data (pr) and the mtry value. mtry (Figure 6).
Tables S2 and S3 provide details from the forward selection procedure used by Ms
models. These tables outline the selected bands and indices that constitute the minimal
set needed to optimize model performance in each formula and study area. The number
of time series (bands or indices) selected ranged from 1 to 9 (1 to 7 for the Mount Conero
area and 2 to 9 for the Frasassi Gorge area). The most frequently involved bands in the
selected indices (in descending order) for the Frasassi Gorge area were B7, B5, B11, B4, B3,
B12, while band B8 was the least utilized. For the Mount Conero area, the most utilized
bands were B7, B6, B11, while bands B8 and B5 were less utilized.
Remote Sens. 2024, 16, x FOR PEER REVIEW 12 of 27
Remote Sens. 2024, 16, 1224 PCA analysis reveals that the Ms models are the most parsimonious, achieving 12
theof 26
highest OA and PA accuracy while using the fewest predictors and mtry (Figure 6).
Figure6.6.Principal
Figure PrincipalComponent
Component biplot
biplot relating
relatingproperties
propertiesofofaccuracy
accuracy and model
and modelcomplexity
complexity(black
(black
arrows)
arrows) toto the
the different
different supervised
supervised classificationmodels
classification models(B,(B,mF,
mF,M,M, Ms)
Ms) applied
applied to to
allall distinct
distinct for-
formulas.
mulas. (a) Mount Conero Area. PCA axis 1 accounts for 49.5% of the multivariate variation and axis
(a) Mount Conero Area. PCA axis 1 accounts for 49.5% of the multivariate variation and axis 2 for
2 for 22.5%. (b) Frasassi Gorge Area. PCA axis 1 accounts for 43.8% of the multivariate variation and
22.5%.
axis 2 (b) Frasassi
for 17.0%. GorgeOA–Overall
Labels: Area. PCAAccuracy;
axis 1 accounts for 43.8%
sd–standard of the pr–number
deviation; multivariate ofvariation and axis
input variables
2 selected;
for 17.0%. Labels: Random
mtry–final OA–Overall
ForestAccuracy; sd–standard
mtry parameter; deviation;
v1–v8 and pr–number
c1–c4 are of input of
Producer Accuracy variables
veg-
selected; mtry–final
etation types (listed Random
in Table 1)Forest mtry Gorge
for Frasassi parameter; v1–v8Conero
and Mount and c1–c4
areas,are Producer Accuracy of
respectively.
vegetation types (listed in Table 1) for Frasassi Gorge and Mount Conero areas, respectively.
Tables S2 and S3 provide details from the forward selection procedure used by Ms
3.2. Best Models
models. These tables outline the selected bands and indices that constitute the minimal
set needed
The Ms to optimize
models model
applied performance
to formula id #15in(see
each formula
Tables and
2 and 3) study
achievedarea.
theThe number
highest OA in
of time
both studyseries (bands
areas. Below,orwe
indices) selected
summarise theranged
accuracy from 1 toof
results 9 these
(1 to 7models
for theand
Mount Conero
compare them
area and 2 to 9 for the Frasassi Gorge area). The most frequently involved bands
to the B models by showing the error matrices (Tables 4 and 5). In the Supplementary Materials, in the
selectedgraphical
detailed indices (inrepresentations
descending order) for two
of the the Frasassi Gorge
Ms models arearea were B7,
provided B5, B11,S1B4,
(Figures B3,S2),
and
B12, while band B8 was the least utilized. For the Mount Conero area,
illustrating the selected time series and functional decomposition via MFPCA with the the most utilized
bands
most were B7, B6, B11,
discriminating while bands
components B8 andvariation)
(seasonal B5 were less
forutilized.
the different vegetation types.
3.2. Best
Table Models
4. Cross-validated confusion matrix (10-fold, repeated five times) for predicted target classes
in the Mount
The MsConero
modelsarea. The table
applied includesid
to formula Overall Accuracy,
#15 (see Tables Producer Accuracy, the
2 and 3) achieved Userhighest
Accuracy
OA in both
(expressed study areas.and
in percentage) Below,
the κwe summarise
statistic. the accuracy
The rows results
and columns of these
(c1–c4) models
represent and
the plant
compare them
associations to the B listed
and habitats models in by showing
Table the errormodel
1. B—baseline matrices (Tables
(Pure Machine4 and 5). In the
Learning Sup-
approach).
plementary
Ms-F15 Materials,
(Ms model with thedetailed
Formulagraphical representations
id #15) is the top-performingofmodel
the two Ms models
in terms areAccuracy
of Overall pro-
videdthe
among (Figures S1 and
RF models S2), on
based illustrating
Functional the selected
Data time(Hybrid
Analysis series and functional decomposi-
statistical–functional–Machine
tion viaapproach).
Learning MFPCA with Pred the most
stands for discriminating
prediction. components (seasonal variation) for the
different vegetation types.
B Ms-Formula id #15
Table 4. Cross-validated confusion matrix (10-fold, repeated five times)Reference
Reference for predicted target classes
in the Mount Conero area. The table includes Overall Accuracy, Producer Accuracy, User Accuracy
c1 c2 c3 c4 UA c1 c2 c3 c4 UA
(expressed in percentage) and the 𝜅 statistic. The rows and columns (c1–c4) represent the plant as-
c1 16.2 3.2 0.0 2.1 75.5 c1 39.2 3.7 3.1 3.4
sociations and habitats listed in Table 1. B—baseline model (Pure Machine Learning approach). Ms- 79.4
Pred model 4.0
F15 (Ms c2 36.2
with the 3.9
Formula 3.0
id #15) 76.9
is the Pred
c2
top-performing 1.3
model 16.9
in terms 0.0
of Overall 0.7 89.7
Accuracy
c3 0.0 0.3 3.5 0.0 91.2 c3 0.0 0.0 4.3 0.0 100.0
c4 0.9 0.8 0.0 25.8 93.8 c4 0.1 0.6 0.0 26.7 97.5
PA 76.8 89.3 47.7 83.7 PA 96.6 80.0 58.5 86.7
OA 81.79 (±9.50) OA 87.18 (±7.82)
K 0.72 (±0.14) K 0.80 (±0.11)
Remote Sens. 2024, 16, 1224 13 of 26
Table 5. Cross-validated confusion matrix (10-fold, repeated five times) for predicted target classes
in the Frasassi Gorge area. The table includes Overall Accuracy, Producer Accuracy, User Accuracy
(expressed in percentage) and the κ statistic. The rows and columns (v1–v8) represent the plant
associations and habitats listed in Table 1. B—baseline model (Pure Machine Learning approach).
Ms-F15 (Ms model with the Formula id #15) is the top-performing model in terms of Overall Accuracy
among the RF models based on Functional Data Analysis (Hybrid statistical–functional–Machine
Learning approach). Pred stands for prediction.
B
reference
v1 v2 v3 v4 v5 v6 v7 v8
v1 11.7 0 1.32 0.74 0 0 0 0 84.9
v2 0 5.87 1.49 0 1.07 0 0.17 0 68.3
v3 0.58 4.96 18.6 0.41 1.16 0 0 0 72.3
v4 1.4 0 0.33 11.7 0 0.83 0 0 82.0
pred
v5 0.17 0.74 0.17 0 2.07 0 0.25 0.41 54.3
v6 0 0 0 0 0.66 4.38 0 0.83 74.6
v7 0 0 0.41 0 0.33 0 5.37 0.25 84.4
v8 0.25 0 0.83 0 1.32 0.99 0.83 17.5 80.6
PA 82.9 50.7 80.4 91.0 31.3 70.7 81.3 92.2
OA 76.99 (±7.07)
K 0.72 (±0.08)
Ms-Formula id #15
reference
v1 v2 v3 v4 v5 v6 v7 v8 UA
v1 13.4 0.0 0.6 0.1 0.0 0.0 0.0 0.0 95.3
v2 0.0 6.8 0.9 0.3 0.6 0.0 0.0 0.0 78.8
v3 0.4 4.3 21.3 0.4 0.2 0.4 0.0 0.0 78.9
v4 0.2 0.0 0.3 12.0 0.0 0.0 0.0 0.0 95.4
pred
v5 0.0 0.2 0.0 0.0 3.8 0.0 0.4 0.0 85.2
v6 0.0 0.0 0.0 0.0 0.2 4.8 0.0 0.0 95.1
v7 0.0 0.0 0.0 0.0 0.1 0.4 5.9 0.4 86.6
v8 0.0 0.2 0.0 0.0 1.7 0.6 0.3 18.6 86.5
PA 95.3 58.6 92.1 93.5 57.5 77.3 88.8 97.8
OA 86.51 (±6.99)
K 0.83 (±0.08)
4. Discussion
4.1. Main Results
This study highlights the effectiveness of the ‘Hybrid statistical–functional–Machine
Learning’ approach, which combines RF with an FDA of dense multispectral time series.
The approach outperforms conventional methods that directly use RF on raw satellite
multi-temporal images. Dense time series, when properly analysed and compressed, offer
crucial information for characterizing seasonal spectral changes in vegetation, improving
classification accuracy [26,27]. Ms models, which were the most accurate in both study
areas, could be suitable tools with important practical implications for accurate classifi-
cation, mapping and monitoring of vegetation and habitats included in Annex I of the
92/43/EEC Directive. Indeed, these models not only effectively process dense time series
(increasingly accessible through web platforms like Google Earth Engine [76,77]) with
FDA, but also independently identify sets of indices specific to the study area (through
the forward selection strategy). The selection of location-specific indices plays a key role
in optimizing the land management [24,25]. Thus, these models are adept at capturing
vegetation and habitats during their optimal phenological stages without requiring prior
knowledge of the best times for data acquisition or the most appropriate index sets, thus
making them more transferable than conventional models [21]. In addition, the results
of these models are graphically interpretable, contributing to a better understanding of
critical seasonal multispectral variations among different plant communities and habitats
(Figures S1 and S2).
Furthermore, the Ms models allowed us to employ new vegetation indices derived
from a combinatorial approach and evaluate their effect on classification accuracy. The
results revealed two aspects of particular interest. In both study areas, the most accurate
models were the Ms models based on the formula id #15, an original index. In addition,
rarely used indices based only on visible spectral bands played a significant role, confirming
that classifications based only on known indices such as NDVI may not always be the most
effective choice for classification purposes [20,78] or for characterizing plant communities.
These results agree that specific plant communities and vegetation types have their own,
specific multispectral profiles [24,26,79].
analysed and offering richer information within a specific time window [32] than the
B models for the classification stage. Unlike B models, hybrid models can be called
‘image-independent’ [81]. In these models, it is the quality of the functional data, which
must adequately represent seasonal spectral variations in vegetation (e.g., Figure 4), that
significantly influences the accuracy of the classification, rather than the timing and quality
of the individual images used to create it. During the transformation of the raw data into a
functional data using the GAM approach, it is essential to perform pre-processing steps to
identify and remove outliers and reduce noise [83]. Another advantage over B models is
that, to create pixel-based functions, it is better to exploit as much information as possible
for each pixel. Thus, even images with only small areas without clouds or even one pixel
without clouds can be used. In other words, if a part of an image is covered by clouds, this
does not prevent the use of the part without clouds, whereas this is usually not the case for
B models. We can assert that, if using dense time series data is an ideal choice for analysing
seasonal variations in vegetation and achieving more accurate classifications [26,27,58],
then FDA serves as an ideal tool for compressing and analysing dense time series data.
Ms, mF and M models have different characteristics and levels of accuracy. The Ms
models are consistently better than the others in terms of Overall Accuracy for both study
areas (Figure 5, Table 3). The superior performance is particularly evident, especially
when applied to indices generated with formula id #15, in a more complex study area,
such as the Frasassi Gorge, which has a higher number of target classes (Table 3). These
models also performed better compared to previous studies. In the Mount Conero area,
they achieved an 87.2% accuracy, exceeding the 83.2% accuracy in [30], which used only
NDVI seasonal variation data. In the Frasassi Gorge area, these models achieved an 86.5%
accuracy, exceeding the 82.1% accuracy in [29], obtained with mF models based on six time
series of preselected indices (see Table 3). It is important to note that the Ms models are
parsimonious. They achieved such a high accuracy with the smallest number of predictors
and mtry (Figure 6, Tables A2 and A3), and this means that they can select a tailored and
mutually complementary set of indices that best align with area-specific characteristics
by capturing crucial seasonal multispectral variations. The key to this capability lies
in the incorporation of two wrapper methods within Ms models, operating at distinct
levels. Forward selection works on the entire index time series, while Recursive Feature
Elimination focuses on individual MFPCA components extracted from the progressively
selected time series. In summary, Ms models improve the characterization and distinction of
various plant communities and habitats, enabling more accurate and detailed classifications.
Their parsimonious nature makes them interpretable, contributing to a better understanding
of critical seasonal multispectral variation among different plant communities and habitats
(Figures S1 and S2). These hybrid models can complement species-based approaches
in plant community ecology [30,32,33,38,84]. Besides their strengths, Ms models have
some limitations. Indeed, forward selection does not guarantee the identification of the
best model since the final set of selected indices is highly dependent on the first index
chosen [85]. Moreover, they may require long computation times for evaluation, especially
when dealing with many time series, such as those generated by formula id #15 (126 time
series of indices). However, to improve the efficiency of these models and reduce the
number of models to be evaluated, a preliminary filtering method could be implemented in
future analyses. This method aims to identify and remove strongly correlated time series,
allowing Ms models to process a smaller and more focused set of candidate time series.
The mF models, in line with prior research [29], demonstrated their effectiveness by
achieving high accuracies. However, they also exhibited complexity and a lack of parsi-
mony due to the utilization of many predictors (see Figure 6, Tables A2 and A3). This
complexity arises from the limitation of multiple separate FPCAs in adequately addressing
joint variations among different time series, resulting in the extraction of numerous cor-
related and redundant components. This redundancy makes the interpretation of results
complicated [38]. Each vegetation plot has multiple scores associated with different univari-
ate FPCA analyses which cannot be synthesized into a single functional reduced-ordination
Remote Sens. 2024, 16, 1224 16 of 26
space [29]. Consequently, while effective, these models are not very efficient and do not
facilitate the understanding of crucial seasonal multispectral variation among different
plant communities and habitats.
Finally, among the hybrid models, the M models proved to be less accurate. Their
accuracies were modest and highly variable, consistently lower than the mF and Ms models,
and often inferior to the B models as well (Table 3, Figures 5 and 6). The M models compress
all the time series of vegetation indices associated with a specific formula using a single
MFPCA, and the corresponding scores serve as input data for RF. It is likely that the
established number of components extracted (k = 36) proved inadequate and too low,
probably discarding useful seasonal variations for RF. To increase the accuracy of the model,
one solution would be to increase the number of MFPCA components. However, this
approach, as in mF models, hinders the identification of the minimum set of time series and
indices specific to the vegetation of the study area. This limitation prevents us from fully
capturing the crucial seasonal multispectral variations among different plant communities
and habitats. In contrast, this method is suitable when the time series and indices specific
to the study area are few and known.
5. Conclusions
In this paper we studied different approaches to supporting the classification of
vegetation. These models combine machine learning, using RF, with the application of FDA
to dense satellite time series. Our main goal was to improve the accuracy of vegetation
and habitat classification in two different study areas. We achieved this by comparing
the performance of these models to that of the most common classification methods,
which apply machine learning directly to raw multi-temporal satellite data. Furthermore,
we analysed the effect of different formulas for calculating vegetation indices, using a
combinatorial approach. The goal was to identify the best approach and formula that
consistently generated the best classification accuracies in both study areas. Now, analysing
the results based on the research questions formulated at the beginning of this work, we
derive the following conclusions:
1. The Hybrid supervised classification approaches based on FDA produce higher accu-
racy than common machine learning methods applied directly to raw multi-temporal
data in both test areas.
2. Among the hybrid approaches examined, the Ms models achieve the highest accuracy
in both test sites. These models effectively combine FDA, by exploiting MFPCA that
compresses multiple time series based on different vegetation indices, with the use
of RF. Using a forward selection strategy, we identified a limited set of indices that
Remote Sens. 2024, 16, 1224 18 of 26
Supplementary Materials: The following supporting information can be downloaded at: https://www.
mdpi.com/article/10.3390/rs16071224/s1.
Author Contributions: Conceptualization S.P., A.M., G.Q. and S.C.; Data curation S.P. and A.M.;
Formal analysis, S.P. and A.M.; Investigation, S.P., G.Q. and S.C.; Methodology, S.P., A.M. and G.Q.;
Software, S.P. and A.M.; Supervision, S.P. and S.C.; Writing—original draft, S.P., A.M., G.Q. and
S.C.; Writing—review and editing, S.P., A.M., G.Q. and S.C. All authors have read and agreed to the
published version of the manuscript.
Funding: This research received no external funding.
Data Availability Statement: The R code for these models is available at https://github.com/
geobotany/habitatmapmfpca (accessed on 27 March 2024).
Acknowledgments: The authors want to thank the Lorenzo Deplano, Riccardo Forconi and Cristian
Colavito at the Department of Information Engineering (DII) of Università Politecnica delle Marche
for their support to optimize the R code.
Conflicts of Interest: The authors declare no conflicts of interest.
Appendix A
Table A1. Selection of Sentinel-2 Images: All images were employed to represent spectral seasonal
variations as pixel-based functions, which were then used for Hybrid Statistical-Functional-Machine
Learning models with RF Models based on Functional Data Analysis. The * and ** scenes from
2019 were used for the baseline model (Pure Machine Learning Approach) with Random Forest
directly applied to raw time series for the Mount Conero and Frasassi Gorge areas, respectively.
Num Date Doy Week Month Num Date Doy Week Month
1 21 April 2017 111 16 4 48 13 October 2018 286 41 10
2 1 May 2017 121 18 5 49 12 November 2018 316 46 11
3 31 May 2017 151 22 5 50 7 December 2018 341 49 12
4 20 June 2017 171 25 6 51 12 December 2018 346 50 12
5 10 July 2017 191 28 7 52 27 December 2018 361 52 12
Remote Sens. 2024, 16, 1224 19 of 26
Num Date Doy Week Month Num Date Doy Week Month
6 20 July 2017 201 29 7 53 31 January 2019 31 5 1
7 30 July 2017 211 31 7 54 26 January 2019 26 4 1
8 9 August 2017 221 32 8 55 5 February 2019 36 6 2
9 19 August 2017 231 33 8 56 15 February 2019 ** 46 7 2
10 29 August 2017 241 35 8 57 20 February 2019 * 51 8 2
11 18 September 2017 261 38 9 58 25 February 2019 56 8 2
12 8 October 2017 281 41 10 59 2 March 2019 ** 61 9 3
13 18 October 2017 291 42 10 60 12 March 2019 71 11 3
14 28 October 2017 301 43 10 61 17 March 2019 76 11 3
15 27 November 2017 331 48 11 62 22 March 2019 *, ** 81 12 3
16 7 December 2017 341 49 12 63 1 April 2019 ** 91 13 4
17 22 December 2017 356 51 12 64 16 April 2019 * 106 16 4
18 6 January 2018 6 1 1 65 31 May 2019 151 22 5
19 15 February 2018 46 7 2 66 5 June 2019 *, ** 156 23 6
20 6 April 2018 96 14 4 67 15 June 2019 166 24 6
21 16 April 2018 106 16 4 68 25 June 2019 176 26 6
22 21 April 2018 111 16 4 69 30 June 2019 * 181 26 6
23 26 April 2018 116 17 4 70 5 July 2019 186 27 7
24 11 May 2018 131 19 5 71 20 July 2019 * 201 29 7
25 16 May 2018 136 20 5 72 25 July 2019 ** 206 30 7
26 21 May 2018 141 21 5 73 30 July 2019 211 31 7
27 31 May 2018 151 22 5 74 4 August 2019 * 216 31 8
28 10 June 2018 161 23 6 75 9 August 2019 221 32 8
29 15 June 2018 166 24 6 76 14 August 2019 226 33 8
30 20 June 2018 171 25 6 77 19 August 2019 ** 231 33 8
31 30 June 2018 181 26 6 78 24 August 2019 236 34 8
32 10 July 2018 191 28 7 79 29 August 2019 * 241 35 8
33 15 July 2018 196 28 7 80 8 September 2019 251 36 9
34 20 July 2018 201 29 7 81 13 September 2019 256 37 9
35 25 July 2018 206 30 7 82 18 September 2019 ** 261 38 9
36 30 July 2018 211 31 7 83 8 October 2019 * 281 41 10
37 4 August 2018 216 31 8 84 23 October 2019 ** 296 43 10
38 9 August 2018 221 32 8 85 7 November 2019 311 45 11
39 19 August 2018 231 33 8 86 1 January 2020 1 1 1
40 24 August 2018 236 34 8 87 6 January 2020 6 1 1
41 29 August 2018 241 35 8 88 5 February 2020 36 6 2
42 3 September 2018 246 36 9 89 15 February 2020 46 7 2
43 8 September 2018 251 36 9 90 20 February 2020 51 8 2
44 18 September 2018 261 38 9 91 11 March 2020 71 11 3
45 23 September 2018 266 38 9 92 16 March 2020 76 11 3
46 28 September 2018 271 39 9 93 21 March 2020 81 12 3
47 3 October 2018 276 40 10
Table A2. List of models for the Mount Conero area, displaying their accuracy (OA—Overall Accuracy
and sd—standard deviation; for c1–c4 vegetation types Producer’s Accuracy was reported) and model
complexity (pr—number of input predictors and Random Forest’s mtry value for tree splits).
Table A3. List of models for the Frasassi Gorge area, displaying their accuracy (OA—Overall Accuracy
and sd—standard deviation; for v1–v8 vegetation types Producer’s Accuracy was reported) and
model complexity (pr—number of input predictors and Random Forest’s mtry value for tree splits).
References
1. The Habitats Directive. Council Directive 92/43/EEC of 21 May 1992 on the Conservation of Natural Habitats and of Wild Fauna
and Flora. Off. J. L 1992, 206, 7–50.
2. Evans, D. The Habitats of the European Union Habitats Directive. Biol. Environ. Proc. R. Irish Acad. 2006, 106B, 167–173. [CrossRef]
3. Corbane, C.; Lang, S.; Pipkins, K.; Alleaume, S.; Deshayes, M.; García Millán, V.E.; Strasser, T.; Vanden Borre, J.; Toon, S.; Michael, F.
Remote Sensing for Mapping Natural Habitats and Their Conservation Status—New Opportunities and Challenges. Int. J. Appl.
Earth Obs. Geoinf. 2015, 37, 7–16. [CrossRef]
4. Vanden Borre, J.; Paelinckx, D.; Mücher, C.A.; Kooistra, L.; Haest, B.; De Blust, G.; Schmidt, A.M. Integrating Remote Sensing in
Natura 2000 Habitat Monitoring: Prospects on the Way Forward. J. Nat. Conserv. 2011, 19, 116–125. [CrossRef]
5. Schmidt, T.; Schuster, C.; Kleinschmit, B.; Förster, M. Evaluating an Intra-Annual Time Series for Grassland Classification—How
Many Acquisitions and What Seasonal Origin Are Optimal? IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2014, 7, 3428–3439.
[CrossRef]
6. Rapinel, S.; Rozo, C.; Delbosc, P.; Bioret, F.; Bouzillé, J.B.; Hubert-Moy, L. Contribution of Free Satellite Time-Series Images to
Mapping Plant Communities in the Mediterranean Natura 2000 Site: The Example of Biguglia Pond in Corse (France). Mediterr.
Bot. 2020, 41, 181–191. [CrossRef]
7. Marzialetti, F.; Giulio, S.; Malavasi, M.; Sperandii, M.G.; Acosta, A.T.R.; Carranza, M.L. Capturing Coastal Dune Natural
Vegetation Types Using a Phenology-Based Mapping Approach: The Potential of Sentinel-2. Remote Sens. 2019, 11, 1506.
[CrossRef]
8. Bajocco, S.; Ferrara, C.; Alivernini, A.; Bascietto, M.; Ricotta, C. Remotely-Sensed Phenology of Italian Forests: Going beyond the
Species. Int. J. Appl. Earth Obs. Geoinf. 2019, 74, 314–321. [CrossRef]
9. Grignetti, A.; Salvatori, R.; Casacchia, R.; Manes, F. Mediterranean Vegetation Analysis by Multi-Temporal Satellite Sensor Data.
Int. J. Remote Sens. 1997, 18, 1307–1318. [CrossRef]
10. Marzialetti, F.; Di Febbraro, M.; Malavasi, M.; Giulio, S.; Acosta, A.T.R.; Carranza, M.L. Mapping Coastal Dune Landscape
through Spectral Rao’s Q Temporal Diversity. Remote Sens. 2020, 12, 2315. [CrossRef]
11. Sittaro, F.; Hutengs, C.; Semella, S.; Vohland, M. A Machine Learning Framework for the Classification of Natura 2000 Habitat
Types at Large Spatial Scales Using MODIS Surface Reflectance Data. Remote Sens. 2022, 14, 823. [CrossRef]
12. Mahmud, S.; Redowan, M.; Ahmed, R.; Khan, A.A.; Rahman, M.M. Phenology-Based Classification of Sentinel-2 Data to Detect
Coastal Mangroves. Geocarto Int. 2022, 37, 14335–14354. [CrossRef]
13. Raab, C.; Stroh, H.G.; Tonn, B.; Meißner, M.; Rohwer, N.; Balkenhol, N.; Isselstein, J. Mapping Semi-Natural Grassland
Communities Using Multi-Temporal RapidEye Remote Sensing Data. Int. J. Remote Sens. 2018, 39, 5638–5659. [CrossRef]
14. Hubert-Moy, L.; Fabre, E.; Rapinel, S. Contribution of SPOT-7 Multi-Temporal Imagery for Mapping Wetland Vegetation. Eur.
J. Remote Sens. 2020, 53, 201–210. [CrossRef]
15. Jarocińska, A.; Kopeć, D.; Niedzielko, J.; Wylazłowska, J.; Halladin-Dabrowska,
˛ A.; Charyton, J.; Piernik, A.; Kamiński, D.
The Utility of Airborne Hyperspectral and Satellite Multispectral Images in Identifying Natura 2000 Non-Forest Habitats for
Conservation Purposes. Sci. Rep. 2023, 13, 4549. [CrossRef] [PubMed]
16. Tarantino, C.; Forte, L.; Blonda, P.; Vicario, S.; Tomaselli, V.; Beierkuhnlein, C.; Adamo, M. Intra-Annual Sentinel-2 Time-Series
Supporting Grassland Habitat Discrimination. Remote Sens. 2021, 13, 277. [CrossRef]
17. Buck, O.; Millán, V.E.G.; Klink, A.; Pakzad, K. Using Information Layers for Mapping Grassland Habitat Distribution at Local to
Regional Scales. Int. J. Appl. Earth Obs. Geoinf. 2015, 37, 83–89. [CrossRef]
18. Rapinel, S.; Mony, C.; Lecoq, L.; Clément, B.; Thomas, A.; Hubert-Moy, L. Evaluation of Sentinel-2 Time-Series for Mapping
Floodplain Grassland Plant Communities. Remote Sens. Environ. 2019, 223, 115–129. [CrossRef]
19. Durell, L.; Scott, J.T.; Hering, A.S. Hybrid Forecasting for Functional Time Series of Dissolved Oxygen Profiles. Data Sci. Sci. 2023,
2, 2152401. [CrossRef]
Remote Sens. 2024, 16, 1224 23 of 26
20. Huang, S.; Tang, L.; Hupy, J.P.; Wang, Y.; Shao, G. A Commentary Review on the Use of Normalized Difference Vegetation Index
(NDVI) in the Era of Popular Remote Sensing. J. For. Res. 2021, 32, 1–6. [CrossRef]
21. Vanden Borre, J.; Spanhove, T.; Haest, B. Towards a Mature Age of Remote Sensing for Natura 2000 Habitat Conservation:
Poor Method Transferability as a Prime Obstacle. In The Roles of Remote Sensing in Nature Conservation; Springer International
Publishing: Cham, Switzerland, 2017; pp. 11–37.
22. Xue, J.; Su, B. Significant Remote Sensing Vegetation Indices: A Review of Developments and Applications. J. Sens. 2017, 2017,
1353691. [CrossRef]
23. Fatima, N.; Javed, A. Assessment of Land Use Land Cover Change Detection Using Geospatial Techniques in Southeast Rajasthan.
J. Geosci. Environ. Prot. 2021, 9, 299–319. [CrossRef]
24. Barrett, B.; Raab, C.; Cawkwell, F.; Green, S. Upland Vegetation Mapping Using Random Forests with Optical and Radar Satellite
Data. Remote Sens. Ecol. Conserv. 2016, 2, 212–231. [CrossRef] [PubMed]
25. Nagendra, H.; Lucas, R.; Honrado, J.P.; Jongman, R.H.G.; Tarantino, C.; Adamo, M.; Mairota, P. Remote Sensing for Conservation
Monitoring: Assessing Protected Areas, Habitat Extent, Habitat Condition, Species Diversity, and Threats. Ecol. Indic. 2013,
33, 45–59. [CrossRef]
26. Pasquarella, V.J.; Holden, C.E.; Kaufman, L.; Woodcock, C.E. From Imagery to Ecology: Leveraging Time Series of All Available
Landsat Observations to Map and Monitor Ecosystem State and Dynamics. Remote Sens. Ecol. Conserv. 2016, 2, 152–170. [CrossRef]
27. Gillanders, S.N.; Coops, N.C.; Wulder, M.A.; Gergel, S.E.; Nelson, T. Multitemporal Remote Sensing of Landscape Dynamics and
Pattern Change: Describing Natural and Anthropogenic Trends. Prog. Phys. Geogr. Earth Environ. 2008, 32, 503–528. [CrossRef]
28. Ramsay, J.O.; Silverman, B.W. Functional Data Analysis; Ramsay, R., Silverman, B., Eds.; Springer Series in Statistics; Springer:
New York, NY, USA, 2005; ISBN 978-0-387-40080-8.
29. Pesaresi, S.; Mancini, A.; Quattrini, G.; Casavecchia, S. Functional Analysis for Habitat Mapping in a Special Area of Conservation
Using Sentinel-2 Time-Series Data. Remote Sens. 2022, 14, 1179. [CrossRef]
30. Pesaresi, S.; Mancini, A.; Quattrini, G.; Casavecchia, S. Mapping Mediterranean Forest Plant Associations and Habitats with
Functional Principal Component Analysis Using Landsat 8 NDVI Time Series. Remote Sens. 2020, 12, 1132. [CrossRef]
31. Coviello, L.; Martini, F.M.; Cesaretti, L.; Pesaresi, S.; Solfanelli, F.; Mancini, A. Clustering of Remotely Sensed Time Series Using
Functional Principal Component Analysis to Monitor Crops. In Proceedings of the 2022 IEEE Workshop on Metrology for
Agriculture and Forestry (MetroAgriFor), Perugia, Italy, 3–5 November 2022; pp. 141–145.
32. Hurley, M.A.; Hebblewhite, M.; Gaillard, J.; Dray, S.; Taylor, K.A.; Smith, W.K.; Zager, P.; Bonenfant, C. Functional Analysis of
Normalized Difference Vegetation Index Curves Reveals Overwinter Mule Deer Survival Is Driven by Both Spring and Autumn
Phenology. Philos. Trans. R. Soc. Lond. B Biol. Sci. 2014, 369, 20130196. [CrossRef]
33. Pesaresi, S.; Mancini, A.; Casavecchia, S. Recognition and Characterization of Forest Plant Communities through Remote-Sensing
NDVI Time Series. Diversity 2020, 12, 313. [CrossRef]
34. Ramsay, J.O. When the Data Are Functions. Psychometrika 1982, 47, 379–396. [CrossRef]
35. Kennedy, R.E.; Andréfouët, S.; Cohen, W.B.; Gómez, C.; Griffiths, P.; Hais, M.; Healey, S.P.; Helmer, E.H.; Hostert, P.; Lyons, M.B.; et al.
Bringing an Ecological View of Change to Landsat-Based Remote Sensing. Front. Ecol. Environ. 2014, 12, 339–346. [CrossRef]
[PubMed]
36. Levitin, D.J.; Nuzzo, R.L.; Vines, B.; Ramsay, J.O. Introduction to Functional Data Analysis. Can. Psychol. 2007, 48, 135–155.
[CrossRef]
37. Ramsay, J.O.; Dalzell, C.J. Some Tools for Functional Data Analysis. J. R. Stat. Soc. Ser. B 1991, 53, 539–572. [CrossRef]
38. Happ, C.; Greven, S. Multivariate Functional Principal Component Analysis for Data Observed on Different (Dimensional)
Domains. J. Am. Stat. Assoc. 2018, 113, 649–659. [CrossRef]
39. Wang, J.-L.; Chiou, J.-M.; Müller, H.-G. Functional Data Analysis. Annu. Rev. Stat. Its Appl. 2016, 3, 257–295. [CrossRef]
40. Geobotanic Group at Università Politecnica delle Marche. Dataset and R Code Related to the Habitat Mapping with Functional
Hybrid Machine Learning. Available online: https://github.com/geobotany (accessed on 15 January 2024).
41. Rivas-Martínez, S.; Sáenz, S.R.; Penas, A. Worldwide Bioclimatic Classification System. Glob. Geobot. 2011, 1, 1–634.
42. Pesaresi, S.; Biondi, E.; Casavecchia, S. Bioclimates of Italy. J. Maps 2017, 13, 955–960. [CrossRef]
43. Biondi, E.; Casavecchia, S.; Gigante, D. Contribution to the Syntaxonomic Knowledge of the Quercus Ilex L. Woods of the Central
European Mediterranean Basin. Fitosociologia 2003, 40, 129–156.
44. Biondi, E.; Gubellini, L.; Pinzi, M.; Casavecchia, S. The Vascular Flora of Conero Regional Nature Park (Marche, Central Italy).
Flora Mediterr. 2012, 22, 67–167. [CrossRef]
45. Biondi, E. L’ostrya Carpinifolia Scop. Sul Litorale Delle Marche (Italia Centrale). Stud. Geobot. 1982, 2, 141–147.
46. Baiocco, M.; Casavecchia, S.; Biondi, E.; Pietracapina, A. Indagini Geobotaniche per Il Recupero Del Rimboschimento Del Monte
Conero (Italia Centrale). Doc. Phytosociol. 1996, 16, 387–425.
47. Blasi, C.; Di Pietro, R.; Filesi, L. Syntaxonomical Revision of Quercetalia Pubescenti-Petraeae in the Italian Peninsula. Fitosociologia
2004, 41, 87–164.
48. Blasi, C.; Feoli, E.; Avena, G.C. Due Nuove Associazioni Dei Quercetalia Pubescentis Dell’Appennino Centrale. Stud. Geobot. 1982,
2, 155–167.
Remote Sens. 2024, 16, 1224 24 of 26
49. Pedrotti, F.; Ballelli, S.; Biondi, E.; Cortini Pedrotti, C.; Orsomando, E. Resoconto Dell’escursione Della Società Italiana Di
Fitosociologia Nelle Marche Ed in Umbria (11–14 Giugno 1979). Not. Fitosociologico 1980, 16, 73–75.
50. Allegrezza, M.; Pesaresi, S.; Ballelli, S.; Tesei, G.; Ottaviani, C. Influences of Mature Pinus Nigra Plantations on the Floristic-
Vegetational Composition along an Altitudinal Gradient in the Central Apennines, Italy. iForest 2020, 13, 279–285. [CrossRef]
51. Biondi, E.; Casavecchia, S. Inquadramento Fitosociologico Della Vegetazione Arbustiva Di Un Settore Dell’Appennino Settentri-
onale. Fitosociologia 2002, 39, 65–73.
52. Biondi, E.; Allegrezza, M.; Zuccarello, V. Syntaxonomic Revision of the Apennine Grasslands Belonging to Brometalia Erecti, and
an Analysis of Their Relationships with the Xerophilous Vegetation of Rosmarinetea Officinalis (Italy). Phytocoenologia 2005, 35,
129–164. [CrossRef]
53. Allegrezza, M.; Biondi, E.; Ballelli, S.; Formica, E. La Vegetazione Dei Settori Rupestri Calcarei Dell’Italia Centrale. Fitosociologia
1997, 32, 91–120.
54. Ranghetti, L.; Boschetti, M.; Nutini, F.; Busetto, L. “Sen2r”: An R Toolbox for Automatically Downloading and Preprocessing
Sentinel-2 Satellite Data. Comput. Geosci. 2020, 139, 104473. [CrossRef]
55. Zeng, Y.; Hao, D.; Huete, A.; Dechant, B.; Berry, J.; Chen, J.M.; Joiner, J.; Frankenberg, C.; Bond-Lamberty, B.; Ryu, Y.; et al. Optical
Vegetation Indices for Monitoring Terrestrial Ecosystems Globally. Nat. Rev. Earth Environ. 2022, 3, 477–493. [CrossRef]
56. ESA. Sentinel-2 User Handbook. Available online: https://sentinel.esa.int/documents/247904/685211/sentinel-2_user_
handbook (accessed on 15 January 2024).
57. Fisher, J.I.; Mustard, J.F.; Vadeboncoeur, M.A. Green Leaf Phenology at Landsat Resolution: Scaling from the Field to the Satellite.
Remote Sens. Environ. 2006, 100, 265–279. [CrossRef]
58. Schuster, C.; Schmidt, T.; Conrad, C.; Kleinschmit, B.; Förster, M. Grassland habitat mapping by intra-annual time series
analysis—Comparison of RapidEye and TerraSAR-X satellite data. Int. J. Appl. Earth Obs. Geoinf. 2015, 34, 25–34. [CrossRef]
59. Lambert, J.; Drenou, C.; Denux, J.; Balent, G.; Cheret, V. Monitoring Forest Decline through Remote Sensing Time Series Analysis.
GISci. Remote Sens. 2013, 50, 437–457. [CrossRef]
60. Hyndman, R.; Athanasopoulos, G.; Bergmeir, C.; Caceres, G.; Chhay, L.; O’Hara-Wild, M.; Petropoulos, F.; Razbash, S.; Wang, E.;
Yasmeen, F. Forecast: Forecasting Functions for Time Series and Linear Models. R Package Version 8.6. Available online:
https://cran.r-project.org/package=forecast (accessed on 3 August 2020).
61. Hyndman, R.J.; Khandakar, Y. Automatic Time Series Forecasting: The Forecast Package for R. J. Stat. Softw. 2008, 27, 1–22.
[CrossRef]
62. Wood, S.N. Generalized Additive Models: An Introduction with R; Chapman and Hall/CRC: New York, NY, USA, 2017;
ISBN 9781315370279.
63. Younes, N.; Joyce, K.E.; Maier, S.W. All Models of Satellite-Derived Phenology Are Wrong, but Some Are Useful: A Case Study
from Northern Australia. Int. J. Appl. Earth Obs. Geoinf. 2021, 97, 102285. [CrossRef]
64. Di Salvo, F.; Ruggieri, M.; Plaia, A. Functional Principal Component Analysis for Multivariate Multidimensional Environmental
Data. Environ. Ecol. Stat. 2015, 22, 739–757. [CrossRef]
65. Dai, X.; Hadjipantelis, P.Z.; Han, K.; Ji, H. Fdapace: Functional Data Analysis and Empirical Dynamics. R Package Version 0.5.5.
Available online: https://cran.r-project.org/package=fdapace (accessed on 3 August 2020).
66. Happ-Kurz, C. MFPCA: Multivariate Functional Principal Component Analysis for Data Observed on Different Dimensional
Domains. R Package Version 1.3-6. Available online: https://cran.r-project.org/web/packages/MFPCA/index.html (accessed
on 22 March 2022).
67. Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [CrossRef]
68. Belgiu, M.; Drăguţ, L. Random Forest in Remote Sensing: A Review of Applications and Future Directions. ISPRS J. Photogramm.
Remote Sens. 2016, 114, 24–31. [CrossRef]
69. Evans, J.S.; Cushman, S.A. Gradient Modeling of Conifer Species Using Random Forests. Landsc. Ecol. 2009, 24, 673–683. [CrossRef]
70. Le Dez, M.; Robin, M.; Launeau, P. Contribution of Sentinel-2 Satellite Images for Habitat Mapping of the Natura 2000 Site
‘Estuaire de La Loire’ (France). Remote Sens. Appl. Soc. Environ. 2021, 24, 100637. [CrossRef]
71. Marcinkowska-Ochtyra, A.; Ochtyra, A.; Raczko, E.; Kopeć, D. Natura 2000 Grassland Habitats Mapping Based on Spectro-
Temporal Dimension of Sentinel-2 Images with Machine Learning. Remote Sens. 2023, 15, 1388. [CrossRef]
72. Wakulińska, M.; Marcinkowska-Ochtyra, A. Multi-Temporal Sentinel-2 Data in Classification of Mountain Vegetation. Remote
Sens. 2020, 12, 2696. [CrossRef]
73. Congalton, R.G. A Review of Assessing the Accuracy of Classifications of Remotely Sensed Data. Remote Sens. Environ. 1991,
37, 35–46. [CrossRef]
74. Cohen, J. A Coefficient of Agreement for Nominal Scales. Educ. Psychol. Meas. 1960, 20, 37–46. [CrossRef]
75. Kuhn, M. Building Predictive Models in R Using the Caret Package. J. Stat. Softw. 2008, 28, 1–26. [CrossRef]
76. Pham-Duc, B.; Nguyen, H.; Phan, H.; Tran-Anh, Q. Trends and Applications of Google Earth Engine in Remote Sensing and Earth
Science Research: A Bibliometric Analysis Using Scopus Database. Earth Sci. Inform. 2023, 16, 2355–2371. [CrossRef]
77. Gorelick, N.; Hancher, M.; Dixon, M.; Ilyushchenko, S.; Thau, D.; Moore, R. Google Earth Engine: Planetary-Scale Geospatial
Analysis for Everyone. Remote Sens. Environ. 2017, 202, 18–27. [CrossRef]
Remote Sens. 2024, 16, 1224 25 of 26
78. Pettorelli, N.; Vik, J.O.; Mysterud, A.; Gaillard, J.-M.; Tucker, C.J.; Stenseth, N.C. Using the Satellite-Derived NDVI to Assess
Ecological Responses to Environmental Change. Trends Ecol. Evol. 2005, 20, 503–510. [CrossRef]
79. Grabska, E.; Hostert, P.; Pflugmacher, D.; Ostapowicz, K. Forest Stand Species Mapping Using the Sentinel-2 Time Series. Remote
Sens. 2019, 11, 1197. [CrossRef]
80. Vrieling, A.; Meroni, M.; Darvishzadeh, R.; Skidmore, A.K.; Wang, T.; Zurita-Milla, R.; Oosterbeek, K.; O’Connor, B.; Paganini, M.
Vegetation Phenology from Sentinel-2 and Field Cameras for a Dutch Barrier Island. Remote Sens. Environ. 2018, 215, 517–529.
[CrossRef]
81. Pasquarella, V.J.; Holden, C.E.; Woodcock, C.E. Improved Mapping of Forest Type Using Spectral-Temporal Landsat Features.
Remote Sens. Environ. 2018, 210, 193–207. [CrossRef]
82. Alvera-Azcárate, A.; Sirjacobs, D.; Barth, A.; Beckers, J.-M. Outlier Detection in Satellite Data Using Spatial Coherence. Remote
Sens. Environ. 2012, 119, 84–91. [CrossRef]
83. Balestra, M.; Pierdicca, R.; Cesaretti, L.; Quattrini, G.; Mancini, A.; Galli, A.; Malinverni, E.S.; Casavecchia, S.; Pesaresi, S.
A comparison of pre-processing approaches for remotely sensed time series classification based on functional analysis. ISPRS
Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2023. [CrossRef]
84. Liu, C.; Ray, S.; Hooker, G.; Friedl, M. Functional Factor Analysis for Periodic Remote Sensing Data. Ann. Appl. Stat. 2012,
6, 601–624. [CrossRef]
85. Fassnacht, F.E.; Neumann, C.; Forster, M.; Buddenbaum, H.; Ghosh, A.; Clasen, A.; Joshi, P.K.; Koch, B. Comparison of Feature
Reduction Algorithms for Classifying Tree Species with Hyperspectral Data on Three Central European Test Sites. IEEE J. Sel. Top.
Appl. Earth Obs. Remote Sens. 2014, 7, 2547–2561. [CrossRef]
86. Saini, R.; Ghosh, S.K. Analyzing the Impact of Red-Edge Band on Land Use Land Cover Classification Using Multispectral
RapidEye Imagery and Machine Learning Techniques. J. Appl. Remote Sens. 2019, 13, 044511. [CrossRef]
87. Schuster, C.; Förster, M.; Kleinschmit, B. Testing the Red Edge Channel for Improving Land-Use Classifications Based on
High-Resolution Multi-Spectral Satellite Data. Int. J. Remote Sens. 2012, 33, 5583–5599. [CrossRef]
88. Immitzer, M.; Vuolo, F.; Atzberger, C. First Experience with Sentinel-2 Data for Crop and Tree Species Classifications in Central
Europe. Remote Sens. 2016, 8, 166. [CrossRef]
89. Meyer, G.E.; Neto, J.C. Verification of Color Vegetation Indices for Automated Crop Imaging Applications. Comput. Electron.
Agric. 2008, 63, 282–293. [CrossRef]
90. Alcaraz-Segura, D.; Cabello, J.; Paruelo, J. Baseline Characterization of Major Iberian Vegetation Types Based on the NDVI
Dynamics. Plant Ecol. 2009, 202, 13–29. [CrossRef]
91. Saini, R. Integrating Vegetation Indices and Spectral Features for Vegetation Mapping from Multispectral Satellite Imagery Using
AdaBoost and Random Forest Machine Learning Classifiers. Geomat. Environ. Eng. 2022, 17, 57–74. [CrossRef]
92. Illarionova, S.; Shadrin, D.; Trekin, A.; Ignatiev, V.; Oseledets, I. Generation of the NIR Spectral Band for Satellite Images with
Convolutional Neural Networks. Sensors 2021, 21, 5646. [CrossRef] [PubMed]
93. Chen, J.; Jo, P. A Simple Method for Reconstructing a High-Quality NDVI Time-Series Data Set Based on the Savitzky–Golay
Filter. Remote Sens. Environ. 2004, 91, 332–344. [CrossRef]
94. Li, S.; Xu, L.; Jing, Y.; Yin, H.; Li, X.; Guan, X. High-Quality Vegetation Index Product Generation: A Review of NDVI Time Series
Reconstruction Techniques. Int. J. Appl. Earth Obs. Geoinf. 2021, 105, 102640. [CrossRef]
95. Marcinkowska-Ochtyra, A.; Gryguc, K.; Ochtyra, A.; Kopeć, D.; Jarocińska, A.; Sławik, Ł. Multitemporal Hyperspectral Data
Fusion with Topographic Indices—Improving Classification of Natura 2000 Grassland Habitats. Remote Sens. 2019, 11, 2264.
[CrossRef]
96. Tuia, D.; Persello, C.; Bruzzone, L. Domain Adaptation for the Classification of Remote Sensing Data: An Overview of Recent
Advances. IEEE Geosci. Remote Sens. Mag. 2016, 4, 41–57. [CrossRef]
97. Piel, A.K.; Crunchant, A.; Knot, I.E.; Chalmers, C.; Fergus, P.; Mulero-Pázmány, M.; Wich, S.A. Noninvasive Technologies for
Primate Conservation in the 21st Century. Int. J. Primatol. 2022, 43, 133–167. [CrossRef]
98. Suir, G.; Saltus, C.; Sasser, C.; Harris, J.; Reif, M.; Diaz, R.; Giffin, G. Evaluating Drone Truthing as an Alternative to Ground Truthing:
An Example with Wetland Plant Identification; Engineer Research and Development Center (U.S.): Vicksburg, MS, USA, 2021.
99. Szantoi, Z.; Smith, S.E.; Strona, G.; Koh, L.P.; Wich, S.A.; Szantoi, Z.; Smith, S.E.; Strona, G.; Koh, L.P.; Serge, A. Mapping
Orangutan Habitat and Agricultural Areas Using Landsat OLI Imagery Augmented with Unmanned Aircraft System Aerial
Photography. Int. J. Remote Sens. 2017, 38, 2231–2245. [CrossRef]
100. Wich, S.A.; Koh, L.P. Conservation Drones: Mapping and Monitoring Biodiversity; Oxford University Press: Oxford, UK, 2018;
pp. 51–54.
101. Onishi, M.; Ise, T. Explainable Identification and Mapping of Trees Using UAV RGB Image and Deep Learning. Sci. Rep. 2021,
11, 903. [CrossRef]
102. Gigante, D.; Attorre, F.; Venanzoni, R.; Acosta, A.T.R.; Agrillo, E.; Aleffi, M.; Alessi, N.; Allegrezza, M.; Angelini, P.; Angiolini, C.; et al.
A Methodological Protocol for Annex I Habitats Monitoring: The Contribution of Vegetation Science. Plant Sociol. 2016, 53, 77–87.
[CrossRef]
Remote Sens. 2024, 16, 1224 26 of 26
103. Correll, M.D.; Hantson, W.; Hodgman, T.P.; Cline, B.B.; Elphick, C.S.; Gregory Shriver, W.; Tymkiw, E.L.; Olsen, B.J. Fine-Scale
Mapping of Coastal Plant Communities in the Northeastern USA. Wetlands 2019, 39, 17–28. [CrossRef]
104. Epifanio, I.; Ventura-Campos, N. Hippocampal Shape Analysis in Alzheimer’s Disease Using Functional Data Analysis. Stat.
Med. 2014, 33, 867–880. [CrossRef] [PubMed]
105. Ramsay, J.O.; Silverman, B.W. Applied Functional Data Analysis: Methods and Case Studies; Ramsay, J.O., Silverman, B.W., Eds.;
Springer Series in Statistics; Springer: New York, NY, USA, 2002; Volume 45, ISBN 978-0-387-95414-1.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.