Functional data provides information about curves varying over a continuum, such as time. This type of data is often present when analyzing measurements at various time points. Such curves usually are interdependent, which means that the measurement at a point \(t_{i + 1}\) usually depends on some measurements \({t_1, ..., t_i}; i \in \mathbb{N}\).

As traditional machine learning techniques usually do not emphasize the interdependence between features, they are often not well suited for such tasks, which can lead to poor performance. Functional data analysis on the other hand tries to address this by either using algorithms specifically tailored to functional data, or by transforming the functional covariates into a non time-dependent feature space. For a more in depth introduction to functional data analysis see e.g When the data are functions (Ramsay, J.O., 1982).

Each observation of a functional covariate in the data are evaluations of a functional, i.e. measurements of a scalar value at various time points. A single observation might then look like this:

How to model functional data?

There are two commonly used approaches for analyzing functional data.

  • Directly analyze the functional data using a learner that is suitable for functional data on a task. Those learners have the prefixes classif.fda and regr.fda.

For more info on learners see fda learners. For this purpose, the functional data has to be saved as a matrix column in the data.frame used for constructing the task. For more info on functional tasks consider the following section.

  • Transform the task into a format suitable for standard classification or regression learners.

This is done by extracting non-temporal/non-functional features from the curves. Non-temporal features do not have any interdependence between each other, similarly to features in traditional machine learning. This is explained in more detail below.

Creating a task that contains functional features

The first step is to get the data in the right format. [%mlr] expects a base::data.frame which consists of the functional features and the target variable as input. Functional data in contrast to numeric data have to be stored as a matrix column in the data.frame. After that a task that contains the data in a well-defined format is created. tasks come in different flavours, such as makeClassifTask() and makeRegrTask(), which can be used according to the class of the target variable.

In the following example, the data is first stored as matrix columns using the helper function makeFunctionalData() for the fuelSubset data from package FDboost.

The data is provided in the following structure:

  • heatan is the target variable, in this case a numeric value.
  • h2o is an additional scalar variable.
  • NIR and UVVIS are matrices containing the curve data. Each column corresponds to a single time point the data was sampled at. Each row indicates a single curve. NIR was measured at \(231\) time points, while UVVIS was measured at \(129\) time points.
  • nir.lambda and uvvis.lambda are numeric vectors of length \(231\) and \(129\) indicate the time points the data was measured at. Each entry corresponds to one column of NIR and UVVIS respectively. For now we ignore this additional information in mlr.

Our data already contains functional features as matrices in a list. In order to showcase how such a matrix can be created from arbitrary numeric columns, we transform the list into a data.frame with a set of numeric columns for each matrix. These columns refer to the matrix columns in the list, i.e UVVIS.1 is the first column of the UVVIS matrix.

Before constructing the task, the data is again reformated so it contains column matrices. This is done by providing a list fd.features, that identifies the functional covariates. All columns not mentioned in the list are kept as-is. In our case the column indices 3:136 correspond to the columns of the UVVIS matrix. Alternatively we could also specify the respective column names.

makeFunctionalData() returns a data.frame, where the functional features are contained as matrices.

Now with a data.frame containing the functionals as matrices, a task can be created:

Constructing a learner

For functional data, learners are constructed using makeLearner("classif.<R_method_name>") or makeLearner("regr.<R_method_name>") depending on the target variable.

Applying learners to a task works in two ways

Either use a learner suitable for functional data:

or use a standard learner:

In this case the temporal structure is disregarded, and the functional data treated as simple numeric features.

Alternatively, transform the functional data into a non-temporal/non-functional space by extracting features before training. In this case, a normal regression- or classification-learner can be applied.

This is explained in more detail in the feature extraction section below.

Train the learner

The resulting learner can now be trained on the task created in section Creating a task above.

Alternatively, learners that do not specifically treat functional covariates can be applied. In this case the temporal structure is completely disregarded, and all columns are treated as independent.

Feature extraction

In contrast to applying a learner that works on a task containing functional features, the task can be converted to a normal task. This works by transforming the functional features into a non-functional domain, e.g by extracting wavelets.

The currently supported preprocessing functions are: * discrete wavelet transform * fast Fourier transform * functional principal component analysis * multi-resolution feature extraction

In order to do this, we specify methods for each functional feature in the task in a list. In this case we simply want to extract the Fourier transform from each UVVIS functional and the Functional PCA Scores from each NIR functional. Variable names can be specified multiple times with different extractors. Additional arguments supplied to the extract functions are passed on.

Wavelets

In this example, discrete wavelet feature transformation is applied to the data using the function extractFDAWavelets. Discrete wavelet transform decomposes the functional into several wavelets. This essentially transforms the time signal to a time-scale representation, where every wavelet captures the data at a different resolution. We can specify which additional parameters (i.e. the filter (type of wavelet) and the boundar) in the pars argument. This function returns a regression task of type regr since the raw data contained temporal structure but the transformed data does not inherit temporal structure anymore. For more information on wavelets consider the documentation wavelets. A more comprehensive guide is for example given here.

Fourier transformation

Now, we use the Fourier feature transformation. The Fourier transform takes a functional and transforms it to a frequency domain by splitting the signal up into its different frequency components. A more detailed tutorial on Fourier transform can be found here. Either the amplitude or the phase of the complex Fourier coefficients can be used for analysis. This can be specified in the additional trafo.coeff argument: