# Contributing to the sits R package

We welcome all contributors to sits package! Please submit questions, bug reports, and requests in the [issues tracker](https://github.com/e-sensing/sits/issues). If you plan to contribute code, go ahead! Fork the repo and submit a pull request. A few notes:

-   This package is released with a [Contributor Code of Conduct](CODE_OF_CONDUCT.md). By participating in this project you agree to abide by its terms. 
-   If you have large changes, please open an issue first to discuss.
-   We will include contributors as authors in the DESCRIPTION file (with
their permission) for contributions that go beyond small typos in code or documentation.
-   This package generally uses the [rOpenSci packaging guidelines](https://github.com/ropensci/onboarding/blob/master/packaging_guide.md) for style and structure.
-   Documentation is generated by **roxygen2**. Please write documentation in code files and let it auto-generate documentation files.  
-  For more substantial contributions, consider adding a new section to one of the chapters of the SITS book (https://e-sensing.github.io/sitsbook/), which has been written in R markdown and whose source is available in the [sitsbook repository](https://github.com/e-sensing/sitsbook).
-  We aim for testing that has high coverage and is robust.  Include tests with any major contribution to code. 
- We particularly welcome additions in two areas: new STAC-based image repositories and new raster machine learning/deep learning algorithms. Please see more details below. 
   

## General structure of sits code.

New functions that build on the `sits` API should follow the general principles below.

### API design 

- The target audience for `sits` is the community of remote sensing experts with Earth Sciences background who want to use state-of-the-art data analysis methods with minimal investment in programming skills. The design of the `sits` API considers the typical workflow for land classification using satellite image time series and thus provides a clear and direct set of functions, which are easy to learn and master. 

- For this reason, we welcome contributors that provide useful additions to the existing API, such as new ML/DL classification algorithms. In case of a new API function, before making a pull request please raise an issue stating your rationale for a new function.

### R programming models 

- Most functions in `sits` use the S3 programming model with a strong emphasis on generic methods wich are specialized depending on the input data type. See for example the implementation of the `sits_bands()` function. 

- Please do not include contributed code using the S4 programming model. Doing so would break the structure and the logic of existing code. Convert your code from S4 to S3.

- Use generic functions as much as possible, as they improve modularity and maintenance. If your code has decision points using `if-else` clauses, such as `if A, do X; else do Y` consider using generic functions. 

- Functions that use the `torch` package use the R6 model to be compatible with that package. See for example, the code in `sits_tempcnn.R` and `api_torch.R`. To convert `pyTorch` code to R and include it is straightforward. Please see the [Technical Annex](https://e-sensing.github.io/sitsbook/technical-annex.html) of the sits on-line book.

### Adherence to the `tidyverse`, `sf` and `terra`

The sits `code` relies on the packages of the `tidyverse` to work with tables and list. We use `dplyr` and `tidyr` for data selection and wrangling, `purrr` and `slider` for loops on lists and table, `lubridate` to handle dates and times. 

### Adherence to the `sits` data types

- The `sits` package in built on top of three data types: time series tibble, data cubes and models. Most `sits` functions have one or more of these types as inputs and one of them as return values.

- The time series tibble contains data and metadata. The first six columns contain the metadata: spatial and temporal information, the label assigned to the sample, and the data cube from where the data has been extracted. The time_series column contains the time series data for each spatiotemporal location. All time series tibbles are objects of class `sits`. 

- The `cube` data type is designed to store metadata about image files. In principle, images which are part of a data cube share the same geographical region, have the same bands, and have been regularized to fit into a pre-defined temporal interval. Data cubes in `sits` are organized by tiles. A tile is an element of a satellite's mission reference system, for example MGRS for Sentinel-2 and WRS2 for Landsat. A `cube` is a tibble where each row contains information about data covering one tile. Each row of the cube tibble contains a column named `file_info`; this column contains a list that stores a tibble 

- The `cube` data type is specialised in `raster_cube` (ARD images), `vector_cube` (ARD cube with segmentation vectors). `probs_cube` (probabilities produced by classification algorithms on raster data), `probs_vector_cube`(probabilites generated by vector classification of segments),  `uncertainty_cube` (cubes with uncertainty information), and `class_cube` (labelled maps). See the code in `sits_plot.R` as an example of specialisation of `plot` to handle different classes of raster data. 

- All ML/DL models in `sits` which are the result of `sits_train` belong to the `ml_model` class. In addition, models are assigned a second class, which is unique to ML models (e.g, `rfor_model`, `svm_model`) and generic for all DL `torch` based models (`torch_model`). The class information is used for plotting models and for establishing if a model can run on GPUs. 

### Literal values, error messages and colors

- The internal `sits` code has no literal values, which are all stored in the YAML configuration file `./inst/extdata/config_internals.yml`. These values are accessible using the `.conf` function. For example, the value of the default size for leaflet objects (64 MB) is accessed using the command `.conf["leaflet_megabytes"]`. See the internal configuration file for a complete list.

- Error messages are also stored outside of the code in the YAML configuration file `./inst/extdata/config_messages.yml`. These values are accessible using the `.conf` function. For example, the error associated to an invalid NA value for an input parameter is accessible using th function `.conf("messages", ".check_na_parameter")`. 

- Color handling in `sits` is described in the Technical Annex section ["How colors work in sits"](https://e-sensing.github.io/sitsbook/technical-annex.html#how-colors-work-in-sits). The legends and colors available by default are described in the YAML file `./inst/extdata/config_colors.yml`. 

### Supporting new STAC-based image catalogues 

- If you want to include a STAC-based catalogue not yet supported by `sits`, we encourage you to look at existing implementations of catalogues such as Microsoft Planetary Computer (MPC), Digital Earth Africa (DEA) and AWS. 

- STAC-based catalogues in `sits` are associated to YAML description files, which are available in the directory `.inst/exdata/sources`. For example, the YAML file `config_source_mpc.yml` describes the contents of the MPC collections supported by `sits`. Please first provide an YAML file which lists the detailed contents of the new catalogue you wish to include. Follow the examples provided.

- After writing the YAML file, you need to consider how to access and query the new catalogue. The entry point for access to all catalogues is the `sits_cube.stac_cube()` function, which in turn calls a sequence of functions which are described in the generic interface `api_source.R`. Most calls of this API are handled by the functions of `api_source_stac.R` which provides an interface to the `rstac` package and handles STAC queries. 

- Each STAC catalogue is different. The STAC specification allows providers to implement their data descriptions with specific information. For this reason, the generic API described in `api_source.R` needs to be specialized for each provider. Whenever a provider needs specific implementations of parts of the STAC protocol, we include them in separate files. For example, `api_source_mpc.R` implements specific quirks of the MPC platform. Similarly, specific support for CDSE (Copernicus Data Space Environment) is available in `api_source_cdse.R`. 

### Supporting new Machine Learning and Deep Learning algorithms

- In general terms, ML/DL algorithms in `sits` are encapsulated as closures which are the output of the `sits_train()` function. In line with the established practices in **R**, each closure  contains a function that classifies input values, as well as information on the samples used to train the model.

- Please read the [Technical Annex](https://e-sensing.github.io/sitsbook/technical-annex.html#including-new-methods-for-machine-learning) to the `sits` book. It describes how include a new ML method, in this case the `lightGBM` algorithm. Follow those guidelines to include a new ML algorithm. 

- If you aim to include a `torch` based deep learning method, in addition to understanding the concepts presented in the Technical Annex please study carefully the implementation of `sits_tempcnn()` and `sits_lighttae()`. 

- Bear in mind that your only task is to provide a new function that is compatible with the requirements of ML/DL methods in `sits`. Once the function has been correctly implemented, you will be able to use in connection with the rest of `sits`. 

## Roadmap

- The roadmap for `sits` is included as part of the [issues tracker](https://github.com/e-sensing/sits/issues). Issues created by the developers are assigned to milestones. Each milestone corresponds to an expected new version of  `sits` to be released in CRAN.
