Variogram analysis on high dimensional data #198
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When working with machine learning + geospatial projects,, it is common to train a model based on multiple input data features, each representing different observed aspects. Collectively, these features form a high-dimensional data space that contributes to the model's predictions. Instead of analyzing each dimension individually, it can be useful to perform vargioram analysis on this high-dimensional data.
The current codebase can be easily extended to this high-dimension scenario by re-writing the semivariance calculation formula as:
$$\gamma(h)= \frac{1}{2N(h)}\sum_{i=1}^{N(h)}\lVert Z(x_i)-Z(x_i+h) \lVert_2^2 ,$$
where:
This is implemented in the code as:
$L_2$ norm of the differences between feature vectors.
diffs = np.sqrt(np.sum(diffs**2, axis=0))
which calculates the