Searches for the nearest neighbors of observations in a reference set or between two sets of observations.
Usage
search_neighbors(Xr, Xu = NULL,
diss_method = diss_pca(), Yr = NULL,
neighbors, spike = NULL,
return_dissimilarity = FALSE,
k, k_diss, k_range, pc_selection,
center, scale, documentation, ...
)Arguments
- Xr
A numeric matrix of reference observations (rows) and variables (columns) where the neighbor search is conducted.
- Xu
Optional matrix of observations for which neighbors are to be searched in
Xr.- diss_method
A dissimilarity method object created by one of:
diss_pca(): Mahalanobis distance in PCA spacediss_pls(): Mahalanobis distance in PLS spacediss_correlation(): Correlation-based dissimilaritydiss_euclidean(): Euclidean distancediss_mahalanobis(): Mahalanobis distancediss_cosine(): Cosine dissimilarity
Default is
diss_pca().- Yr
Optional response matrix. Required for PLS methods and when using
ncomp_by_opc().- neighbors
A neighbor selection object created by:
neighbors_k(): Select k nearest neighborsneighbors_diss(): Select neighbors by dissimilarity threshold
- spike
Optional integer vector indicating observations in
Xrto force into (positive indices) or exclude from (negative indices) neighborhoods.- return_dissimilarity
Logical indicating whether to return the dissimilarity matrix. Default is
FALSE.- k
Deprecated.
- k_diss
Deprecated.
- k_range
Deprecated.
- pc_selection
Deprecated.
- center
Deprecated.
- scale
Deprecated.
- documentation
Deprecated.
- ...
Additional arguments (currently unused).
Value
A list containing:
- neighbors
Matrix of
Xrindices for each query observation's neighbors, sorted by dissimilarity (columns = query observations).- neighbors_diss
Matrix of dissimilarity scores corresponding to
neighbors.- unique_neighbors
Vector of unique
Xrindices that appear in any neighborhood.- k_diss_info
If
neighbors_diss()was used, adata.framewith columns for observation index, number of neighbors found, and final number after applying bounds.- dissimilarity
If
return_dissimilarity = TRUE, the full dissimilarity object.- projection
If the dissimilarity method includes
return_projection = TRUE, the projection object.- gh
If the dissimilarity method includes
gh = TRUE, the GH distances.
Details
This function is useful for reducing large reference sets by identifying
only relevant neighbors before running mbl.
If Xu is not provided, the function searches for neighbors within
Xr itself (excluding self-matches). If Xu is provided,
neighbors of each observation in Xu are searched in Xr.
The spike argument allows forcing specific observations into or out
of all neighborhoods. Positive indices are always included; negative indices
are always excluded.
References
Ramirez-Lopez, L., Behrens, T., Schmidt, K., Stevens, A., Dematte, J.A.M., Scholten, T. 2013a. The spectrum-based learner: A new local approach for modeling soil vis-NIR spectra of complex data sets. Geoderma 195-196, 268-279.
Ramirez-Lopez, L., Behrens, T., Schmidt, K., Viscarra Rossel, R., Dematte, J.A.M., Scholten, T. 2013b. Distance and similarity-search metrics for use with soil vis-NIR spectra. Geoderma 199, 43-53.
Examples
# \donttest{
library(prospectr)
data(NIRsoil)
Xu <- NIRsoil$spc[!as.logical(NIRsoil$train), ]
Yu <- NIRsoil$CEC[!as.logical(NIRsoil$train)]
Yr <- NIRsoil$CEC[as.logical(NIRsoil$train)]
Xr <- NIRsoil$spc[as.logical(NIRsoil$train), ]
Xu <- Xu[!is.na(Yu), ]
Yu <- Yu[!is.na(Yu)]
Xr <- Xr[!is.na(Yr), ]
Yr <- Yr[!is.na(Yr)]
# Correlation-based neighbor search with k neighbors
ex1 <- search_neighbors(
Xr = Xr, Xu = Xu,
diss_method = diss_correlation(),
neighbors = neighbors_k(40)
)
# PCA-based with OPC selection
ex2 <- search_neighbors(
Xr = Xr, Xu = Xu,
diss_method = diss_pca(
ncomp = ncomp_by_opc(40),
scale = TRUE,
return_projection = TRUE
),
Yr = Yr,
neighbors = neighbors_k(50)
)
# Observations not in any neighborhood
setdiff(seq_len(nrow(Xr)), ex2$unique_neighbors)
#> [1] 3 16 20 23 26 27 42 44 59 61 86 96 103 111 123 149 231 267 279
#> [20] 298 310 311 326 328 330
# Dissimilarity threshold-based selection
ex3 <- search_neighbors(
Xr = Xr, Xu = Xu,
diss_method = diss_pls(
ncomp = ncomp_by_opc(40),
scale = TRUE
),
Yr = Yr,
neighbors = neighbors_diss(threshold = 0.5, k_min = 10, k_max = 100)
)
# }
