Adaptive k-Nearest Neighbour Classification using the StabMap joint embedding
Source:R/classifyEmbedding.R
classifyEmbedding.Rd
Performs adaptive k-nearest neighbour classification of discrete labels for a training set from a query set, leveraging the StabMap joint embedding. The training labels are defined in `labels`, with all other rows of the embedding treated as the testing set.
Arguments
- coords
A cells (rows) x dimensions data matrix, on which euclidean distances are to be calculated for KNN classification. Must have rownames. Typically, output from `stabMap()`.
- labels
A named character vector of labels for the training set.
- type
A character of the type of adaptive KNN classification to be used. Must be one of "adaptive_local", "adaptive_labels", "uniform_optimised", or "uniform_fixed". Default is "uniform_fixed".
- k_values
A numeric vector of potential k values. If type is "uniform_fixed", then the first value of k_values is used. Default is 5.
- error_measure
Is the error type to use for selection of the best k. Must be one of "simple_error" or "balanced_error". "simple_error" weights all cells equally. "balanced_error" weights error by `labels` factors. Only affects error type for type == "uniform_optimised".
- adaptive_nFold
Is the number of folds for adaptive selection cross-validation.
- adaptive_nRep
Is the number of repetitions of adaptive selection cross-validation.
- adaptive_local_nhood
Is the neighbourhood size for optimising locally.
- adaptive_local_smooth
Is the number of neighbours to use for smoothing locally.
- verbose
Logical whether to print repetition and fold number for adaptive selection cross-validation.
Value
Is a dataframe with rows the same as coords, and same rownames. Columns are: input_labels is the training labels that were provided in `labels` (NA is used as labels for the testing set), resubstituted_labels is predicted labels for all rows (including for the training data), predicted_labels is predicted labels for the testing set but true labels as provided in `labels` for the training set, k is the adaptive k value used for that each row of the training set.
Examples
set.seed(100)
# Simulate coordinates
coords <- matrix(rnorm(1000), 100, 10)
rownames(coords) <- paste0("cell_", seq_len(nrow(coords)))
# Define labels of the first 50 cells
labels <- rep(paste0("type_", letters[1:5]), 10)
names(labels) <- rownames(coords)[seq_along(labels)]
# Uniform fixed KNN classification
knn_out <- classifyEmbedding(
coords, labels,
type = "uniform_fixed", k_values = 5
)
table(knn_out$predicted_labels)
#>
#> type_a type_b type_c type_d type_e
#> 23 18 24 19 16
# Adaptive KNN classification using local error
knn_out <- classifyEmbedding(
coords, labels,
type = "adaptive_local",
k_values = 2:3,
adaptive_nFold = 5,
adaptive_nRep = 10
)
#> Rep 1 of 10
#> Fold 1 of 5
#> Fold 2 of 5
#> Fold 3 of 5
#> Fold 4 of 5
#> Fold 5 of 5
#> Rep 2 of 10
#> Fold 1 of 5
#> Fold 2 of 5
#> Fold 3 of 5
#> Fold 4 of 5
#> Fold 5 of 5
#> Rep 3 of 10
#> Fold 1 of 5
#> Fold 2 of 5
#> Fold 3 of 5
#> Fold 4 of 5
#> Fold 5 of 5
#> Rep 4 of 10
#> Fold 1 of 5
#> Fold 2 of 5
#> Fold 3 of 5
#> Fold 4 of 5
#> Fold 5 of 5
#> Rep 5 of 10
#> Fold 1 of 5
#> Fold 2 of 5
#> Fold 3 of 5
#> Fold 4 of 5
#> Fold 5 of 5
#> Rep 6 of 10
#> Fold 1 of 5
#> Fold 2 of 5
#> Fold 3 of 5
#> Fold 4 of 5
#> Fold 5 of 5
#> Rep 7 of 10
#> Fold 1 of 5
#> Fold 2 of 5
#> Fold 3 of 5
#> Fold 4 of 5
#> Fold 5 of 5
#> Rep 8 of 10
#> Fold 1 of 5
#> Fold 2 of 5
#> Fold 3 of 5
#> Fold 4 of 5
#> Fold 5 of 5
#> Rep 9 of 10
#> Fold 1 of 5
#> Fold 2 of 5
#> Fold 3 of 5
#> Fold 4 of 5
#> Fold 5 of 5
#> Rep 10 of 10
#> Fold 1 of 5
#> Fold 2 of 5
#> Fold 3 of 5
#> Fold 4 of 5
#> Fold 5 of 5
#> Warning: 'k' capped at the number of observations
table(knn_out$predicted_labels)
#>
#> type_a type_b type_c type_d type_e
#> 24 14 23 18 21
knn_out <- classifyEmbedding(
coords, labels,
type = "adaptive_labels",
k_values = 2:3,
adaptive_nFold = 5,
adaptive_nRep = 10
)
#> Rep 1 of 10
#> Fold 1 of 5
#> Fold 2 of 5
#> Fold 3 of 5
#> Fold 4 of 5
#> Fold 5 of 5
#> Rep 2 of 10
#> Fold 1 of 5
#> Fold 2 of 5
#> Fold 3 of 5
#> Fold 4 of 5
#> Fold 5 of 5
#> Rep 3 of 10
#> Fold 1 of 5
#> Fold 2 of 5
#> Fold 3 of 5
#> Fold 4 of 5
#> Fold 5 of 5
#> Rep 4 of 10
#> Fold 1 of 5
#> Fold 2 of 5
#> Fold 3 of 5
#> Fold 4 of 5
#> Fold 5 of 5
#> Rep 5 of 10
#> Fold 1 of 5
#> Fold 2 of 5
#> Fold 3 of 5
#> Fold 4 of 5
#> Fold 5 of 5
#> Rep 6 of 10
#> Fold 1 of 5
#> Fold 2 of 5
#> Fold 3 of 5
#> Fold 4 of 5
#> Fold 5 of 5
#> Rep 7 of 10
#> Fold 1 of 5
#> Fold 2 of 5
#> Fold 3 of 5
#> Fold 4 of 5
#> Fold 5 of 5
#> Rep 8 of 10
#> Fold 1 of 5
#> Fold 2 of 5
#> Fold 3 of 5
#> Fold 4 of 5
#> Fold 5 of 5
#> Rep 9 of 10
#> Fold 1 of 5
#> Fold 2 of 5
#> Fold 3 of 5
#> Fold 4 of 5
#> Fold 5 of 5
#> Rep 10 of 10
#> Fold 1 of 5
#> Fold 2 of 5
#> Fold 3 of 5
#> Fold 4 of 5
#> Fold 5 of 5
table(knn_out$predicted_labels)
#>
#> type_a type_b type_c type_d type_e
#> 23 16 22 19 20
# Adaptive KNN classification using uniform optimised with balanced error
knn_out <- classifyEmbedding(
coords, labels,
type = "uniform_optimised",
k_values = 2:3,
adaptive_nFold = 3,
adaptive_nRep = 10,
error_measure = "balanced_error"
)
#> Rep 1 of 10
#> Fold 1 of 3
#> Fold 2 of 3
#> Fold 3 of 3
#> Rep 2 of 10
#> Fold 1 of 3
#> Fold 2 of 3
#> Fold 3 of 3
#> Rep 3 of 10
#> Fold 1 of 3
#> Fold 2 of 3
#> Fold 3 of 3
#> Rep 4 of 10
#> Fold 1 of 3
#> Fold 2 of 3
#> Fold 3 of 3
#> Rep 5 of 10
#> Fold 1 of 3
#> Fold 2 of 3
#> Fold 3 of 3
#> Rep 6 of 10
#> Fold 1 of 3
#> Fold 2 of 3
#> Fold 3 of 3
#> Rep 7 of 10
#> Fold 1 of 3
#> Fold 2 of 3
#> Fold 3 of 3
#> Rep 8 of 10
#> Fold 1 of 3
#> Fold 2 of 3
#> Fold 3 of 3
#> Rep 9 of 10
#> Fold 1 of 3
#> Fold 2 of 3
#> Fold 3 of 3
#> Rep 10 of 10
#> Fold 1 of 3
#> Fold 2 of 3
#> Fold 3 of 3
table(knn_out$predicted_labels)
#>
#> type_a type_b type_c type_d type_e
#> 23 17 22 19 19