Adaptive k-Nearest Neighbour Classification using the StabMap joint embedding

Performs adaptive k-nearest neighbour classification of discrete labels for a training set from a query set, leveraging the StabMap joint embedding. The training labels are defined in `labels`, with all other rows of the embedding treated as the testing set.

Usage

classifyEmbedding(
  coords,
  labels,
  type = c("uniform_fixed", "adaptive_labels", "adaptive_local", "uniform_optimised"),
  k_values = 5,
  error_measure = c("simple_error", "balanced_error"),
  adaptive_nFold = 2,
  adaptive_nRep = 5,
  adaptive_local_nhood = 100,
  adaptive_local_smooth = 10,
  verbose = TRUE
)

Arguments

coords: A cells (rows) x dimensions data matrix, on which euclidean distances are to be calculated for KNN classification. Must have rownames. Typically, output from `stabMap()`.
labels: A named character vector of labels for the training set.
type: A character of the type of adaptive KNN classification to be used. Must be one of "adaptive_local", "adaptive_labels", "uniform_optimised", or "uniform_fixed". Default is "uniform_fixed".
k_values: A numeric vector of potential k values. If type is "uniform_fixed", then the first value of k_values is used. Default is 5.
error_measure: Is the error type to use for selection of the best k. Must be one of "simple_error" or "balanced_error". "simple_error" weights all cells equally. "balanced_error" weights error by `labels` factors. Only affects error type for type == "uniform_optimised".
adaptive_nFold: Is the number of folds for adaptive selection cross-validation.
adaptive_nRep: Is the number of repetitions of adaptive selection cross-validation.
adaptive_local_nhood: Is the neighbourhood size for optimising locally.
adaptive_local_smooth: Is the number of neighbours to use for smoothing locally.
verbose: Logical whether to print repetition and fold number for adaptive selection cross-validation.

Value

Is a dataframe with rows the same as coords, and same rownames. Columns are: input_labels is the training labels that were provided in `labels` (NA is used as labels for the testing set), resubstituted_labels is predicted labels for all rows (including for the training data), predicted_labels is predicted labels for the testing set but true labels as provided in `labels` for the training set, k is the adaptive k value used for that each row of the training set.

Examples

set.seed(100)
# Simulate coordinates
coords <- matrix(rnorm(1000), 100, 10)
rownames(coords) <- paste0("cell_", seq_len(nrow(coords)))

# Define labels of the first 50 cells
labels <- rep(paste0("type_", letters[1:5]), 10)
names(labels) <- rownames(coords)[seq_along(labels)]

# Uniform fixed KNN classification
knn_out <- classifyEmbedding(
  coords, labels,
  type = "uniform_fixed", k_values = 5
)
table(knn_out$predicted_labels)
#> 
#> type_a type_b type_c type_d type_e 
#>     23     18     24     19     16 

# Adaptive KNN classification using local error
knn_out <- classifyEmbedding(
coords, labels,
type = "adaptive_local",
k_values = 2:3,
adaptive_nFold = 5,
adaptive_nRep = 10
)
#> Rep 1 of 10
#> Fold 1 of 5
#> Fold 2 of 5
#> Fold 3 of 5
#> Fold 4 of 5
#> Fold 5 of 5
#> Rep 2 of 10
#> Fold 1 of 5
#> Fold 2 of 5
#> Fold 3 of 5
#> Fold 4 of 5
#> Fold 5 of 5
#> Rep 3 of 10
#> Fold 1 of 5
#> Fold 2 of 5
#> Fold 3 of 5
#> Fold 4 of 5
#> Fold 5 of 5
#> Rep 4 of 10
#> Fold 1 of 5
#> Fold 2 of 5
#> Fold 3 of 5
#> Fold 4 of 5
#> Fold 5 of 5
#> Rep 5 of 10
#> Fold 1 of 5
#> Fold 2 of 5
#> Fold 3 of 5
#> Fold 4 of 5
#> Fold 5 of 5
#> Rep 6 of 10
#> Fold 1 of 5
#> Fold 2 of 5
#> Fold 3 of 5
#> Fold 4 of 5
#> Fold 5 of 5
#> Rep 7 of 10
#> Fold 1 of 5
#> Fold 2 of 5
#> Fold 3 of 5
#> Fold 4 of 5
#> Fold 5 of 5
#> Rep 8 of 10
#> Fold 1 of 5
#> Fold 2 of 5
#> Fold 3 of 5
#> Fold 4 of 5
#> Fold 5 of 5
#> Rep 9 of 10
#> Fold 1 of 5
#> Fold 2 of 5
#> Fold 3 of 5
#> Fold 4 of 5
#> Fold 5 of 5
#> Rep 10 of 10
#> Fold 1 of 5
#> Fold 2 of 5
#> Fold 3 of 5
#> Fold 4 of 5
#> Fold 5 of 5
#> Warning: 'k' capped at the number of observations
table(knn_out$predicted_labels)
#> 
#> type_a type_b type_c type_d type_e 
#>     24     14     23     18     21 

knn_out <- classifyEmbedding(
  coords, labels,
  type = "adaptive_labels",
  k_values = 2:3,
  adaptive_nFold = 5,
  adaptive_nRep = 10
)
#> Rep 1 of 10
#> Fold 1 of 5
#> Fold 2 of 5
#> Fold 3 of 5
#> Fold 4 of 5
#> Fold 5 of 5
#> Rep 2 of 10
#> Fold 1 of 5
#> Fold 2 of 5
#> Fold 3 of 5
#> Fold 4 of 5
#> Fold 5 of 5
#> Rep 3 of 10
#> Fold 1 of 5
#> Fold 2 of 5
#> Fold 3 of 5
#> Fold 4 of 5
#> Fold 5 of 5
#> Rep 4 of 10
#> Fold 1 of 5
#> Fold 2 of 5
#> Fold 3 of 5
#> Fold 4 of 5
#> Fold 5 of 5
#> Rep 5 of 10
#> Fold 1 of 5
#> Fold 2 of 5
#> Fold 3 of 5
#> Fold 4 of 5
#> Fold 5 of 5
#> Rep 6 of 10
#> Fold 1 of 5
#> Fold 2 of 5
#> Fold 3 of 5
#> Fold 4 of 5
#> Fold 5 of 5
#> Rep 7 of 10
#> Fold 1 of 5
#> Fold 2 of 5
#> Fold 3 of 5
#> Fold 4 of 5
#> Fold 5 of 5
#> Rep 8 of 10
#> Fold 1 of 5
#> Fold 2 of 5
#> Fold 3 of 5
#> Fold 4 of 5
#> Fold 5 of 5
#> Rep 9 of 10
#> Fold 1 of 5
#> Fold 2 of 5
#> Fold 3 of 5
#> Fold 4 of 5
#> Fold 5 of 5
#> Rep 10 of 10
#> Fold 1 of 5
#> Fold 2 of 5
#> Fold 3 of 5
#> Fold 4 of 5
#> Fold 5 of 5
table(knn_out$predicted_labels)
#> 
#> type_a type_b type_c type_d type_e 
#>     23     16     22     19     20 

# Adaptive KNN classification using uniform optimised with balanced error
knn_out <- classifyEmbedding(
  coords, labels,
  type = "uniform_optimised",
  k_values = 2:3,
  adaptive_nFold = 3,
  adaptive_nRep = 10,
  error_measure = "balanced_error"
)
#> Rep 1 of 10
#> Fold 1 of 3
#> Fold 2 of 3
#> Fold 3 of 3
#> Rep 2 of 10
#> Fold 1 of 3
#> Fold 2 of 3
#> Fold 3 of 3
#> Rep 3 of 10
#> Fold 1 of 3
#> Fold 2 of 3
#> Fold 3 of 3
#> Rep 4 of 10
#> Fold 1 of 3
#> Fold 2 of 3
#> Fold 3 of 3
#> Rep 5 of 10
#> Fold 1 of 3
#> Fold 2 of 3
#> Fold 3 of 3
#> Rep 6 of 10
#> Fold 1 of 3
#> Fold 2 of 3
#> Fold 3 of 3
#> Rep 7 of 10
#> Fold 1 of 3
#> Fold 2 of 3
#> Fold 3 of 3
#> Rep 8 of 10
#> Fold 1 of 3
#> Fold 2 of 3
#> Fold 3 of 3
#> Rep 9 of 10
#> Fold 1 of 3
#> Fold 2 of 3
#> Fold 3 of 3
#> Rep 10 of 10
#> Fold 1 of 3
#> Fold 2 of 3
#> Fold 3 of 3
table(knn_out$predicted_labels)
#> 
#> type_a type_b type_c type_d type_e 
#>     23     17     22     19     19