Adaptive k selection for KNN classification

Given an error matrix, identify the k that maximises the accuracy for cells belonging to a provided labelling/grouping. If no labelling given, expect a cell-cell similarity network to identify the k that maximises the accuracy for cells within that neighbourhood. If neither are given, simply treat all cells as if they have the same labelling/grouping

Usage

getAdaptiveK(E, labels = NULL, local = NULL, outputPerCell = TRUE, ...)

Arguments

E: An error matrix with rows corresponding to cells and columns corresponding to candidate k values, with values themselves corresponding to error values (either binary for single classification, or continuous after multiple classification).
labels: Group labels for cells.
local: A neighbourhood index representation, as typically output using BiocNeighbors::findKNN().
outputPerCell: Logical whether to return adaptive k for each cell, not just for each label type (used for when labels is given).
...: Includes return_colnames, whether to give the colnames of the best selected, or just the index, which is default TRUE.

Value

Vector of adaptive k values.

Examples

E <- matrix(runif(100), 20, 5)
colnames(E) <- paste0("K_", 1:5)

# generate cell labels
labels <- factor(rep(letters[1:2], each = 10))

# generate nearest neighbourhood index representation
data <- matrix(rpois(10 * 20, 10), 10, 20) # 10 genes, 20 cells
local <- BiocNeighbors::findKNN(t(data), k = 5, get.distance = FALSE)$index
#> Warning: detected tied distances to neighbors, see ?'BiocNeighbors-ties'

best_k_labels <- getAdaptiveK(E,
  labels = labels
)
best_k_local <- getAdaptiveK(E,
  local = local
)