Weighted k-nearest neighbor with confidence estimate

Usage

weighted_knn_predict_with_conf(
  train_coords,
  train_labels,
  test_coords,
  k,
  epsilon = 0.1,
  conf_threshold = NULL,
  other_class = "Other",
  verbose = FALSE,
  use_weights = TRUE,
  max_neighbors = 500
)

Arguments

train_coords: Data frame of coordinates for labeled (training) samples. One row per sample, columns are features (typically UMAP V1, V2).
train_labels: Character/factor vector of labels for training samples.
test_coords: Data frame of coordinates for samples to classify (same columns/space as train_coords).
k: Integer; number of neighbors to consider.
epsilon: Numeric; small value added to distances before weighting (when use_weights = TRUE). Default: 0.1.
conf_threshold: Optional numeric; minimum confidence for assigning a class. If provided and confidence < threshold, sample is assigned other_class.
other_class: Name of the outgroup class to treat specially when separate_other = TRUE. Default: "Other".
verbose: Logical; print verbose info. Default: FALSE.
use_weights: Logical; inverse-distance weights (1 / (d + epsilon)). If FALSE, neighbors contribute equally. Default: TRUE.
max_neighbors: Integer; maximum neighbors to retrieve from the search index before trimming to k. Default: 500.
ignore_self: Logical; drop a zero-distance self-neighbor. Default: TRUE.
track_neighbors: Logical; append neighbor diagnostics to output. Default: TRUE.
separate_other: Logical; when TRUE, exclude neighbors labeled other_class from the main weighted vote and report their influence separately (as other_* columns). Default: TRUE.

Value

Data frame with rows = test samples and columns:

predicted_label: the predicted class
confidence: predicted class weight / total weight

If track_neighbors = TRUE, additional columns:

other_score: relative weight of outgroup vs predicted class
neighbor_id: comma-separated neighbor sample IDs
neighbor: comma-separated neighbor indices (in train order)
distance: comma-separated neighbor distances
label: comma-separated neighbor labels (in-group only if separate_other=TRUE)
vote_labels: comma-separated unique labels contributing to weights
weighted_votes: comma-separated weights per vote_labels
neighbors_other: count of outgroup neighbors closer than the farthest in-group neighbor
other_weighted_votes: sum of outgroup weights closer than the farthest in-group neighbor
total_w: sum of weights for in-group neighbors used
pred_w: weight supporting the predicted class