Skip to contents

Weighted k-nearest neighbor with confidence estimate

Usage

weighted_knn_predict_with_conf(
  train_coords,
  train_labels,
  test_coords,
  k,
  epsilon = 0.1,
  conf_threshold = NULL,
  other_class = "Other",
  verbose = FALSE,
  use_weights = TRUE,
  max_neighbors = 500
)

Arguments

train_coords

Data frame of coordinates for labeled (training) samples. One row per sample, columns are features (typically UMAP V1, V2).

train_labels

Character/factor vector of labels for training samples.

test_coords

Data frame of coordinates for samples to classify (same columns/space as train_coords).

k

Integer; number of neighbors to consider.

epsilon

Numeric; small value added to distances before weighting (when use_weights = TRUE). Default: 0.1.

conf_threshold

Optional numeric; minimum confidence for assigning a class. If provided and confidence < threshold, sample is assigned other_class.

other_class

Name of the outgroup class to treat specially when separate_other = TRUE. Default: "Other".

verbose

Logical; print verbose info. Default: FALSE.

use_weights

Logical; inverse-distance weights (1 / (d + epsilon)). If FALSE, neighbors contribute equally. Default: TRUE.

max_neighbors

Integer; maximum neighbors to retrieve from the search index before trimming to k. Default: 500.

ignore_self

Logical; drop a zero-distance self-neighbor. Default: TRUE.

track_neighbors

Logical; append neighbor diagnostics to output. Default: TRUE.

separate_other

Logical; when TRUE, exclude neighbors labeled other_class from the main weighted vote and report their influence separately (as other_* columns). Default: TRUE.

Value

Data frame with rows = test samples and columns:

predicted_label

the predicted class

confidence

predicted class weight / total weight

If track_neighbors = TRUE, additional columns:

other_score

relative weight of outgroup vs predicted class

neighbor_id

comma-separated neighbor sample IDs

neighbor

comma-separated neighbor indices (in train order)

distance

comma-separated neighbor distances

label

comma-separated neighbor labels (in-group only if separate_other=TRUE)

vote_labels

comma-separated unique labels contributing to weights

weighted_votes

comma-separated weights per vote_labels

neighbors_other

count of outgroup neighbors closer than the farthest in-group neighbor

other_weighted_votes

sum of outgroup weights closer than the farthest in-group neighbor

total_w

sum of weights for in-group neighbors used

pred_w

weight supporting the predicted class