Weighted k-nearest neighbor with confidence estimate
weighted_knn_predict_with_conf.RdWeighted k-nearest neighbor with confidence estimate
Usage
weighted_knn_predict_with_conf(
train_coords,
train_labels,
test_coords,
k,
epsilon = 0.1,
conf_threshold = NULL,
other_class = "Other",
verbose = FALSE,
use_weights = TRUE,
max_neighbors = 500
)Arguments
- train_coords
Data frame of coordinates for labeled (training) samples. One row per sample, columns are features (typically UMAP V1, V2).
- train_labels
Character/factor vector of labels for training samples.
- test_coords
Data frame of coordinates for samples to classify (same columns/space as train_coords).
- k
Integer; number of neighbors to consider.
- epsilon
Numeric; small value added to distances before weighting (when use_weights = TRUE). Default: 0.1.
- conf_threshold
Optional numeric; minimum confidence for assigning a class. If provided and confidence < threshold, sample is assigned
other_class.- other_class
Name of the outgroup class to treat specially when
separate_other = TRUE. Default: "Other".- verbose
Logical; print verbose info. Default: FALSE.
- use_weights
Logical; inverse-distance weights (1 / (d + epsilon)). If FALSE, neighbors contribute equally. Default: TRUE.
- max_neighbors
Integer; maximum neighbors to retrieve from the search index before trimming to k. Default: 500.
- ignore_self
Logical; drop a zero-distance self-neighbor. Default: TRUE.
- track_neighbors
Logical; append neighbor diagnostics to output. Default: TRUE.
- separate_other
Logical; when TRUE, exclude neighbors labeled
other_classfrom the main weighted vote and report their influence separately (asother_*columns). Default: TRUE.
Value
Data frame with rows = test samples and columns:
- predicted_label
the predicted class
- confidence
predicted class weight / total weight
If track_neighbors = TRUE, additional columns:
- other_score
relative weight of outgroup vs predicted class
- neighbor_id
comma-separated neighbor sample IDs
- neighbor
comma-separated neighbor indices (in train order)
- distance
comma-separated neighbor distances
- label
comma-separated neighbor labels (in-group only if separate_other=TRUE)
- vote_labels
comma-separated unique labels contributing to weights
- weighted_votes
comma-separated weights per
vote_labels- neighbors_other
count of outgroup neighbors closer than the farthest in-group neighbor
- other_weighted_votes
sum of outgroup weights closer than the farthest in-group neighbor
- total_w
sum of weights for in-group neighbors used
- pred_w
weight supporting the predicted class