Process KNN Vote Strings and Scores for Classification
process_votes.RdThis function processes the raw neighbor label strings and weighted vote scores from k-nearest neighbor (KNN) classification results. It computes per-class neighbor counts, weighted scores, and identifies the top group by count and score for each sample. The function also supports custom logic for handling the "Other" class, including vote multipliers and purity requirements. NOTE: This is a helper function and is not intended to be called directly by the user
Usage
process_votes(
df,
raw_col = "label",
group_labels = c("EZB", "MCD", "ST2", "BN2", "N1", "Other"),
vote_labels_col = "vote_labels",
weighted_votes_col = "weighted_votes",
k,
other_vote_multiplier = 2,
score_purity_requirement = 1,
other_class = "Other",
optimize_for_other = TRUE,
debug = FALSE
)Arguments
- df
Data frame containing kNN results, including columns with neighbor labels and weighted votes.
- raw_col
Name of the column containing the comma-separated neighbor labels (default: "label").
- group_labels
Character vector of all possible class labels to consider (default: c("EZB", "MCD", "ST2", "BN2", "N1", "Other")).
- vote_labels_col
Name of the column containing the comma-separated neighbor labels for weighted votes (default: "vote_labels").
- weighted_votes_col
Name of the column containing the comma-separated weighted votes (default: "weighted_votes").
- k
Number of neighbors used in kNN (required).
- other_vote_multiplier
Multiplier for the "Other" class when determining if a sample should be reclassified as "Other" (default: 2).
- score_purity_requirement
Minimum ratio of top group score to "Other" score to assign a sample to the top group (default: 1).
Value
Data frame with additional columns for per-class neighbor counts, scores, top group assignments, and summary statistics for each sample.
Details
Computes the number of neighbors for each class and the sum of weighted votes per class.
Identifies the top group by count and by weighted score, and applies custom logic for the "Other" class if present.
Adds columns for counts, scores, top group, top group score, score ratios, and optimized group assignments.
Designed for downstream use in DLBCLone and similar kNN-based classification workflows.