Process KNN Vote Strings and Scores for Classification — process

This function processes the raw neighbor label strings and weighted vote scores from k-nearest neighbor (KNN) classification results. It computes per-class neighbor counts, weighted scores, and identifies the top group by count and score for each sample. The function also supports custom logic for handling the "Other" class, including vote multipliers and purity requirements. NOTE: This is a helper function and is not intended to be called directly by the user

Usage

process_votes(
  df,
  raw_col = "label",
  group_labels = c("EZB", "MCD", "ST2", "BN2", "N1", "Other"),
  vote_labels_col = "vote_labels",
  weighted_votes_col = "weighted_votes",
  k,
  other_vote_multiplier = 2,
  score_purity_requirement = 1,
  other_class = "Other",
  optimize_for_other = TRUE,
  debug = FALSE
)

Arguments

df: Data frame containing kNN results, including columns with neighbor labels and weighted votes.
raw_col: Name of the column containing the comma-separated neighbor labels (default: "label").
group_labels: Character vector of all possible class labels to consider (default: c("EZB", "MCD", "ST2", "BN2", "N1", "Other")).
vote_labels_col: Name of the column containing the comma-separated neighbor labels for weighted votes (default: "vote_labels").
weighted_votes_col: Name of the column containing the comma-separated weighted votes (default: "weighted_votes").
k: Number of neighbors used in kNN (required).
other_vote_multiplier: Multiplier for the "Other" class when determining if a sample should be reclassified as "Other" (default: 2).
score_purity_requirement: Minimum ratio of top group score to "Other" score to assign a sample to the top group (default: 1).

Value

Data frame with additional columns for per-class neighbor counts, scores, top group assignments, and summary statistics for each sample.

Details

Computes the number of neighbors for each class and the sum of weighted votes per class.
Identifies the top group by count and by weighted score, and applies custom logic for the "Other" class if present.
Adds columns for counts, scores, top group, top group score, score ratios, and optimized group assignments.
Designed for downstream use in DLBCLone and similar kNN-based classification workflows.

Examples

# Example usage:
# result <- process_votes(knn_output_df, k = 7)