Skip to contents

This function processes the raw neighbor label strings and weighted vote scores from k-nearest neighbor (KNN) classification results. It computes per-class neighbor counts, weighted scores, and identifies the top group by count and score for each sample. The function also supports custom logic for handling the "Other" class, including vote multipliers and purity requirements. NOTE: This is a helper function and is not intended to be called directly by the user

Usage

process_votes(
  df,
  raw_col = "label",
  group_labels = c("EZB", "MCD", "ST2", "BN2", "N1", "Other"),
  vote_labels_col = "vote_labels",
  weighted_votes_col = "weighted_votes",
  k,
  other_vote_multiplier = 2,
  score_purity_requirement = 1,
  other_class = "Other",
  optimize_for_other = TRUE,
  debug = FALSE
)

Arguments

df

Data frame containing kNN results, including columns with neighbor labels and weighted votes.

raw_col

Name of the column containing the comma-separated neighbor labels (default: "label").

group_labels

Character vector of all possible class labels to consider (default: c("EZB", "MCD", "ST2", "BN2", "N1", "Other")).

vote_labels_col

Name of the column containing the comma-separated neighbor labels for weighted votes (default: "vote_labels").

weighted_votes_col

Name of the column containing the comma-separated weighted votes (default: "weighted_votes").

k

Number of neighbors used in kNN (required).

other_vote_multiplier

Multiplier for the "Other" class when determining if a sample should be reclassified as "Other" (default: 2).

score_purity_requirement

Minimum ratio of top group score to "Other" score to assign a sample to the top group (default: 1).

Value

Data frame with additional columns for per-class neighbor counts, scores, top group assignments, and summary statistics for each sample.

Details

  • Computes the number of neighbors for each class and the sum of weighted votes per class.

  • Identifies the top group by count and by weighted score, and applies custom logic for the "Other" class if present.

  • Adds columns for counts, scores, top group, top group score, score ratios, and optimized group assignments.

  • Designed for downstream use in DLBCLone and similar kNN-based classification workflows.

Examples

# Example usage:
# result <- process_votes(knn_output_df, k = 7)