Skip to contents

This function searches for the optimal purity threshold to assign samples to their predicted class or to "Other" based on the score ratio in processed kNN vote results. It iteratively tests a range of purity thresholds, updating the predicted class if the score ratio meets or exceeds the threshold, and computes the accuracy for each threshold. The function returns the best accuracy achieved and the corresponding purity threshold. NOTE: This is a helper function and is not intended to be called directly by the user

Usage

optimize_purity(
  optimized_model_object,
  vote_df,
  mode,
  optimize_by = "balanced_accuracy",
  truth_column,
  all_classes = c("MCD", "EZB", "BN2", "N1", "ST2", "Other"),
  k,
  cap_classification_rate = 1,
  exclude_other_for_accuracy = FALSE,
  other_class = "Other",
  optimize_for_other = TRUE
)

Arguments

truth_column

Name of the column in processed_votes containing the true class labels.

processed_votes

Data frame output from process_votes, containing at least the columns for score ratio, by_score_opt, and the relevant prediction and truth columns.

prediction_column

Name of the column in processed_votes to update with the optimized prediction.

Value

A list with two elements: best_accuracy (numeric, the highest accuracy achieved) and best_purity_threshold (numeric, the threshold at which this accuracy was achieved).

Details

  • For each threshold in the range 0.1 to 0.95 (step 0.05), the function updates the prediction column to assign the class from by_score_opt if the score ratio meets the threshold, otherwise assigns "Other".

  • Accuracy is computed as the proportion of correct assignments (diagonal of the confusion matrix).

  • The function is intended for use in optimizing classification purity in kNN-based workflows, especially when distinguishing between confident class assignments and ambiguous ("Other") cases.

Examples

# Example usage:
if (FALSE) { # \dontrun{
result <- optimize_purity(processed_votes,
  prediction_column = "pred_label",
  truth_column = "true_label")
} # }