Optimize Purity Threshold for Classification Assignment
optimize_purity.RdThis function searches for the optimal purity threshold to assign samples to their predicted class or to "Other" based on the score ratio in processed kNN vote results. It iteratively tests a range of purity thresholds, updating the predicted class if the score ratio meets or exceeds the threshold, and computes the accuracy for each threshold. The function returns the best accuracy achieved and the corresponding purity threshold. NOTE: This is a helper function and is not intended to be called directly by the user
Usage
optimize_purity(
optimized_model_object,
vote_df,
mode,
optimize_by = "balanced_accuracy",
truth_column,
all_classes = c("MCD", "EZB", "BN2", "N1", "ST2", "Other"),
k,
cap_classification_rate = 1,
exclude_other_for_accuracy = FALSE,
other_class = "Other",
optimize_for_other = TRUE
)Arguments
- truth_column
Name of the column in
processed_votescontaining the true class labels.- processed_votes
Data frame output from
process_votes, containing at least the columns for score ratio, by_score_opt, and the relevant prediction and truth columns.- prediction_column
Name of the column in
processed_votesto update with the optimized prediction.
Value
A list with two elements: best_accuracy (numeric, the
highest accuracy achieved) and best_purity_threshold (numeric,
the threshold at which this accuracy was achieved).
Details
For each threshold in the range 0.1 to 0.95 (step 0.05), the function updates the prediction column to assign the class from
by_score_optif the score ratio meets the threshold, otherwise assigns "Other".Accuracy is computed as the proportion of correct assignments (diagonal of the confusion matrix).
The function is intended for use in optimizing classification purity in kNN-based workflows, especially when distinguishing between confident class assignments and ambiguous ("Other") cases.