Skip to contents

Performs a post-hoc evaluation of the classification of a sample as one of the main classes vs the outgroup/unclassified label "Other" and returns the optimal threshold for classifying a sample as "Other" based on the ground truth provided in the true_labels vector. It evaluates the performance of the classifier using a range of thresholds and returns the best threshold based on the specified metric (balanced accuracy or accuracy).

Usage

optimize_outgroup(
  predicted_labels,
  true_labels,
  other_score,
  all_classes = c("MCD", "EZB", "BN2", "N1", "ST2", "Other"),
  maximize = "balanced_accuracy",
  exclude_other_for_accuracy = FALSE,
  cap_classification_rate = 1,
  verbose = FALSE,
  other_class = "Other"
)

Arguments

predicted_labels

Vector of predicted labels for the samples

true_labels

Vector of true labels for the samples

other_score

Vector of scores for the "Other" class for each sample

all_classes

Vector of classes to use for training and testing. Default: c("MCD","EZB","BN2","N1","ST2","Other")

maximize

Metric to use for optimization. Either "accuracy" (actual accuracy across all samples) or "balanced_accuracy" (the mean of the balanced accuracy values across all classes). Default: "balanced_accuracy"

exclude_other_for_accuracy

Set to TRUE to exclude the "Other" class from the 'lymphgen' column when calculating accuracy metrics (passed to DLBCLone_optimize_params). Default: FALSE

Value

a list of data frames with the predictions and the UMAP input

Details

NOTE: This function is not generally meant to be called directly but rather is a helper function used by DLBCLone_optimize_params.