Skip to contents

Fits a supervised Gaussian mixture model (GMM) to UMAP-projected data using a user-provided taxonomy of genetic subtypes, excluding samples labeled "Other". Assigns class predictions and optionally reclassifies samples as "Other" based on probability and density thresholds. NOTE: This is not related to the core KNN DLBCLone approach and is mostly here just for curiosity.

Usage

DLBCLone_train_mixture_model(
  umap_out,
  probability_threshold = 0.5,
  density_max_threshold = 0.05,
  truth_column = "lymphgen",
  cohort = NULL,
  truth_classes = c("EZB", "MCD", "ST2", "N1", "BN2", "Other")
)

Arguments

umap_out

List. Output from make_and_annotate_umap, containing a data frame with UMAP coordinates and truth labels.

probability_threshold

Numeric. Minimum posterior probability required to assign a class (default: 0.5).

density_max_threshold

Numeric. Minimum maximum density required to assign a class (default: 0.05).

cohort

Optional character. Cohort label to annotate predictions.

Value

A list with:

gaussian_mixture_model

Fitted MclustDA model object

predictions

Data frame with sample IDs, UMAP coordinates, true labels, predicted classes, and thresholded assignments

probability_threshold

Probability threshold used for "Other" assignment

Details

  • Uses MclustDA to fit a supervised mixture model to the UMAP coordinates (V1, V2) and class labels.

  • Predicts class membership and computes per-class densities for each sample.

  • Samples with low maximum probability or density are reclassified as "Other".

  • Returns both raw and thresholded class assignments, respectively under the columns DLBCLone_g and DLBCLone_go.

Examples

if (FALSE) { # \dontrun{
 result <- DLBCLone_train_mixture_model(umap_out)
 head(result$predictions)
} # }