Train a Gaussian Mixture Model for DLBCLone Classification — DLBCLone_train_mixture

Fits a supervised Gaussian mixture model (GMM) to UMAP-projected data using a user-provided taxonomy of genetic subtypes, excluding samples labeled "Other". Assigns class predictions and optionally reclassifies samples as "Other" based on probability and density thresholds. NOTE: This is not related to the core KNN DLBCLone approach and is mostly here just for curiosity.

Usage

DLBCLone_train_mixture_model(
  umap_out,
  probability_threshold = 0.5,
  density_max_threshold = 0.05,
  truth_column = "lymphgen",
  cohort = NULL,
  truth_classes = c("EZB", "MCD", "ST2", "N1", "BN2", "Other")
)

Arguments

umap_out: List. Output from make_and_annotate_umap, containing a data frame with UMAP coordinates and truth labels.
probability_threshold: Numeric. Minimum posterior probability required to assign a class (default: 0.5).
density_max_threshold: Numeric. Minimum maximum density required to assign a class (default: 0.05).
cohort: Optional character. Cohort label to annotate predictions.

Value

A list with:

gaussian_mixture_model: Fitted MclustDA model object
predictions: Data frame with sample IDs, UMAP coordinates, true labels, predicted classes, and thresholded assignments
probability_threshold: Probability threshold used for "Other" assignment

Details

Uses MclustDA to fit a supervised mixture model to the UMAP coordinates (V1, V2) and class labels.
Predicts class membership and computes per-class densities for each sample.
Samples with low maximum probability or density are reclassified as "Other".
Returns both raw and thresholded class assignments, respectively under the columns DLBCLone_g and DLBCLone_go.

Examples

if (FALSE) { # \dontrun{
 result <- DLBCLone_train_mixture_model(umap_out)
 head(result$predictions)
} # }