Train a Gaussian Mixture Model for DLBCLone Classification
DLBCLone_train_mixture_model.RdFits a supervised Gaussian mixture model (GMM) to UMAP-projected data using a user-provided taxonomy of genetic subtypes, excluding samples labeled "Other". Assigns class predictions and optionally reclassifies samples as "Other" based on probability and density thresholds. NOTE: This is not related to the core KNN DLBCLone approach and is mostly here just for curiosity.
Usage
DLBCLone_train_mixture_model(
umap_out,
probability_threshold = 0.5,
density_max_threshold = 0.05,
truth_column = "lymphgen",
cohort = NULL,
truth_classes = c("EZB", "MCD", "ST2", "N1", "BN2", "Other")
)Arguments
- umap_out
List. Output from
make_and_annotate_umap, containing a data frame with UMAP coordinates and truth labels.- probability_threshold
Numeric. Minimum posterior probability required to assign a class (default: 0.5).
- density_max_threshold
Numeric. Minimum maximum density required to assign a class (default: 0.05).
- cohort
Optional character. Cohort label to annotate predictions.
Value
A list with:
- gaussian_mixture_model
Fitted
MclustDAmodel object- predictions
Data frame with sample IDs, UMAP coordinates, true labels, predicted classes, and thresholded assignments
- probability_threshold
Probability threshold used for "Other" assignment
Details
Uses
MclustDAto fit a supervised mixture model to the UMAP coordinates (V1, V2) and class labels.Predicts class membership and computes per-class densities for each sample.
Samples with low maximum probability or density are reclassified as "Other".
Returns both raw and thresholded class assignments, respectively under the columns DLBCLone_g and DLBCLone_go.
Examples
if (FALSE) { # \dontrun{
result <- DLBCLone_train_mixture_model(umap_out)
head(result$predictions)
} # }