Package index • GAMBLR.predict

DLBCLone: Data pre-processing

assemble_genetic_features(): Assemble genetic features for UMAP input

DLBCLone: UMAP

Functions for generating and annotating UMAP visualizations.

make_and_annotate_umap(): Run UMAP and attach result to metadata

make_umap_scatterplot(): Make UMAP scatterplot

basic_umap_scatterplot(): Basic UMAP Scatterplot

DLBCLone Functions - Optimizing Parameters

Tools for feature selection, model optimization and QC.

DLBCLone_optimize_params(): Optimize parameters for classifying samples using UMAP and k-nearest neighbor

posthoc_feature_enrichment(): Determine feature enrichment per class using truth or predicted labels

make_alluvial(): Create an Alluvial Plot Comparing Original and Predicted Classifications

DLBCLone_summarize_model(): Summarize and Export DLBCLone Model Results

DLBCLone_ensemble_postprocess(): Post-process KNN results across K to score consistency, (optionally) refine classified/Other cutoffs per-class and (optionally) assign composite classes

DLBCLone: Model persistence

DLBCLone_save_optimized(): Save a DLBCLone model (and optionally integrity test embeddings)

DLBCLone_load_optimized(): Load a previously saved DLBCLone model (including UMAP state)

DLBCLone_activate(): Activate a DLBCLone model by embedding its training set once

DLBCLone: Classifying samples

Functions for applying trained DLBCLone models to new samples and generating visual summaries of prediction confidence and neighborhood relationships.

DLBCLone_predict(): Predict DLBCL genetic subgroup for one or more samples using a pre-trained DLBCLone model

make_neighborhood_plot(): Make Neighborhood Plot

nearest_neighbor_heatmap(): Heatmap visualization of mutations in nearest neighbors for a sample

DLBCLone: K-Nearest Neighbors

Functions to train and predict using KNN in high-dimensional space instead of UMAP. Not part of the core DLBCLone functionality.

DLBCLone_KNN(): Run DLBCLone KNN Classification

DLBCLone_KNN_predict(): Predict DLBCLone Classes for New Samples Using a Trained KNN Model

DLBCLone: Gaussian Mixture Model

Functions to train and predict using gaussian mixture models in UMAP space.

DLBCLone_train_mixture_model(): Train a Gaussian Mixture Model for DLBCLone Classification

DLBCLone_predict_mixture_model(): Predict DLBCLone Class Membership Using a Trained Gaussian Mixture Model

Other

All remaining GAMBLR.predict functions not listed above.

DLBCLone_shiny(): Run the DLBCLone Shiny App

DLBCLone_train_test_plot(): Plot the result of a DLBCLone classification

RFmodel_BL: BL Classifier model.

RFmodel_FL: FL Classifier model.

RFmodel_Lacy: DLBCL Classifier model.

chapuy_features: Features for DLBCL grouping by Chapuy method.

check_for_missing_features(): Check matrix against missing features.

classify_bl(): Classify BL samples into genetic subgroups.

classify_dlbcl(): Classify DLBCLs according to genetic subgroups.

classify_dlbcl_chapuy(): Classify DLBCLs according to genetic subgroups of Chapuy et al.

classify_dlbcl_lacy(): Classify DLBCLs according to genetic subgroups of Lacy et al.

classify_dlbcl_lymphgenerator(): Construct LymphGenerator matrix

classify_fl(): Classify FL samples into cFL/dFL subgroups.

complete_missing_from_matrix(): Complete samples missing from matrix.

construct_reduced_winning_version(): Construct reduced 21-dimension feature vector for DLBCLass

flatten_feature(): Flatten feature

handle_genome_build(): Harmonize different flavors of genome builds.

lacy_features: Features for DLBCL grouping by Lacy method.

lymphgenerator_features: Features for DLBCL grouping by unified method.

massage_matrix_for_clustering(): Will prepare the data frame of binary matrix to be used as NMF input. This means that for the features with SSM and CNV, they will be squished together as one feature named GeneName-MUTorAMP or GeneName-MUTorLOSS, so the CNV features in the input data frame are expected to be named GeneName_AMP or GeneName_LOSS. Next, for the genes with hotspot mutations labelled in the input data as GeneNameHOTSPOT, the feature for hotspot mutation will be given preference and SSM with/without CNV will be set to 0 for that sample. The naming scheme of the features as in this description is important, because the function uses regex to searh for these patters as specified. Finally, if any features are provided to be dropped explicitly, they will be removed, and then the features not meeting the specified minimal frequency will be removed, as well as any samples with 0 features. Consistent with NMF input, in the input data frame each row is a feature, and each column is a sample. The input is expected to be numeric 1/0 with row and column names.

optimize_outgroup(): Optimize the threshold for classifying samples as "Other"

optimize_purity(): Optimize Purity Threshold for Classification Assignment

process_votes(): Process KNN Vote Strings and Scores for Classification

report_accuracy(): Calculate Classification Accuracy and Per-Class Metrics based on Predictions

summarize_all_ssm_status(): Summarize SSM (Somatic Single Nucleotide Mutation) Status Across Samples

tabulate_ssm_status(): Get Coding SSM Status.

weighted_knn_predict_with_conf(): Weighted k-nearest neighbor with confidence estimate