Skip to contents

DLBCLone: Data pre-processing

assemble_genetic_features()
Assemble genetic features for UMAP input

DLBCLone: UMAP

Functions for generating and annotating UMAP visualizations.

make_and_annotate_umap()
Run UMAP and attach result to metadata
make_umap_scatterplot()
Make UMAP scatterplot
basic_umap_scatterplot()
Basic UMAP Scatterplot

DLBCLone Functions - Optimizing Parameters

Tools for feature selection, model optimization and QC.

DLBCLone_optimize_params()
Optimize parameters for classifying samples using UMAP and k-nearest neighbor
posthoc_feature_enrichment()
Determine feature enrichment per class using truth or predicted labels
make_alluvial()
Create an Alluvial Plot Comparing Original and Predicted Classifications
DLBCLone_summarize_model()
Summarize and Export DLBCLone Model Results
DLBCLone_ensemble_postprocess()
Post-process KNN results across K to score consistency, (optionally) refine classified/Other cutoffs per-class and (optionally) assign composite classes

DLBCLone: Model persistence

DLBCLone_save_optimized()
Save a DLBCLone model (and optionally integrity test embeddings)
DLBCLone_load_optimized()
Load a previously saved DLBCLone model (including UMAP state)
DLBCLone_activate()
Activate a DLBCLone model by embedding its training set once

DLBCLone: Classifying samples

Functions for applying trained DLBCLone models to new samples and generating visual summaries of prediction confidence and neighborhood relationships.

DLBCLone_predict()
Predict DLBCL genetic subgroup for one or more samples using a pre-trained DLBCLone model
make_neighborhood_plot()
Make Neighborhood Plot
nearest_neighbor_heatmap()
Heatmap visualization of mutations in nearest neighbors for a sample

DLBCLone: K-Nearest Neighbors

Functions to train and predict using KNN in high-dimensional space instead of UMAP. Not part of the core DLBCLone functionality.

DLBCLone_KNN()
Run DLBCLone KNN Classification
DLBCLone_KNN_predict()
Predict DLBCLone Classes for New Samples Using a Trained KNN Model

DLBCLone: Gaussian Mixture Model

Functions to train and predict using gaussian mixture models in UMAP space.

DLBCLone_train_mixture_model()
Train a Gaussian Mixture Model for DLBCLone Classification
DLBCLone_predict_mixture_model()
Predict DLBCLone Class Membership Using a Trained Gaussian Mixture Model

Other

All remaining GAMBLR.predict functions not listed above.

DLBCLone_shiny()
Run the DLBCLone Shiny App
DLBCLone_train_test_plot()
Plot the result of a DLBCLone classification
RFmodel_BL
BL Classifier model.
RFmodel_FL
FL Classifier model.
RFmodel_Lacy
DLBCL Classifier model.
chapuy_features
Features for DLBCL grouping by Chapuy method.
check_for_missing_features()
Check matrix against missing features.
classify_bl()
Classify BL samples into genetic subgroups.
classify_dlbcl()
Classify DLBCLs according to genetic subgroups.
classify_dlbcl_chapuy()
Classify DLBCLs according to genetic subgroups of Chapuy et al.
classify_dlbcl_lacy()
Classify DLBCLs according to genetic subgroups of Lacy et al.
classify_dlbcl_lymphgenerator()
Construct LymphGenerator matrix
classify_fl()
Classify FL samples into cFL/dFL subgroups.
complete_missing_from_matrix()
Complete samples missing from matrix.
construct_reduced_winning_version()
Construct reduced 21-dimension feature vector for DLBCLass
flatten_feature()
Flatten feature
handle_genome_build()
Harmonize different flavors of genome builds.
lacy_features
Features for DLBCL grouping by Lacy method.
lymphgenerator_features
Features for DLBCL grouping by unified method.
massage_matrix_for_clustering()
Will prepare the data frame of binary matrix to be used as NMF input. This means that for the features with SSM and CNV, they will be squished together as one feature named GeneName-MUTorAMP or GeneName-MUTorLOSS, so the CNV features in the input data frame are expected to be named GeneName_AMP or GeneName_LOSS. Next, for the genes with hotspot mutations labelled in the input data as GeneNameHOTSPOT, the feature for hotspot mutation will be given preference and SSM with/without CNV will be set to 0 for that sample. The naming scheme of the features as in this description is important, because the function uses regex to searh for these patters as specified. Finally, if any features are provided to be dropped explicitly, they will be removed, and then the features not meeting the specified minimal frequency will be removed, as well as any samples with 0 features. Consistent with NMF input, in the input data frame each row is a feature, and each column is a sample. The input is expected to be numeric 1/0 with row and column names.
optimize_outgroup()
Optimize the threshold for classifying samples as "Other"
optimize_purity()
Optimize Purity Threshold for Classification Assignment
process_votes()
Process KNN Vote Strings and Scores for Classification
report_accuracy()
Calculate Classification Accuracy and Per-Class Metrics based on Predictions
summarize_all_ssm_status()
Summarize SSM (Somatic Single Nucleotide Mutation) Status Across Samples
tabulate_ssm_status()
Get Coding SSM Status.
weighted_knn_predict_with_conf()
Weighted k-nearest neighbor with confidence estimate