DLBCLone_save_optimized( # saving DLBCLone_optimize_params
DLBCLone_model = best_opt_model,
base_path="../data/saved_tutorial_model",
name_prefix="best_opt_model"
)DLBCLone Models
Preamble
For future use and to facilitate reproducibility, you can save the outputs of DLBCLone_optimize_params (i.e. a DLBCLone model) and restore them in a subsequent session.
DLBCLone_save_optimized
To store your model, you simply specify the directory and a name prefix that will be incorporated into the file names. A saved model has two separate compoents, a .rds file and a .uwot file.
DLBCLone_load_optimized
Allows you to load in your stored model.
loaded_model <- DLBCLone_load_optimized( # loading DLBCLone_optimize_params
path="../data/saved_tutorial_model",
name_prefix="best_opt_model"
)
default_model <- "best_opt_w"
# Load your stored optimized model from DLBCLone_save_optimized()
all_features_optimized <- DLBCLone_load_optimized(
dirname(
system.file(
"extdata/models",
"best_opt_model.rds",
package = "GAMBLR.predict"
)
),
default_model
)Key components of a DLBCLone model
DLBCLone models are simply named lists that keep track of all the necessary objects required for downstream functions such as DLBCLone_predict. Key components you should be aware of:
-
featuresfeature matrix for the training data and what is used to initially create the UMAP model and determine the coordinates of training samples in UMAP space. It contains all the original mutation features plus (where applicable) any meta-features. The latter are recognizable because they’re named differently (all end in_feats). -
modelthe actual UMAP model, generated by theuwotpackage -
predictionsthese are the original predicted labels for every training sample as determined duringDLBCLone_optimize_params.
Let’s take a peek at what’s in the features for this model. As you will see, this model used 5 meta-features, one for each of ST2, N1, EZB, MCD and BN2.
colnames(all_features_optimized$features) [1] "ACTB" "ACTG1" "BCL10" "BCL2" "BCL2L1"
[6] "BCL2_SV" "BCL6" "BCL6_SV" "BIRC3" "BRAF"
[11] "BTG1" "BTG2" "BTK" "CD19" "CD70"
[16] "CD79B" "CD83" "CDKN2A" "CREBBP" "DDX3X"
[21] "DTX1" "DUSP2" "EDRF1" "EIF4A2" "EP300"
[26] "ETS1" "ETV6" "EZH2" "FAS" "FCGR2B"
[31] "FOXC1" "FOXO1" "GNA13" "GRHPR" "HLA-A"
[36] "HLA-B" "HNRNPD" "IRF4" "IRF8" "ITPKB"
[41] "JUNB" "KLF2" "KLHL14" "KLHL6" "KMT2D"
[46] "MEF2B" "MPEG1" "MYD88" "MYD88HOTSPOT" "NFKBIA"
[51] "NFKBIE" "NFKBIZ" "NOL9" "NOTCH1" "NOTCH2"
[56] "OSBPL10" "PIM1" "PIM2" "PRDM1" "PRDM15"
[61] "PRKDC" "PRRC2C" "PTPN1" "RFTN1" "S1PR2"
[66] "SETD1B" "SGK1" "SOCS1" "SPEN" "STAT3"
[71] "STAT6" "TBCC" "TBL1XR1" "TET2" "TMEM30A"
[76] "TMSB4X" "TNFAIP3" "TNFRSF14" "TOX" "TP53"
[81] "TP73" "UBE2A" "WEE1" "XBP1" "ZFP36L1"
[86] "st2_feats" "n1_feats" "ezb_feats" "mcd_feats" "bn2_feats"
[1] "00-14595_tumorC" "00-15201_tumorA" "00-15201_tumorB"
[4] "00-17960_CLC01670" "FL1015T2" "00-22011_tumorB"
ACTB ACTG1 BCL10 BCL2 BCL2L1 BCL2_SV BCL6 BCL6_SV BIRC3 BRAF
00-14595_tumorC 0 0 0 2 0 2 2 2 0 0
00-15201_tumorA 0 0 0 0 0 0 2 0 0 0
00-15201_tumorB 0 0 0 0 0 0 0 0 0 0
00-17960_CLC01670 2 0 0 1 0 2 0 2 0 0
FL1015T2 0 0 0 0 0 2 0 2 0 0
00-22011_tumorB 0 0 0 0 0 0 0 0 0 0
00-23442_tumorB 0 0 0 0 0 2 0 0 0 0
00-26427_tumorA 0 0 0 0 0 2 0 0 0 0
00-26427_tumorC 0 0 0 0 0 2 0 0 0 0
FL1002T2 0 0 0 2 0 2 0 0 0 0
DLBCLone_predict is a versatile function, you can use it to cross reference your training data, predict a test sample one at a time, or all of your test samples all at once! Here we will just be re-analyzing training samples. There are only two required arguments:
mutation_statusA data frame with sample_id as rownames and feature name as columns. This is the feature matrix you want to predict classes for. You can subset it to contain only rows for the samples you want analyzed but this isn’t necessary.optimized_modelThe actual DLBCLone model you want to use. For example, the output of DLBCLone_optimize_params or a model you’ve re-loaded from disk viaDLBCLone_load_optimized
Activating a model
If you plan on classifying more than one sample in a session, an activated model will perform better. This simply pre-computes the coordinates of all training samples using the model and stores these for re-use.
# Reactivates stored model setting the UMAP projection
all_features_optimized <- DLBCLone_activate(
all_features_optimized,
force = TRUE
)