Run UMAP and attach result to metadata
make_and_annotate_umap.RdRun UMAP and attach result to metadata
Usage
make_and_annotate_umap(
df,
metadata,
umap_out,
truth_column = "lymphgen",
core_features = NULL,
core_feature_multiplier = 1.5,
hidden_features = NULL,
n_neighbors = 55,
min_dist = 0,
metric = "cosine",
n_epochs = 1500,
init = "spca",
ret_model = TRUE,
na_vals = "drop",
join_column = "sample_id",
seed = 12345,
target_column,
target_metric = "euclidean",
target_weight = 0.5,
calc_dispersion = FALSE,
algorithm = "tumap",
individually = TRUE,
make_plot = FALSE
)Arguments
- df
Feature matrix with one row per sample and one column per mutation
- metadata
Metadata data frame with one row per sample and a column sample_id that matches the row names of df. This data frame will be joined to the UMAP output.
- umap_out
Optional UMAP output from a previous run. If provided, the function will use this model to project the data instead of re-running UMAP. This is useful for reproducibility and for using the same UMAP model on different datasets.
- n_neighbors
Passed to UMAP2. The number of neighbors to consider when calculating the UMAP embedding.
- min_dist
Passed to UMAP2. The minimum distance between points in the UMAP embedding.
- metric
Passed to UMAP2. The distance metric to use for calculating distances between points.
- n_epochs
Passed to UMAP2. The number of epochs to run the UMAP algorithm.
- init
Passed to UMAP2. The initialization method for the UMAP algorithm.
- ret_model
additional argument
- na_vals
How to deal with NA values. Two options are "drop", which will remove all columns containing at least one NA or "to_zero", which sets all NA to zero and leaves the column intact.
- join_column
The column name in the metadata data frame that contains the sample IDs (default sample_id).
- seed
Passed to UMAP2. The random seed for reproducibility.
Examples
if (FALSE) { # \dontrun{
#library(GAMBLR.predict)
# Load your mutation status data frame and
# ensure sample_id is moved to row names
all_full_status = readr::read_tsv(system.file("extdata/all_full_status.tsv",
package = "GAMBLR.predict")) %>%
tibble::column_to_rownames("sample_id")
# Load sample metadata for training/labeling
dlbcl_meta = readr::read_tsv(
system.file("extdata/dlbcl_meta_with_dlbclass.tsv",
package = "GAMBLR.predict")
)
my_umap <- make_and_annotate_umap(
df=all_full_status,
metadata=dlbcl_meta
)
# Usually you'll immediately want to visualize it
# make_umap_scatterplot(my_umap$df)
} # }