Skip to contents

Run UMAP and attach result to metadata

Usage

make_and_annotate_umap(
  df,
  metadata,
  umap_out,
  truth_column = "lymphgen",
  core_features = NULL,
  core_feature_multiplier = 1.5,
  hidden_features = NULL,
  n_neighbors = 55,
  min_dist = 0,
  metric = "cosine",
  n_epochs = 1500,
  init = "spca",
  ret_model = TRUE,
  na_vals = "drop",
  join_column = "sample_id",
  seed = 12345,
  target_column,
  target_metric = "euclidean",
  target_weight = 0.5,
  calc_dispersion = FALSE,
  algorithm = "tumap",
  individually = TRUE,
  make_plot = FALSE
)

Arguments

df

Feature matrix with one row per sample and one column per mutation

metadata

Metadata data frame with one row per sample and a column sample_id that matches the row names of df. This data frame will be joined to the UMAP output.

umap_out

Optional UMAP output from a previous run. If provided, the function will use this model to project the data instead of re-running UMAP. This is useful for reproducibility and for using the same UMAP model on different datasets.

n_neighbors

Passed to UMAP2. The number of neighbors to consider when calculating the UMAP embedding.

min_dist

Passed to UMAP2. The minimum distance between points in the UMAP embedding.

metric

Passed to UMAP2. The distance metric to use for calculating distances between points.

n_epochs

Passed to UMAP2. The number of epochs to run the UMAP algorithm.

init

Passed to UMAP2. The initialization method for the UMAP algorithm.

ret_model

additional argument

na_vals

How to deal with NA values. Two options are "drop", which will remove all columns containing at least one NA or "to_zero", which sets all NA to zero and leaves the column intact.

join_column

The column name in the metadata data frame that contains the sample IDs (default sample_id).

seed

Passed to UMAP2. The random seed for reproducibility.

Examples

if (FALSE) { # \dontrun{
#library(GAMBLR.predict)

# Load your mutation status data frame and
# ensure sample_id is moved to row names
all_full_status = readr::read_tsv(system.file("extdata/all_full_status.tsv",
                                  package = "GAMBLR.predict")) %>%
                                  tibble::column_to_rownames("sample_id")
# Load sample metadata for training/labeling
dlbcl_meta = readr::read_tsv(
  system.file("extdata/dlbcl_meta_with_dlbclass.tsv",
              package = "GAMBLR.predict")
)
my_umap <- make_and_annotate_umap(
  df=all_full_status,
  metadata=dlbcl_meta
)
# Usually you'll immediately want to visualize it
# make_umap_scatterplot(my_umap$df)
} # }