Assemble genetic features for UMAP input
assemble_genetic_features.RdThis function assembles a matrix of genetic features for each sample, including mutation status, aSHM counts, and structural variant status for BCL2, BCL6, and MYC. It supports both genome and capture sequencing (data) types.
Usage
assemble_genetic_features(
these_samples_metadata,
sv_from_metadata = c(BCL2 = "bcl2_ba", BCL6 = "bcl6_ba", MYC = "myc_ba"),
genes,
synon_genes,
maf_with_synon,
hotspot_genes,
genome_build = "grch37",
sv_value = 2,
synon_value = 1,
coding_value = 2,
include_ashm = FALSE,
annotated_sv,
include_GAMBL_sv = TRUE,
review_hotspots = TRUE,
verbose = FALSE
)Arguments
- these_samples_metadata
Data frame with sample metadata, must include seq_type and sample_id.
- sv_from_metadata
A named vector that specifies the columns containing the oncogene translocation status for any SV that is annotated in the metadata Where the name is the oncogene and the value is the column name in the metadata. The column created in the output will be "
_SV". - genes
Vector of gene symbols to include.
- synon_genes
Vector of gene symbols for synonymous mutations (generally a subset of genes).
- maf_with_synon
MAF data frame including synonymous mutations.
- hotspot_genes
Vector specifying genes for which hotspot mutations should be separately annotated. The columns will be named "
HOTSPOT". For this to work, either specify review_hotspots = TRUE or, if you want full control over hotspot annotation, the MAF must include a column "hot_spot" with TRUE specifying any row corresponding to a hotspot mutation. - sv_value
Value to assign for SV presence (default: 2).
- synon_value
Value to assign for synonymous mutations (default: 1).
- coding_value
Value to assign for coding mutations (default: 2).
- include_ashm
Logical; if TRUE, use GAMBLR.results::get_ssm_by_region to retrieve all non-coding mutations for each gene in synon_genes and use these to infer mutation status (default: FALSE). WARNING: This feature is experimental and is likely not going to give comparable results if you are using a mix of genome and capture data. It also relies on GAMBLR.results, which is not a core dependency of GAMBLR.predict.
- annotated_sv
Data frame in bedpe format with annotated SVs from GAMBLR.utils::annotate_sv(). If provided, the oncogene SV status will be based on the union of the SVs in this data frame and the metadata columns specified in sv_from_metadata.
- include_GAMBL_sv
Logical; if TRUE, SVs from GAMBLR.results will automatically be retrieved and annotated. WARNING: This feature is experimental and is likely not going to give comparable results if you are using a mix of genome and capture data. It also relies on GAMBLR.results, which is not a core dependency of GAMBLR.predict.
- review_hotspots
Logical; if TRUE, any gene in hotspot_genes that is compatible with review_hotspots will have its hotspots annotated. For more information, see GAMBLR.helpers::review_hotspots
- verbose
Defaults to FALSE
Examples
if (FALSE) { # \dontrun{
all_meta = get_gambl_metadata() %>%
dplyr::filter(pathology=="DLBCL",seq_type=="genome")
all_maf = get_all_coding_ssm(all_meta,include_silent=TRUE)
sv_all =get_combined_sv(all_meta)
anno_sv = annotate_sv(sv_all)
feat_mat = assemble_genetic_features(all_meta,
genes=c("EZH2","SOCS1","PIM1",
"MYD88","CREBBP","SGK1",
"NOTCH2","NOTCH1"),
maf_with_synon = all_maf,
annotated_sv = anno_sv,
synon_genes=c("PIM1","SOCS1"))
} # }