Skip to contents

This function assembles a matrix of genetic features for each sample, including mutation status, aSHM counts, and structural variant status for BCL2, BCL6, and MYC. It supports both genome and capture sequencing (data) types.

Usage

assemble_genetic_features(
  these_samples_metadata,
  sv_from_metadata = c(BCL2 = "bcl2_ba", BCL6 = "bcl6_ba", MYC = "myc_ba"),
  genes,
  synon_genes,
  maf_with_synon,
  hotspot_genes,
  genome_build = "grch37",
  sv_value = 2,
  synon_value = 1,
  coding_value = 2,
  include_ashm = FALSE,
  annotated_sv,
  include_GAMBL_sv = TRUE,
  review_hotspots = TRUE,
  verbose = FALSE
)

Arguments

these_samples_metadata

Data frame with sample metadata, must include seq_type and sample_id.

sv_from_metadata

A named vector that specifies the columns containing the oncogene translocation status for any SV that is annotated in the metadata Where the name is the oncogene and the value is the column name in the metadata. The column created in the output will be "_SV".

genes

Vector of gene symbols to include.

synon_genes

Vector of gene symbols for synonymous mutations (generally a subset of genes).

maf_with_synon

MAF data frame including synonymous mutations.

hotspot_genes

Vector specifying genes for which hotspot mutations should be separately annotated. The columns will be named "HOTSPOT". For this to work, either specify review_hotspots = TRUE or, if you want full control over hotspot annotation, the MAF must include a column "hot_spot" with TRUE specifying any row corresponding to a hotspot mutation.

sv_value

Value to assign for SV presence (default: 2).

synon_value

Value to assign for synonymous mutations (default: 1).

coding_value

Value to assign for coding mutations (default: 2).

include_ashm

Logical; if TRUE, use GAMBLR.results::get_ssm_by_region to retrieve all non-coding mutations for each gene in synon_genes and use these to infer mutation status (default: FALSE). WARNING: This feature is experimental and is likely not going to give comparable results if you are using a mix of genome and capture data. It also relies on GAMBLR.results, which is not a core dependency of GAMBLR.predict.

annotated_sv

Data frame in bedpe format with annotated SVs from GAMBLR.utils::annotate_sv(). If provided, the oncogene SV status will be based on the union of the SVs in this data frame and the metadata columns specified in sv_from_metadata.

include_GAMBL_sv

Logical; if TRUE, SVs from GAMBLR.results will automatically be retrieved and annotated. WARNING: This feature is experimental and is likely not going to give comparable results if you are using a mix of genome and capture data. It also relies on GAMBLR.results, which is not a core dependency of GAMBLR.predict.

review_hotspots

Logical; if TRUE, any gene in hotspot_genes that is compatible with review_hotspots will have its hotspots annotated. For more information, see GAMBLR.helpers::review_hotspots

verbose

Defaults to FALSE

Value

Matrix of assembled features for each sample.

Examples

if (FALSE) { # \dontrun{
all_meta = get_gambl_metadata() %>%
 dplyr::filter(pathology=="DLBCL",seq_type=="genome")

all_maf = get_all_coding_ssm(all_meta,include_silent=TRUE)

sv_all =get_combined_sv(all_meta)

anno_sv = annotate_sv(sv_all)

feat_mat = assemble_genetic_features(all_meta,
                                    genes=c("EZH2","SOCS1","PIM1",
                                            "MYD88","CREBBP","SGK1",
                                            "NOTCH2","NOTCH1"),
                                    maf_with_synon = all_maf,
                                    annotated_sv = anno_sv,
                                    synon_genes=c("PIM1","SOCS1"))
} # }