Tabulate mutation status (SSM) for a set of genes.

get_coding_ssm_status(
  gene_symbols,
  these_samples_metadata,
  from_flatfile = TRUE,
  augmented = TRUE,
  min_read_support = 3,
  maf_path = NULL,
  maf_data,
  include_hotspots = TRUE,
  keep_multihit_hotspot = FALSE,
  recurrence_min = 5,
  seq_type = "genome",
  projection = "grch37",
  review_hotspots = TRUE,
  genes_of_interest = c("FOXO1", "MYD88", "CREBBP"),
  genome_build = "hg19",
  include_silent = TRUE
)

Arguments

gene_symbols

A vector of gene symbols for which the mutation status will be tabulated. If not provided, lymphoma genes will be returned by default.

these_samples_metadata

The metadata for samples of interest to be included in the returned matrix. Only the column "sample_id" is required. If not provided, the matrix is tabulated for all available samples as default.

from_flatfile

Optional argument whether to use database or flat file to retrieve mutations. Default is TRUE.

augmented

default: TRUE. Set to FALSE if you instead want the original MAF from each sample for multi-sample patients instead of the augmented MAF.

min_read_support

Only returns variants with at least this many reads in t_alt_count (for cleaning up augmented MAFs).

maf_path

If the status of coding SSM should be tabulated from a custom maf file, provide path to the maf in this argument. The default is set to NULL.

maf_data

Either a maf loaded from disk or from the database using a get_ssm function.

include_hotspots

Logical parameter indicating whether hotspots object should also be tabulated. Default is TRUE.

keep_multihit_hotspot

Logical parameter indicating whether to keep the gene annotation as mutated when the gene has both hot spot and non-hotspot mutation. Default is FALSE. If set to TRUE, will report the number of non-hotspot mutations instead of tabulating for just mutation presence.

recurrence_min

Integer value indicating minimal recurrence level.

seq_type

The seq_type you want back, default is genome.

projection

Specify projection (grch37 or hg38) of mutations. Default is grch37.

review_hotspots

Logical parameter indicating whether hotspots object should be reviewed to include functionally relevant mutations or rare lymphoma-related genes. Default is TRUE.

genes_of_interest

A vector of genes for hotspot review. Currently only FOXO1, MYD88, and CREBBP are supported.

genome_build

Reference genome build for the coordinates in the MAF file. The default is hg19 genome build.

include_silent

Logical parameter indicating whether to include silent mutations into coding mutations. Default is TRUE.

Value

A data frame with tabulated mutation status.

Details

This function takes a vector of gene symbols and subsets the incoming MAF to specified genes. If no genes are provided, the function will default to all lymphoma genes. The function can accept a wide range of incoming MAFs. For example, the user can call this function with these_samples_metadata (preferably a metadata table that has been subset to the sample IDs of interest). If this parameter is not called, the function will default to all samples available with get_gambl_metadata. The user can also provide a path to a MAF, or MAF-like file with maf_path, or an already loaded MAF can be used with the maf_data parameter. If both maf_path and maf_data is missing, the function will default to run get_coding_ssm. This function also has a lot of filtering and convenience parameters giving the user full control of the return. For more information, refer to the parameter descriptions and examples. Is this function not what you are looking for? Try one of the following, similar, functions; get_coding_ssm, get_ssm_by_patients, get_ssm_by_sample, get_ssm_by_samples, get_ssm_by_region, get_ssm_by_regions

Examples

coding_tabulated_df = get_coding_ssm_status(maf_data = grande_maf,
                                            gene_symbols = "EGFR")
#> Joining with `by = join_by(sample_id)`
#> Adding missing grouping variables: `SYMBOL`
#> Adding missing grouping variables: `SYMBOL`
#> annotating hotspots
#> Joining with `by = join_by(sample_id)`
#> FOXO1HOTSPOT

#all lymphoma genes from bundled NHL gene list
coding_tabulated_df = get_coding_ssm_status()
#> defaulting to all lymphoma genes
#> reading from: /projects/nhl_meta_analysis_scratch/gambl/results_local/all_the_things/slms_3-1.0_vcf2maf-1.3/genome--projection/deblacklisted/augmented_maf/all_slms-3--grch37.CDS.maf
#> mutations from 1652 samples
#> after linking with metadata, we have mutations from 1646 samples
#> Joining with `by = join_by(sample_id)`
#> Adding missing grouping variables: `SYMBOL`
#> Adding missing grouping variables: `SYMBOL`
#> annotating hotspots
#> Joining with `by = join_by(sample_id)`
#> FOXO1HOTSPOT
#> OK
#> CREBBPHOTSPOT
#> OK
#> MYD88HOTSPOT
#> OK