get_coding_ssm_status.Rd
Tabulate mutation status (SSM) for a set of genes.
get_coding_ssm_status(
gene_symbols,
these_samples_metadata,
from_flatfile = TRUE,
augmented = TRUE,
min_read_support = 3,
maf_path = NULL,
maf_data,
include_hotspots = TRUE,
keep_multihit_hotspot = FALSE,
recurrence_min = 5,
seq_type = "genome",
projection = "grch37",
review_hotspots = TRUE,
genes_of_interest = c("FOXO1", "MYD88", "CREBBP"),
genome_build = "hg19",
include_silent = TRUE
)
A vector of gene symbols for which the mutation status will be tabulated. If not provided, lymphoma genes will be returned by default.
The metadata for samples of interest to be included in the returned matrix. Only the column "sample_id" is required. If not provided, the matrix is tabulated for all available samples as default.
Optional argument whether to use database or flat file to retrieve mutations. Default is TRUE.
default: TRUE. Set to FALSE if you instead want the original MAF from each sample for multi-sample patients instead of the augmented MAF.
Only returns variants with at least this many reads in t_alt_count (for cleaning up augmented MAFs).
If the status of coding SSM should be tabulated from a custom maf file, provide path to the maf in this argument. The default is set to NULL.
Either a maf loaded from disk or from the database using a get_ssm function.
Logical parameter indicating whether hotspots object should also be tabulated. Default is TRUE.
Logical parameter indicating whether to keep the gene annotation as mutated when the gene has both hot spot and non-hotspot mutation. Default is FALSE. If set to TRUE, will report the number of non-hotspot mutations instead of tabulating for just mutation presence.
Integer value indicating minimal recurrence level.
The seq_type you want back, default is genome.
Specify projection (grch37 or hg38) of mutations. Default is grch37.
Logical parameter indicating whether hotspots object should be reviewed to include functionally relevant mutations or rare lymphoma-related genes. Default is TRUE.
A vector of genes for hotspot review. Currently only FOXO1, MYD88, and CREBBP are supported.
Reference genome build for the coordinates in the MAF file. The default is hg19 genome build.
Logical parameter indicating whether to include silent mutations into coding mutations. Default is TRUE.
A data frame with tabulated mutation status.
This function takes a vector of gene symbols and subsets the incoming MAF to specified genes. If no genes are provided, the function will default to all lymphoma genes.
The function can accept a wide range of incoming MAFs. For example, the user can call this function with these_samples_metadata
(preferably a metadata table that has been subset to the sample IDs of interest).
If this parameter is not called, the function will default to all samples available with get_gambl_metadata. The user can also provide a path to a MAF, or MAF-like file with maf_path
,
or an already loaded MAF can be used with the maf_data
parameter. If both maf_path
and maf_data
is missing, the function will default to run get_coding_ssm
.
This function also has a lot of filtering and convenience parameters giving the user full control of the return. For more information, refer to the parameter descriptions and examples.
Is this function not what you are looking for? Try one of the following, similar, functions; get_coding_ssm, get_ssm_by_patients, get_ssm_by_sample,
get_ssm_by_samples, get_ssm_by_region, get_ssm_by_regions
coding_tabulated_df = get_coding_ssm_status(maf_data = grande_maf,
gene_symbols = "EGFR")
#> Joining with `by = join_by(sample_id)`
#> Adding missing grouping variables: `SYMBOL`
#> Adding missing grouping variables: `SYMBOL`
#> annotating hotspots
#> Joining with `by = join_by(sample_id)`
#> FOXO1HOTSPOT
#all lymphoma genes from bundled NHL gene list
coding_tabulated_df = get_coding_ssm_status()
#> defaulting to all lymphoma genes
#> reading from: /projects/nhl_meta_analysis_scratch/gambl/results_local/all_the_things/slms_3-1.0_vcf2maf-1.3/genome--projection/deblacklisted/augmented_maf/all_slms-3--grch37.CDS.maf
#> mutations from 1652 samples
#> after linking with metadata, we have mutations from 1646 samples
#> Joining with `by = join_by(sample_id)`
#> Adding missing grouping variables: `SYMBOL`
#> Adding missing grouping variables: `SYMBOL`
#> annotating hotspots
#> Joining with `by = join_by(sample_id)`
#> FOXO1HOTSPOT
#> OK
#> CREBBPHOTSPOT
#> OK
#> MYD88HOTSPOT
#> OK