Summarize SSM (Somatic Single Nucleotide Mutation) Status Across Samples
summarize_all_ssm_status.RdThis function summarizes the mutation status for a set of genes
across multiple samples, separating mutations by class for genes specified
by separate_by_class_genes and counting the number of hits
in each sample per mutation category.
Usage
summarize_all_ssm_status(
maf_df,
these_samples_metadata,
genes_of_interest,
synon_genes,
silent_maf_df,
separate_by_class_genes = NULL,
count_hits = FALSE
)Arguments
- maf_df
A data frame containing mutation annotation format (MAF) data, with at least the following columns:
Hugo_Symbol,Variant_Classification, andTumor_Sample_Barcode.- these_samples_metadata
A data frame containing metadata for the samples, with at least a
sample_idcolumn. Any sample that does not have a matching sample_id in these_samples_metadata will be dropped.- genes_of_interest
A character vector of gene symbols to include in the summary. If missing, defaults to all Tier 1 B-cell lymphoma genes.
- synon_genes
(Optional) A character vector of gene symbols for which synonymous mutations should be included.
- silent_maf_df
(Optional) A separate data frame containing silent mutation data if the user doesn't want to pull silent mutation status from
maf_df. This argument is useful when you want to combine mutations from the output of get_coding_ssm and get_ssm_by_region or get_ssm_by_gene- separate_by_class_genes
(Optional) A character vector of gene symbols for which mutations should be separated by class (e.g., "Nonsense_Mutation", "Missense_Mutation").
- count_hits
Logical; if
TRUE, counts the number of mutations per gene per sample. IfFALSE(default), only presence/absence is recorded.
Value
A wide-format data frame (matrix) with samples as rows and
mutation types as columns. Each cell contains either the count of
mutations (if count_hits = TRUE) or a binary indicator (0/1)
for mutation presence.
Details
Mutations are grouped and optionally separated by mutation class for genes specified in
separate_by_class_genesSynonymous mutations can be counted as another separate feature for genes specified by
synon_genesgenes.The function simplifies mutation annotations and pivots the data to a wide format suitable for downstream analysis.
Examples
# A basic example, using only the output of get_all_coding_ssm
# Since the only non-coding class this function handles is Silent,
# we will be missing most non-coding types such as Intron, UTR, Flank
if (FALSE) { # \dontrun{
sample_metadata = get_sample_metadata() %>% filter(seq_type!= "mrna")
maf_data = get_all_coding_ssm(sample_metadata)
mutation_matrix <- summarize_all_ssm_status(
maf_df = maf_data,
these_samples_metadata = sample_metadata,
genes_of_interest = c("TP53", "SGK1", "BCL2"),
synon_genes = c("BCL2"),
separate_by_class_genes = c("TP53","SGK1"),
count_hits = FALSE
)
} # }