Summarize SSM (Somatic Single Nucleotide Mutation) Status Across Samples — summarize_all_ssm

This function summarizes the mutation status for a set of genes across multiple samples, separating mutations by class for genes specified by separate_by_class_genes and counting the number of hits in each sample per mutation category.

Usage

summarize_all_ssm_status(
  maf_df,
  these_samples_metadata,
  genes_of_interest,
  synon_genes,
  silent_maf_df,
  separate_by_class_genes = NULL,
  count_hits = FALSE
)

Arguments

maf_df: A data frame containing mutation annotation format (MAF) data, with at least the following columns: Hugo_Symbol, Variant_Classification, and Tumor_Sample_Barcode.
these_samples_metadata: A data frame containing metadata for the samples, with at least a sample_id column. Any sample that does not have a matching sample_id in these_samples_metadata will be dropped.
genes_of_interest: A character vector of gene symbols to include in the summary. If missing, defaults to all Tier 1 B-cell lymphoma genes.
synon_genes: (Optional) A character vector of gene symbols for which synonymous mutations should be included.
silent_maf_df: (Optional) A separate data frame containing silent mutation data if the user doesn't want to pull silent mutation status from maf_df. This argument is useful when you want to combine mutations from the output of get_coding_ssm and get_ssm_by_region or get_ssm_by_gene
separate_by_class_genes: (Optional) A character vector of gene symbols for which mutations should be separated by class (e.g., "Nonsense_Mutation", "Missense_Mutation").
count_hits: Logical; if TRUE, counts the number of mutations per gene per sample. If FALSE (default), only presence/absence is recorded.

Value

A wide-format data frame (matrix) with samples as rows and mutation types as columns. Each cell contains either the count of mutations (if count_hits = TRUE) or a binary indicator (0/1) for mutation presence.

Details

Mutations are grouped and optionally separated by mutation class for genes specified in separate_by_class_genes
Synonymous mutations can be counted as another separate feature for genes specified by synon_genes genes.
The function simplifies mutation annotations and pivots the data to a wide format suitable for downstream analysis.

Examples

# A basic example, using only the output of get_all_coding_ssm
# Since the only non-coding class this function handles is Silent,
# we will be missing most non-coding types such as Intron, UTR, Flank
if (FALSE) { # \dontrun{
sample_metadata = get_sample_metadata() %>% filter(seq_type!= "mrna")

maf_data = get_all_coding_ssm(sample_metadata)

mutation_matrix <- summarize_all_ssm_status(
  maf_df = maf_data,
  these_samples_metadata = sample_metadata,
  genes_of_interest = c("TP53", "SGK1", "BCL2"),
  synon_genes = c("BCL2"),
  separate_by_class_genes = c("TP53","SGK1"),
  count_hits = FALSE
)
} # }