Skip to contents

Obtain a long tidy or wide matrix of mutation counts across sliding windows for multiple regions.

Usage

calc_mutation_frequency_bin_regions(
  regions_list = NULL,
  regions_bed = NULL,
  these_samples_metadata = NULL,
  these_sample_ids = NULL,
  projection,
  region_padding = 1000,
  drop_unmutated = FALSE,
  skip_regions = NULL,
  only_regions = NULL,
  slide_by = 100,
  window_size = 500,
  return_format = "wide"
)

Arguments

regions_list

Named vector of regions in the format c(name1 = "chr:start-end", name2 = "chr:start-end"). If neither regions nor regions_bed is specified, the function will use GAMBLR aSHM region information.

regions_bed

Data frame of regions with four columns (chrom, start, end, name).

these_samples_metadata

Metadata with at least sample_id column. If not providing a maf data frame, seq_type is also required.

these_sample_ids

Vector of sample IDs. Metadata will be subset to sample IDs present in this vector.

projection

Genome build the function will operate in. Ensure this matches your provided regions and maf data for correct chr prefix handling. Default "grch37".

region_padding

Amount to pad the start and end coordinates by. Default 1000.

drop_unmutated

Whether to drop bins with 0 mutations. If returning a matrix format, this will only drop bins with no mutations in any samples.

skip_regions

Optional character vector of genes to exclude from the default aSHM regions.

only_regions

Optional character vector of genes to include from the default aSHM regions.

slide_by

Slide size for sliding window. Default 100.

window_size

Size of sliding window. Default 500.

return_format

Return format of mutations. Accepted inputs are "long" and "wide". Long returns a data frame of one sample ID/window per row. Wide returns a matrix with one sample ID per row and one window per column. Using the "wide" format will retain all samples and windows regardless of the drop_unmutated or min_count_per_bin parameters. Default wide.

Value

A table of mutation counts for sliding windows across one or more regions. May be long or wide.

Details

This function takes a metadata table with these_samples_metadata parameter and internally calls calc_mutation_frequency_bin_region (that internally calls get_ssm_by_regions). to retrieve mutation counts for sliding windows across one or more regions. May optionally provide any combination of a maf data frame, existing metadata, or a regions data frame or named vector. The heatmap plotting portion of this function has been moved to heatmap_mutation_frequency_bin.

Examples

if (FALSE) { # \dontrun{
 #load metadata.
 metadata = suppressMessages(get_gambl_metadata())
 dlbcl_bl_meta = dplyr::filter(metadata, pathology %in% c("DLBCL", "BL"))

 #get ashm regions
 some_regions = create_bed_data(GAMBLR.data::grch37_ashm_regions,
                               fix_names = "concat",
                               concat_cols = c("gene","region"),sep="-")
 some_regions
 mut_count_matrix <- calc_mutation_frequency_bin_regions(
   these_samples_metadata = dlbcl_bl_meta,
   regions_bed = some_regions
 )
} # }