Skip to contents

Count the number of mutations in a sliding window across a region for all samples.

Usage

calc_mutation_frequency_bin_region(
  region,
  chromosome,
  start_pos,
  end_pos,
  these_samples_metadata,
  these_sample_ids = NULL,
  maf_data = NULL,
  projection = "grch37",
  slide_by = 100,
  window_size = 1000,
  return_format = "long",
  min_count_per_bin = 0,
  return_count = TRUE,
  drop_unmutated = FALSE
)

Arguments

region

A string describing a genomic region in the "chrom:start-end" format. The region must be specifed in this format OR as separate chromosome, start_pos, end_pos arguments.

chromosome

Chromosome name in region.

start_pos

Start coordinate of region.

end_pos

End coordinate of region.

these_samples_metadata

Optional data frame containing a sample_id column. If not providing a maf file, seq_type is also a required column.

these_sample_ids

Optional vector of sample IDs. Output will be subset to IDs present in this vector.

maf_data

Optional maf data frame. Will be subset to rows where Tumor_Sample_Barcode matches provided sample IDs or metadata table. If not provided, maf data will be obtained with get_ssm_by_regions().

projection

Specify which genome build to use. Required.

slide_by

Slide size for sliding window. Default 100.

window_size

Size of sliding window. Default 1000.

return_format

Return format of mutations. Accepted inputs are "long" and "wide". Long returns a data frame of one sample ID/window per row. Wide returns a matrix with one sample ID per row and one window per column. Using the "wide" format will retain all samples and windows regardless of the drop_unmutated or min_count_per_bin parameters.

min_count_per_bin

Minimum counts per bin, default is 0. Setting this greater than 0 will drop unmutated windows only when return_format is long.

return_count

Boolean statement to return mutation count per window (TRUE) or binary mutated/unmutated status (FALSE). Default is TRUE.

drop_unmutated

Boolean for whether to drop windows with 0 mutations. Only effective with "long" return format.

Value

Either a matrix or a long tidy table of counts per window.

Details

This function is called to return the mutation frequency for a given region, either from a provided input maf data frame or from the GAMBL maf data.

Regions are specified with the regionparameter.

Alternatively, the region of interest can also be specified by calling the function with chromosome, start_pos, and end_pos parameters.

This function operates on a single region. To return a matrix of sliding window counts over multiple regions, see calc_mutation_frequency_bin_regionUse .

Examples

meta = suppressMessages(get_gambl_metadata()) %>% 
                        dplyr::filter(pathology=="MCL")

mut_freq = calc_mutation_frequency_bin_region(these_samples_metadata = meta,
                                              region = "11:69455000-69459900",
                                              slide_by = 10,
                                              window_size = 10000)
#> processing bins of size 10000 across 4900 bp region
#> Using GAMBLR.results::get_ssm_by_region...
#> Joining with `by = join_by(sample_id)`
#> Joining with `by = join_by(window_start)`
head(mut_freq)
#> # A tibble: 6 × 3
#>   sample_id bin         mutation_count
#>   <chr>     <chr>                <int>
#> 1 01-11817T 11_69455000              5
#> 2 01-11817T 11_69455010              5
#> 3 01-11817T 11_69455020              5
#> 4 01-11817T 11_69455030              5
#> 5 01-11817T 11_69455040              5
#> 6 01-11817T 11_69455050              5

if (FALSE) { # \dontrun{
# This will fail because the chromosome naming doesn't match the default projection 
misguided_attempt = calc_mutation_frequency_bin_region(these_samples_metadata = meta,
                                                         region = "chr11:69455000-69459900",
                                                         slide_by = 10,
                                                         window_size = 10000) 
# This will work!
mut_freq = calc_mutation_frequency_bin_region(these_samples_metadata = meta,
                                                         region = "chr11:69455000-69459900",
                                                         slide_by = 10,
                                                         window_size = 10000,projection="hg38")
head(mut_freq)
} # }