Calculate Mutation Frequency By Sliding Window.
calc_mutation_frequency_bin_region.Rd
Count the number of mutations in a sliding window across a region for all samples.
Usage
calc_mutation_frequency_bin_region(
region,
chromosome,
start_pos,
end_pos,
these_samples_metadata,
these_sample_ids = NULL,
maf_data = NULL,
projection = "grch37",
slide_by = 100,
window_size = 1000,
return_format = "long",
min_count_per_bin = 0,
return_count = TRUE,
drop_unmutated = FALSE
)
Arguments
- region
A string describing a genomic region in the "chrom:start-end" format. The region must be specifed in this format OR as separate chromosome, start_pos, end_pos arguments.
- chromosome
Chromosome name in region.
- start_pos
Start coordinate of region.
- end_pos
End coordinate of region.
- these_samples_metadata
Optional data frame containing a sample_id column. If not providing a maf file, seq_type is also a required column.
- these_sample_ids
Optional vector of sample IDs. Output will be subset to IDs present in this vector.
- maf_data
Optional maf data frame. Will be subset to rows where Tumor_Sample_Barcode matches provided sample IDs or metadata table. If not provided, maf data will be obtained with get_ssm_by_regions().
- projection
Specify which genome build to use. Required.
- slide_by
Slide size for sliding window. Default 100.
- window_size
Size of sliding window. Default 1000.
- return_format
Return format of mutations. Accepted inputs are "long" and "wide". Long returns a data frame of one sample ID/window per row. Wide returns a matrix with one sample ID per row and one window per column. Using the "wide" format will retain all samples and windows regardless of the drop_unmutated or min_count_per_bin parameters.
- min_count_per_bin
Minimum counts per bin, default is 0. Setting this greater than 0 will drop unmutated windows only when return_format is long.
- return_count
Boolean statement to return mutation count per window (TRUE) or binary mutated/unmutated status (FALSE). Default is TRUE.
- drop_unmutated
Boolean for whether to drop windows with 0 mutations. Only effective with "long" return format.
Details
This function is called to return the mutation frequency for a given region, either from a provided input maf data frame or from the GAMBL maf data.
Regions are specified with the region
parameter.
Alternatively, the region of interest can also be specified by calling the function with chromosome
, start_pos
, and end_pos
parameters.
This function operates on a single region. To return a matrix of sliding window counts over multiple regions, see calc_mutation_frequency_bin_regionUse .
Examples
meta = suppressMessages(get_gambl_metadata()) %>%
dplyr::filter(pathology=="MCL")
mut_freq = calc_mutation_frequency_bin_region(these_samples_metadata = meta,
region = "11:69455000-69459900",
slide_by = 10,
window_size = 10000)
#> processing bins of size 10000 across 4900 bp region
#> Using GAMBLR.results::get_ssm_by_region...
#> Joining with `by = join_by(sample_id)`
#> Joining with `by = join_by(window_start)`
head(mut_freq)
#> # A tibble: 6 × 3
#> sample_id bin mutation_count
#> <chr> <chr> <int>
#> 1 01-11817T 11_69455000 5
#> 2 01-11817T 11_69455010 5
#> 3 01-11817T 11_69455020 5
#> 4 01-11817T 11_69455030 5
#> 5 01-11817T 11_69455040 5
#> 6 01-11817T 11_69455050 5
if (FALSE) { # \dontrun{
# This will fail because the chromosome naming doesn't match the default projection
misguided_attempt = calc_mutation_frequency_bin_region(these_samples_metadata = meta,
region = "chr11:69455000-69459900",
slide_by = 10,
window_size = 10000)
# This will work!
mut_freq = calc_mutation_frequency_bin_region(these_samples_metadata = meta,
region = "chr11:69455000-69459900",
slide_by = 10,
window_size = 10000,projection="hg38")
head(mut_freq)
} # }