Mutation counts across sliding windows for multiple regions.
calc_mutation_frequency_bin_regions.Rd
Obtain a long tidy or wide matrix of mutation counts across sliding windows for multiple regions.
Usage
calc_mutation_frequency_bin_regions(
regions_list = NULL,
regions_bed = NULL,
these_samples_metadata = NULL,
these_sample_ids = NULL,
projection,
region_padding = 1000,
drop_unmutated = FALSE,
skip_regions = NULL,
only_regions = NULL,
slide_by = 100,
window_size = 500,
return_format = "wide"
)
Arguments
- regions_list
Named vector of regions in the format c(name1 = "chr:start-end", name2 = "chr:start-end"). If neither regions nor regions_bed is specified, the function will use GAMBLR aSHM region information.
- regions_bed
Data frame of regions with four columns (chrom, start, end, name).
- these_samples_metadata
Metadata with at least sample_id column. If not providing a maf data frame, seq_type is also required.
- these_sample_ids
Vector of sample IDs. Metadata will be subset to sample IDs present in this vector.
- projection
Genome build the function will operate in. Ensure this matches your provided regions and maf data for correct chr prefix handling. Default "grch37".
- region_padding
Amount to pad the start and end coordinates by. Default 1000.
- drop_unmutated
Whether to drop bins with 0 mutations. If returning a matrix format, this will only drop bins with no mutations in any samples.
- skip_regions
Optional character vector of genes to exclude from the default aSHM regions.
- only_regions
Optional character vector of genes to include from the default aSHM regions.
- slide_by
Slide size for sliding window. Default 100.
- window_size
Size of sliding window. Default 500.
- return_format
Return format of mutations. Accepted inputs are "long" and "wide". Long returns a data frame of one sample ID/window per row. Wide returns a matrix with one sample ID per row and one window per column. Using the "wide" format will retain all samples and windows regardless of the drop_unmutated or min_count_per_bin parameters. Default wide.
Value
A table of mutation counts for sliding windows across one or more regions. May be long or wide.
Details
This function takes a metadata table with these_samples_metadata
parameter and internally calls calc_mutation_frequency_bin_region (that internally calls get_ssm_by_regions).
to retrieve mutation counts for sliding windows across one or more regions. May optionally provide any combination of a maf data frame, existing metadata, or a regions data frame or named vector.
The heatmap plotting portion of this function has been moved to heatmap_mutation_frequency_bin.
Examples
if (FALSE) { # \dontrun{
#load metadata.
metadata = suppressMessages(get_gambl_metadata())
dlbcl_bl_meta = dplyr::filter(metadata, pathology %in% c("DLBCL", "BL"))
#get ashm regions
some_regions = create_bed_data(GAMBLR.data::grch37_ashm_regions,
fix_names = "concat",
concat_cols = c("gene","region"),sep="-")
some_regions
mut_count_matrix <- calc_mutation_frequency_bin_regions(
these_samples_metadata = dlbcl_bl_meta,
regions_bed = some_regions
)
} # }