Count the number of mutations in a sliding window across a region for all samples.

calc_mutation_frequency_sliding_windows(
  this_region,
  chromosome,
  start_pos,
  end_pos,
  metadata,
  seq_type,
  slide_by = 100,
  window_size = 1000,
  plot_type = "none",
  sortByColumns = "pathology",
  return_format = "long-simple",
  min_count_per_bin = 3,
  return_count = FALSE,
  drop_unmutated = FALSE,
  classification_column = "lymphgen",
  from_indexed_flatfile = FALSE,
  mode = "slms-3"
)

Arguments

this_region

Genomic region in bed format.

chromosome

Chromosome name in region.

start_pos

Start coordinate of region.

end_pos

End coordinate of region.

metadata

Data frame containing sample ids and column with annotated data for the 2 groups of interest. All other columns are ignored. Currently, function exits if asked to compare more than 2 groups.

seq_type

The seq_type you want back, default is genome.

slide_by

Slide size for sliding window, default is 100.

window_size

Size of sliding window, default is 1000.

plot_type

Set to TRUE for a plot of your bins. By default no plots are made.

sortByColumns

Which of the metadata to sort on for the heatmap

return_format

Return format of mutations. Accepted inputs are "long" and "long-simple". Default is "long-simple".

min_count_per_bin

Minimum counts per bin, default is 3.

return_count

Boolean statement to return count. Default is FALSE.

drop_unmutated

This may not currently work properly. Default is FALSE.

classification_column

Only used for plotting, default is "lymphgen"

from_indexed_flatfile

Set to TRUE to avoid using the database and instead rely on flat-files (only works for streamlined data, not full MAF details). Default is FALSE.

mode

Only works with indexed flat-files. Accepts 2 options of "slms-3" and "strelka2" to indicate which variant caller to use. Default is "slms-3".

Value

Count matrix.

Details

This function is called to return the mutation frequency for a given region, for all GAMBL samples. Regions are specified with the this_regionparameter. Alternatively, the region of interest can also be specified by calling the function with chromosome, start_pos, and end_pos parameters. It is also possible to return a plot of the created bins. This is done with setting plot_type = TRUE. There are a collection of parameters available for further customizing the return, for more information, refer to the parameter descriptions and examples. This function is unlikely to be used directly in most cases. See get_mutation_frequency_bin_matrix instead.

Examples

chr11_mut_freq = calc_mutation_frequency_sliding_windows(this_region = "chr11:69455000-69459900",
                                                         slide_by = 10,
                                                         window_size = 10000)
#> processing bins of size 10000 across 4900 bp region