Count hypermutated bins and generate heatmap/cluster the data.

get_mutation_frequency_bin_matrix(
  regions,
  regions_df,
  these_samples_metadata,
  seq_type = "genome",
  region_padding = 1000,
  metadataColumns = c("pathology"),
  sortByColumns = c("pathology"),
  expressionColumns = c(),
  orientation = "sample_rows",
  skip_regions = c("MYC", "BCL2", "IGLL5"),
  customColour = NULL,
  slide_by = 100,
  window_size = 500,
  min_count_per_bin = 3,
  min_bin_recurrence = 5,
  min_bin_patient = 0,
  region_fontsize = 8,
  cluster_rows_heatmap = FALSE,
  cluster_cols_heatmap = FALSE,
  show_gene_colours = FALSE,
  legend_row = 3,
  legend_col = 3,
  legend_direction = "horizontal",
  legendFontSize = 10,
  from_indexed_flatfile = TRUE,
  mode = "slms-3"
)

Arguments

regions

Vector of regions in the format "chr:start-end".

regions_df

Data frame of regions with four columns (chrom,start,end,gene_name).

these_samples_metadata

GAMBL metadata subset to the cases you want to process (or full metadata).

seq_type

The seq_type you want back, default is genome.

region_padding

How many bases will be added on the left and right of the regions to ensure any small regions are sufficiently covered by bins. Default is 1000.

metadataColumns

What metadata will be shown in the visualization.

sortByColumns

Which of the metadata to sort on for the heatmap.

expressionColumns

Optional variable for retrieving expression values for a specific gene(s).

orientation

Specify the sample orientation, default is sample_rows.

skip_regions

Regions to be filtered out from the regions data frame. Only applies if regions_df is not provided. Default is MYC, BCL2 and IGLL5.

customColour

Optional named list of named vectors for specifying all colours for metadata. Can be generated with map_metadata_to_colours. Default is NULL.

slide_by

How far to shift before starting the next window.

window_size

The width of your sliding window.

min_count_per_bin

Minimum counts per bin, default is 3.

min_bin_recurrence

How many samples a bin must be mutated in to retain in the visualization.

min_bin_patient

How many bins must a patient mutated in to retain in the visualization.

region_fontsize

Font size of regions in plot, default is 8ppt.

cluster_rows_heatmap

Optional parameter to enable/disable clustering of each dimension of the heatmap. Default is FALSE.

cluster_cols_heatmap

Optional parameter to enable/disable clustering of each dimension of the heatmap. Default is FALSE.

show_gene_colours

Optional logical argument indicating whether regions should have associated colours plotted as annotation track of heatmap.

legend_row

Fiddle with these to widen or narrow your legend.

legend_col

Fiddle with these to widen or narrow your legend.

legend_direction

Accepts one of "horizontal" (default) or "vertical" to indicate in which direction the legend will be drawn.

legendFontSize

Font size of legend in plot, default is 10ppt.

from_indexed_flatfile

Set to TRUE to avoid using the database and instead rely on flat files (only works for streamlined data, not full MAF details).

mode

Only works with indexed flat files. Accepts 2 options of "slms-3" and "strelka2" to indicate which variant caller to use. Default is "slms-3".

Value

Nothing

Details

This function takes a metadata table with these_samples_metadata parameter and internally calls calc_mutation_frequency_sliding_windows (that internally calls get_ssm_by_regions) to retrieve mutations for plotting. This plotting function has a variety of useful parameters, providing many customizable plotting options. For more details on how these parameters can be used, and extended usage examples, refer to the SSM tutorial vignette section 1.4.9.

Examples

#load metadata.
metadata = get_gambl_metadata()
dlbcl_bl_meta = dplyr::filter(metadata, pathology %in% c("DLBCL", "BL"))

#bring together all derived sample-level results from many GAMBL pipelines.
dlbcl_bl_meta = collate_results(join_with_full_metadata = TRUE,
                                these_samples_metadata = dlbcl_bl_meta)
#> /projects/nhl_meta_analysis_scratch/gambl/results_local/shared/gambl_genome_results.tsv
#> Joining with `by = join_by(patient_id, sample_id, biopsy_id)`

#get ashm regions
some_regions = grch37_ashm_regions
#> Error in eval(expr, envir, enclos): object 'grch37_ashm_regions' not found

get_mutation_frequency_bin_matrix(these_samples_metadata = dlbcl_bl_meta,
                                  regions_df = some_regions)
#> Error in apply(regions_df, 1, function(x) {    paste0(x[1], ":", as.numeric(x[2]) - region_padding, "-",         as.numeric(x[3]) + region_padding)}): object 'some_regions' not found