Get SSM By Regions.
get_ssm_by_regions.Rd
Efficiently retrieve all mutations across a range of genomic regions.
Usage
get_ssm_by_regions(
these_samples_metadata,
regions_list,
regions_bed,
this_seq_type = "genome",
streamlined = TRUE,
projection = "grch37",
verbose = FALSE,
use_name_column = FALSE,
tool_name = "slms-3",
...
)
Arguments
- these_samples_metadata
Optional, a metadata table (with sample IDs in a column) to subset the return to.
- regions_list
A vector of regions in the chr:start-end format to restrict the returned SSM calls to.
- regions_bed
A data frame in BED format with the coordinates you want to retrieve (recommended). This parameter can also accept an additional column with region names that will be added to the return if
use_name_column = TRUE
- this_seq_type
The this_seq_type you want back, default is genome.
- streamlined
If set to TRUE (default) only 3 columns will be kept in the returned data frame (start, sample_id and region_name).
- projection
Obtain variants projected to this reference (one of grch37 or hg38), default is grch37.
- verbose
Set to TRUE to maximize the output to console. Default is TRUE. This parameter also dictates the verbosity of any helper function internally called inside the main function.
- use_name_column
If your bed-format data frame has a name column (must be named "name") these can be used to name your regions.
- tool_name
Optionally specify which tool to report variant from. The default is slms-3, also supports "publication" to return the exact variants as reported in the original papers.
- ...
Any additional parameters.
Details
This function internally calls get_ssm_by_region to retrieve SSM calls for the specified regions.
Examples
#basic usage, adding custom names from bundled ashm data frame
regions_bed = create_bed_data( GAMBLR.data::grch37_ashm_regions,
fix_names = "concat",
concat_cols = c("gene","region"),
sep="-")
my_meta = get_gambl_metadata()
#> Using the bundled metadata in GAMBLR.data...
# get a full MAF-format data frame for all aSHM regions on grch37 coordinates
ashm_maf = get_ssm_by_regions(regions_bed = regions_bed,
these_samples_metadata = my_meta,
streamlined = FALSE)
#> Using the bundled SSM calls (.maf) calls in GAMBLR.data...
#> Using the bundled SSM calls (.maf) calls in GAMBLR.data...
#> id_ease: WARNING! 1838 samples in the provided metadata were removed because their seq types are not the same as in the `set_type` argument. Use `verbose = TRUE` to see their IDs.
#> Running in default mode of any...
one_region_maf = get_ssm_by_regions(regions_list = "2:136875000-136875097",
streamlined = FALSE,
projection = "grch37",
these_samples_metadata = my_meta)
#> Using the bundled SSM calls (.maf) calls in GAMBLR.data...
#> Using the bundled SSM calls (.maf) calls in GAMBLR.data...
#> id_ease: WARNING! 1838 samples in the provided metadata were removed because their seq types are not the same as in the `set_type` argument. Use `verbose = TRUE` to see their IDs.
#> Running in default mode of any...
if (FALSE) { # \dontrun{
# This example fails, as it should
#ashm_maf = get_ssm_by_regions(regions_bed = regions_bed,
# these_samples_metadata = my_meta,
# projection="hg38")
# Error in get_ssm_by_regions(regions_bed = regions_bed, these_samples_metadata = my_meta, :
# requested projection: hg38 and genome_build of regions_bed: grch37 don't match
} # }