Get SSM By Regions. — get_ssm_by_regions • GAMBLR.results

Efficiently retrieve all mutations across a range of genomic regions.

Usage

get_ssm_by_regions(
  regions_list,
  regions_bed,
  these_sample_ids = NULL,
  these_samples_metadata = NULL,
  streamlined = TRUE,
  maf_data = maf_data,
  use_name_column = FALSE,
  from_indexed_flatfile = TRUE,
  mode = "slms-3",
  augmented = TRUE,
  this_seq_type = "genome",
  projection = "grch37",
  min_read_support = 4,
  basic_columns = FALSE,
  verbose = FALSE
)

Arguments

regions_list: Either provide a vector of regions in the chr:start-end format OR.
regions_bed: Better yet, provide a bed file with the coordinates you want to retrieve.
these_sample_ids: Optional, a vector of multiple sample_id (or a single sample ID as a string) that you want results for.
these_samples_metadata: Optional, a metadata table (with sample IDs in a column) to subset the return to.
streamlined: If TRUE (default), only 3 columns will be kept in the maf (start, sample_id and region name). To return more columns, set this parameter to FALSE, see basic_column for more info. Note, if this parameter is TRUE, the function will disregard anything specified with basic_columns.
maf_data: Use an already loaded MAF data frame.
use_name_column: If your bed-format data frame has a name column (must be named "name") these can be used to name your regions.
from_indexed_flatfile: Set to TRUE to avoid using the database and instead rely on flatfiles (only works for streamlined data, not full MAF details).
mode: Only works with indexed flatfiles. Accepts 2 options of "slms-3" and "strelka2" to indicate which variant caller to use. Default is "slms-3".
augmented: default: TRUE. Set to FALSE if you instead want the original MAF from each sample for multi-sample patients instead of the augmented MAF
this_seq_type: The seq_type you want back, default is genome.
projection: Obtain variants projected to this reference (one of grch37 or hg38).
min_read_support: Only returns variants with at least this many reads in t_alt_count (for cleaning up augmented MAFs).
basic_columns: Parameter to be used when streamlined is FALSE. Set this parameter to TRUE for returning a maf with standard 45 columns, set to FALSE to keep all 116 maf columns in the returned object. To return all 116 maf columns, set this parameter to FALSE.
verbose: Boolean parameter set to FALSE per default.

Value

Returns a data frame of variants in MAF-like format.

Details

This function internally calls get_ssm_by_region to retrieve SSM calls for the specified regions. See parameter descriptions for get_ssm_by_region for more information on how the different parameters can be called. Is this function not what you are looking for? Try one of the following, similar, functions; get_coding_ssm, get_coding_ssm_status, get_ssm_by_sample, get_ssm_by_samples, get_ssm_by_region

Examples


regions_bed = GAMBLR.utils::create_bed_data(
   GAMBLR.data::grch37_ashm_regions,
   fix_names = "concat",
   concat_cols = c("gene","region"),sep="-"
) %>% head(20)

DLBCL_meta = suppressMessages(get_gambl_metadata()) %>% 
                dplyr::filter(pathology=="DLBCL")
ashm_MAF = get_ssm_by_regions(regions_bed = regions_bed,
                             these_samples_metadata = DLBCL_meta,
                             streamlined=FALSE)
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#>   dat <- vroom(...)
#>   problems(dat)
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#>   dat <- vroom(...)
#>   problems(dat)
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#>   dat <- vroom(...)
#>   problems(dat)
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#>   dat <- vroom(...)
#>   problems(dat)
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#>   dat <- vroom(...)
#>   problems(dat)
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#>   dat <- vroom(...)
#>   problems(dat)
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#>   dat <- vroom(...)
#>   problems(dat)
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#>   dat <- vroom(...)
#>   problems(dat)
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#>   dat <- vroom(...)
#>   problems(dat)
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#>   dat <- vroom(...)
#>   problems(dat)
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#>   dat <- vroom(...)
#>   problems(dat)
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#>   dat <- vroom(...)
#>   problems(dat)
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#>   dat <- vroom(...)
#>   problems(dat)
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#>   dat <- vroom(...)
#>   problems(dat)
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#>   dat <- vroom(...)
#>   problems(dat)
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#>   dat <- vroom(...)
#>   problems(dat)
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#>   dat <- vroom(...)
#>   problems(dat)
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#>   dat <- vroom(...)
#>   problems(dat)
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#>   dat <- vroom(...)
#>   problems(dat)
ashm_MAF %>% dplyr::arrange(Start_Position,Tumor_Sample_Barcode) %>%
              dplyr::select(Hugo_Symbol,
                    Tumor_Sample_Barcode,
                    Chromosome,Start_Position,
                    Reference_Allele,
                    Tumor_Seq_Allele2)
#> genomic_data Object
#> Genome Build: grch37 
#> Showing first 10 rows:
#>    Hugo_Symbol      Tumor_Sample_Barcode Chromosome Start_Position
#> 1       KLHL21           13-26835_tumorA          1        6661537
#> 2       KLHL21           13-26835_tumorB          1        6661537
#> 3       KLHL21           13-26835_tumorD          1        6661537
#> 4       KLHL21                  SP193546          1        6661538
#> 5       KLHL21 HTMCP-01-06-00497-01A-01D          1        6661563
#> 6       KLHL21           17-40409_tumorA          1        6661575
#> 7       KLHL21           17-40409_tumorB          1        6661575
#> 8       KLHL21 HTMCP-01-06-00136-01A-01D          1        6661604
#> 9       KLHL21                 15-26538T          1        6661607
#> 10      KLHL21                 10-18191T          1        6661655
#>    Reference_Allele Tumor_Seq_Allele2
#> 1                 A                 T
#> 2                 A                 T
#> 3                 A                 T
#> 4                 C                 G
#> 5                 G                 C
#> 6                 C                 T
#> 7                 C                 T
#> 8                 G                 C
#> 9                 G                 A
#> 10                A                 G