Skip to contents

Efficiently retrieve all mutations across a range of genomic regions.

Usage

get_ssm_by_regions(
  regions_list,
  regions_bed,
  these_sample_ids = NULL,
  these_samples_metadata = NULL,
  streamlined = TRUE,
  maf_data = maf_data,
  use_name_column = FALSE,
  from_indexed_flatfile = TRUE,
  mode = "slms-3",
  augmented = TRUE,
  this_seq_type = "genome",
  projection = "grch37",
  min_read_support = 4,
  basic_columns = FALSE,
  verbose = FALSE
)

Arguments

regions_list

Either provide a vector of regions in the chr:start-end format OR.

regions_bed

Better yet, provide a bed file with the coordinates you want to retrieve.

these_sample_ids

Optional, a vector of multiple sample_id (or a single sample ID as a string) that you want results for.

these_samples_metadata

Optional, a metadata table (with sample IDs in a column) to subset the return to.

streamlined

If TRUE (default), only 3 columns will be kept in the maf (start, sample_id and region name). To return more columns, set this parameter to FALSE, see basic_column for more info. Note, if this parameter is TRUE, the function will disregard anything specified with basic_columns.

maf_data

Use an already loaded MAF data frame.

use_name_column

If your bed-format data frame has a name column (must be named "name") these can be used to name your regions.

from_indexed_flatfile

Set to TRUE to avoid using the database and instead rely on flatfiles (only works for streamlined data, not full MAF details).

mode

Only works with indexed flatfiles. Accepts 2 options of "slms-3" and "strelka2" to indicate which variant caller to use. Default is "slms-3".

augmented

default: TRUE. Set to FALSE if you instead want the original MAF from each sample for multi-sample patients instead of the augmented MAF

this_seq_type

The seq_type you want back, default is genome.

projection

Obtain variants projected to this reference (one of grch37 or hg38).

min_read_support

Only returns variants with at least this many reads in t_alt_count (for cleaning up augmented MAFs).

basic_columns

Parameter to be used when streamlined is FALSE. Set this parameter to TRUE for returning a maf with standard 45 columns, set to FALSE to keep all 116 maf columns in the returned object. To return all 116 maf columns, set this parameter to FALSE.

verbose

Boolean parameter set to FALSE per default.

Value

Returns a data frame of variants in MAF-like format.

Details

This function internally calls get_ssm_by_region to retrieve SSM calls for the specified regions. See parameter descriptions for get_ssm_by_region for more information on how the different parameters can be called. Is this function not what you are looking for? Try one of the following, similar, functions; get_coding_ssm, get_coding_ssm_status, get_ssm_by_sample, get_ssm_by_samples, get_ssm_by_region

Examples


regions_bed = GAMBLR.utils::create_bed_data(
   GAMBLR.data::grch37_ashm_regions,
   fix_names = "concat",
   concat_cols = c("gene","region"),sep="-"
) %>% head(20)

DLBCL_meta = suppressMessages(get_gambl_metadata()) %>% 
                dplyr::filter(pathology=="DLBCL")
ashm_MAF = get_ssm_by_regions(regions_bed = regions_bed,
                             these_samples_metadata = DLBCL_meta,
                             streamlined=FALSE)
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#>   dat <- vroom(...)
#>   problems(dat)
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#>   dat <- vroom(...)
#>   problems(dat)
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#>   dat <- vroom(...)
#>   problems(dat)
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#>   dat <- vroom(...)
#>   problems(dat)
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#>   dat <- vroom(...)
#>   problems(dat)
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#>   dat <- vroom(...)
#>   problems(dat)
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#>   dat <- vroom(...)
#>   problems(dat)
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#>   dat <- vroom(...)
#>   problems(dat)
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#>   dat <- vroom(...)
#>   problems(dat)
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#>   dat <- vroom(...)
#>   problems(dat)
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#>   dat <- vroom(...)
#>   problems(dat)
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#>   dat <- vroom(...)
#>   problems(dat)
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#>   dat <- vroom(...)
#>   problems(dat)
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#>   dat <- vroom(...)
#>   problems(dat)
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#>   dat <- vroom(...)
#>   problems(dat)
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#>   dat <- vroom(...)
#>   problems(dat)
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#>   dat <- vroom(...)
#>   problems(dat)
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#>   dat <- vroom(...)
#>   problems(dat)
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#>   dat <- vroom(...)
#>   problems(dat)
ashm_MAF %>% dplyr::arrange(Start_Position,Tumor_Sample_Barcode) %>%
              dplyr::select(Hugo_Symbol,
                    Tumor_Sample_Barcode,
                    Chromosome,Start_Position,
                    Reference_Allele,
                    Tumor_Seq_Allele2)
#> genomic_data Object
#> Genome Build: grch37 
#> Showing first 10 rows:
#>    Hugo_Symbol      Tumor_Sample_Barcode Chromosome Start_Position
#> 1       KLHL21           13-26835_tumorA          1        6661537
#> 2       KLHL21           13-26835_tumorB          1        6661537
#> 3       KLHL21           13-26835_tumorD          1        6661537
#> 4       KLHL21                  SP193546          1        6661538
#> 5       KLHL21 HTMCP-01-06-00497-01A-01D          1        6661563
#> 6       KLHL21           17-40409_tumorA          1        6661575
#> 7       KLHL21           17-40409_tumorB          1        6661575
#> 8       KLHL21 HTMCP-01-06-00136-01A-01D          1        6661604
#> 9       KLHL21                 15-26538T          1        6661607
#> 10      KLHL21                 10-18191T          1        6661655
#>    Reference_Allele Tumor_Seq_Allele2
#> 1                 A                 T
#> 2                 A                 T
#> 3                 A                 T
#> 4                 C                 G
#> 5                 G                 C
#> 6                 C                 T
#> 7                 C                 T
#> 8                 G                 C
#> 9                 G                 A
#> 10                A                 G