Get Coding SSM. — get_coding_ssm • GAMBLR.results

Retrieve all coding SSMs from one seq_type in GAMBL in MAF-like format.

Usage

get_coding_ssm(
  these_samples_metadata = NULL,
  force_unmatched_samples,
  projection = "grch37",
  this_seq_type = "genome",
  basic_columns = TRUE,
  maf_cols = NULL,
  augmented = TRUE,
  min_read_support = 3,
  groups = c("gambl", "icgc_dart"),
  include_silent = TRUE,
  engine,
  verbose = FALSE,
  limit_cohort,
  exclude_cohort,
  limit_pathology,
  limit_samples,
  from_flatfile,
  these_sample_ids
)

Arguments

these_samples_metadata: Optional (but highly recommended) metadata table to tell the function how to subset the data. Only sample_id in this table with the matching seq_type will be in the output. Not all samples may be in the output, though, if they are missing SSM results or had no mutations detected.
force_unmatched_samples: Optional argument for forcing unmatched samples, using get_ssm_by_samples.
projection: Reference genome build for the coordinates in the MAF file. The default is grch37.
this_seq_type: The seq_type you want SSMs from, default is genome.
basic_columns: Basic columns refers to the first 45 standard MAF columns. Set this to FALSE if you want all the available columns instead.
maf_cols: if basic_columns is set to FALSE, the user can specify which columns to be returned within the MAF. This parameter can either be a vector of indexes (integer) or a vector of characters (matching columns in MAF).
augmented: Set to FALSE if you instead want the original MAF from each sample for multi-sample patients instead of the augmented MAF
min_read_support: Only returns variants with at least this many reads in t_alt_count (for cleaning up augmented MAFs)
groups: Deprecated. Use these_samples_metadata instead.
include_silent: Logical indicating whether to include silent mutations in coding regions (i.e. synonymous). Default is TRUE.
engine: Deprecated. Ignored
verbose: Controls the "verboseness" of this function (and internally called helpers).
limit_cohort: Deprecated. Use these_samples_metadata instead.
exclude_cohort: Deprecated. Use these_samples_metadata instead.
limit_pathology: Deprecated. Use these_samples_metadata instead.
limit_samples: Deprecated. Use these_samples_metadata instead.
from_flatfile: Deprecated. Ignored
these_sample_ids: Deprecated. Use these_samples_metadata instead.

Value

A data frame containing all the MAF data columns (one row per mutation).

Details

Effectively retrieve simple somatic mutations (SSM) results for either capture or genome seq_type (but not both at once). The resulting data frame will be a maf_data object, which tracks the genome build (projection) for the variants and will have a maf_seq_type column that tracks the origin seq_type each variant. In most cases, users should be using the related function that is able to obtain SSMs across both genome and capture seq_type: get_all_coding_ssm Is this function not what you are looking for? Try one of: get_coding_ssm_status, get_ssm_by_patients, get_ssm_by_sample, get_ssm_by_samples, get_ssm_by_region, get_ssm_by_regions

Examples


  #basic usage (defaults to genome seq_type)
  maf_genome = get_coding_ssm()
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#>   dat <- vroom(...)
#>   problems(dat)

  nrow(maf_genome)
#> [1] 248754

  dplyr::select(maf_genome,1,4,5,6,9,maf_seq_type)
#> genomic_data Object
#> Genome Build: grch37 
#> Showing first 10 rows:
#>      Hugo_Symbol NCBI_Build Chromosome Start_Position Variant_Classification
#> 1     AL627309.1     GRCh37          1         138626                 Silent
#> 2     AL627309.1     GRCh37          1         138972        Frame_Shift_Ins
#> 3  RP11-206L10.9     GRCh37          1         730845          Splice_Region
#> 4         FAM87B     GRCh37          1         753589          Splice_Region
#> 5         SAMD11     GRCh37          1         871158                 Silent
#> 6         SAMD11     GRCh37          1         871192      Missense_Mutation
#> 7         SAMD11     GRCh37          1         874416          Splice_Region
#> 8         SAMD11     GRCh37          1         874467      Missense_Mutation
#> 9         SAMD11     GRCh37          1         874648          Splice_Region
#> 10        SAMD11     GRCh37          1         874763      Missense_Mutation
#>    maf_seq_type
#> 1        genome
#> 2        genome
#> 3        genome
#> 4        genome
#> 5        genome
#> 6        genome
#> 7        genome
#> 8        genome
#> 9        genome
#> 10       genome

  maf_exome_hg38 = get_coding_ssm(this_seq_type = "capture",
                                  projection="hg38") 
#> Warning: The following named parsers don't match the column names: GENE_PHENO, FILTER, flanking_bps, vcf_id, vcf_qual, gnomAD_AF, gnomAD_AFR_AF, gnomAD_AMR_AF, gnomAD_SAS_AF, vcf_pos, gnomADg_AF, blacklist_count
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#>   dat <- vroom(...)
#>   problems(dat)

  dplyr::select(maf_exome_hg38,1,4,5,6,9,maf_seq_type)
#> genomic_data Object
#> Genome Build: hg38 
#> Showing first 10 rows:
#>    Hugo_Symbol NCBI_Build Chromosome Start_Position Variant_Classification
#> 1        OR4F5     GRCh38       chr1          69634      Missense_Mutation
#> 2        OR4F5     GRCh38       chr1          69644      Missense_Mutation
#> 3   FO538757.2     GRCh38       chr1         183189      Missense_Mutation
#> 4   FO538757.2     GRCh38       chr1         183937      Missense_Mutation
#> 5   FO538757.1     GRCh38       chr1         186356      Nonsense_Mutation
#> 6   FO538757.1     GRCh38       chr1         186385      Missense_Mutation
#> 7   FO538757.1     GRCh38       chr1         186404      Missense_Mutation
#> 8   FO538757.1     GRCh38       chr1         186440      Missense_Mutation
#> 9   FO538757.1     GRCh38       chr1         186440      Missense_Mutation
#> 10  FO538757.1     GRCh38       chr1         186475          Splice_Region
#>    maf_seq_type
#> 1       capture
#> 2       capture
#> 3       capture
#> 4       capture
#> 5       capture
#> 6       capture
#> 7       capture
#> 8       capture
#> 9       capture
#> 10      capture