Skip to contents

Retrieve all coding SSMs from one seq_type in GAMBL in MAF-like format.

Usage

get_coding_ssm(
  these_samples_metadata = NULL,
  force_unmatched_samples,
  projection = "grch37",
  this_seq_type = "genome",
  basic_columns = TRUE,
  maf_cols = NULL,
  augmented = TRUE,
  min_read_support = 3,
  groups = c("gambl", "icgc_dart"),
  include_silent = TRUE,
  engine,
  verbose = FALSE,
  limit_cohort,
  exclude_cohort,
  limit_pathology,
  limit_samples,
  from_flatfile,
  these_sample_ids
)

Arguments

these_samples_metadata

Optional (but highly recommended) metadata table to tell the function how to subset the data. Only sample_id in this table with the matching seq_type will be in the output. Not all samples may be in the output, though, if they are missing SSM results or had no mutations detected.

force_unmatched_samples

Optional argument for forcing unmatched samples, using get_ssm_by_samples.

projection

Reference genome build for the coordinates in the MAF file. The default is grch37.

this_seq_type

The seq_type you want SSMs from, default is genome.

basic_columns

Basic columns refers to the first 45 standard MAF columns. Set this to FALSE if you want all the available columns instead.

maf_cols

if basic_columns is set to FALSE, the user can specify which columns to be returned within the MAF. This parameter can either be a vector of indexes (integer) or a vector of characters (matching columns in MAF).

augmented

Set to FALSE if you instead want the original MAF from each sample for multi-sample patients instead of the augmented MAF

min_read_support

Only returns variants with at least this many reads in t_alt_count (for cleaning up augmented MAFs)

groups

Deprecated. Use these_samples_metadata instead.

include_silent

Logical indicating whether to include silent mutations in coding regions (i.e. synonymous). Default is TRUE.

engine

Deprecated. Ignored

verbose

Controls the "verboseness" of this function (and internally called helpers).

limit_cohort

Deprecated. Use these_samples_metadata instead.

exclude_cohort

Deprecated. Use these_samples_metadata instead.

limit_pathology

Deprecated. Use these_samples_metadata instead.

limit_samples

Deprecated. Use these_samples_metadata instead.

from_flatfile

Deprecated. Ignored

these_sample_ids

Deprecated. Use these_samples_metadata instead.

Value

A data frame containing all the MAF data columns (one row per mutation).

Details

Effectively retrieve simple somatic mutations (SSM) results for either capture or genome seq_type (but not both at once). The resulting data frame will be a maf_data object, which tracks the genome build (projection) for the variants and will have a maf_seq_type column that tracks the origin seq_type each variant. In most cases, users should be using the related function that is able to obtain SSMs across both genome and capture seq_type: get_all_coding_ssm Is this function not what you are looking for? Try one of: get_coding_ssm_status, get_ssm_by_patients, get_ssm_by_sample, get_ssm_by_samples, get_ssm_by_region, get_ssm_by_regions

Examples


  #basic usage (defaults to genome seq_type)
  maf_genome = get_coding_ssm()
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#>   dat <- vroom(...)
#>   problems(dat)

  nrow(maf_genome)
#> [1] 248754

  dplyr::select(maf_genome,1,4,5,6,9,maf_seq_type)
#> genomic_data Object
#> Genome Build: grch37 
#> Showing first 10 rows:
#>      Hugo_Symbol NCBI_Build Chromosome Start_Position Variant_Classification
#> 1     AL627309.1     GRCh37          1         138626                 Silent
#> 2     AL627309.1     GRCh37          1         138972        Frame_Shift_Ins
#> 3  RP11-206L10.9     GRCh37          1         730845          Splice_Region
#> 4         FAM87B     GRCh37          1         753589          Splice_Region
#> 5         SAMD11     GRCh37          1         871158                 Silent
#> 6         SAMD11     GRCh37          1         871192      Missense_Mutation
#> 7         SAMD11     GRCh37          1         874416          Splice_Region
#> 8         SAMD11     GRCh37          1         874467      Missense_Mutation
#> 9         SAMD11     GRCh37          1         874648          Splice_Region
#> 10        SAMD11     GRCh37          1         874763      Missense_Mutation
#>    maf_seq_type
#> 1        genome
#> 2        genome
#> 3        genome
#> 4        genome
#> 5        genome
#> 6        genome
#> 7        genome
#> 8        genome
#> 9        genome
#> 10       genome

  maf_exome_hg38 = get_coding_ssm(this_seq_type = "capture",
                                  projection="hg38") 
#> Warning: The following named parsers don't match the column names: GENE_PHENO, FILTER, flanking_bps, vcf_id, vcf_qual, gnomAD_AF, gnomAD_AFR_AF, gnomAD_AMR_AF, gnomAD_SAS_AF, vcf_pos, gnomADg_AF, blacklist_count
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#>   dat <- vroom(...)
#>   problems(dat)

  dplyr::select(maf_exome_hg38,1,4,5,6,9,maf_seq_type)
#> genomic_data Object
#> Genome Build: hg38 
#> Showing first 10 rows:
#>    Hugo_Symbol NCBI_Build Chromosome Start_Position Variant_Classification
#> 1        OR4F5     GRCh38       chr1          69634      Missense_Mutation
#> 2        OR4F5     GRCh38       chr1          69644      Missense_Mutation
#> 3   FO538757.2     GRCh38       chr1         183189      Missense_Mutation
#> 4   FO538757.2     GRCh38       chr1         183937      Missense_Mutation
#> 5   FO538757.1     GRCh38       chr1         186356      Nonsense_Mutation
#> 6   FO538757.1     GRCh38       chr1         186385      Missense_Mutation
#> 7   FO538757.1     GRCh38       chr1         186404      Missense_Mutation
#> 8   FO538757.1     GRCh38       chr1         186440      Missense_Mutation
#> 9   FO538757.1     GRCh38       chr1         186440      Missense_Mutation
#> 10  FO538757.1     GRCh38       chr1         186475          Splice_Region
#>    maf_seq_type
#> 1       capture
#> 2       capture
#> 3       capture
#> 4       capture
#> 5       capture
#> 6       capture
#> 7       capture
#> 8       capture
#> 9       capture
#> 10      capture