Get Coding SSM.
get_coding_ssm.Rd
Retrieve all coding SSMs from one seq_type in GAMBL in MAF-like format.
Usage
get_coding_ssm(
these_samples_metadata = NULL,
force_unmatched_samples,
projection = "grch37",
this_seq_type = "genome",
basic_columns = TRUE,
maf_cols = NULL,
augmented = TRUE,
min_read_support = 3,
groups = c("gambl", "icgc_dart"),
include_silent = TRUE,
engine,
verbose = FALSE,
limit_cohort,
exclude_cohort,
limit_pathology,
limit_samples,
from_flatfile,
these_sample_ids
)
Arguments
- these_samples_metadata
Optional (but highly recommended) metadata table to tell the function how to subset the data. Only sample_id in this table with the matching seq_type will be in the output. Not all samples may be in the output, though, if they are missing SSM results or had no mutations detected.
- force_unmatched_samples
Optional argument for forcing unmatched samples, using get_ssm_by_samples.
- projection
Reference genome build for the coordinates in the MAF file. The default is grch37.
- this_seq_type
The seq_type you want SSMs from, default is genome.
- basic_columns
Basic columns refers to the first 45 standard MAF columns. Set this to FALSE if you want all the available columns instead.
- maf_cols
if basic_columns is set to FALSE, the user can specify which columns to be returned within the MAF. This parameter can either be a vector of indexes (integer) or a vector of characters (matching columns in MAF).
- augmented
Set to FALSE if you instead want the original MAF from each sample for multi-sample patients instead of the augmented MAF
- min_read_support
Only returns variants with at least this many reads in t_alt_count (for cleaning up augmented MAFs)
- groups
Deprecated. Use these_samples_metadata instead.
- include_silent
Logical indicating whether to include silent mutations in coding regions (i.e. synonymous). Default is TRUE.
- engine
Deprecated. Ignored
- verbose
Controls the "verboseness" of this function (and internally called helpers).
- limit_cohort
Deprecated. Use these_samples_metadata instead.
- exclude_cohort
Deprecated. Use these_samples_metadata instead.
- limit_pathology
Deprecated. Use these_samples_metadata instead.
- limit_samples
Deprecated. Use these_samples_metadata instead.
- from_flatfile
Deprecated. Ignored
- these_sample_ids
Deprecated. Use these_samples_metadata instead.
Details
Effectively retrieve simple somatic mutations (SSM) results for either capture or genome seq_type (but not both at once). The resulting data frame will be a maf_data object, which tracks the genome build (projection) for the variants and will have a maf_seq_type column that tracks the origin seq_type each variant. In most cases, users should be using the related function that is able to obtain SSMs across both genome and capture seq_type: get_all_coding_ssm Is this function not what you are looking for? Try one of: get_coding_ssm_status, get_ssm_by_patients, get_ssm_by_sample, get_ssm_by_samples, get_ssm_by_region, get_ssm_by_regions
Examples
#basic usage (defaults to genome seq_type)
maf_genome = get_coding_ssm()
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#> dat <- vroom(...)
#> problems(dat)
nrow(maf_genome)
#> [1] 248754
dplyr::select(maf_genome,1,4,5,6,9,maf_seq_type)
#> genomic_data Object
#> Genome Build: grch37
#> Showing first 10 rows:
#> Hugo_Symbol NCBI_Build Chromosome Start_Position Variant_Classification
#> 1 AL627309.1 GRCh37 1 138626 Silent
#> 2 AL627309.1 GRCh37 1 138972 Frame_Shift_Ins
#> 3 RP11-206L10.9 GRCh37 1 730845 Splice_Region
#> 4 FAM87B GRCh37 1 753589 Splice_Region
#> 5 SAMD11 GRCh37 1 871158 Silent
#> 6 SAMD11 GRCh37 1 871192 Missense_Mutation
#> 7 SAMD11 GRCh37 1 874416 Splice_Region
#> 8 SAMD11 GRCh37 1 874467 Missense_Mutation
#> 9 SAMD11 GRCh37 1 874648 Splice_Region
#> 10 SAMD11 GRCh37 1 874763 Missense_Mutation
#> maf_seq_type
#> 1 genome
#> 2 genome
#> 3 genome
#> 4 genome
#> 5 genome
#> 6 genome
#> 7 genome
#> 8 genome
#> 9 genome
#> 10 genome
maf_exome_hg38 = get_coding_ssm(this_seq_type = "capture",
projection="hg38")
#> Warning: The following named parsers don't match the column names: GENE_PHENO, FILTER, flanking_bps, vcf_id, vcf_qual, gnomAD_AF, gnomAD_AFR_AF, gnomAD_AMR_AF, gnomAD_SAS_AF, vcf_pos, gnomADg_AF, blacklist_count
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#> dat <- vroom(...)
#> problems(dat)
dplyr::select(maf_exome_hg38,1,4,5,6,9,maf_seq_type)
#> genomic_data Object
#> Genome Build: hg38
#> Showing first 10 rows:
#> Hugo_Symbol NCBI_Build Chromosome Start_Position Variant_Classification
#> 1 OR4F5 GRCh38 chr1 69634 Missense_Mutation
#> 2 OR4F5 GRCh38 chr1 69644 Missense_Mutation
#> 3 FO538757.2 GRCh38 chr1 183189 Missense_Mutation
#> 4 FO538757.2 GRCh38 chr1 183937 Missense_Mutation
#> 5 FO538757.1 GRCh38 chr1 186356 Nonsense_Mutation
#> 6 FO538757.1 GRCh38 chr1 186385 Missense_Mutation
#> 7 FO538757.1 GRCh38 chr1 186404 Missense_Mutation
#> 8 FO538757.1 GRCh38 chr1 186440 Missense_Mutation
#> 9 FO538757.1 GRCh38 chr1 186440 Missense_Mutation
#> 10 FO538757.1 GRCh38 chr1 186475 Splice_Region
#> maf_seq_type
#> 1 capture
#> 2 capture
#> 3 capture
#> 4 capture
#> 5 capture
#> 6 capture
#> 7 capture
#> 8 capture
#> 9 capture
#> 10 capture