Skip to contents

Get the SSMs (i.e. load MAF) for a single sample.

Usage

get_ssm_by_sample(
  these_samples_metadata,
  tool_name = "slms-3",
  projection = "grch37",
  augmented = TRUE,
  flavour = "clustered",
  min_read_support = 3,
  basic_columns = TRUE,
  maf_cols = NULL,
  verbose = FALSE,
  this_sample_id,
  this_seq_type
)

Arguments

these_samples_metadata

Required if not specifying both this_sample_id and this_seq_type a single row or entire metadata table containing your sample_id.

tool_name

The name of the variant calling pipeline (currently only slms-3 is supported).

projection

The projection genome build. Supports hg38 and grch37.

augmented

default: TRUE. Set to FALSE if you instead want the original MAF from each sample for multi-sample patients instead of the augmented MAF.

flavour

Currently this function only supports one flavour option but this feature is meant for eventual compatibility with additional variant calling parameters/versions.

min_read_support

Only returns variants with at least this many reads in t_alt_count (for cleaning up augmented MAFs).

basic_columns

Return first 43 columns of MAF rather than full details. Default is TRUE.

maf_cols

if basic_columns is set to FALSE, the user can specify what columns to be returned within the MAF. This parameter can either be a vector of indexes (integer) or a vector of characters.

verbose

Enable for debugging/noisier output.

this_sample_id

Deprecated. Inferred from these_samples_metadata

this_seq_type

Deprecated. Inferred from these_samples_metadata

Value

data frame in MAF format.

Details

This was implemented to allow flexibility because there are some samples that we may want to use a different set of variants than those in the main GAMBL merge. The current use case is to allow a force_unmatched output to be used to replace the SSMs from the merge for samples with known contamination in the normal. This will also be useful to apply a blacklist to individual MAFs when coupled with annotate_ssm_blacklist. Is this function not what you are looking for? Try one of the following, similar, functions; get_coding_ssm, get_coding_ssm_status, get_ssm_by_patients, get_ssm_by_samples, get_ssm_by_region, get_ssm_by_regions

Examples


maf_samp = GAMBLR.results:::get_ssm_by_sample(
  get_gambl_metadata() %>% dplyr::filter(sample_id=="13-27975_tumorA"),
  augmented = FALSE
)
#> 3273 capture samples are missing a value for protocol. Assuming Exome.
#> 138 biopsies are missing from the biopsy metadata. This should be fixed!
#> affected cohorts:  DLBCL_LSARP_Trios,Ennishi_tapestri,SMZL_Strefford,cHL_Maura,MCL_Barcelona
#> 110 biopsies with discrepancies in the pathology field. This should be fixed!
#> 10 biopsies with discrepancies in the time_point field. This should be fixed!
nrow(maf_samp)
#> [1] 4705
maf_samp_aug = GAMBLR.results:::get_ssm_by_sample(
  get_gambl_metadata() %>% dplyr::filter(sample_id=="13-27975_tumorA"),
  augmented = TRUE
)
#> 3273 capture samples are missing a value for protocol. Assuming Exome.
#> 138 biopsies are missing from the biopsy metadata. This should be fixed!
#> affected cohorts:  DLBCL_LSARP_Trios,Ennishi_tapestri,SMZL_Strefford,cHL_Maura,MCL_Barcelona
#> 110 biopsies with discrepancies in the pathology field. This should be fixed!
#> 10 biopsies with discrepancies in the time_point field. This should be fixed!
nrow(maf_samp_aug)
#> [1] 6118


 some_maf = GAMBLR.results:::get_ssm_by_sample(
                          these_samples_metadata = get_gambl_metadata() %>%
                            dplyr::filter(sample_id == "HTMCP-01-06-00485-01A-01D",
                                     seq_type == "genome"),
                         projection = "hg38")
#> 3273 capture samples are missing a value for protocol. Assuming Exome.
#> 138 biopsies are missing from the biopsy metadata. This should be fixed!
#> affected cohorts:  DLBCL_LSARP_Trios,Ennishi_tapestri,SMZL_Strefford,cHL_Maura,MCL_Barcelona
#> 110 biopsies with discrepancies in the pathology field. This should be fixed!
#> 10 biopsies with discrepancies in the time_point field. This should be fixed!
#> Warning: The following named parsers don't match the column names: GENE_PHENO, FILTER, flanking_bps, vcf_id, vcf_qual, gnomAD_AF, gnomAD_AFR_AF, gnomAD_AMR_AF, gnomAD_SAS_AF, vcf_pos, gnomADg_AF, blacklist_count
 dplyr::select(some_maf,1:10)
#> genomic_data Object
#> Genome Build: hg38 
#> Showing first 10 rows:
#>        Hugo_Symbol Entrez_Gene_Id Center NCBI_Build Chromosome Start_Position
#> 1          Unknown              0      .     GRCh38       chr1        5446963
#> 2           CAMTA1              0      .     GRCh38       chr1        7244319
#> 3             PER3              0      .     GRCh38       chr1        7828459
#> 4          TNFRSF9              0      .     GRCh38       chr1        7911008
#> 5  ENSR00000000893              0      .     GRCh38       chr1        8181883
#> 6           PIK3CD              0      .     GRCh38       chr1        9726972
#> 7            CASZ1              0      .     GRCh38       chr1       10720942
#> 8           CELA2A              0      .     GRCh38       chr1       15454068
#> 9          PLA2G2A              0      .     GRCh38       chr1       19975558
#> 10         PLA2G2C              0      .     GRCh38       chr1       20172387
#>    End_Position Strand Variant_Classification Variant_Type
#> 1       5446963      +                    IGR          SNP
#> 2       7244319      +                 Intron          SNP
#> 3       7828459      +                 Intron          SNP
#> 4       7911008      +                3'Flank          SNP
#> 5       8181883      +                    IGR          SNP
#> 6       9726972      +      Missense_Mutation          SNP
#> 7      10720942      +                 Intron          SNP
#> 8      15454068      +                5'Flank          SNP
#> 9      19975559      +                  3'UTR          DEL
#> 10     20172387      +                 Intron          SNP