Get SSM By Sample.
get_ssm_by_sample.Rd
Get the SSMs (i.e. load MAF) for a single sample.
Usage
get_ssm_by_sample(
these_samples_metadata,
tool_name = "slms-3",
projection = "grch37",
augmented = TRUE,
flavour = "clustered",
min_read_support = 3,
basic_columns = TRUE,
maf_cols = NULL,
verbose = FALSE,
this_sample_id,
this_seq_type
)
Arguments
- these_samples_metadata
Required if not specifying both this_sample_id and this_seq_type a single row or entire metadata table containing your sample_id.
- tool_name
The name of the variant calling pipeline (currently only slms-3 is supported).
- projection
The projection genome build. Supports hg38 and grch37.
- augmented
default: TRUE. Set to FALSE if you instead want the original MAF from each sample for multi-sample patients instead of the augmented MAF.
- flavour
Currently this function only supports one flavour option but this feature is meant for eventual compatibility with additional variant calling parameters/versions.
- min_read_support
Only returns variants with at least this many reads in t_alt_count (for cleaning up augmented MAFs).
- basic_columns
Return first 43 columns of MAF rather than full details. Default is TRUE.
- maf_cols
if basic_columns is set to FALSE, the user can specify what columns to be returned within the MAF. This parameter can either be a vector of indexes (integer) or a vector of characters.
- verbose
Enable for debugging/noisier output.
- this_sample_id
Deprecated. Inferred from these_samples_metadata
- this_seq_type
Deprecated. Inferred from these_samples_metadata
Details
This was implemented to allow flexibility because there are some samples that we may want to use a different set of variants than those in the main GAMBL merge. The current use case is to allow a force_unmatched output to be used to replace the SSMs from the merge for samples with known contamination in the normal. This will also be useful to apply a blacklist to individual MAFs when coupled with annotate_ssm_blacklist. Is this function not what you are looking for? Try one of the following, similar, functions; get_coding_ssm, get_coding_ssm_status, get_ssm_by_patients, get_ssm_by_samples, get_ssm_by_region, get_ssm_by_regions
Examples
maf_samp = GAMBLR.results:::get_ssm_by_sample(
get_gambl_metadata() %>% dplyr::filter(sample_id=="13-27975_tumorA"),
augmented = FALSE
)
#> 3273 capture samples are missing a value for protocol. Assuming Exome.
#> 138 biopsies are missing from the biopsy metadata. This should be fixed!
#> affected cohorts: DLBCL_LSARP_Trios,Ennishi_tapestri,SMZL_Strefford,cHL_Maura,MCL_Barcelona
#> 110 biopsies with discrepancies in the pathology field. This should be fixed!
#> 10 biopsies with discrepancies in the time_point field. This should be fixed!
nrow(maf_samp)
#> [1] 4705
maf_samp_aug = GAMBLR.results:::get_ssm_by_sample(
get_gambl_metadata() %>% dplyr::filter(sample_id=="13-27975_tumorA"),
augmented = TRUE
)
#> 3273 capture samples are missing a value for protocol. Assuming Exome.
#> 138 biopsies are missing from the biopsy metadata. This should be fixed!
#> affected cohorts: DLBCL_LSARP_Trios,Ennishi_tapestri,SMZL_Strefford,cHL_Maura,MCL_Barcelona
#> 110 biopsies with discrepancies in the pathology field. This should be fixed!
#> 10 biopsies with discrepancies in the time_point field. This should be fixed!
nrow(maf_samp_aug)
#> [1] 6118
some_maf = GAMBLR.results:::get_ssm_by_sample(
these_samples_metadata = get_gambl_metadata() %>%
dplyr::filter(sample_id == "HTMCP-01-06-00485-01A-01D",
seq_type == "genome"),
projection = "hg38")
#> 3273 capture samples are missing a value for protocol. Assuming Exome.
#> 138 biopsies are missing from the biopsy metadata. This should be fixed!
#> affected cohorts: DLBCL_LSARP_Trios,Ennishi_tapestri,SMZL_Strefford,cHL_Maura,MCL_Barcelona
#> 110 biopsies with discrepancies in the pathology field. This should be fixed!
#> 10 biopsies with discrepancies in the time_point field. This should be fixed!
#> Warning: The following named parsers don't match the column names: GENE_PHENO, FILTER, flanking_bps, vcf_id, vcf_qual, gnomAD_AF, gnomAD_AFR_AF, gnomAD_AMR_AF, gnomAD_SAS_AF, vcf_pos, gnomADg_AF, blacklist_count
dplyr::select(some_maf,1:10)
#> genomic_data Object
#> Genome Build: hg38
#> Showing first 10 rows:
#> Hugo_Symbol Entrez_Gene_Id Center NCBI_Build Chromosome Start_Position
#> 1 Unknown 0 . GRCh38 chr1 5446963
#> 2 CAMTA1 0 . GRCh38 chr1 7244319
#> 3 PER3 0 . GRCh38 chr1 7828459
#> 4 TNFRSF9 0 . GRCh38 chr1 7911008
#> 5 ENSR00000000893 0 . GRCh38 chr1 8181883
#> 6 PIK3CD 0 . GRCh38 chr1 9726972
#> 7 CASZ1 0 . GRCh38 chr1 10720942
#> 8 CELA2A 0 . GRCh38 chr1 15454068
#> 9 PLA2G2A 0 . GRCh38 chr1 19975558
#> 10 PLA2G2C 0 . GRCh38 chr1 20172387
#> End_Position Strand Variant_Classification Variant_Type
#> 1 5446963 + IGR SNP
#> 2 7244319 + Intron SNP
#> 3 7828459 + Intron SNP
#> 4 7911008 + 3'Flank SNP
#> 5 8181883 + IGR SNP
#> 6 9726972 + Missense_Mutation SNP
#> 7 10720942 + Intron SNP
#> 8 15454068 + 5'Flank SNP
#> 9 19975559 + 3'UTR DEL
#> 10 20172387 + Intron SNP