Get SSM By Samples.
get_ssm_by_samples.Rd
Get the genome-wide set of mutations for one or more sample including coding and non-coding mutations.
Usage
get_ssm_by_samples(
these_samples_metadata,
tool_name = "slms-3",
projection = "grch37",
flavour = "clustered",
these_genes,
min_read_support = 3,
basic_columns = TRUE,
maf_cols = NULL,
subset_from_merge = FALSE,
augmented = TRUE,
engine = "fread_maf",
these_sample_ids,
this_seq_type
)
Arguments
- these_samples_metadata
Optional metadata table. If provided, it will return SSM calls for the samples in the metadata table.
- tool_name
Only supports slms-3 currently.
- projection
Obtain variants projected to this reference (one of grch37 or hg38).
- min_read_support
Only returns variants with at least this many reads in t_alt_count (for cleaning up augmented MAFs).
- basic_columns
Return first 45 columns of MAF rather than full details. Default is TRUE.
- maf_cols
if basic_columns is set to FALSE, the user can specify what columns to be returned within the MAF. This parameter can either be a vector of indexes (integer) or a vector of characters.
- subset_from_merge
Instead of merging individual MAFs, the data will be subset from a pre-merged MAF of samples with the specified this_seq_type.
- augmented
default: TRUE. Set to FALSE if you instead want the original MAF from each sample for multi-sample patients instead.
- engine
Specify one of readr or fread_maf (default) to change how the large files are loaded prior to subsetting. You may have better performance with one or the other but for me fread_maf is faster and uses a lot less RAM.
- these_sample_ids
Deprecated. Inferred from these_samples_metadata
- this_seq_type
Deprecated. Inferred from these_samples_metadata
Details
The user can specify a metadata table (these_samples_metadata
), subset to the sample IDs of interest.
In most situations, this should never need to be run with subset_from_merge = TRUE, which is very inefficient.
This function does not scale well to many samples. In most cases, users will actually need either get_coding_ssm or get_ssm_by_region.
See get_ssm_by_sample for more information.
Is this function not what you are looking for? Try one of the following, similar, functions; get_coding_ssm,
get_ssm_by_patients, get_ssm_by_regions
Examples
my_meta = get_gambl_metadata() %>%
dplyr::filter(sample_id %in% c("HTMCP-01-06-00485-01A-01D",
"14-35472_tumorA",
"14-35472_tumorB"))
#> 3273 capture samples are missing a value for protocol. Assuming Exome.
#> 138 biopsies are missing from the biopsy metadata. This should be fixed!
#> affected cohorts: DLBCL_LSARP_Trios,Ennishi_tapestri,SMZL_Strefford,cHL_Maura,MCL_Barcelona
#> 110 biopsies with discrepancies in the pathology field. This should be fixed!
#> 10 biopsies with discrepancies in the time_point field. This should be fixed!
sample_ssms = get_ssm_by_samples(these_samples_metadata = my_meta)
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#> dat <- vroom(...)
#> problems(dat)
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#> dat <- vroom(...)
#> problems(dat)
hg38_ssms = get_ssm_by_samples(projection="hg38",
these_samples_metadata = my_meta)
#> Warning: The following named parsers don't match the column names: GENE_PHENO, FILTER, flanking_bps, vcf_id, vcf_qual, gnomAD_AF, gnomAD_AFR_AF, gnomAD_AMR_AF, gnomAD_SAS_AF, vcf_pos, gnomADg_AF, blacklist_count
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#> dat <- vroom(...)
#> problems(dat)
#> Warning: The following named parsers don't match the column names: GENE_PHENO, FILTER, flanking_bps, vcf_id, vcf_qual, gnomAD_AF, gnomAD_AFR_AF, gnomAD_AMR_AF, gnomAD_SAS_AF, vcf_pos, gnomADg_AF, blacklist_count
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#> dat <- vroom(...)
#> problems(dat)
#> Warning: The following named parsers don't match the column names: GENE_PHENO, FILTER, flanking_bps, vcf_id, vcf_qual, gnomAD_AF, gnomAD_AFR_AF, gnomAD_AMR_AF, gnomAD_SAS_AF, vcf_pos, gnomADg_AF, blacklist_count
dplyr::group_by(hg38_ssms,Tumor_Sample_Barcode) %>%
dplyr::count()
#> genomic_data Object
#> Genome Build: hg38
#> Showing first 10 rows:
#> Tumor_Sample_Barcode n
#> 1 14-35472_tumorA 5265
#> 2 14-35472_tumorB 7966
#> 3 HTMCP-01-06-00485-01A-01D 2160
hg38_ssms_no_aug = get_ssm_by_samples(projection="hg38",
these_samples_metadata = my_meta,augmented= FALSE)
#> Warning: The following named parsers don't match the column names: GENE_PHENO, FILTER, flanking_bps, vcf_id, vcf_qual, gnomAD_AF, gnomAD_AFR_AF, gnomAD_AMR_AF, gnomAD_SAS_AF, vcf_pos, gnomADg_AF, blacklist_count
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#> dat <- vroom(...)
#> problems(dat)
#> Warning: The following named parsers don't match the column names: GENE_PHENO, FILTER, flanking_bps, vcf_id, vcf_qual, gnomAD_AF, gnomAD_AFR_AF, gnomAD_AMR_AF, gnomAD_SAS_AF, vcf_pos, gnomADg_AF, blacklist_count
#> Warning: The following named parsers don't match the column names: GENE_PHENO, FILTER, flanking_bps, vcf_id, vcf_qual, gnomAD_AF, gnomAD_AFR_AF, gnomAD_AMR_AF, gnomAD_SAS_AF, vcf_pos, gnomADg_AF, blacklist_count
dplyr::group_by(hg38_ssms_no_aug,Tumor_Sample_Barcode) %>%
dplyr::count()
#> genomic_data Object
#> Genome Build: hg38
#> Showing first 10 rows:
#> Tumor_Sample_Barcode n
#> 1 14-35472_tumorA 4001
#> 2 14-35472_tumorB 7470
#> 3 HTMCP-01-06-00485-01A-01D 2160
if (FALSE) { # \dontrun{
my_metadata = dplyr::filter(my_metadata, pathology == "FL")
sample_ssms = get_ssm_by_samples(these_samples_metadata = my_metadata)
} # }