Skip to contents

Get the genome-wide set of mutations for one or more sample including coding and non-coding mutations.

Usage

get_ssm_by_samples(
  these_samples_metadata,
  tool_name = "slms-3",
  projection = "grch37",
  flavour = "clustered",
  these_genes,
  min_read_support = 3,
  basic_columns = TRUE,
  maf_cols = NULL,
  subset_from_merge = FALSE,
  augmented = TRUE,
  engine = "fread_maf",
  these_sample_ids,
  this_seq_type
)

Arguments

these_samples_metadata

Optional metadata table. If provided, it will return SSM calls for the samples in the metadata table.

tool_name

Only supports slms-3 currently.

projection

Obtain variants projected to this reference (one of grch37 or hg38).

min_read_support

Only returns variants with at least this many reads in t_alt_count (for cleaning up augmented MAFs).

basic_columns

Return first 45 columns of MAF rather than full details. Default is TRUE.

maf_cols

if basic_columns is set to FALSE, the user can specify what columns to be returned within the MAF. This parameter can either be a vector of indexes (integer) or a vector of characters.

subset_from_merge

Instead of merging individual MAFs, the data will be subset from a pre-merged MAF of samples with the specified this_seq_type.

augmented

default: TRUE. Set to FALSE if you instead want the original MAF from each sample for multi-sample patients instead.

engine

Specify one of readr or fread_maf (default) to change how the large files are loaded prior to subsetting. You may have better performance with one or the other but for me fread_maf is faster and uses a lot less RAM.

these_sample_ids

Deprecated. Inferred from these_samples_metadata

this_seq_type

Deprecated. Inferred from these_samples_metadata

Value

A data frame in MAF format.

Details

The user can specify a metadata table (these_samples_metadata), subset to the sample IDs of interest. In most situations, this should never need to be run with subset_from_merge = TRUE, which is very inefficient. This function does not scale well to many samples. In most cases, users will actually need either get_coding_ssm or get_ssm_by_region. See get_ssm_by_sample for more information. Is this function not what you are looking for? Try one of the following, similar, functions; get_coding_ssm, get_ssm_by_patients, get_ssm_by_regions

Examples


my_meta = get_gambl_metadata() %>% 
                       dplyr::filter(sample_id %in% c("HTMCP-01-06-00485-01A-01D",
                                               "14-35472_tumorA",
                                               "14-35472_tumorB"))
#> 3273 capture samples are missing a value for protocol. Assuming Exome.
#> 138 biopsies are missing from the biopsy metadata. This should be fixed!
#> affected cohorts:  DLBCL_LSARP_Trios,Ennishi_tapestri,SMZL_Strefford,cHL_Maura,MCL_Barcelona
#> 110 biopsies with discrepancies in the pathology field. This should be fixed!
#> 10 biopsies with discrepancies in the time_point field. This should be fixed!
sample_ssms = get_ssm_by_samples(these_samples_metadata = my_meta)
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#>   dat <- vroom(...)
#>   problems(dat)
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#>   dat <- vroom(...)
#>   problems(dat)

hg38_ssms = get_ssm_by_samples(projection="hg38",
                               these_samples_metadata = my_meta)
#> Warning: The following named parsers don't match the column names: GENE_PHENO, FILTER, flanking_bps, vcf_id, vcf_qual, gnomAD_AF, gnomAD_AFR_AF, gnomAD_AMR_AF, gnomAD_SAS_AF, vcf_pos, gnomADg_AF, blacklist_count
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#>   dat <- vroom(...)
#>   problems(dat)
#> Warning: The following named parsers don't match the column names: GENE_PHENO, FILTER, flanking_bps, vcf_id, vcf_qual, gnomAD_AF, gnomAD_AFR_AF, gnomAD_AMR_AF, gnomAD_SAS_AF, vcf_pos, gnomADg_AF, blacklist_count
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#>   dat <- vroom(...)
#>   problems(dat)
#> Warning: The following named parsers don't match the column names: GENE_PHENO, FILTER, flanking_bps, vcf_id, vcf_qual, gnomAD_AF, gnomAD_AFR_AF, gnomAD_AMR_AF, gnomAD_SAS_AF, vcf_pos, gnomADg_AF, blacklist_count

dplyr::group_by(hg38_ssms,Tumor_Sample_Barcode) %>% 
  dplyr::count()
#> genomic_data Object
#> Genome Build: hg38 
#> Showing first 10 rows:
#>        Tumor_Sample_Barcode    n
#> 1           14-35472_tumorA 5265
#> 2           14-35472_tumorB 7966
#> 3 HTMCP-01-06-00485-01A-01D 2160
hg38_ssms_no_aug = get_ssm_by_samples(projection="hg38",
                               these_samples_metadata = my_meta,augmented= FALSE)
#> Warning: The following named parsers don't match the column names: GENE_PHENO, FILTER, flanking_bps, vcf_id, vcf_qual, gnomAD_AF, gnomAD_AFR_AF, gnomAD_AMR_AF, gnomAD_SAS_AF, vcf_pos, gnomADg_AF, blacklist_count
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#>   dat <- vroom(...)
#>   problems(dat)
#> Warning: The following named parsers don't match the column names: GENE_PHENO, FILTER, flanking_bps, vcf_id, vcf_qual, gnomAD_AF, gnomAD_AFR_AF, gnomAD_AMR_AF, gnomAD_SAS_AF, vcf_pos, gnomADg_AF, blacklist_count
#> Warning: The following named parsers don't match the column names: GENE_PHENO, FILTER, flanking_bps, vcf_id, vcf_qual, gnomAD_AF, gnomAD_AFR_AF, gnomAD_AMR_AF, gnomAD_SAS_AF, vcf_pos, gnomADg_AF, blacklist_count

dplyr::group_by(hg38_ssms_no_aug,Tumor_Sample_Barcode) %>% 
  dplyr::count()
#> genomic_data Object
#> Genome Build: hg38 
#> Showing first 10 rows:
#>        Tumor_Sample_Barcode    n
#> 1           14-35472_tumorA 4001
#> 2           14-35472_tumorB 7470
#> 3 HTMCP-01-06-00485-01A-01D 2160

if (FALSE) { # \dontrun{
my_metadata = dplyr::filter(my_metadata, pathology == "FL")

sample_ssms = get_ssm_by_samples(these_samples_metadata = my_metadata)
} # }