Get SSM By Samples. — get_ssm_by

Get MAF-format data frame for more than one sample and combine them together.

get_ssm_by_samples(
  these_sample_ids,
  these_samples_metadata,
  tool_name = "slms-3",
  projection = "grch37",
  seq_type = "genome",
  flavour = "clustered",
  these_genes,
  min_read_support = 3,
  basic_columns = TRUE,
  maf_cols = NULL,
  subset_from_merge = FALSE,
  augmented = TRUE,
  engine = "fread_maf"
)

Arguments

these_sample_ids: A vector of sample_id that you want results for.
these_samples_metadata: Optional metadata table. If provided, the function will return SSM calls for the sample IDs in the provided metadata table.
tool_name: Only supports slms-3 currently.
projection: Obtain variants projected to this reference (one of grch37 or hg38).
seq_type: The seq type you want results for. Default is "genome".
flavour: Currently this function only supports one flavour option but this feature is meant for eventual compatibility with additional variant calling parameters/versions.
these_genes: A vector of genes to subset ssm to.
min_read_support: Only returns variants with at least this many reads in t_alt_count (for cleaning up augmented MAFs).
basic_columns: Return first 45 columns of MAF rather than full details. Default is TRUE.
maf_cols: if basic_columns is set to FALSE, the user can specify what columns to be returned within the MAF. This parameter can either be a vector of indexes (integer) or a vector of characters.
subset_from_merge: Instead of merging individual MAFs, the data will be subset from a pre-merged MAF of samples with the specified seq_type.
augmented: default: TRUE. Set to FALSE if you instead want the original MAF from each sample for multi-sample patients instead.
engine: Specify one of readr or fread_maf (default) to change how the large files are loaded prior to subsetting. You may have better performance with one or the other but for me fread_maf is faster and uses a lot less RAM.

Value

A data frame in MAF format.

Details

This function internally runs get_ssm_by_sample. The user can either give the function a vector of sample IDs of interest with these_sample_ids, or use a metadata table (these_samples_metadata), already subset to the sample IDs of interest. In most situations, this should never need to be run with subset_from_merge = TRUE. Instead use one of get_coding_ssm or get_ssm_by_region. See get_ssm_by_sample for more information. Is this function not what you are looking for? Try one of the following, similar, functions; get_coding_ssm, get_coding_ssm_status, get_ssm_by_patients, get_ssm_by_sample, get_ssm_by_region, get_ssm_by_regions

Examples

library(parallel)

#examples using the these_sample_ids parameter.
sample_ssms = get_ssm_by_samples(these_sample_ids = c("HTMCP-01-06-00485-01A-01D",
                                                      "14-35472_tumorA",
                                                      "14-35472_tumorB"))
#> WARNING: on-the-fly merges can be extremely slow and consume a lot of memory if many samples are involved. Use at your own risk. 

hg38_ssms = get_ssm_by_samples(projection="hg38",
                               these_sample_ids = c("HTMCP-01-06-00485-01A-01D",
                                                    "14-35472_tumorA",
                                                    "14-35472_tumorB"))
#> WARNING: on-the-fly merges can be extremely slow and consume a lot of memory if many samples are involved. Use at your own risk. 

readr_sample_ssms = get_ssm_by_samples(subset_from_merge = TRUE,
                                       engine = "readr",
                                       these_sample_ids = c("HTMCP-01-06-00485-01A-01D",
                                                            "14-35472_tumorA",
                                                            "14-35472_tumorB"))
#> using existing merge: /projects/nhl_meta_analysis_scratch/gambl/results_local/all_the_things/slms_3-1.0_vcf2maf-1.3/genome--projection/deblacklisted/augmented_maf/all_slms-3--grch37.maf

slow_sample_ssms = get_ssm_by_samples(subset_from_merge = TRUE,
                                      these_sample_ids = c("HTMCP-01-06-00485-01A-01D",
                                                           "14-35472_tumorA",
                                                           "14-35472_tumorB"))
#> using existing merge: /projects/nhl_meta_analysis_scratch/gambl/results_local/all_the_things/slms_3-1.0_vcf2maf-1.3/genome--projection/deblacklisted/augmented_maf/all_slms-3--grch37.maf

#example using a metadata table subset to sample IDs of interest.
my_metadata = get_gambl_metadata(seq_type_filter = "genome")
my_metadata = dplyr::filter(my_metadata, pathology == "FL")

sample_ssms = get_ssm_by_samples(these_samples_metadata = my_metadata)
#> WARNING: on-the-fly merges can be extremely slow and consume a lot of memory if many samples are involved. Use at your own risk.