Get MAF-format data frame for more than one sample and combine them together.

get_ssm_by_samples(
  these_sample_ids,
  these_samples_metadata,
  tool_name = "slms-3",
  projection = "grch37",
  seq_type = "genome",
  flavour = "clustered",
  these_genes,
  min_read_support = 3,
  basic_columns = TRUE,
  maf_cols = NULL,
  subset_from_merge = FALSE,
  augmented = TRUE,
  engine = "fread_maf"
)

Arguments

these_sample_ids

A vector of sample_id that you want results for.

these_samples_metadata

Optional metadata table. If provided, the function will return SSM calls for the sample IDs in the provided metadata table.

tool_name

Only supports slms-3 currently.

projection

Obtain variants projected to this reference (one of grch37 or hg38).

seq_type

The seq type you want results for. Default is "genome".

flavour

Currently this function only supports one flavour option but this feature is meant for eventual compatibility with additional variant calling parameters/versions.

these_genes

A vector of genes to subset ssm to.

min_read_support

Only returns variants with at least this many reads in t_alt_count (for cleaning up augmented MAFs).

basic_columns

Return first 45 columns of MAF rather than full details. Default is TRUE.

maf_cols

if basic_columns is set to FALSE, the user can specify what columns to be returned within the MAF. This parameter can either be a vector of indexes (integer) or a vector of characters.

subset_from_merge

Instead of merging individual MAFs, the data will be subset from a pre-merged MAF of samples with the specified seq_type.

augmented

default: TRUE. Set to FALSE if you instead want the original MAF from each sample for multi-sample patients instead.

engine

Specify one of readr or fread_maf (default) to change how the large files are loaded prior to subsetting. You may have better performance with one or the other but for me fread_maf is faster and uses a lot less RAM.

Value

A data frame in MAF format.

Details

This function internally runs get_ssm_by_sample. The user can either give the function a vector of sample IDs of interest with these_sample_ids, or use a metadata table (these_samples_metadata), already subset to the sample IDs of interest. In most situations, this should never need to be run with subset_from_merge = TRUE. Instead use one of get_coding_ssm or get_ssm_by_region. See get_ssm_by_sample for more information. Is this function not what you are looking for? Try one of the following, similar, functions; get_coding_ssm, get_coding_ssm_status, get_ssm_by_patients, get_ssm_by_sample, get_ssm_by_region, get_ssm_by_regions

Examples

library(parallel)

#examples using the these_sample_ids parameter.
sample_ssms = get_ssm_by_samples(these_sample_ids = c("HTMCP-01-06-00485-01A-01D",
                                                      "14-35472_tumorA",
                                                      "14-35472_tumorB"))
#> WARNING: on-the-fly merges can be extremely slow and consume a lot of memory if many samples are involved. Use at your own risk. 

hg38_ssms = get_ssm_by_samples(projection="hg38",
                               these_sample_ids = c("HTMCP-01-06-00485-01A-01D",
                                                    "14-35472_tumorA",
                                                    "14-35472_tumorB"))
#> WARNING: on-the-fly merges can be extremely slow and consume a lot of memory if many samples are involved. Use at your own risk. 

readr_sample_ssms = get_ssm_by_samples(subset_from_merge = TRUE,
                                       engine = "readr",
                                       these_sample_ids = c("HTMCP-01-06-00485-01A-01D",
                                                            "14-35472_tumorA",
                                                            "14-35472_tumorB"))
#> using existing merge: /projects/nhl_meta_analysis_scratch/gambl/results_local/all_the_things/slms_3-1.0_vcf2maf-1.3/genome--projection/deblacklisted/augmented_maf/all_slms-3--grch37.maf

slow_sample_ssms = get_ssm_by_samples(subset_from_merge = TRUE,
                                      these_sample_ids = c("HTMCP-01-06-00485-01A-01D",
                                                           "14-35472_tumorA",
                                                           "14-35472_tumorB"))
#> using existing merge: /projects/nhl_meta_analysis_scratch/gambl/results_local/all_the_things/slms_3-1.0_vcf2maf-1.3/genome--projection/deblacklisted/augmented_maf/all_slms-3--grch37.maf

#example using a metadata table subset to sample IDs of interest.
my_metadata = get_gambl_metadata(seq_type_filter = "genome")
my_metadata = dplyr::filter(my_metadata, pathology == "FL")

sample_ssms = get_ssm_by_samples(these_samples_metadata = my_metadata)
#> WARNING: on-the-fly merges can be extremely slow and consume a lot of memory if many samples are involved. Use at your own risk.