Return metadata for a selection of samples.

get_gambl_metadata(
  seq_type_filter = "genome",
  tissue_status_filter = "tumour",
  case_set,
  remove_benchmarking = TRUE,
  with_outcomes = TRUE,
  from_flatfile = TRUE,
  sample_flatfile,
  biopsy_flatfile,
  only_available = TRUE,
  seq_type_priority = "genome"
)

Arguments

seq_type_filter

Filtering criteria (default: all genomes).

tissue_status_filter

Filtering criteria (default: only tumour genomes, can be "mrna" or "any" for the superset of cases).

case_set

Optional short name for a pre-defined set of cases avoiding any embargoed cases (current options: 'BLGSP-study', 'FL-study', 'DLBCL-study', 'FL-DLBCL-study', 'FL-DLBCL-all', 'DLBCL-unembargoed', 'BL-DLBCL-manuscript', 'MCL','MCL-CLL').

remove_benchmarking

By default the FFPE benchmarking duplicate samples will be dropped.

with_outcomes

Optionally join to gambl outcome data.

from_flatfile

New default is to use the metadata in the flat-files from your clone of the repo. Can be overridden to use the database.

sample_flatfile

Optionally provide the full path to a sample table to use instead of the default.

biopsy_flatfile

Optionally provide the full path to a biopsy table to use instead of the default.

only_available

If TRUE, will remove samples with FALSE or NA in the bam_available column (default: TRUE).

seq_type_priority

For duplicate sample_id with different seq_type available, the metadata will prioritize this seq_type and drop the others.

Value

A data frame with metadata for each biopsy in GAMBL

Details

This function returns metadata for GAMBL samples. Options for subset and filter the returned data are available. For more information on how to use this function with different filtering criteria, refer to the parameter descriptions, examples and vignettes. Embargoed cases (current options: 'BLGSP-study', 'FL-study', 'DLBCL-study', 'FL-DLBCL-study', 'FL-DLBCL-all', 'DLBCL-unembargoed', 'BL-DLBCL-manuscript', 'MCL','MCL-CLL')

Examples

#basic usage
my_metadata = get_gambl_metadata()

#use pre-defined custom sample sets
only_blgsp_metadata = get_gambl_metadata(case_set = "BLGSP-study")

#override default filters and request metadata for samples other than tumour genomes,
#e.g. also get the normals
only_normal_metadata = get_gambl_metadata(tissue_status_filter = c('tumour','normal'))

non_duplicated_genome_and_capture = get_gambl_metadata(seq_type_filter = c('genome', 'capture'),
                                                       seq_type_priority = "genome")