Skip to contents

Return metadata for a selection of samples.

Usage

og_get_gambl_metadata(
  seq_type_filter = "genome",
  tissue_status_filter = "tumour",
  case_set,
  remove_benchmarking = TRUE,
  with_outcomes = TRUE,
  sample_flatfile,
  biopsy_flatfile,
  only_available = TRUE,
  seq_type_priority = "genome",
  from_flatfile
)

Arguments

seq_type_filter

Filtering criteria (default: all genomes).

tissue_status_filter

Filtering criteria for tissue status. Possible values are "tumour" (the default) or "normal".

case_set

Optional short name for a pre-defined set of cases avoiding any embargoed cases (current options: 'BLGSP-study', 'FL-study', 'DLBCL-study', 'FL-DLBCL-study', 'FL-DLBCL-all', 'DLBCL-unembargoed', 'BL-DLBCL-manuscript', 'MCL','MCL-CLL').

remove_benchmarking

By default the FFPE benchmarking duplicate samples will be dropped.

with_outcomes

Optionally join to gambl outcome data.

sample_flatfile

Optionally provide the full path to a sample table to use instead of the default.

biopsy_flatfile

Optionally provide the full path to a biopsy table to use instead of the default.

only_available

If TRUE, will remove samples with FALSE or NA in the bam_available column (default: TRUE).

seq_type_priority

For duplicate sample_id with different seq_type available, the metadata will prioritize this seq_type and drop the others. Possible values are "genome" or "capture".

from_flatfile

Deprecated (will be ignored)

Value

A data frame with metadata for each biopsy in GAMBL

compression

Format of the original data used as input for our analysis pipelines (cram, bam or fastq)

bam_available

Whether or not this file was available when last checked.

patient_id

The anonymized unique identifier for this patient. For BC samples, this will be Res ID.

sample_id

A unique identifier for the sample analyzed.

seq_type

The assay type used to produce this data (one of "genome","capture, "mrna", "promethION")

genome_build

The name of the genome reference the data were aligned to.

tissue_status

Whether the sample was atumour or normal.

cohort

Name for a group of samples that were added together (usually from a single study), often in the format pathology_cohort_descriptor.

library_id

The unique identifier for the sequencing library.

pathology

The diagnosis or pathology for the sample

time_point

Timing of biopsy in increasing alphabetical order (A = diagnosis, B = first relapse etc)

protocol

General protocol for library construction. e.g. "Ribodepletion", "PolyA", or "Genome"

ffpe_or_frozen

Whether the nucleic acids were extracted from a frozen or FFPE sample

read_length

The length of reads (required for RNA-seq libraries)

strandedness

Whether the RNA-seq librayr construction was strand-specific and, if so, which strand. Required for RNAseq; "positive", "negative", or "unstranded")

seq_source_type

Required for RNAseq. Usually the same value as ffpe_or_frozen but sometimes immunotube or sorted cells

data_path

Symbolic link to the bam or cram file (not usually relevant for GAMBLR)

link_name

Standardized naming for symbolic link (not usually relevant for GAMBLR)

data_path

Symbolic link to the fastq file (not usually relevant for GAMBLR)

fastq_link_name

Standardized naming for symbolic link for FASTQ file, if used (not usually relevant for GAMBLR)

unix_group

Whether a source is external and restricted by data access agreements (icgc_dart) or internal (gambl)

COO_consensus

TODO

DHITsig_consensus

TODO

COO_PRPS_class

TODO

DHITsig_PRPS_class

TODO

DLBCL90_dlbcl_call

TODO

DLBCL90_dhitsig_call

TODO

res_id

duplicate of sample_id for local samples and NA otherwise

DLBCL90_code_set

Code set used for DLBCL90 call. One of DLBCL90 DLBCL90v2 DLBCL90v3

DLBCL90_dlbcl_score

TODO

DLBCL90_pmbl_score

TODO

DLBCL90_pmbl_call

TODO

DLBCL90_dhitsig_score

TODO

myc_ba

Result from breakapart FISH for MYC locus

myc_cn

Result from copy number FISH for MYC locus

bcl2_ba

Result from breakapart FISH for BCL2 locus

bcl2_cn

Result from copy number FISH for BCL2 locus

bcl6_ba

Result from breakapart FISH for BCL6 locus

bcl6_cn

Result from copy number FISH for BCL6 locus

time_since_diagnosis_years

TODO

relapse_timing

TODO

dtbx

TODO. OR REMOVE?

dtdx

TODO. OR REMOVE?

lymphgen_no_cnv

TODO

lymphgen_with_cnv

TODO

lymphgen_cnv_noA53

TODO

lymphgen_wright

The LymphGen call for this sample from Wright et all (if applicable)

fl_grade

TODO

capture_frozen_sample_id

TODO

capture_FFPE_sample_id

TODO

capture_unknown_sample_id

TODO

genome_frozen_sample_id

TODO

genome_ctDNA_sample_id

TODO

genome_FFPE_sample_id

TODO

mrna_PolyA_frozen_sample_id

TODO

mrna_Ribodepletion_frozen_sample_id

TODO

mrna_Ribodepletion_frozen_sample_id

TODO

XXX_cohort

Cohort name for batch effect correction(?)

transformation

TODO

relapse

TODO

ighv_mutation_original

TODO

normal_sample_id

TODO

pairing_status

TODO

ICGC_ID

TODO

ICGC_XXX

metadata value for ICGC cohort inferred from external metadata

detailed_pathology

TODO

COO_final

TODO

consensus_pathology

TODO

lymphgen

TODO

Tumor_Sample_Barcode

Duplicate of sample_id for simplifying joins to MAF data frames

consensus_coo_dhitsig

TODO

pathology_rank

Numeric rank for consistent ordering of samples by pathology

lymphgen_rank

Numeric rank for consistent ordering of samples by LymphGen

hiv_status

TODO

CODE_XXX

Event-free status at last follow-up for overall survival (OS), progression-free survival (PFS) etc. 0 = no event/censored. 1 = event

XXX_YEARS

Time, in years, from diagnosis to last follow-up for overall survival (OS), progression-free survival (PFS)

alive

Theoretically redundant with CODE_OS

is_adult

Adult or pediatric at diagnosis. One of "Adult" for adults and "Pediatric" otherwise

age_group

Adult_BL or Pediatric_BL or Other, specific to the BLGSP study

age

patient age at diagnosis

sex

The biological sex of the patient, if available. Allowable options: M, F, NA

tx_primary

TODO

cause_of_death

TODO

Details

This function returns metadata for GAMBL samples. Options for subset and filter the returned data are available. For more information on how to use this function with different filtering criteria, refer to the parameter descriptions, examples and vignettes. Embargoed cases (current options: 'BLGSP-study', 'FL-study', 'DLBCL-study', 'FL-DLBCL-study', 'FL-DLBCL-all', 'DLBCL-unembargoed', 'BL-DLBCL-manuscript', 'MCL','MCL-CLL')

Examples

if (FALSE) { # \dontrun{
#basic usage
my_metadata = suppressMessages(get_gambl_metadata())

#use pre-defined custom sample sets
only_blgsp_metadata = get_gambl_metadata(case_set = "BLGSP-study")

#override default filters and request metadata for samples other than tumour genomes,
#e.g. also get the normals
tumour_and_normal_metadata = get_gambl_metadata(tissue_status_filter = c('tumour','normal'))

non_redundant_genome_and_capture = get_gambl_metadata(seq_type_filter = c('genome', 'capture'),
                                                       seq_type_priority = "genome")
                                                       
absolutely_everything = get_gambl_metadata(seq_type_filter = c('genome', 'capture','mrna'), tissue_status_filter=c('tumour','normal'))
} # }