Get GAMBL metadata.
og_get_gambl_metadata.Rd
Return metadata for a selection of samples.
Usage
og_get_gambl_metadata(
seq_type_filter = "genome",
tissue_status_filter = "tumour",
case_set,
remove_benchmarking = TRUE,
with_outcomes = TRUE,
sample_flatfile,
biopsy_flatfile,
only_available = TRUE,
seq_type_priority = "genome",
from_flatfile
)
Arguments
- seq_type_filter
Filtering criteria (default: all genomes).
- tissue_status_filter
Filtering criteria for tissue status. Possible values are "tumour" (the default) or "normal".
- case_set
Optional short name for a pre-defined set of cases avoiding any embargoed cases (current options: 'BLGSP-study', 'FL-study', 'DLBCL-study', 'FL-DLBCL-study', 'FL-DLBCL-all', 'DLBCL-unembargoed', 'BL-DLBCL-manuscript', 'MCL','MCL-CLL').
- remove_benchmarking
By default the FFPE benchmarking duplicate samples will be dropped.
- with_outcomes
Optionally join to gambl outcome data.
- sample_flatfile
Optionally provide the full path to a sample table to use instead of the default.
- biopsy_flatfile
Optionally provide the full path to a biopsy table to use instead of the default.
- only_available
If TRUE, will remove samples with FALSE or NA in the bam_available column (default: TRUE).
- seq_type_priority
For duplicate sample_id with different seq_type available, the metadata will prioritize this seq_type and drop the others. Possible values are "genome" or "capture".
- from_flatfile
Deprecated (will be ignored)
Value
A data frame with metadata for each biopsy in GAMBL
- compression
Format of the original data used as input for our analysis pipelines (cram, bam or fastq)
- bam_available
Whether or not this file was available when last checked.
- patient_id
The anonymized unique identifier for this patient. For BC samples, this will be Res ID.
- sample_id
A unique identifier for the sample analyzed.
- seq_type
The assay type used to produce this data (one of "genome","capture, "mrna", "promethION")
- genome_build
The name of the genome reference the data were aligned to.
- tissue_status
Whether the sample was atumour or normal.
- cohort
Name for a group of samples that were added together (usually from a single study), often in the format pathology_cohort_descriptor.
- library_id
The unique identifier for the sequencing library.
- pathology
The diagnosis or pathology for the sample
- time_point
Timing of biopsy in increasing alphabetical order (A = diagnosis, B = first relapse etc)
- protocol
General protocol for library construction. e.g. "Ribodepletion", "PolyA", or "Genome"
- ffpe_or_frozen
Whether the nucleic acids were extracted from a frozen or FFPE sample
- read_length
The length of reads (required for RNA-seq libraries)
- strandedness
Whether the RNA-seq librayr construction was strand-specific and, if so, which strand. Required for RNAseq; "positive", "negative", or "unstranded")
- seq_source_type
Required for RNAseq. Usually the same value as ffpe_or_frozen but sometimes immunotube or sorted cells
- data_path
Symbolic link to the bam or cram file (not usually relevant for GAMBLR)
- link_name
Standardized naming for symbolic link (not usually relevant for GAMBLR)
- data_path
Symbolic link to the fastq file (not usually relevant for GAMBLR)
- fastq_link_name
Standardized naming for symbolic link for FASTQ file, if used (not usually relevant for GAMBLR)
- unix_group
Whether a source is external and restricted by data access agreements (icgc_dart) or internal (gambl)
- COO_consensus
TODO
- DHITsig_consensus
TODO
- COO_PRPS_class
TODO
- DHITsig_PRPS_class
TODO
- DLBCL90_dlbcl_call
TODO
- DLBCL90_dhitsig_call
TODO
- res_id
duplicate of sample_id for local samples and NA otherwise
- DLBCL90_code_set
Code set used for DLBCL90 call. One of DLBCL90 DLBCL90v2 DLBCL90v3
- DLBCL90_dlbcl_score
TODO
- DLBCL90_pmbl_score
TODO
- DLBCL90_pmbl_call
TODO
- DLBCL90_dhitsig_score
TODO
- myc_ba
Result from breakapart FISH for MYC locus
- myc_cn
Result from copy number FISH for MYC locus
- bcl2_ba
Result from breakapart FISH for BCL2 locus
- bcl2_cn
Result from copy number FISH for BCL2 locus
- bcl6_ba
Result from breakapart FISH for BCL6 locus
- bcl6_cn
Result from copy number FISH for BCL6 locus
- time_since_diagnosis_years
TODO
- relapse_timing
TODO
- dtbx
TODO. OR REMOVE?
- dtdx
TODO. OR REMOVE?
- lymphgen_no_cnv
TODO
- lymphgen_with_cnv
TODO
- lymphgen_cnv_noA53
TODO
- lymphgen_wright
The LymphGen call for this sample from Wright et all (if applicable)
- fl_grade
TODO
- capture_frozen_sample_id
TODO
- capture_FFPE_sample_id
TODO
- capture_unknown_sample_id
TODO
- genome_frozen_sample_id
TODO
- genome_ctDNA_sample_id
TODO
- genome_FFPE_sample_id
TODO
- mrna_PolyA_frozen_sample_id
TODO
- mrna_Ribodepletion_frozen_sample_id
TODO
- mrna_Ribodepletion_frozen_sample_id
TODO
- XXX_cohort
Cohort name for batch effect correction(?)
- transformation
TODO
- relapse
TODO
- ighv_mutation_original
TODO
- normal_sample_id
TODO
- pairing_status
TODO
- ICGC_ID
TODO
- ICGC_XXX
metadata value for ICGC cohort inferred from external metadata
- detailed_pathology
TODO
- COO_final
TODO
- consensus_pathology
TODO
- lymphgen
TODO
- Tumor_Sample_Barcode
Duplicate of sample_id for simplifying joins to MAF data frames
- consensus_coo_dhitsig
TODO
- pathology_rank
Numeric rank for consistent ordering of samples by pathology
- lymphgen_rank
Numeric rank for consistent ordering of samples by LymphGen
- hiv_status
TODO
- CODE_XXX
Event-free status at last follow-up for overall survival (OS), progression-free survival (PFS) etc. 0 = no event/censored. 1 = event
- XXX_YEARS
Time, in years, from diagnosis to last follow-up for overall survival (OS), progression-free survival (PFS)
- alive
Theoretically redundant with CODE_OS
- is_adult
Adult or pediatric at diagnosis. One of "Adult" for adults and "Pediatric" otherwise
- age_group
Adult_BL or Pediatric_BL or Other, specific to the BLGSP study
- age
patient age at diagnosis
- sex
The biological sex of the patient, if available. Allowable options: M, F, NA
- tx_primary
TODO
- cause_of_death
TODO
Details
This function returns metadata for GAMBL samples. Options for subset and filter the returned data are available. For more information on how to use this function with different filtering criteria, refer to the parameter descriptions, examples and vignettes. Embargoed cases (current options: 'BLGSP-study', 'FL-study', 'DLBCL-study', 'FL-DLBCL-study', 'FL-DLBCL-all', 'DLBCL-unembargoed', 'BL-DLBCL-manuscript', 'MCL','MCL-CLL')
Examples
if (FALSE) { # \dontrun{
#basic usage
my_metadata = suppressMessages(get_gambl_metadata())
#use pre-defined custom sample sets
only_blgsp_metadata = get_gambl_metadata(case_set = "BLGSP-study")
#override default filters and request metadata for samples other than tumour genomes,
#e.g. also get the normals
tumour_and_normal_metadata = get_gambl_metadata(tissue_status_filter = c('tumour','normal'))
non_redundant_genome_and_capture = get_gambl_metadata(seq_type_filter = c('genome', 'capture'),
seq_type_priority = "genome")
absolutely_everything = get_gambl_metadata(seq_type_filter = c('genome', 'capture','mrna'), tissue_status_filter=c('tumour','normal'))
} # }