Data bunded with GAMBLR.data
The GAMBLR.data comes with many different bundled data objects that can be systematically organized in the following categories:
Somatic variants
sample_dataA list of data frames containing the metadata, simple somatic, copy number, and structural variants collected together from the supplemental tables of large sequencing studies of B-cell lymphomas.
Curated gene lists
gene_blacklistA tibble with gene symbols (Hugo) that fall within blacklisted regions of the genome. The genes in this data object represent common sequencing artifacts and are discarded during the data analysis.lymphoma_genesA data frame with a manually curated set of genes commonly mutated in lymphomas with associated TRUE/FALSE columns annotating lymphoma type(s) where the particular mutations are present. This object by default always represents the most recent version of the curated list.lymphoma_genes_bl_v0.1A data frame with a manually curated set of genes commonly mutated in BL with associated TRUE/FALSE columns annotating whether the particular mutations are present. This is the versioned data and can be referred to directly by it’s version number.lymphoma_genes_bl_v_latestA data frame with a manually curated set of genes commonly mutated in BL with associated TRUE/FALSE columns annotating whether the particular mutations are present. This object by default always represents the most recent version of the curated list.lymphoma_genes_comprehensiveA data frame with the curated list of genes reported as significantly mutated in the large lymphoma studies. Both Ensembl ID and Hugo Symbol are available as gene identifiers. This data contains annotations for the studies by Chapuy, Reddy, Wright (LymphGen), Lacy, as well as annotations for whether the gene is curated, reported as SMG in other_studies, or a target of aSHM.lymphoma_genes_dlbcl_v0.1A data frame with a manually curated set of genes commonly mutated in DLBCL with associated TRUE/FALSE columns annotating whether the particular mutations are present. This is the versioned data and can be referred to directly by it’s version number.lymphoma_genes_dlbcl_v_latestA data frame with a manually curated set of genes commonly mutated in DLBCL with associated TRUE/FALSE columns annotating whether the particular mutations are present. This object by default always represents the most recent version of the curated list.lymphoma_genes_lymphoma_genes_v0.0Legacy version of the curated list of genes significantly mutated in lymphomas. Bundled here for backwards compatibility and can be referred to directly by it’s version number.lymphoma_genes_mcl_v0.1A data frame with a manually curated set of genes commonly mutated in MCL with associated TRUE/FALSE columns annotating whether the particular mutations are present. This is the versioned data and can be referred directly by it’s version number.lymphoma_genes_mcl_v_latestA data frame with a manually curated set of genes commonly mutated in MCL with associated TRUE/FALSE columns annotating whether the particular mutations are present. This object by default always represents the most recent version of the curated list.
Coordinate-based resources
chromosome_arms_grch37: A data frame with the chromosome arm coordinates with respect to the grch37 projection.chromosome_arms_hg38A data frame with the chromosome arm coordinates with respect to the hg38 projection.grch37_gene_coordinatesA data frame of all gene coordinates with respect to grch37. Contains both Ensembl ID and Hugo Symbol as identifiers.grch37_lymphoma_genes_bedA data frame in the bed format for genes commonly associated with B-cell lymphomas. Coordinates are with respect to grch37.grch37_oncogeneA data frame with the coordinates of lymphoma oncogenes relative to grch37. Used in mapping of the breakpoint coordinates.grch37_partnersA data frame of translocation partners for oncogenes with coordinates relative to grch37.hg38_gene_coordinatesA data frame of all gene coordinates with respect to hg38. Contains both Ensembl ID and Hugo Symbol as identifiers.hg38_lymphoma_genes_bedA data frame in the bed format for genes commonly associated with B-cell lymphomas. Coordinates are with respect to hg38.hg38_oncogeneA data frame with the coordinates of lymphoma oncogenes relative to the hg38. Used in mapping of the breakpoint coordinates.hg38_partnersA data frame of translocation partners for oncogenes with relative coordinates to hg38.grch37_all_gene_coordinatesA data frame of protein-coding gene coordinates relative to grch37. Contains both Ensembl ID and Hugo Symbol as identifiers. Mainly here for backwards compatibility with earlier GAMBLR versions.hotspot_regions_grch37A data frame of mutation hotspot regions relative to grch37.hotspot_regions_hg38A data frame of mutation hotspot regions relative to hg38.target_regions_grch37A data frame with coordinates of the regions of the genome targeted by the whole exome sequencing panel Agilent V5 (no UTR) relative to grch37.target_regions_hg38A data frame with coordinates of the regions of the genome targeted by the whole exome sequencing panel Agilent V5 (no UTR) relative to hg38.
aSHM regions
grch37_ashm_regionsAberrant somatic hypermutation (aSHM) regions relative to grch37. This object always by default refers to the most recent version of the aSHM regions.hg38_ashm_regionsAberrant somatic hypermutation (aSHM) regions relative to hg38. This object always by default refers to the most recent version of the aSHM regions.somatic_hypermutation_locations_GRCh37_v0.0Aberrant somatic hypermutation (aSHM) regions relative to grch37. This is the versioned data and can be referred to directly by it’s version number.somatic_hypermutation_locations_GRCh37_v0.1Aberrant somatic hypermutation (aSHM) regions relative to grch37. This is the versioned data and can be referred to directly by it’s version number.somatic_hypermutation_locations_GRCh37_v0.2Aberrant somatic hypermutation (aSHM) regions relative to grch37. This is the versioned data and can be referred to directly by it’s version number.somatic_hypermutation_locations_GRCh37_v0.3Aberrant somatic hypermutation (aSHM) regions relative to grch37. This is the versioned data and can be referred to directly by it’s version number.somatic_hypermutation_locations_GRCh37_v0.4Aberrant somatic hypermutation (aSHM) regions relative to grch37. This is the versioned data and can be referred to directly by it’s version number.somatic_hypermutation_locations_GRCh37_v0.5Aberrant somatic hypermutation (aSHM) regions relative to grch37. This is the versioned data and can be referred to directly by it’s version number.somatic_hypermutation_locations_GRCh37_v_latestAberrant somatic hypermutation (aSHM) regions relative to grch37. This is an alias for the latest version of this data.somatic_hypermutation_locations_GRCh38_v0.0Aberrant somatic hypermutation (aSHM) regions relative to hg38. This is the versioned data and can be referred to directly by it’s version number.somatic_hypermutation_locations_GRCh38_v0.1Aberrant somatic hypermutation (aSHM) regions relative to hg38. This is the versioned data and can be referred to directly by it’s version number.somatic_hypermutation_locations_GRCh38_v0.2Aberrant somatic hypermutation (aSHM) regions relative to hg38. This is the versioned data and can be referred to directly by it’s version number.somatic_hypermutation_locations_GRCh38_v0.3Aberrant somatic hypermutation (aSHM) regions relative to hg38. This is the versioned data and can be referred to directly by it’s version number.somatic_hypermutation_locations_GRCh38_v0.4Aberrant somatic hypermutation (aSHM) regions relative to hg38. This is the versioned data and can be referred to directly by it’s version number.somatic_hypermutation_locations_GRCh38_v0.5Aberrant somatic hypermutation (aSHM) regions relative to hg38. This is the versioned data and can be referred to directly by it’s version number.somatic_hypermutation_locations_GRCh38_v_latestAberrant somatic hypermutation (aSHM) regions relative to hg38. This is an alias for the latest version of this data.
Other resources
colour_codesA data frame with colour codes (HEX) arranged into different categories, groups.dhitsig_genes_with_weightsA data frame with double hit signature genes (both as ensembl IDs and Hugo symbols) and importance scores.gambl_metadataA data frame with metadata for a collection of GAMBL samples. This represents a collection of whole genome, exome, targeted, RNA, and PrometION sequencing samples available as a data set known as GAMBL. This object rather serves an FYI purpose as not all samples listed here are published and bundled with GAMBLR.data.hgnc2pfam.dfA dataset containing the mapping table between Hugo symbol, UniProt ID, and Pfam ACC. This dataset comes from the g3viz package and was obtained via this URL: https://github.com/morinlab/g3viz/tree/master/datahotspots_annotationsHotspot coordinates used in the feature annotation during matrix assembly of data for cFL classifier.mirage_metricsA data frame providing the data reported in the Supplemental Table of the MIRAGE study by Dreval et al, 2022mutation.table.dfA data frame providing the linkage between Variant Classification, Mutation_Class, and Short_Name for the simple somatic mutations.- reddy_genes A data frame of the genes reported as significantly mutated by the study of Reddy et al, 2017
wright_genes_with_weightsWright genes with weight values from the study by Scott et al, 2014.