Data bunded with GAMBLR.data

The GAMBLR.data comes with many different bundled data objects that can be systematically organized in the following categories:

Somatic variants

  • sample_data A list of data frames containing the metadata, simple somatic, copy number, and structural variants collected together from the supplemental tables of large sequencing studies of B-cell lymphomas.

Curated gene lists

  • gene_blacklist A tibble with gene symbols (Hugo) that fall within blacklisted regions of the genome. The genes in this data object represent common sequencing artifacts and are discarded during the data analysis.
  • lymphoma_genes A data frame with a manually curated set of genes commonly mutated in lymphomas with associated TRUE/FALSE columns annotating lymphoma type(s) where the particular mutations are present. This object by default always represents the most recent version of the curated list.
  • lymphoma_genes_bl_v0.1 A data frame with a manually curated set of genes commonly mutated in BL with associated TRUE/FALSE columns annotating whether the particular mutations are present. This is the versioned data and can be referred to directly by it’s version number.
  • lymphoma_genes_bl_v_latest A data frame with a manually curated set of genes commonly mutated in BL with associated TRUE/FALSE columns annotating whether the particular mutations are present. This object by default always represents the most recent version of the curated list.
  • lymphoma_genes_comprehensive A data frame with the curated list of genes reported as significantly mutated in the large lymphoma studies. Both Ensembl ID and Hugo Symbol are available as gene identifiers. This data contains annotations for the studies by Chapuy, Reddy, Wright (LymphGen), Lacy, as well as annotations for whether the gene is curated, reported as SMG in other_studies, or a target of aSHM.
  • lymphoma_genes_dlbcl_v0.1 A data frame with a manually curated set of genes commonly mutated in DLBCL with associated TRUE/FALSE columns annotating whether the particular mutations are present. This is the versioned data and can be referred to directly by it’s version number.
  • lymphoma_genes_dlbcl_v_latest A data frame with a manually curated set of genes commonly mutated in DLBCL with associated TRUE/FALSE columns annotating whether the particular mutations are present. This object by default always represents the most recent version of the curated list.
  • lymphoma_genes_lymphoma_genes_v0.0 Legacy version of the curated list of genes significantly mutated in lymphomas. Bundled here for backwards compatibility and can be referred to directly by it’s version number.
  • lymphoma_genes_mcl_v0.1 A data frame with a manually curated set of genes commonly mutated in MCL with associated TRUE/FALSE columns annotating whether the particular mutations are present. This is the versioned data and can be referred directly by it’s version number.
  • lymphoma_genes_mcl_v_latest A data frame with a manually curated set of genes commonly mutated in MCL with associated TRUE/FALSE columns annotating whether the particular mutations are present. This object by default always represents the most recent version of the curated list.

Coordinate-based resources

  • chromosome_arms_grch37: A data frame with the chromosome arm coordinates with respect to the grch37 projection.
  • chromosome_arms_hg38 A data frame with the chromosome arm coordinates with respect to the hg38 projection.
  • grch37_gene_coordinates A data frame of all gene coordinates with respect to grch37. Contains both Ensembl ID and Hugo Symbol as identifiers.
  • grch37_lymphoma_genes_bed A data frame in the bed format for genes commonly associated with B-cell lymphomas. Coordinates are with respect to grch37.
  • grch37_oncogene A data frame with the coordinates of lymphoma oncogenes relative to grch37. Used in mapping of the breakpoint coordinates.
  • grch37_partners A data frame of translocation partners for oncogenes with coordinates relative to grch37.
  • hg38_gene_coordinates A data frame of all gene coordinates with respect to hg38. Contains both Ensembl ID and Hugo Symbol as identifiers.
  • hg38_lymphoma_genes_bed A data frame in the bed format for genes commonly associated with B-cell lymphomas. Coordinates are with respect to hg38.
  • hg38_oncogene A data frame with the coordinates of lymphoma oncogenes relative to the hg38. Used in mapping of the breakpoint coordinates.
  • hg38_partners A data frame of translocation partners for oncogenes with relative coordinates to hg38.
  • grch37_all_gene_coordinates A data frame of protein-coding gene coordinates relative to grch37. Contains both Ensembl ID and Hugo Symbol as identifiers. Mainly here for backwards compatibility with earlier GAMBLR versions.
  • hotspot_regions_grch37 A data frame of mutation hotspot regions relative to grch37.
  • hotspot_regions_hg38 A data frame of mutation hotspot regions relative to hg38.
  • target_regions_grch37 A data frame with coordinates of the regions of the genome targeted by the whole exome sequencing panel Agilent V5 (no UTR) relative to grch37.
  • target_regions_hg38 A data frame with coordinates of the regions of the genome targeted by the whole exome sequencing panel Agilent V5 (no UTR) relative to hg38.

aSHM regions

  • grch37_ashm_regions Aberrant somatic hypermutation (aSHM) regions relative to grch37. This object always by default refers to the most recent version of the aSHM regions.
  • hg38_ashm_regions Aberrant somatic hypermutation (aSHM) regions relative to hg38. This object always by default refers to the most recent version of the aSHM regions.
  • somatic_hypermutation_locations_GRCh37_v0.0 Aberrant somatic hypermutation (aSHM) regions relative to grch37. This is the versioned data and can be referred to directly by it’s version number.
  • somatic_hypermutation_locations_GRCh37_v0.1 Aberrant somatic hypermutation (aSHM) regions relative to grch37. This is the versioned data and can be referred to directly by it’s version number.
  • somatic_hypermutation_locations_GRCh37_v0.2 Aberrant somatic hypermutation (aSHM) regions relative to grch37. This is the versioned data and can be referred to directly by it’s version number.
  • somatic_hypermutation_locations_GRCh37_v0.3 Aberrant somatic hypermutation (aSHM) regions relative to grch37. This is the versioned data and can be referred to directly by it’s version number.
  • somatic_hypermutation_locations_GRCh37_v0.4 Aberrant somatic hypermutation (aSHM) regions relative to grch37. This is the versioned data and can be referred to directly by it’s version number.
  • somatic_hypermutation_locations_GRCh37_v0.5 Aberrant somatic hypermutation (aSHM) regions relative to grch37. This is the versioned data and can be referred to directly by it’s version number.
  • somatic_hypermutation_locations_GRCh37_v_latest Aberrant somatic hypermutation (aSHM) regions relative to grch37. This is an alias for the latest version of this data.
  • somatic_hypermutation_locations_GRCh38_v0.0 Aberrant somatic hypermutation (aSHM) regions relative to hg38. This is the versioned data and can be referred to directly by it’s version number.
  • somatic_hypermutation_locations_GRCh38_v0.1 Aberrant somatic hypermutation (aSHM) regions relative to hg38. This is the versioned data and can be referred to directly by it’s version number.
  • somatic_hypermutation_locations_GRCh38_v0.2 Aberrant somatic hypermutation (aSHM) regions relative to hg38. This is the versioned data and can be referred to directly by it’s version number.
  • somatic_hypermutation_locations_GRCh38_v0.3 Aberrant somatic hypermutation (aSHM) regions relative to hg38. This is the versioned data and can be referred to directly by it’s version number.
  • somatic_hypermutation_locations_GRCh38_v0.4 Aberrant somatic hypermutation (aSHM) regions relative to hg38. This is the versioned data and can be referred to directly by it’s version number.
  • somatic_hypermutation_locations_GRCh38_v0.5 Aberrant somatic hypermutation (aSHM) regions relative to hg38. This is the versioned data and can be referred to directly by it’s version number.
  • somatic_hypermutation_locations_GRCh38_v_latest Aberrant somatic hypermutation (aSHM) regions relative to hg38. This is an alias for the latest version of this data.

Other resources

  • colour_codes A data frame with colour codes (HEX) arranged into different categories, groups.
  • dhitsig_genes_with_weights A data frame with double hit signature genes (both as ensembl IDs and Hugo symbols) and importance scores.
  • gambl_metadata A data frame with metadata for a collection of GAMBL samples. This represents a collection of whole genome, exome, targeted, RNA, and PrometION sequencing samples available as a data set known as GAMBL. This object rather serves an FYI purpose as not all samples listed here are published and bundled with GAMBLR.data.
  • hgnc2pfam.df A dataset containing the mapping table between Hugo symbol, UniProt ID, and Pfam ACC. This dataset comes from the g3viz package and was obtained via this URL: https://github.com/morinlab/g3viz/tree/master/data
  • hotspots_annotations Hotspot coordinates used in the feature annotation during matrix assembly of data for cFL classifier.
  • mirage_metrics A data frame providing the data reported in the Supplemental Table of the MIRAGE study by Dreval et al, 2022
  • mutation.table.df A data frame providing the linkage between Variant Classification, Mutation_Class, and Short_Name for the simple somatic mutations.
  • reddy_genes A data frame of the genes reported as significantly mutated by the study of Reddy et al, 2017
  • wright_genes_with_weights Wright genes with weight values from the study by Scott et al, 2014.
Back to top