Get the expression for one or more genes for all GAMBL samples.

get_gene_expression(
  metadata,
  hugo_symbols,
  ensembl_gene_ids,
  join_with = "mrna",
  all_genes = FALSE,
  expression_data,
  from_flatfile = TRUE
)

Arguments

metadata

GAMBL metadata.

hugo_symbols

One or more gene symbols. Should match the values in a maf file.

ensembl_gene_ids

One or more ensembl gene IDs. Only one of hugo_symbols or ensembl_gene_ids may be used.

join_with

How to restrict cases for the join. Can be one of genome, mrna or "any".

all_genes

Set to TRUE to return the full expression data frame without any subsetting. Avoid this if you don't want to use tons of RAM.

expression_data

Optional argument to use an already loaded expression data frame (prevent function to re-load full df from flat file or database).

from_flatfile

Deprecated but left here for backwards compatibility.

Value

A data frame with gene expression.

Details

Effectively get gene expression for one or multiple genes for al GAMBL samples. This function can also take an already loaded expression matrix (expression_data) to prevent the user from having to load the full expression matrix if this function needs to be run in an interactive session. For examples and more info, refer to the parameter descriptions as wella s vignette examples.

Examples

MYC_expr = get_gene_expression(hugo_symbols = c("MYC"), join_with = "mrna")
#> [1] "grep -w -F -e Hugo_Symbol -e MYC /projects/nhl_meta_analysis_scratch/gambl/results_local/icgc_dart/DESeq2-0.0_salmon-1.0/mrna--gambl-icgc-all/vst-matrix-Hugo_Symbol_tidy.tsv"

#Read full expression values df (no subsetting on genes)
full_expression_df = get_gene_expression(all_genes = TRUE,
                                             join_with = "genome")
#> Warning: NAs produced by integer overflow
#> Error in vec_init(value, nrow * ncol): `n` must be a single number, not an integer `NA`.

#Use loaded df (in the previous step) to get expression values for IRF4 and MYC.
irf4_myc_expressions = get_gene_expression(hugo_symbols = c("IRF4", "MYC"),
                                               all_genes = FALSE,
                                               join_with = "genome",
                                               from_flatfile = FALSE,
                                               expression_data = full_expression_df)
#> Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'as.data.frame': object 'full_expression_df' not found