Skip to contents

Get the expression for one or more genes for all GAMBL samples.

Usage

get_gene_expression(
  these_samples_metadata,
  hugo_symbols,
  ensembl_gene_ids,
  all_genes = FALSE,
  verbose = FALSE,
  engine = "grep",
  format = "wide",
  lazy_join = FALSE,
  arbitrarily_pick = FALSE,
  HGNC = FALSE,
  ...
)

Arguments

these_samples_metadata

The data frame with sample metadata. Usually output of the get_gambl_metadata().

hugo_symbols

One or more gene symbols. Cannot be used in conjunction with ensembl_gene_ids.

ensembl_gene_ids

One or more ensembl gene IDs. Cannot be used in conjunction with hugo_symbols.

all_genes

Set to TRUE for the full expression data without any subsetting (see warnings below).

verbose

Set to TRUE for a more chatty output

engine

Either readr or grep. The grep engine usually will increase the speed of loading but doesn't work if you want all genes or a very long list.

format

Either wide or long. Wide format returns one column of expression values per gene. Long format returns one column of expression values with the gene stored in a separate column.

lazy_join

If TRUE, your data frame will also have capture_sample_id and genome_sample_id columns provided. See check_gene_expression for more information.

arbitrarily_pick

A stop-gap for handling the rare scenario where the same Hugo_Symbol has more than one ensembl_gene_id. Set to TRUE only if you encounter an error that states "Values are not uniquely identified; output will contain list-cols."

HGNC

When you request the wide matrix and all genes, this forces the columns to contain hgnc_id rather than ensembl_gene_id

...

Optional parameters to pass along to get_gambl_metadata (only used in conjunction with lazy_join)

Value

A data frame with the first 9 columns identical to the columns from check_gene_expression and the remaining columns containing the expression values for each gene requested.

Details

Efficiently retrieve variance-stabilized and batch effect corrected gene expression values for one, multiple or all genes for all GAMBL samples. For more information, refer to the parameter descriptions and examples.

Warnings:

  1. The speed of loading data is heavily impacted by how many samples you load. For the sake of efficiency, be sure not to specify extraneous samples.

  2. To reduce impact on memory (RAM), load only the data for the genes you need.

  3. Combining lazy_join with all_genes will result in a data table with samples on rows and genes on columns. Use with caution. This is practically guaranteed to use more RAM than you want.

  4. Before you run this function, it's recommended that you run check_gene_expression to determine which samples are available

Examples