Get Gene Expression.
get_gene_expression.Rd
Get the expression for one or more genes for all GAMBL samples.
Usage
get_gene_expression(
these_samples_metadata,
hugo_symbols,
ensembl_gene_ids,
all_genes = FALSE,
verbose = FALSE,
engine = "grep",
format = "wide",
lazy_join = FALSE,
arbitrarily_pick = FALSE,
HGNC = FALSE,
...
)
Arguments
- these_samples_metadata
The data frame with sample metadata. Usually output of the get_gambl_metadata().
- hugo_symbols
One or more gene symbols. Cannot be used in conjunction with ensembl_gene_ids.
- ensembl_gene_ids
One or more ensembl gene IDs. Cannot be used in conjunction with hugo_symbols.
- all_genes
Set to TRUE for the full expression data without any subsetting (see warnings below).
- verbose
Set to TRUE for a more chatty output
- engine
Either readr or grep. The grep engine usually will increase the speed of loading but doesn't work if you want all genes or a very long list.
- format
Either
wide
orlong
. Wide format returns one column of expression values per gene. Long format returns one column of expression values with the gene stored in a separate column.- lazy_join
If TRUE, your data frame will also have capture_sample_id and genome_sample_id columns provided. See
check_gene_expression
for more information.- arbitrarily_pick
A stop-gap for handling the rare scenario where the same Hugo_Symbol has more than one ensembl_gene_id. Set to TRUE only if you encounter an error that states "Values are not uniquely identified; output will contain list-cols."
- HGNC
When you request the wide matrix and all genes, this forces the columns to contain hgnc_id rather than ensembl_gene_id
- ...
Optional parameters to pass along to
get_gambl_metadata
(only used in conjunction with lazy_join)
Value
A data frame with the first 9 columns identical to the columns from check_gene_expression and the remaining columns containing the expression values for each gene requested.
Details
Efficiently retrieve variance-stabilized and batch effect corrected gene expression values for one, multiple or all genes for all GAMBL samples. For more information, refer to the parameter descriptions and examples.
Warnings:
The speed of loading data is heavily impacted by how many samples you load. For the sake of efficiency, be sure not to specify extraneous samples.
To reduce impact on memory (RAM), load only the data for the genes you need.
Combining lazy_join with all_genes will result in a data table with samples on rows and genes on columns. Use with caution. This is practically guaranteed to use more RAM than you want.
Before you run this function, it's recommended that you run
check_gene_expression
to determine which samples are available