Skip to contents

Get the raw read counts from RNA-seq for one or more genes for all GAMBL samples.

Usage

get_raw_expression_counts(
  these_samples_metadata,
  existing_sample_id_column,
  new_sample_id_column,
  all_samples = FALSE,
  check = FALSE,
  from_flatfile = TRUE,
  map_to_symbol = FALSE,
  verbose = FALSE,
  TPM = FALSE
)

Arguments

these_samples_metadata

The data frame with sample metadata. Usually output of the get_gambl_metadata().

existing_sample_id_column

Specify which column contains the sample_id you want replaced with the contents of new_sample_id_column

new_sample_id_column

Specify which column in your metadata contains the sample_id you want used instead of the existing_sample_id

all_samples

Set to TRUE to force the function to return all available data (should rarely be necessary)

check

For basic debugging. Set to TRUE to obtain basic information about the number of samples in your metadata with expression data available

from_flatfile

Set to FALSE to use the database instead of reading from flatfiles

map_to_symbol

Set to TRUE to obtain the mappings between the rows in the count matrix and HGNC gene symbol/alias

verbose

Set to TRUE for more print statements and such

TPM

Set to TRUE to get TPM estimates instead of counts

Value

A list containing a counts matrix and the associated metadata for DESeq2

Details

Efficiently retrieve raw gene expression values (read counts) for one, multiple or all genes for all GAMBL samples. For examples and more info, refer to the parameter descriptions as well as vignette examples.

Examples

if (FALSE) { # \dontrun{
schmitz_meta = get_gambl_metadata() %>% 
    filter(seq_type=="mrna",cohort=="dlbcl_schmitz")
exp_out = get_raw_expression_counts(these_samples_metadata = schmitz_meta)

# Create DESeq data set directly from the two named objects in the output

dds <- DESeqDataSetFromMatrix(countData = exp_out$counts,
    colData = exp_out$metadata,
    design = ~ COO_consensus + sex)
    
# Run a basic DESeq analysis
dds <- DESeq(dds)
res <- results(dds, 
    name="COO_consensus_GCB_vs_ABC",
    lfcThreshold=2,alpha=0.1)
# Filter outputs using padj, logFC and baseMean (more highly expressed overall)     
res_df = as.data.frame(res) %>% 
    filter(padj<0.1,baseMean>500)

show_genes = rownames(res_df)
vsd <- vst(dds, blind=FALSE)

#Visualize the results with a heatmap
column_ha = HeatmapAnnotation(df=select(exp_out$metadata,COO_consensus,sex))
Heatmap(assay(vsd)[show_genes,],
    row_names_gp = gpar(fontsize=5),
    bottom_annotation = column_ha,
    show_column_names = FALSE)
} # }