Get Raw Expression Counts
get_raw_expression_counts.Rd
Get the raw read counts from RNA-seq for one or more genes for all GAMBL samples.
Usage
get_raw_expression_counts(
these_samples_metadata,
existing_sample_id_column,
new_sample_id_column,
all_samples = FALSE,
check = FALSE,
from_flatfile = TRUE,
map_to_symbol = FALSE,
verbose = FALSE,
TPM = FALSE
)
Arguments
- these_samples_metadata
The data frame with sample metadata. Usually output of the get_gambl_metadata().
- existing_sample_id_column
Specify which column contains the sample_id you want replaced with the contents of new_sample_id_column
- new_sample_id_column
Specify which column in your metadata contains the sample_id you want used instead of the existing_sample_id
- all_samples
Set to TRUE to force the function to return all available data (should rarely be necessary)
- check
For basic debugging. Set to TRUE to obtain basic information about the number of samples in your metadata with expression data available
- from_flatfile
Set to FALSE to use the database instead of reading from flatfiles
- map_to_symbol
Set to TRUE to obtain the mappings between the rows in the count matrix and HGNC gene symbol/alias
- verbose
Set to TRUE for more print statements and such
- TPM
Set to TRUE to get TPM estimates instead of counts
Details
Efficiently retrieve raw gene expression values (read counts) for one, multiple or all genes for all GAMBL samples. For examples and more info, refer to the parameter descriptions as well as vignette examples.
Examples
if (FALSE) { # \dontrun{
schmitz_meta = get_gambl_metadata() %>%
filter(seq_type=="mrna",cohort=="dlbcl_schmitz")
exp_out = get_raw_expression_counts(these_samples_metadata = schmitz_meta)
# Create DESeq data set directly from the two named objects in the output
dds <- DESeqDataSetFromMatrix(countData = exp_out$counts,
colData = exp_out$metadata,
design = ~ COO_consensus + sex)
# Run a basic DESeq analysis
dds <- DESeq(dds)
res <- results(dds,
name="COO_consensus_GCB_vs_ABC",
lfcThreshold=2,alpha=0.1)
# Filter outputs using padj, logFC and baseMean (more highly expressed overall)
res_df = as.data.frame(res) %>%
filter(padj<0.1,baseMean>500)
show_genes = rownames(res_df)
vsd <- vst(dds, blind=FALSE)
#Visualize the results with a heatmap
column_ha = HeatmapAnnotation(df=select(exp_out$metadata,COO_consensus,sex))
Heatmap(assay(vsd)[show_genes,],
row_names_gp = gpar(fontsize=5),
bottom_annotation = column_ha,
show_column_names = FALSE)
} # }