Skip to contents

This function identifies the top N features (genes) for each subtype based on their prevalence in the dataset using either the truth labels or the predicted subgroups from DLBCLone.

Usage

posthoc_feature_enrichment(
  sample_metadata,
  features,
  label_column = "lymphgen",
  truth_classes = c("BN2", "EZB", "MCD", "ST2", "N1"),
  method = "frequency",
  num_feats = 10,
  p_threshold = 0.01,
  title = NULL,
  base_size = 7,
  separate_plot_per_group = TRUE
)

Arguments

sample_metadata

Data frame containing sample metadata with class labels, by default in a column named "lymphgen". Use label_column to specify a different column.

label_column

Name of the column containing the class labels. The default is to use "lymphgen", the default truth class.

truth_classes

Vector of class labels to consider (default: c("BN2","EZB","MCD","ST2","N1")).

method

Method to determine top features: "frequency" for most abundant features, "chi_square" for top differentially mutated features in the classes vs all other classes (default : "frequency").

num_feats

Number of top features to display per subtype (default: 10).

p_threshold

Maximum P value to retain (when method is fisher)

title

Title for the plot (default: NULL).

base_size

Base font size used (passed to theme_Morons)

separate_plot_per_group

If TRUE, creates separate plots for each group and also combines them with ggarrange (default: FALSE).

Value

A ggplot2 object representing the stacked bar plot.

Examples

if (FALSE) { # \dontrun{
library(GAMBLR.predict)

# Assuming my_DLBCLone_opt is the output from DLBCLone_optimize_params
plot_list <- posthoc_feature_enrichment(
    my_DLBCLone_opt$predictions,
    features=DLBCLone_model$features,
    method = "chi_square",
    num_feats = 10,
    title = "LymphGen"
) 
print(plot_list$bar_plot)

plot_list <- posthoc_feature_enrichment(
   label_column = "lymphgen",
   sample_metadata = my_DLBCLone_opt$predictions,
   features = my_DLBCLone_opt$features,
   method = "fisher",
   num_feats = 10,
   base_size=9
)
print(plot_list$forest_plot)
} # }