Collate Results — collate_results • GAMBLR

Bring together all derived sample-level results from many GAMBL pipelines.

collate_results(
  sample_table,
  write_to_file = FALSE,
  join_with_full_metadata = FALSE,
  these_samples_metadata,
  case_set,
  sbs_manipulation = "",
  seq_type_filter = "genome",
  from_cache = TRUE
)

Arguments

sample_table: A data frame with sample_id as the first column.
write_to_file: Boolean statement that outputs tsv file (/projects/nhl_meta_analysis_scratch/gambl/results_local/shared/gambl_seq_type_filter_results.tsv) if TRUE, default is FALSE.
join_with_full_metadata: Join with all columns of metadata, default is FALSE.
these_samples_metadata: Optional argument to use a user specified metadata df, overwrites get_gambl_metadata in join_with_full_metadata.
case_set: Optional short name for a pre-defined set of cases.
sbs_manipulation: Optional variable for transforming sbs values (e.g log, scale).
seq_type_filter: Filtering criteria, default is genomes.
from_cache: Boolean variable for using cached results (/projects/nhl_meta_analysis_scratch/gambl/results_local/shared/gambl_seq_type_filter_results.tsv), default is TRUE. If write_to_file is TRUE, this parameter auto-defaults to FALSE.

Value

A table keyed on biopsy_id that contains a bunch of per-sample results from GAMBL

Details

This function takes a data frame with sample IDs (in the first column) with the sample_table parameter and adds sample-level results from many of the available GAMBL pipelines. Optional parameters are these_samples_metadata and join_with_full_metadata. If join_with_full_metadata is set to TRUE, the function can either work with an already subset metadata table (these_sampels_metadata), or, if not provided, the function will default to all metadata returned with get_gambl_metadata, allowing the user to extend the available information in a metadata table. This function has also been designed so that it can get cached results, meaning that not all individual collate helper functions would have to be run to get results back. To do so, run this function with from_cache = TRUE (default). In addition, it's also possible to regenerate the cached results, this is done by setting write_to_file = TRUE, This operation auto defaults from_cache = FALSE. case_set is an optional parameter available for subsetting the return to an already defined set of cases. Lastly, seq_type_filter lets the user control what seq type results will be returned for. Default is "genome". For more information on how to get the most out of this function, refer to function examples, vignettes and parameter descriptions.

Examples

#get collated results for all capture samples, using cached results
capture_collated_everything = collate_results(seq_type_filter = "capture",
                                              from_cache = TRUE,
                                              write_to_file = FALSE)
#> /projects/nhl_meta_analysis_scratch/gambl/results_local/shared/gambl_capture_results.tsv

#use an already subset metadata table for getting collated results (cached)
my_metadata = get_gambl_metadata()
fl_metadata = dplyr::filter(my_metadata, pathology == "FL")

fl_collated = collate_results(seq_type_filter = "genome",
                              join_with_full_metadata = TRUE,
                              these_samples_metadata = fl_metadata,
                              write_to_file = FALSE,
                              from_cache = TRUE)
#> /projects/nhl_meta_analysis_scratch/gambl/results_local/shared/gambl_genome_results.tsv
#> Joining with `by = join_by(patient_id, sample_id, biopsy_id)`

#get collated results for all genome samples and join with full metadata
everything_collated = collate_results(seq_type_filter = "genome",
                                      from_cache = TRUE,
                                      join_with_full_metadata = TRUE)
#> /projects/nhl_meta_analysis_scratch/gambl/results_local/shared/gambl_genome_results.tsv
#> Joining with `by = join_by(patient_id, sample_id, biopsy_id)`

#another example demonstrating correct usage of the sample_table parameter.
fl_samples = dplyr::select(fl_metadata, sample_id, patient_id, biopsy_id)

fl_collated = collate_results(sample_table = fl_samples,
                              seq_type_filter = "genome",
                              from_cache = TRUE)
#> /projects/nhl_meta_analysis_scratch/gambl/results_local/shared/gambl_genome_results.tsv