Collate Results
collate_results.Rd
Bring together all derived sample-level results from many GAMBL pipelines.
Usage
collate_results(
sample_table,
write_to_file = FALSE,
join_with_full_metadata = FALSE,
these_samples_metadata,
case_set,
sbs_manipulation = "",
seq_type_filter = "genome",
from_cache = TRUE
)
Arguments
- sample_table
A data frame with sample_id as the first column.
- write_to_file
Boolean statement that outputs tsv file (/projects/nhl_meta_analysis_scratch/gambl/results_local/shared/gambl_seq_type_filter_results.tsv) if TRUE, default is FALSE.
- join_with_full_metadata
Join with all columns of metadata, default is FALSE.
- these_samples_metadata
Optional argument to use a user specified metadata df, overwrites get_gambl_metadata in join_with_full_metadata.
- case_set
Optional short name for a pre-defined set of cases.
- sbs_manipulation
Optional variable for transforming sbs values (e.g log, scale).
- seq_type_filter
Filtering criteria, default is genomes.
- from_cache
Boolean variable for using cached results (/projects/nhl_meta_analysis_scratch/gambl/results_local/shared/gambl_seq_type_filter_results.tsv), default is TRUE. If write_to_file is TRUE, this parameter auto-defaults to FALSE.
Details
This function takes a data frame with sample IDs (in the first column) with the sample_table
parameter and adds sample-level results from many of the available GAMBL pipelines.
Optional parameters are these_samples_metadata
and join_with_full_metadata
. If join_with_full_metadata
is set to TRUE, the function can either work with an already subset metadata
table (these_sampels_metadata
), or, if not provided, the function will default to all metadata returned with get_gambl_metadata
, allowing the user to extend the available information in a metadata table.
This function has also been designed so that it can get cached results, meaning that not all individual collate helper functions would have to be run to get results back.
To do so, run this function with from_cache = TRUE
(default). In addition, it's also possible to regenerate the cached results, this is done by setting write_to_file = TRUE
,
This operation auto defaults from_cache = FALSE
. case_set
is an optional parameter available for subsetting the return to an already defined set of cases.
Lastly, seq_type_filter
lets the user control what seq type results will be returned for. Default is "genome". For more information on how to get the most out of this function,
refer to function examples, vignettes and parameter descriptions.
Examples
if (FALSE) { # \dontrun{
#get collated results for all capture samples, using cached results
capture_collated_everything = collate_results(seq_type_filter = "capture",
from_cache = TRUE,
write_to_file = FALSE)
#use an already subset metadata table for getting collated results (cached)
my_metadata = get_gambl_metadata()
fl_metadata = dplyr::filter(my_metadata, pathology == "FL")
fl_collated = collate_results(seq_type_filter = "genome",
join_with_full_metadata = TRUE,
these_samples_metadata = fl_metadata,
write_to_file = FALSE,
from_cache = TRUE)
#get collated results for all genome samples and join with full metadata
everything_collated = collate_results(seq_type_filter = "genome",
from_cache = TRUE,
join_with_full_metadata = TRUE)
#another example demonstrating correct usage of the sample_table parameter.
fl_samples = dplyr::select(fl_metadata, sample_id, patient_id, biopsy_id)
fl_collated = collate_results(sample_table = fl_samples,
seq_type_filter = "genome",
from_cache = TRUE)
} # }