Functions overview
The GAMBLR.data package not only stores data, but also provides some basic functionality to easily retrieve and operate on this data. Almost all of the functions listed below are the “lighter” version of those in the associated package GAMBLR.results, which means that the functions included here do not require you to be Morin Lab member or have access to GSC.
There is a limited number of basic functions available, therefore the functions are not separated into categories below. Nevertheless, these are the basic functions available:
annotate_hotspots
: Annotate maf data with hotspots. Will return the same maf as input with an additional logical column showing whether the particular SSM is in the hotspot region.assign_cn_to_ssm
: Annotate maf data by assigning copy number state to SSM. Will return the same maf as input with an additional column showing the absolute copy number state of the variant in that position.calc_mutation_frequency_bin_region
: Calculate mutation frequency by sliding window. Will return a numeric matrix with rows corresponding to samples and values showing number of mutations in the given window. Operates only on one given region. If you need to analyze many regions, use the plural version of this function.calc_mutation_frequency_bin_regions
: Calculate mutation frequency by sliding window. Will return a numeric matrix with rows corresponding to samples and values showing number of mutations in the given window. This function is really just a wrapper for the singular version but works on many given regions.check_excess_params
: Helper function that ensures arguments given to the functions in this package are appropriate, and will drop any unsupported arguments.collate_results
: This currently adds the QC metrics to the given set of samples in the metadata.get_ashm_count_matrix
: Returns a matrix showing how many mutations are in the given aSHM region for each given sample.get_cn_segments
: Returns copy number data in seg format in a given region.get_coding_ssm
: Returns maf data with only coding mutations.get_coding_ssm_status
: Returns a binary matrix showing whether or not each given sample is mutated in each given gene.get_gambl_metadata
: Returns metadata for the whole collection of samples present in GAMBL.get_manta_sv
: Returns SV calls from manta.get_sample_cn_segments
: Returns the CNV data for a given sample in seg format.get_ssm_by_patients
: Returns maf data (both coding and non-coding mutations) for a given set of patients. This will return maf data for multiple samples if they exist for a given patient.get_ssm_by_regions
: Returns maf data of the variants in the given regions.get_ssm_by_samples
: Returns maf data (both coding and non-coding mutations) for a given set of samples. This will return maf data for only the specified samples even if multiple samples exist for a given patient.id_ease
: Internal function that converts a vector of sample ids into a data frame with metadata and vice versa. Used internally and is not really meant for standalone use.process_regions
: Helper function that will harmonize genomic regions specified as character vectors or a data frame. Returns a list with two objects: regions as a vector and regions in bed format.region_to_chunks
: Helper function that will separate a chromosome region specified in UCSC format (chr:start-end) into individual chunks of chromosome, start, and end.review_hotspots
Review hotspot mutations at certain genes to ensure they are correctly annotated. For example, will mark any missence mutation within KAT domain of CREBBP as hotspot. Only selected genes are supported.