Skip to contents

Annotate a MAF with segmented absolute copy number data and added additional columns (VAF, Ploidy and Final_purity).

Usage

estimate_purity(
  these_samples_metadata,
  maf_data,
  seg_data,
  show_plots = FALSE,
  assume_diploid = FALSE,
  coding_only = FALSE,
  projection,
  verbose = FALSE
)

Arguments

these_samples_metadata

Metadata for one sample

maf_data

Optional. Instead of using the path to a maf file, use a local dataframe as the maf file.

seg_data

Data frame or seg_data object for the sample of interest. Can contain data from other samples, which will be ignored.

show_plots

Optional. Show two faceted plots that display the VAF and purity distributions for each copy number state in the sample. Default is FALSE.

assume_diploid

Optional. If no local seg file is provided, instead of defaulting to a GAMBL sample, this parameter annotates every mutation as copy neutral. Default is FALSE.

coding_only

Optional. set to TRUE to restrict to only coding variants. Default is FALSE.

in_maf

Path to a local maf file.

in_seg

Path to a local corresponding seg file for the same sample ID as the input maf.

this_seq_type

Seq type for returned CN segments. One of "genome" (default) or "capture".

verbose.

Set to TRUE for more feedback.

Value

A list containing a data frame (MAF-like format) with the segmented absolute copy number data and three extra columns: VAF is the variant allele frequency calculated from the t_ref_count and t_alt_count Ploidy is the number of copies of an allele in the tumour cell Final_purity is the finalized purity estimation per mutation after considering different copy number states and LOH events.

Details

This function takes a single row of metadata that defines a sample you wish to estimate the purity of. The user can also use an already loaded maf file with maf_df. In addition, a path to the maf/seq file of interest can also be passed to this function with in_maf and in_seg. To visualize VAF and purity distributions, set the show_plots to TRUE (default is FALSE). For more information on how to run this function with the parameters at hand, refer to the parameter descriptions and function examples.

Examples


# get metadata for one sample 
my_meta = suppressMessages(get_gambl_metadata()) %>% 
  dplyr::filter(sample_id == "HTMCP-01-06-00422-01A-01D",
  seq_type == "genome")

#estimate purity, allowing the data to be retrieved for you
outputs = estimate_purity(these = my_meta,
                show_plots = TRUE,
                projection  = "grch37")
#> dummy segments are not annotated in the inputs
#> fill_missing_with parameter will be ignored
#> Running in default mode of any...
outputs$sample_purity_estimation
#> [1] 1