Annotate and auto-drop a MAF data frame with existing blacklists.

annotate_ssm_blacklist(
  mutations_df,
  seq_type,
  tool_name = "slms_3",
  tool_version = "1.0",
  annotator_name = "vcf2maf",
  annotator_version = "1.2",
  genome_build = "grch37",
  project_base,
  blacklist_file_template,
  drop_threshold = 4,
  return_blacklist = FALSE,
  use_curated_blacklist = FALSE,
  verbose = FALSE,
  invert = FALSE
)

Arguments

mutations_df

A data frame with mutation data.

seq_type

The seq_type of your mutations if you prefer to apply only the corresponding blacklist. More than one seq_type can be specified as a vector if desired. This parameter is required.

tool_name

The tool or pipeline that generated the files (should be the same for all).

tool_version

The version of the tool specified under tool_name.

annotator_name

Name of annotator, default is "vcf2maf".

annotator_version

Version of annotator specified under annotator_name.

genome_build

The genome build projection for the variants you are working with (default is grch37).

project_base

Optional: A full path to the directory that your blacklist_file_pattern is relative to.

blacklist_file_template

Optional: A string that contains the relative path to your blacklist file from after the project_base (i.e. results) with any wildcards surrounded with curly braces.

drop_threshold

The minimum count from one of the blacklists to drop a variant.

return_blacklist

Boolean parameter for returning the blacklist. Default is FALSE.

use_curated_blacklist

Boolean parameter for using a curated blacklist, default is FALSE.

verbose

For debugging, print out a bunch of possibly useful information.

invert

USE WITH CAUTION! This returns only the variants that would be dropped in the process (opposite of what you want, probably).

Value

A MAF format data frame with two new columns indicating the number of occurrences of each variant in the two blacklists.

Details

Annotate and auto-drop a MAF data frame with existing blacklists to remove variants that would be dropped during the merge process. This function returns a MAF format data frame with two new columns, indicating the number of occurrences of each variant in the two blacklists. Note that there are a collection of parameters to this function to improve flexibility for many applications, such as return_blacklist (returns the used blacklist to the vector given the function, or printed to the terminal if blank). For returning variants that would be dropped, one can specify invert = TRUE, please use with caution, this is most likely the opposite of what you want from this function. Lastly, the minimum count from one of the blacklists to drop a variant is specified with drop_threshold = 4. This function also conveniently lets you know how many variants that were dropped in the annotation process.

Examples


#annotate MAF
deblacklisted_maf = annotate_ssm_blacklist(grande_maf,
                                           seq_type = "genome",
                                           genome_build = "hg38")