annotate_ssm_blacklist.Rd
Annotate and auto-drop a MAF data frame with existing blacklists.
annotate_ssm_blacklist(
mutations_df,
seq_type,
tool_name = "slms_3",
tool_version = "1.0",
annotator_name = "vcf2maf",
annotator_version = "1.2",
genome_build = "grch37",
project_base,
blacklist_file_template,
drop_threshold = 4,
return_blacklist = FALSE,
use_curated_blacklist = FALSE,
verbose = FALSE,
invert = FALSE
)
A data frame with mutation data.
The seq_type of your mutations if you prefer to apply only the corresponding blacklist. More than one seq_type can be specified as a vector if desired. This parameter is required.
The tool or pipeline that generated the files (should be the same for all).
The version of the tool specified under tool_name
.
Name of annotator, default is "vcf2maf".
Version of annotator specified under annotator_name
.
The genome build projection for the variants you are working with (default is grch37).
Optional: A full path to the directory that your blacklist_file_pattern is relative to.
Optional: A string that contains the relative path to your blacklist file from after the project_base (i.e. results) with any wildcards surrounded with curly braces.
The minimum count from one of the blacklists to drop a variant.
Boolean parameter for returning the blacklist. Default is FALSE.
Boolean parameter for using a curated blacklist, default is FALSE.
For debugging, print out a bunch of possibly useful information.
USE WITH CAUTION! This returns only the variants that would be dropped in the process (opposite of what you want, probably).
A MAF format data frame with two new columns indicating the number of occurrences of each variant in the two blacklists.
Annotate and auto-drop a MAF data frame with existing blacklists to remove variants that would be dropped during the merge process.
This function returns a MAF format data frame with two new columns, indicating the number of occurrences of each variant in the two blacklists.
Note that there are a collection of parameters to this function to improve flexibility for many applications,
such as return_blacklist
(returns the used blacklist to the vector given the function, or printed to the terminal if blank).
For returning variants that would be dropped, one can specify invert = TRUE
, please use with caution, this is most likely the opposite of what you want from this function.
Lastly, the minimum count from one of the blacklists to drop a variant is specified with drop_threshold = 4
.
This function also conveniently lets you know how many variants that were dropped in the annotation process.
#annotate MAF
deblacklisted_maf = annotate_ssm_blacklist(grande_maf,
seq_type = "genome",
genome_build = "hg38")