Maf To Custom Track. — maf_to_custom

Convert mutations into a UCSC custom track file

Usage

maf_to_custom_track(
  maf_data,
  these_samples_metadata = NULL,
  this_seq_type = "genome",
  output_file,
  as_bigbed = FALSE,
  colour_column = "lymphgen",
  as_biglolly = FALSE,
  track_name = "GAMBL mutations",
  track_description = "mutations from GAMBL",
  verbose = FALSE,
  padding_size = 0,
  projection = "grch37",
  bedToBigBed_path = "config",
  these_sample_ids = NULL
)

Arguments

maf_data: maf_data obtained from of the get_ssm family of functions.
these_samples_metadata: A metadata table to subset the samples of interest from the input maf_data. If NULL (the default), all samples in maf_data are kept.
this_seq_type: The seq type you want back, default is "genome".
output_file: Name for your new bed file that can be uploaded as a custom track to UCSC.
as_bigbed: Boolean parameter controlling the format of the returned file. Default is FALSE.
colour_column: Set the colouring properties of the returned bed file. Per default, this function will assign colour based on "lymphgen".
as_biglolly: Boolean parameter controlling the format of the returned file. Default is FALSE (i.e a BED file will be returned).
track_name: Track name. Default is "GAMBL mutations"
track_description: Track description. Default is "mutations from GAMBL"
verbose: Default is FALSE.
padding_size: Optional parameter specifying the padding size in the returned file, default is 0.
projection: Specify which genome build to use. Possible values are "grch37" (default) or "hg38". This parameter has effect only when as_bigbed or as_biglolly is TRUE.
bedToBigBed_path: Path to your local bedToBigBed UCSC tool or the string "config" (default). If set to "config", GAMBLR.helpers::check_config_value is called internally and the bedToBigBed path is obtained from the config.yml file saved in the current working directory. This parameter is ignored if both as_bigbed and as_biglolly is set to FALSE.
these_sample_ids: DEPRECATED

Value

Nothing.

Details

This function takes a set of mutations as maf_data and converts it to a UCSC Genome Browser ready BED (or bigbed/biglolly) file complete with the required header. Upload the resulting file to UCSC genome browser to view your data as a custom track. Optional parameters available for further customization of the returned file. For more information, refer to the parameter descriptions and function examples.

Examples

# using grch37 coordinates
myc_grch37 <- GAMBLR.utils::create_bed_data(
                GAMBLR.data::grch37_lymphoma_genes_bed
              ) %>%
              dplyr::filter(name == "MYC")

print(myc_grch37)
#> genomic_data Object
#> Genome Build: grch37 
#> Showing first 10 rows:
#>   chrom     start       end name
#> 1     8 128747680 128753674  MYC
# desired projection will be automatically set to the
# genome_build of your region object
genome_maf <- get_ssm_by_regions(regions_bed = myc_grch37,
                             these_samples_metadata = get_gambl_metadata(),
                             this_seq_type = "genome",
                             streamlined = FALSE)
#> 3273 capture samples are missing a value for protocol. Assuming Exome.
#> 138 biopsies are missing from the biopsy metadata. This should be fixed!
#> affected cohorts:  DLBCL_LSARP_Trios,Ennishi_tapestri,SMZL_Strefford,cHL_Maura,MCL_Barcelona
#> 110 biopsies with discrepancies in the pathology field. This should be fixed!
#> 10 biopsies with discrepancies in the time_point field. This should be fixed!
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#>   dat <- vroom(...)
#>   problems(dat)

# myc_hg19.bed will be created in your working directory

maf_to_custom_track(maf_data = genome_maf,
                   output_file = "myc_genome_hg19.bed")
#> 3273 capture samples are missing a value for protocol. Assuming Exome.
#> 138 biopsies are missing from the biopsy metadata. This should be fixed!
#> affected cohorts:  DLBCL_LSARP_Trios,Ennishi_tapestri,SMZL_Strefford,cHL_Maura,MCL_Barcelona
#> 110 biopsies with discrepancies in the pathology field. This should be fixed!
#> 10 biopsies with discrepancies in the time_point field. This should be fixed!
#> Joining with `by = join_by(group)`

#lazy/concise way:
my_region = "8:128747680-128753674"

capture_maf <- get_ssm_by_regions(regions_list = my_region,
                             these_samples_metadata = get_gambl_metadata(),
                             this_seq_type = "genome",
                             projection = "grch37",
                             streamlined = FALSE)
#> 3273 capture samples are missing a value for protocol. Assuming Exome.
#> 138 biopsies are missing from the biopsy metadata. This should be fixed!
#> affected cohorts:  DLBCL_LSARP_Trios,Ennishi_tapestri,SMZL_Strefford,cHL_Maura,MCL_Barcelona
#> 110 biopsies with discrepancies in the pathology field. This should be fixed!
#> 10 biopsies with discrepancies in the time_point field. This should be fixed!
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#>   dat <- vroom(...)
#>   problems(dat)
maf_to_custom_track(maf_data = capture_maf,
                   output_file = "myc_capture_hg19.bed")
#> 3273 capture samples are missing a value for protocol. Assuming Exome.
#> 138 biopsies are missing from the biopsy metadata. This should be fixed!
#> affected cohorts:  DLBCL_LSARP_Trios,Ennishi_tapestri,SMZL_Strefford,cHL_Maura,MCL_Barcelona
#> 110 biopsies with discrepancies in the pathology field. This should be fixed!
#> 10 biopsies with discrepancies in the time_point field. This should be fixed!
#> Joining with `by = join_by(group)`