Skip to contents

Convert mutations into a UCSC custom track file

Usage

maf_to_custom_track(
  maf_data,
  these_samples_metadata = NULL,
  this_seq_type = "genome",
  output_file,
  as_bigbed = FALSE,
  colour_column = "lymphgen",
  as_biglolly = FALSE,
  track_name = "GAMBL mutations",
  track_description = "mutations from GAMBL",
  verbose = FALSE,
  padding_size = 0,
  projection = "grch37",
  bedToBigBed_path = "config",
  these_sample_ids = NULL
)

Arguments

maf_data

maf_data obtained from of the get_ssm family of functions.

these_samples_metadata

A metadata table to subset the samples of interest from the input maf_data. If NULL (the default), all samples in maf_data are kept.

this_seq_type

The seq type you want back, default is "genome".

output_file

Name for your new bed file that can be uploaded as a custom track to UCSC.

as_bigbed

Boolean parameter controlling the format of the returned file. Default is FALSE.

colour_column

Set the colouring properties of the returned bed file. Per default, this function will assign colour based on "lymphgen".

as_biglolly

Boolean parameter controlling the format of the returned file. Default is FALSE (i.e a BED file will be returned).

track_name

Track name. Default is "GAMBL mutations"

track_description

Track description. Default is "mutations from GAMBL"

verbose

Default is FALSE.

padding_size

Optional parameter specifying the padding size in the returned file, default is 0.

projection

Specify which genome build to use. Possible values are "grch37" (default) or "hg38". This parameter has effect only when as_bigbed or as_biglolly is TRUE.

bedToBigBed_path

Path to your local bedToBigBed UCSC tool or the string "config" (default). If set to "config", GAMBLR.helpers::check_config_value is called internally and the bedToBigBed path is obtained from the config.yml file saved in the current working directory. This parameter is ignored if both as_bigbed and as_biglolly is set to FALSE.

these_sample_ids

DEPRECATED

Value

Nothing.

Details

This function takes a set of mutations as maf_data and converts it to a UCSC Genome Browser ready BED (or bigbed/biglolly) file complete with the required header. Upload the resulting file to UCSC genome browser to view your data as a custom track. Optional parameters available for further customization of the returned file. For more information, refer to the parameter descriptions and function examples.

Examples

# using grch37 coordinates
myc_grch37 <- GAMBLR.utils::create_bed_data(
                GAMBLR.data::grch37_lymphoma_genes_bed
              ) %>%
              dplyr::filter(name == "MYC")

print(myc_grch37)
#> genomic_data Object
#> Genome Build: grch37 
#> Showing first 10 rows:
#>   chrom     start       end name
#> 1     8 128747680 128753674  MYC
# desired projection will be automatically set to the
# genome_build of your region object
genome_maf <- get_ssm_by_regions(regions_bed = myc_grch37,
                             these_samples_metadata = get_gambl_metadata(),
                             this_seq_type = "genome",
                             streamlined = FALSE)
#> 3273 capture samples are missing a value for protocol. Assuming Exome.
#> 138 biopsies are missing from the biopsy metadata. This should be fixed!
#> affected cohorts:  DLBCL_LSARP_Trios,Ennishi_tapestri,SMZL_Strefford,cHL_Maura,MCL_Barcelona
#> 110 biopsies with discrepancies in the pathology field. This should be fixed!
#> 10 biopsies with discrepancies in the time_point field. This should be fixed!
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#>   dat <- vroom(...)
#>   problems(dat)

# myc_hg19.bed will be created in your working directory

maf_to_custom_track(maf_data = genome_maf,
                   output_file = "myc_genome_hg19.bed")
#> 3273 capture samples are missing a value for protocol. Assuming Exome.
#> 138 biopsies are missing from the biopsy metadata. This should be fixed!
#> affected cohorts:  DLBCL_LSARP_Trios,Ennishi_tapestri,SMZL_Strefford,cHL_Maura,MCL_Barcelona
#> 110 biopsies with discrepancies in the pathology field. This should be fixed!
#> 10 biopsies with discrepancies in the time_point field. This should be fixed!
#> Joining with `by = join_by(group)`

#lazy/concise way:
my_region = "8:128747680-128753674"

capture_maf <- get_ssm_by_regions(regions_list = my_region,
                             these_samples_metadata = get_gambl_metadata(),
                             this_seq_type = "genome",
                             projection = "grch37",
                             streamlined = FALSE)
#> 3273 capture samples are missing a value for protocol. Assuming Exome.
#> 138 biopsies are missing from the biopsy metadata. This should be fixed!
#> affected cohorts:  DLBCL_LSARP_Trios,Ennishi_tapestri,SMZL_Strefford,cHL_Maura,MCL_Barcelona
#> 110 biopsies with discrepancies in the pathology field. This should be fixed!
#> 10 biopsies with discrepancies in the time_point field. This should be fixed!
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#>   dat <- vroom(...)
#>   problems(dat)
maf_to_custom_track(maf_data = capture_maf,
                   output_file = "myc_capture_hg19.bed")
#> 3273 capture samples are missing a value for protocol. Assuming Exome.
#> 138 biopsies are missing from the biopsy metadata. This should be fixed!
#> affected cohorts:  DLBCL_LSARP_Trios,Ennishi_tapestri,SMZL_Strefford,cHL_Maura,MCL_Barcelona
#> 110 biopsies with discrepancies in the pathology field. This should be fixed!
#> 10 biopsies with discrepancies in the time_point field. This should be fixed!
#> Joining with `by = join_by(group)`