Get ASHM Count Matrix. — get_ashm_count

Prepare a matrix with one row per sample and one column per region using a set of hypermutated regions.

Usage

get_ashm_count_matrix(
  regions_bed,
  maf_data,
  these_samples_metadata,
  this_seq_type = "genome",
  projection
)

Arguments

regions_bed: A bed file with one row for each region. The first three columns in this file MUST contain the Chromosome, Start and End position
maf_data: Optionally provide a data frame in the MAF format, otherwise either GAMBLR.data or GAMBLR.results will be used.
these_samples_metadata: This is used to complete your matrix. All GAMBL samples of the specified seq_type will be used by default. Provide a data frame with at least sample_id for all samples if you are using non-GAMBL data.
this_seq_type: The seq_type to return results for. Must be a single value. Only used if no metadata is provided with these_samples_metadata.
projection: The genome build we are working with

Value

A data frame with a row for every sample in these_samples_metadata and a column for every region in regions_bed

Details

Values are the number of mutations in that patient in the region.

Examples



  DLBCL_genome_meta = get_gambl_metadata() %>% 
    dplyr::filter(pathology=="DLBCL")
#> 3273 capture samples are missing a value for protocol. Assuming Exome.
#> 138 biopsies are missing from the biopsy metadata. This should be fixed!
#> affected cohorts:  DLBCL_LSARP_Trios,Ennishi_tapestri,SMZL_Strefford,cHL_Maura,MCL_Barcelona
#> 110 biopsies with discrepancies in the pathology field. This should be fixed!
#> 10 biopsies with discrepancies in the time_point field. This should be fixed!
#get ashm regions
some_regions = GAMBLR.utils::create_bed_data(
                              GAMBLR.data::grch37_ashm_regions,
                              fix_names = "concat",
                              concat_cols = c("gene","region"),
                              sep="-") %>%
  dplyr::filter(grepl("PAX5",name))

pax5_matrix <- get_ashm_count_matrix(
     regions_bed = some_regions,
     this_seq_type = "genome",
     these_samples_metadata = DLBCL_genome_meta
)
#> Streamlined is set to TRUE, this function will disregard anything specified with basic_columns
#> To return a MAF with standard 45 columns, set streamlioned = FALSE and basic_columns = TRUE
#> To return a maf with all (116) columns, set streamlined = FALSE and basic_columns = FALSE
#> Joining with `by = join_by(sample_id, region_name)`
head(pax5_matrix)
#>                   PAX5-TSS-1 PAX5-distal-enhancer-1 PAX5-distal-enhancer-3
#> 00-14595_tumorC            2                      4                      3
#> 00-15201_tumorA            0                      0                      3
#> 00-15201_tumorB            0                      0                      0
#> 00-17960_CLC01670          0                      0                      0
#> FL1015T2                   0                      1                      0
#> 00-23442_tumorB            0                      0                      2
#>                   PAX5-intron-1 PAX5-distal-enhancer-2
#> 00-14595_tumorC              12                      0
#> 00-15201_tumorA               7                      0
#> 00-15201_tumorB               1                      0
#> 00-17960_CLC01670            11                      2
#> FL1015T2                      0                      0
#> 00-23442_tumorB               0                      0
colMeans(pax5_matrix)
#>             PAX5-TSS-1 PAX5-distal-enhancer-1 PAX5-distal-enhancer-3 
#>              0.4402619              0.5319149              1.2487725 
#>          PAX5-intron-1 PAX5-distal-enhancer-2 
#>              1.9247136              0.6579378