Get ASHM Count Matrix.
get_ashm_count_matrix.Rd
Prepare a matrix with one row per sample and one column per region using a set of hypermutated regions.
Usage
get_ashm_count_matrix(
regions_bed,
maf_data,
these_samples_metadata,
this_seq_type = "genome",
projection
)
Arguments
- regions_bed
A bed file with one row for each region. The first three columns in this file MUST contain the Chromosome, Start and End position
- maf_data
Optionally provide a data frame in the MAF format, otherwise either GAMBLR.data or GAMBLR.results will be used.
- these_samples_metadata
This is used to complete your matrix. All GAMBL samples of the specified seq_type will be used by default. Provide a data frame with at least sample_id for all samples if you are using non-GAMBL data.
- this_seq_type
The seq_type to return results for. Must be a single value. Only used if no metadata is provided with these_samples_metadata.
- projection
The genome build we are working with
Value
A data frame with a row for every sample in these_samples_metadata and a column for every region in regions_bed
Examples
DLBCL_genome_meta = get_gambl_metadata() %>%
dplyr::filter(pathology=="DLBCL")
#> 3273 capture samples are missing a value for protocol. Assuming Exome.
#> 138 biopsies are missing from the biopsy metadata. This should be fixed!
#> affected cohorts: DLBCL_LSARP_Trios,Ennishi_tapestri,SMZL_Strefford,cHL_Maura,MCL_Barcelona
#> 110 biopsies with discrepancies in the pathology field. This should be fixed!
#> 10 biopsies with discrepancies in the time_point field. This should be fixed!
#get ashm regions
some_regions = GAMBLR.utils::create_bed_data(
GAMBLR.data::grch37_ashm_regions,
fix_names = "concat",
concat_cols = c("gene","region"),
sep="-") %>%
dplyr::filter(grepl("PAX5",name))
pax5_matrix <- get_ashm_count_matrix(
regions_bed = some_regions,
this_seq_type = "genome",
these_samples_metadata = DLBCL_genome_meta
)
#> Streamlined is set to TRUE, this function will disregard anything specified with basic_columns
#> To return a MAF with standard 45 columns, set streamlioned = FALSE and basic_columns = TRUE
#> To return a maf with all (116) columns, set streamlined = FALSE and basic_columns = FALSE
#> Joining with `by = join_by(sample_id, region_name)`
head(pax5_matrix)
#> PAX5-TSS-1 PAX5-distal-enhancer-1 PAX5-distal-enhancer-3
#> 00-14595_tumorC 2 4 3
#> 00-15201_tumorA 0 0 3
#> 00-15201_tumorB 0 0 0
#> 00-17960_CLC01670 0 0 0
#> FL1015T2 0 1 0
#> 00-23442_tumorB 0 0 2
#> PAX5-intron-1 PAX5-distal-enhancer-2
#> 00-14595_tumorC 12 0
#> 00-15201_tumorA 7 0
#> 00-15201_tumorB 1 0
#> 00-17960_CLC01670 11 2
#> FL1015T2 0 0
#> 00-23442_tumorB 0 0
colMeans(pax5_matrix)
#> PAX5-TSS-1 PAX5-distal-enhancer-1 PAX5-distal-enhancer-3
#> 0.4402619 0.5319149 1.2487725
#> PAX5-intron-1 PAX5-distal-enhancer-2
#> 1.9247136 0.6579378