Assign CN state to SSMs.
assign_cn_to_ssm.Rd
Annotate mutations with their copy number information.
Usage
assign_cn_to_ssm(
these_samples_metadata,
maf_data,
seg_data,
projection,
coding_only = FALSE,
assume_diploid = FALSE,
include_silent = FALSE,
...
)
Arguments
- these_samples_metadata
Metadata table with one or more rows to specify the samples to process.
- maf_data
A data frame of mutations in MAF format or maf_data object (e.g. from
get_coding_ssm
orget_ssm_by_sample
).- seg_data
A data frame of segmented copy number data or seg_data object
- projection
Specified genome projection that returned data is relative to. This is only required when it cannot be inferred from maf_df or seg_df (or they are not provided).
- coding_only
Optional. Set to TRUE to restrict to only variants in coding space Default is to work with genome-wide variants.
- assume_diploid
Optional, this parameter annotates every mutation as copy neutral. Default is FALSE.
- include_silent
Logical parameter indicating whether to include silent mutations in coding space. Default is FALSE. This parameter only makes sense if
coding_only
is set to TRUE.- ...
Any additional parameters.
Value
A list containing a data frame (MAF-like format) with three extra columns: - log.ratio is the log ratio from the seg file (NA when no overlap). - LOH - CN (the rounded absolute copy number estimate of the region based on log.ratio, NA when no overlap was found).
Details
This function takes a metadata table and returns all mutations
for the samples in that metadata. Each mutation is annotated with the
local copy number state of each mutated site. The user can specify if
only coding mutations are of interest. To do so,
set coding_only = TRUE
. When necessary, this function relies on
get_ssm_by_samples
and get_cn_segments
to obtain the required data.
Examples
if (FALSE) { # \dontrun{
# long-handed way (mostly for illustration)
# 1. get some metadata for a collection of samples
some_meta = suppressMessages(get_gambl_metadata()) %>%
dplyr::filter(cohort=="DLBCL_ICGC")
# 2. Get the SSMs for these samples
ssm_genomes_grch37 = get_coding_ssm(projection = "grch37",
these_samples_metadata = some_meta)
# peek at the results
ssm_genomes_grch37 %>% dplyr::select(1:8)
# 3. Lazily let this function obtain the corresponding seg_data
# for the right genome_build
cn_list = assign_cn_to_ssm(some_meta,ssm_genomes_grch37)
cn_list$maf %>% dplyr::select(1:8,log.ratio,CN)
# or using the other genome build:
ssm_genomes_hg38 = get_coding_ssm(projection = "hg38",
these_samples_metadata = some_meta)
cn_list = assign_cn_to_ssm(some_meta,ssm_genomes_hg38)
cn_list$maf %>% dplyr::select(1:8,log.ratio,CN)
} # }
# Easiest/laziest way: Let the function obtain
# the seg_data and maf_data for you
# 1. get some metadata for a collection of samples
some_meta = suppressMessages(get_gambl_metadata()) %>%
dplyr::filter(cohort=="DLBCL_ICGC") %>% head(3)
cn_list = assign_cn_to_ssm(these_samples_metadata = some_meta,
projection = "grch37")
#> dummy segments are not annotated in the inputs
#> fill_missing_with parameter will be ignored
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#> dat <- vroom(...)
#> problems(dat)
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#> dat <- vroom(...)
#> problems(dat)
#> Running in default mode of any...
cn_list$maf %>% dplyr::group_by(Tumor_Sample_Barcode,CN) %>%
dplyr::count()
#> genomic_data Object
#> Genome Build: grch37
#> Showing first 10 rows:
#> Tumor_Sample_Barcode CN n
#> 1 SP124957 1.893249 17
#> 2 SP124957 2.000000 10263
#> 3 SP124957 2.112682 96
#> 4 SP124957 2.126767 59
#> 5 SP124957 2.144877 479
#> 6 SP124957 2.156021 149
#> 7 SP124957 3.000000 52
#> 8 SP124957 3.521206 344
#> 9 SP124957 3.551033 712
#> 10 SP124957 3.563755 145