calculate_pga.Rd
calculate_pga
returns a data.frame with estimated proportion of genome altered for each sample.
calculate_pga(
this_seg,
seg_path,
projection = "grch37",
cutoff = 0.56,
exclude_sex = TRUE,
exclude_centromeres = TRUE
)
Input data frame of seg file.
Optionally, specify the path to a local seg file.
Argument specifying the projection of seg file, which will determine chr prefix, chromosome coordinates, and genome size. Default is grch37, but hg38 is also accepted.
The minimum log.ratio for the segment to be considered as CNV. Default is 0.56, which is 1 copy. This value is expected to be a positive float of log.ratio for both deletions and amplifications.
Boolean argument specifying whether to exclude sex chromosomes from calculation. Default is TRUE.
Boolean argument specifying whether to exclude centromeres from calculation. Default is TRUE.
data frame
This function calculates the percent of genome altered (PGA) by CNV. It takes into account the total length of sample's CNV and relates it to the total genome length to return the proportion affected by CNV. The input is expected to be a seg file. The path to a local SEG file can be provided instead. If The custom seg file is provided, the minimum required columns are sample, chrom, start, end, and log.ratio. The function can work with either individual or multi-sample seg files. The telomeres are always excluded from calculation, and centromeres/sex chromosomes can be optionally included or excluded.
sample_seg = get_sample_cn_segments(this_sample_id = "14-36022T")
sample_seg = dplyr::rename(sample_seg, "sample" = "ID")
calculate_pga(this_seg = sample_seg)
#> sample_id PGA
#> 1 14-36022T 0.871
calculate_pga(this_seg = sample_seg,
exclude_sex = FALSE)
#> sample_id PGA
#> 1 14-36022T 0.866
one_sample = get_sample_cn_segments(this_sample_id = "14-36022T")
one_sample = dplyr::rename(one_sample, "sample" = "ID")
another_sample = get_sample_cn_segments(this_sample_id = "BLGSP-71-21-00243-01A-11E")
another_sample = dplyr::rename(another_sample, "sample" = "ID")
multi_sample_seg = rbind(one_sample, another_sample)
calculate_pga(this_seg = multi_sample_seg)
#> sample_id PGA
#> 1 14-36022T 0.871
#> 2 BLGSP-71-21-00243-01A-11E 0.000