Calculate proportion of genome altered by CNV.

calculate_pga returns a data.frame with estimated proportion of genome altered for each sample.

calculate_pga(
  this_seg,
  seg_path,
  projection = "grch37",
  cutoff = 0.56,
  exclude_sex = TRUE,
  exclude_centromeres = TRUE
)

Arguments

this_seg: Input data frame of seg file.
seg_path: Optionally, specify the path to a local seg file.
projection: Argument specifying the projection of seg file, which will determine chr prefix, chromosome coordinates, and genome size. Default is grch37, but hg38 is also accepted.
cutoff: The minimum log.ratio for the segment to be considered as CNV. Default is 0.56, which is 1 copy. This value is expected to be a positive float of log.ratio for both deletions and amplifications.
exclude_sex: Boolean argument specifying whether to exclude sex chromosomes from calculation. Default is TRUE.
exclude_centromeres: Boolean argument specifying whether to exclude centromeres from calculation. Default is TRUE.

Value

data frame

Details

This function calculates the percent of genome altered (PGA) by CNV. It takes into account the total length of sample's CNV and relates it to the total genome length to return the proportion affected by CNV. The input is expected to be a seg file. The path to a local SEG file can be provided instead. If The custom seg file is provided, the minimum required columns are sample, chrom, start, end, and log.ratio. The function can work with either individual or multi-sample seg files. The telomeres are always excluded from calculation, and centromeres/sex chromosomes can be optionally included or excluded.

Examples

sample_seg = get_sample_cn_segments(this_sample_id = "14-36022T")
sample_seg = dplyr::rename(sample_seg, "sample" = "ID")

calculate_pga(this_seg = sample_seg)
#>   sample_id   PGA
#> 1 14-36022T 0.871

calculate_pga(this_seg = sample_seg,
              exclude_sex = FALSE)
#>   sample_id   PGA
#> 1 14-36022T 0.866

one_sample = get_sample_cn_segments(this_sample_id = "14-36022T")
one_sample = dplyr::rename(one_sample, "sample" = "ID")

another_sample = get_sample_cn_segments(this_sample_id = "BLGSP-71-21-00243-01A-11E")
another_sample = dplyr::rename(another_sample, "sample" = "ID")

multi_sample_seg = rbind(one_sample, another_sample)

calculate_pga(this_seg = multi_sample_seg)
#>                   sample_id   PGA
#> 1                 14-36022T 0.871
#> 2 BLGSP-71-21-00243-01A-11E 0.000