adjust_ploidy returns a seg file with log.ratios adjusted to the overall sample ploidy.

adjust_ploidy(
  this_seg,
  seg_path,
  projection = "grch37",
  pga,
  pga_cutoff = 0.05,
  exclude_sex = TRUE,
  return_seg = TRUE
)

Arguments

this_seg

Input data frame of seg file.

seg_path

Optionally, specify the path to a local seg file.

projection

Argument specifying the projection of seg file, which will determine chr prefix and genome size. Default is grch37, but hg38 is also accepted.

pga

If PGA is calculated through other sources, the data frame with columns sample_id and PGA can be provided in this argument.

pga_cutoff

Minimum PGA for the sample to adjust ploidy. Default is 0.05 (5%).

exclude_sex

Boolean argument specifying whether to exclude sex chromosomes from calculation. Default is TRUE.

return_seg

Boolean argument specifying whether to return a data frame in seg-consistent format, or a raw data frame with all step-by-step transformations. Default is TRUE.

Value

A data frame in seg-consistent format with ploidy-adjusted log ratios.

Details

This function adjusts the ploidy of the sample using the percent of genome altered (PGA). The PGA is calculated internally, but can also be optionally provided as data frame if calculated from other sources. Only the samples above the threshold-provided PGA will have ploidy adjusted. The function can work with either individual or multi-sample seg file. The telomeres are always excluded from calculation, and sex chromosomes can be optionally included or excluded. The supported projections are grch37 and hg38. The chromosome prefix is handled internally per projection and does not need to be consistent.

Examples

sample_seg = get_sample_cn_segments(this_sample_id = "14-36022T")
sample_seg = dplyr::rename(sample_seg, "sample" = "ID")

adjust_ploidy(this_seg = sample_seg)
#> Calculating PGA ...
#> Returning the seg file with ploidy-adjusted CN ...
#> # A tibble: 187 × 6
#>    sample    chrom     start       end LOH_flag log.ratio
#>    <chr>     <chr>     <dbl>     <dbl>    <dbl>     <dbl>
#>  1 14-36022T 1         10001    762600        0         0
#>  2 14-36022T 1        762601 121500000        0         0
#>  3 14-36022T 1     142600000 248277662        0         0
#>  4 14-36022T 1     248277663 248278622        0         0
#>  5 14-36022T 1     248278623 249226346        0        -1
#>  6 14-36022T 1     249226347 249250620        0         0
#>  7 14-36022T 2         10001     11319        0         0
#>  8 14-36022T 2         11320  90500000        0         0
#>  9 14-36022T 2      96800000 186704965        0         0
#> 10 14-36022T 2     186704966 186712276        0         0
#> # ℹ 177 more rows

one_sample = get_sample_cn_segments(this_sample_id = "14-36022T")
one_sample = dplyr::rename(one_sample, "sample" = "ID")

another_sample = get_sample_cn_segments(this_sample_id = "BLGSP-71-21-00243-01A-11E")
another_sample = dplyr::rename(another_sample, "sample" = "ID")

multi_sample_seg = rbind(one_sample, another_sample)

adjust_ploidy(this_seg = multi_sample_seg)
#> Calculating PGA ...
#> Returning the seg file with ploidy-adjusted CN ...
#> # A tibble: 278 × 6
#>    sample    chrom     start       end LOH_flag log.ratio
#>    <chr>     <chr>     <dbl>     <dbl>    <dbl>     <dbl>
#>  1 14-36022T 1         10001    762600        0         0
#>  2 14-36022T 1        762601 121500000        0         0
#>  3 14-36022T 1     142600000 248277662        0         0
#>  4 14-36022T 1     248277663 248278622        0         0
#>  5 14-36022T 1     248278623 249226346        0        -1
#>  6 14-36022T 1     249226347 249250620        0         0
#>  7 14-36022T 2         10001     11319        0         0
#>  8 14-36022T 2         11320  90500000        0         0
#>  9 14-36022T 2      96800000 186704965        0         0
#> 10 14-36022T 2     186704966 186712276        0         0
#> # ℹ 268 more rows