Get Manta SVs
get_manta_sv.Rd
Retrieve Manta SVs for one or many samples
Usage
get_manta_sv(
these_samples_metadata = NULL,
projection = "grch37",
region,
min_vaf = 0.1,
min_score = 40,
pass_filters = TRUE,
verbose = TRUE,
from_cache = TRUE,
write_to_file = FALSE,
chromosome,
qstart,
qend,
these_sample_ids = NULL,
pairing_status
)
Arguments
- these_samples_metadata
A metadata data frame to limit the result to sample_ids within it
- projection
The projection genome build. Default is grch37.
- region
Specify a single region to fetch SVs anchored within using the format "chrom:start-end"
- min_vaf
The minimum tumour VAF for a SV to be returned. Default is 0.1.
- min_score
The lowest Manta somatic score for a SV to be returned. Default is 40.
- pass_filters
If TRUE (default) only return SVs that are annotated with PASS in the FILTER column. Set to FALSE to keep all variants, regardless if they PASS the filters.
- verbose
Set to FALSE to minimize the output to console. Default is TRUE. This parameter also dictates the verbose-ness of any helper function internally called inside the main function.
- from_cache
Boolean variable for using cached results, default is TRUE. If
write_to_file = TRUE
, this parameter auto-defaults to FALSE.- write_to_file
Boolean statement that outputs bedpe file if TRUE, default is FALSE. Setting this to TRUE forces
from_cache = FALSE
.- chromosome
DEPRECATED. Use
region
instead.- qstart
DEPRECATED. Use
region
instead.- qend
DEPRECATED. Use
region
instead.- these_sample_ids
DEPRECATED. Use
these_samples_metadata
instead.- pairing_status
DEPRECATED. Subset your metadata and supply these_samples_metadata instead.
Value
A data frame in a bedpe-like format with additional columns that allow filtering of high-confidence SVs.
Details
Retrieve Manta SVs with additional VCF information to allow for
filtering of high-confidence variants.
To get SV calls for multiple samples, supply a metadata table via
these_samples_metadata
that has been subset to only those samples.
The results will be restricted to the sample_ids within that data frame.
This function relies on a set of specific internal functions
get_manta_sv_by_samples (if from_cache = FALSE
).
This function can also restrict the returned breakpoints within a genomic
region specified via region
(in chr:start-end format).
Useful filtering parameters are also available, use min_vaf
to set the
minimum tumour VAF for a SV to be returned and min_score
to set the lowest Manta somatic score for a SV to be returned.
In addition, the user can chose to return all variants, even
the ones not passing the filter criteria. To do so,
set pass_filters = FALSE
(defaults to TRUE).
Advanced settings (probably not for you)
Is it advised to leave the default from_cache
setting to TRUE.
To ensure manta results arre pulled from a pre-generated merge
(i.e. the cached result).
If set to FALSE in combination with write_to_file = TRUE
,
the function will (re)generate new merged manta calls, if the user has
the required file permissions.
Note, that if write_to_file
is set to TRUE, the function defaults
from_cache = FALSE
to avoid nonsense parameter combinations.
Is this function not what you are looking for? You may want:
get_combined_sv
After running this or get_combined_sv, you most likely want to
annotate the result using GAMBLR.utils::annotate_sv
Examples
# lazily get every SV in the table with default quality filters
all_sv <- get_manta_sv()
#> [1] "no metadata provided, fetching all samples..."
#> [1] "dropping capture samples because manta results\n are only available for genome seq_type"
#>
#> The cached results were last updated: 2025-02-24 16:06:43.114603
#>
#> Reading cached results...
#> [1] "No Manta SVs found for 327 samples and 13 cohorts"
#> [1] "DLBCL_LSARP_Trios" "tFL_LSARP_Trios" "pFL_LSARP_Trios"
#> [4] "FL_FOLL_BR" "DLBCL_TFRI_DarkZone" "DLBCL_Pasqualucci"
#> [7] "DLBCL_montreal" "DLBCL_Jain" "DLBCL_cell_lines"
#> [10] "MCL_CellLines" "cHL_Maura" "MM_mmsanger"
#> [13] "SMZL_Strefford"
#>
#> The following VCF filters are applied;
#> Minimum VAF: 0.1
#> Minimum Score: 40
#> Only keep variants passing the quality filter: TRUE
#>
#> Returning 789098 variants from 1664 sample(s)
#>
#> Done!
dplyr::select(all_sv,1:14) %>% head()
#> genomic_data Object
#> Genome Build: grch37
#> Showing first 10 rows:
#> CHROM_A START_A END_A CHROM_B START_B END_B
#> 1 1 10286 10286 8 146301391 146301391
#> 2 1 10309 10837 12 95038 95505
#> 3 1 10347 10630 15 102520227 102520676
#> 4 1 10438 10438 8 146301391 146301391
#> 5 1 10438 10438 8 146301391 146301391
#> 6 1 10457 10839 12 94873 95291
#> manta_name SCORE STRAND_A STRAND_B tumour_sample_id
#> 1 MantaBND:5:1923:1927:0:0:0 46 + + 09-41114T
#> 2 MantaBND:1:6049:6050:1:0:0 52 + + 4687-03-01BD
#> 3 MantaBND:11:3940:4135:0:0:0 58 - - 12-34927T
#> 4 MantaBND:2:7221:7224:0:1:0 84 + + 102-01-01TD
#> 5 MantaBND:2:1723:1728:0:0:0 81 + + 102-0202-1DVT
#> 6 MantaBND:3:26317:26320:0:0:0 56 + + 4690-03-01BD
#> normal_sample_id VAF_tumour DP
#> 1 14-11247N 0.118 110
#> 2 14-11247Normal 0.250 52
#> 3 14-11247N 0.135 104
#> 4 14-11247Normal 0.520 25
#> 5 14-11247Normal 0.630 27
#> 6 14-11247Normal 0.333 18
# get all SVs for just one cohort
cohort_meta = suppressMessages(get_gambl_metadata()) %>%
dplyr::filter(cohort == "DLBCL_cell_lines")
some_sv <- get_manta_sv(these_samples_metadata = cohort_meta, verbose=FALSE)
dplyr::select(some_sv,1:14) %>% head()
#> genomic_data Object
#> Genome Build: grch37
#> Showing first 10 rows:
#> CHROM_A START_A END_A CHROM_B START_B END_B manta_name
#> 1 1 963851 963870 1 964461 964461 MantaDEL:14848:0:0:0:0:0
#> 2 1 1142719 1142719 1 1143140 1143140 MantaDEL:14306:0:0:0:0:0
#> 3 1 1142719 1142719 1 1143140 1143140 MantaDEL:14173:0:0:0:0:0
#> 4 1 1142719 1142719 1 1143140 1143140 MantaDEL:11910:0:0:0:0:0
#> 5 1 1161716 1161716 1 1161780 1161780 MantaDEL:15361:0:0:0:0:0
#> 6 1 1161716 1161716 1 1161780 1161780 MantaDEL:11880:0:0:0:0:0
#> SCORE STRAND_A STRAND_B tumour_sample_id normal_sample_id VAF_tumour DP
#> 1 144 + - Toledo 14-11247N 0.923 26
#> 2 94 + - HBL-1 14-11247N 0.300 100
#> 3 81 + - SU-DHL-4 14-11247N 0.256 78
#> 4 55 + - SU-DHL-9 14-11247N 0.183 60
#> 5 48 + - HT 14-11247N 0.273 44
#> 6 58 + - MD903 14-11247N 0.471 34
nrow(some_sv)
#> [1] 21216
# get the SVs in a region around MYC
# WARNING: This is not the best way to find MYC SVs.
# Use annotate_sv on the full SV set instead.
myc_region_hg38 = "chr8:127710883-127761821"
myc_region_grch37 = "8:128723128-128774067"
hg38_myc_locus_sv <- get_manta_sv(region = myc_region_hg38,
projection = "hg38",
verbose = FALSE)
dplyr::select(hg38_myc_locus_sv,1:14) %>% head()
#> genomic_data Object
#> Genome Build: hg38
#> Showing first 10 rows:
#> CHROM_A START_A END_A CHROM_B START_B END_B
#> 1 chr2 9700440 9700440 chr8 127726024 127726024
#> 2 chr2 28983233 28983240 chr8 127711264 127711271
#> 3 chr2 88858802 88858802 chr8 127744262 127744262
#> 4 chr2 88860304 88860306 chr8 127751936 127751938
#> 5 chr2 88860417 88860417 chr8 127751955 127751955
#> 6 chr2 88861500 88861500 chr8 127748752 127748752
#> manta_name SCORE STRAND_A STRAND_B
#> 1 MantaBND:80035:1:8:0:0:0 103 + +
#> 2 MantaBND:3:52907:52908:0:3:0 43 - -
#> 3 MantaBND:279432:0:1:0:0:0 148 + +
#> 4 MantaBND:194837:0:1:0:0:0:0 102 + +
#> 5 MantaBND:194837:0:1:0:0:0:0 73 - -
#> 6 MantaBND:1102030:0:1:0:0:0 89 + +
#> tumour_sample_id normal_sample_id VAF_tumour DP
#> 1 BLGSP-71-06-00252-01A-01D BLGSP-71-06-00252-10A-01D 0.194 252
#> 2 02-14764_tumorB 02-14764_normal 0.109 55
#> 3 SP59344 SP59342 0.386 88
#> 4 BLGSP-71-27-00414-01A-01E BLGSP-71-27-00414-10A-01D 0.171 280
#> 5 BLGSP-71-27-00414-01A-01E BLGSP-71-27-00414-10A-01D 0.117 230
#> 6 BLGSP-71-30-00647-01A-01E BLGSP-71-06-00286-99A-01D 0.283 46
nrow(hg38_myc_locus_sv)
#> [1] 458
incorrect_myc_locus_sv <- get_manta_sv(region = myc_region_grch37,
projection = "hg38",
verbose = FALSE)
dplyr::select(incorrect_myc_locus_sv,1:14) %>% head()
#> genomic_data Object
#> Genome Build: hg38
#> Showing first 10 rows:
#> CHROM_A START_A END_A CHROM_B START_B END_B
#> 1 chr4 77227094 77227100 chr8 128767241 128767247
#> 2 chr8 1287381 1287381 chr8 1287384 1287384
#> 3 chr8 128726344 128727379 chr11 93629113 93629647
#> 4 chr8 128726820 128726820 chr8 128726825 128726825
#> 5 chr8 128726820 128726820 chr8 128726825 128726825
#> 6 chr8 128738979 128738983 chr8 128752584 128752588
#> manta_name SCORE STRAND_A STRAND_B tumour_sample_id
#> 1 MantaBND:658884:1:2:0:0:0 42 - - 14-33798_tumorB
#> 2 MantaINS:1063533:0:0:0:4:0 51 + - 97-28459_tumorB
#> 3 MantaBND:28037:1:9:0:0:0 66 - + 01-20774T
#> 4 MantaINS:242009:0:0:0:3:0 76 + - PD26403a
#> 5 MantaINS:226876:7:7:1:3:0 84 + - PD26403c
#> 6 MantaDEL:1407936:0:1:0:0:0 118 + - 04-14093_tumorA
#> normal_sample_id VAF_tumour DP
#> 1 14-33798_normal 0.136 44
#> 2 FL3006N 0.308 26
#> 3 14-11247N 0.280 25
#> 4 PD26403b 0.400 105
#> 5 PD26403b 0.407 113
#> 6 04-14093_normal 0.442 43
nrow(incorrect_myc_locus_sv)
#> [1] 28
# Despite potentially being incomplete, we can nonetheless
# annotate these directly for more details
annotated_myc_hg38 = suppressMessages(
GAMBLR.utils::annotate_sv(hg38_myc_locus_sv, genome_build = "hg38")
)
head(annotated_myc_hg38)
#> chrom1 start1 end1 chrom2 start2 end2 name score strand1
#> 1 2 28983233 28983240 8 127711264 127711271 . 43 -
#> 2 4 1746419 1746421 8 127723483 127723485 . 77 -
#> 3 8 127226860 127226862 8 127759782 127759784 . 56 +
#> 4 8 127226860 127226860 8 127759821 127759821 . 51 -
#> 5 8 127301019 127301020 8 127742838 127742839 . 71 -
#> 6 8 127301020 127301022 8 127742838 127742840 . 65 +
#> strand2 tumour_sample_id gene partner fusion
#> 1 - 02-14764_tumorB ALK <NA> NA-ALK
#> 2 - 09-41114T WHSC1 <NA> NA-WHSC1
#> 3 + SP13307 MYC <NA> NA-MYC
#> 4 - SP13307 MYC <NA> NA-MYC
#> 5 - 365-16-01TD MYC <NA> NA-MYC
#> 6 + 365-16-01TD MYC <NA> NA-MYC
table(annotated_myc_hg38$partner)
#>
#> BCL6 CCNL1 DMD IGH IGK IGL LRMP PAX5 RFTN1
#> 3 1 2 293 5 6 1 5 1
# The usual MYC partners are seen here
annotated_myc_incorrect = suppressMessages(
GAMBLR.utils::annotate_sv(incorrect_myc_locus_sv, genome_build = "hg38")
)
head(annotated_myc_incorrect)
#> chrom1 start1 end1 chrom2 start2 end2 name score strand1
#> 1 8 128726344 128727379 11 93629113 93629647 . 66 -
#> 2 8 128726820 128726820 8 128726825 128726825 . 76 +
#> 3 8 128726820 128726820 8 128726825 128726825 . 84 +
#> 4 8 128738979 128738983 8 128752584 128752588 . 118 +
#> 5 8 128738979 128738983 8 128752584 128752588 . 127 +
#> 6 8 128738981 128738981 8 128752584 128752584 . 126 +
#> strand2 tumour_sample_id gene partner fusion
#> 1 + 01-20774T MYC <NA> NA-MYC
#> 2 - PD26403a MYC <NA> NA-MYC
#> 3 - PD26403c MYC <NA> NA-MYC
#> 4 - 04-14093_tumorA MYC <NA> NA-MYC
#> 5 - 04-14093_tumorB MYC <NA> NA-MYC
#> 6 - 05-24065T MYC <NA> NA-MYC
table(annotated_myc_incorrect$partner)
#> < table of extent 0 >
# The effect of specifying the wrong coordinate is evident