Get CNV and coding SSM combined status
get_cnv_and_ssm_status.Rd
For each specified chromosome region (gene name), return status 1 if the copy number (CN) state is non-neutral, i.e. different from 2, or if the region contains any coding simple somatic mutation (SSM).
Usage
get_cnv_and_ssm_status(
genes_and_cn_threshs,
these_samples_metadata,
maf_df,
seg_data,
cn_matrix,
only_cnv = "none",
genome_build = "grch37",
include_hotspots = TRUE,
review_hotspots = TRUE,
adjust_for_ploidy = TRUE,
include_silent = FALSE,
this_seq_type,
verbose = FALSE
)
Arguments
- genes_and_cn_threshs
A data frame with columns "gene_id" and "cn_thresh". The "gene_id" column stores gene symbols (characters) which determine the regions to return CNV and/or coding SSM status. The "cn_thresh" column stores integers that mean the maximum or minimum CN states to return status 1 (contains CNV) for its respective gene. If this integer is below 2 (neutral CN state for diploids), it is taken as the maximum (gene consider as tumor suppressor); if above 2, it is the minimum (oncogene); if equal to 2, do not consider CNV to return status.
- these_samples_metadata
The metadata for samples of interest to be included in the returned matrix. Can be created with
get_gambl_metadata
function.- maf_df
Optional data frame containing the coding variants for your samples (i.e. output from
get_all_coding_ssm
)- seg_data
Optionally provide the function with a data frame of segments that will be used instead of the GAMBL flatfiles
- cn_matrix
Instead of seg_data, you can provide a matrix of CN values for the samples in the metadata. See GAMBLR.utils::segmented_data_to_cn_matrix for more information on how to create this matrix.
- only_cnv
A vector of gene names indicating the genes for which only CNV status should be considered, ignoring SSM status. Set this argument to "all" or "none" (default) to apply this behavior to all or none of the genes, respectively.
- genome_build
Reference genome build. Possible values are "grch37" (default) or "hg38".
- include_hotspots
Logical parameter indicating whether hotspots object should also be tabulated. Default is TRUE.
- review_hotspots
Logical parameter indicating whether hotspots object should be reviewed to include functionally relevant mutations or rare lymphoma-related genes. Default is TRUE.
- adjust_for_ploidy
Set to FALSE to disable scaling of CN values by the genome-wide average per sample
- include_silent
Set to TRUE if you want Synonymous mutations to also be considered
- this_seq_type
Deprecated
- verbose.
Set to TRUE for more text
Details
The user can choose from which regions are intended to return only copy number variation (CNV) status,
only coding SSM status, or at least the presence of one of them. This behavior is controlled by the arguments
genes_and_cn_threshs
(column cn_thresh
) and only_cnv
.
This function internally calls the get_cn_states
, get_ssm_by_samples
and get_coding_ssm_status
functions.
Therefore, many of its arguments are assigned to these functions. If needed, see the documentation of these
functions for more information.
In the case of returning NA values, this is due to the get_cn_segments
function not being able to internally
return any copy number segments from the specified chromosome region.
Examples
# Get sample metadata including a mix of seq_type
all_types_meta = suppressMessages(get_gambl_metadata()) %>%
dplyr::filter(pathology == "BL")
dplyr::group_by(all_types_meta, seq_type) %>%
dplyr::summarize(n=dplyr::n())
#> # A tibble: 3 × 2
#> seq_type n
#> <chr> <int>
#> 1 capture 174
#> 2 genome 259
#> 3 mrna 279
# For MYC and SYNCRIP, return CNV and SSM combined status; for MIR17HG,
# return only CNV status; for CCND3 return only SSM status
genes_and_cn_threshs = data.frame(
gene_id=c("MYC", "MIR17HG", "CCND3","ID3","DDX3X", "SYNCRIP"),
cn_thresh=c(3, 3, 2, 2, 2, 1)
)
genome_cnv_ssm_status = suppressMessages(get_cnv_and_ssm_status(
genes_and_cn_threshs,
dplyr::filter(all_types_meta,seq_type=="genome"),
only_cnv = "MIR17HG"))
print(dim(genome_cnv_ssm_status))
#> [1] 259 6
head(genome_cnv_ssm_status)
#> MYC MIR17HG CCND3 ID3 DDX3X SYNCRIP
#> BLGSP-71-06-00001-01A-11D 0 0 1 1 1 0
#> BLGSP-71-06-00002-01C-01D 1 0 0 1 0 0
#> BLGSP-71-06-00004-01A-11D 0 0 1 1 1 0
#> BLGSP-71-06-00005-01A-21D 0 0 1 1 0 0
#> BLGSP-71-06-00007-01A-11D 1 0 1 1 0 0
#> BLGSP-71-06-00008-01A-11D 0 0 0 0 0 0
colSums(genome_cnv_ssm_status)
#> MYC MIR17HG CCND3 ID3 DDX3X SYNCRIP
#> 187 47 76 120 122 14
all_seq_type_status = suppressMessages(get_cnv_and_ssm_status(
genes_and_cn_threshs,
all_types_meta,
only_cnv = "MIR17HG"))
print(dim(all_seq_type_status))
#> [1] 433 6
head(all_seq_type_status)
#> MYC MIR17HG CCND3 ID3 DDX3X SYNCRIP
#> BLGSP-71-06-00001-01A-11D 0 0 1 1 1 0
#> BLGSP-71-06-00002-01C-01D 1 0 0 1 0 0
#> BLGSP-71-06-00004-01A-11D 0 0 1 1 1 0
#> BLGSP-71-06-00005-01A-21D 0 0 1 1 0 0
#> BLGSP-71-06-00007-01A-11D 1 0 1 1 0 0
#> BLGSP-71-06-00008-01A-11D 0 0 0 0 0 0
colSums(all_seq_type_status)
#> MYC MIR17HG CCND3 ID3 DDX3X SYNCRIP
#> 288 68 120 197 192 26