If you are viewing this page on Github, consider clicking this link to go to the GAMBLR webpage and learn more about this package.
If you have access to gphost, the easiest way to obtain and run GAMBLR is to do this via Rstudio on a gphost. If you do not have access to gphost, please refer to the Run Remote On A Local Machine section. Assuming you are running Rstudio on gphost, clone the repo to your home directory (not your GAMBL working directory).
git clone git@github.com:morinlab/GAMBLR.git
In Rstudio (on a gphost), set your working directory to the place you just cloned the repo.
setwd("~/GAMBLR-master")
Install the package in R by running the following command (requires the devtools package)
devtools::install()
If you don’t have access to gphost on GSC, no worries, you can still execute GAMBLR functions in another way. Remote support was developed for this purpose. This section explains how to run GAMBLR remote on a local machine (i.e on your own computer). There are two different approaches to get this to work, both with its own advantages and limitations. We will be going over both in this next section.
This section details how to deploy GAMBLR with limited functionality. This approach requires either a working GSC VPN connection (or is directly accessible if connected to the GSC network).
mkdir ~/git_repos
cd ~/git_repos #set as working directiory
git clone https://github.com/morinlab/gambl
git clone https://github.com/morinlab/GAMBLR
remote:
project_base: "/path/to/your/local/gambl_results_directory/"
repo_base: "/path/to/your/local/gambl_repo/"
setwd("~/git_repos/GAMBLR-master")
devtools::install()
library(GAMBLR)
Sys.setenv(R_CONFIG_ACTIVE = "remote")
get_gambl_metadata()
to retrieve meta data for all gambl samples).get_gambl_metadata() %>%
head()
This section details how to obtain GAMBLR with full functionality, using a dedicated snake file to retrieve all necessary files and dependencies.
mkdir ~/git_repos
cd ~/git_repos
git clone https://github.com/morinlab/gambl
git clone https://github.com/morinlab/GAMBLR
mkdir ~/gambl_results/
remote
in your local config.yml (GAMBLR) to point to the recently cloned, local gambl folder (repo_base) and recently created gambl_results (project_base) folder. For example:remote:
project_base: "~/gambl_results/"
repo_base: "~/git_repos/gambl-master/"
config.yml
and get_gambl_results.smk
.cp ~/git_repos/GAMBLR-master/config.yml ~/gambl_results/
cp ~/git_repos/GAMBLR-master/get_gambl_results.smk ~/gambl_results/
export GSC_USERNAME="your_gsc_username"
export GSC_KEY="path_to_SSH_key_with_passphrase_from_step_1"
export GSC_PASSPHRASE="passpharase_from_step_1"
setwd("~/git_repos/GAMBLR-master")
devtools::install()
cd ~/gambl_results
conda env create --name snakemake_gambl --file ~/git_repos/GAMBLR-master/get_gambl_results.yml
conda activate snakemake_gambl
--cores 1
for this, since it seems to be the more stable option. In addition, if your sync gets interrupted, you only need restart the syncing of 1 file, compared to if you run on multiple cores.snakemake -s get_gambl_results.smk --cores 1
Sys.setenv(R_CONFIG_ACTIVE = "remote")
check_gamblr_config()
get_gambl_metadata() %>%
head()
Note, if your seeing the following message when trying to use GAMBLR, please ensure that the config/gambl repo is set up properly (step 5 and 13) and/or remember to load the remote one (i.e Sys.setenv(R_CONFIG_ACTIVE = "remote")
).
get_gambl_metadata(seq_type_filter = "capture") %>%
pull(cohort) %>%
table()
Error: '/projects/rmorin/projects/gambl-repos/gambl-rmorin/data/metadata/gambl_all_outcomes.tsv' does not exist.
As GAMBL users (GAMBLRs, so to speak) rely on the functionality of this package, the Master branch is protected. All commits must be submitted via pull request on a branch. Please refer to the GAMBL documentation for details on how to do this.
When designing new functions, please refer to guid-lines and best practices detailed here. For your convenience, here is an empty function-skeleton that can be recycled when designing new GAMBLR functions. Ensure to always provide the required documentation for any new functions. See this section for more details on best practices for documenting R functions. Unsure what information goes where in a function documentation? Here is a brief outline for what the different sections should include. For more information, see this.
The title is taken from the first sentence. It should be written in sentence case, not end in a full stop, and be followed by a blank line. The title is shown in various function indexes (e.g. help(package = “somepackage”)) and is what the user will usually see when browsing multiple functions.
The description is taken from the next paragraph. It’s shown at the top of documentation and should briefly describe the most important features of the function.
Additional details are anything after the description. Details are optional, but can be any length so are useful if you want to dig deep into some important aspect of the function. Note that, even though the details come right after the description in the introduction, they appear much later in rendered documentation.
Detailed parameter descriptions should be included for all functions. Remember to state the required data types, default values, if the parameter is required or optional, etc.
Always import all the packages from which you are calling any functions outside of base R and R packages that gets loaded per default. Remember to not import tidyverse
, rather, import the individual packages from tidyverse
that the function is depending on.
Should this function be exported to NAMESPACE (i.e make it directly accessible for anyone who loads GAMBLR) or is the function considered to be an internal/helper function (i.e don’t export it)?
Please provide fully reproducible examples for the function. Ideally, the example should demonstrate basic usage, as well as more advanced usage with different parameter combinations. Note that examples can not extend over 100 characters per line, since this will cause the lines to be truncated in the rendered PDF manual.
#' @title
#'
#' @description
#'
#' @details
#'
#' @param a_parameter
#' @param another_parameter
#'
#' @return
#'
#' @import
#' @export
#'
#' @examples
#' #this is an example
#' ###For your reference, this line is exactly 100 characters. Do not exceed 100 characters per line
#'
function_name = function(a_parameter,
another_parameter){
}
For your convenience, as an example, here is a perfectly documented GAMBLR function, following the best practices detailed above.
#' @title ASHM Rainbow Plot
#'
#' @description Make a rainbow plot of all mutations in a region, ordered and coloured by metadata.
#'
#' @details This function creates a rainbow plot for all mutations in a region. Region can either be specified with the `region` parameter,
#' or the user can provide a maf that has already been subset to the region(s) of interest with `mutation_maf`.
#' As a third alternative, the regions can also be specified as a bed file with `bed`.
#' Lastly, this function has a variety of parameters that can be used to further customize the returned plot in many different ways.
#' Refer to the parameter descriptions, examples as well as the vignettes for more demonstrations how this function can be called.
#'
#' @param mutations_maf A data frame containing mutations (MAF format) within a region of interest (i.e. use the get_ssm_by_region).
#' @param metadata should be a data frame with sample_id as a column.
#' @param exclude_classifications Optional argument for excluding specific classifications from a metadeta file.
#' @param drop_unmutated Boolean argument for removing unmutated sample ids in mutated cases.
#' @param classification_column The name of the metadata column to use for ordering and colouring samples.
#' @param bed Optional data frame specifying the regions to annotate (required columns: start, end, name).
#' @param region Genomic region for plotting in bed format.
#' @param custom_colours Provide named vector (or named list of vectors) containing custom annotation colours if you do not want to use standartized pallette.
#' @param hide_ids Boolean argument, if TRUE, ids will be removed.
#'
#' @return ggplot2 object.
#'
#' @import dplyr ggplot2
#' @export
#'
#' @examples
#' #basic usage
#' region = "chr6:90975034-91066134"
#' metadata = get_gambl_metadata()
#' plot = ashm_rainbow_plot(metadata = metadata, region = region)
#'
#' #advanced usages
#' mybed = data.frame(start = c(128806578,
#' 128805652,
#' 128748315),
#' end = c(128806992,
#' 128809822,
#' 128748880),
#' name = c("TSS",
#' "enhancer",
#' "MYC-e1"))
#'
#' ashm_rainbow_plot(mutations_maf = my_mutations,
#' metadata = my_metadata,
#' bed = mybed)
#'