GABMLR - an R package with convenience functions for working with GAMBL results.

If you are viewing this page on Github, consider clicking this link to go to the GAMBLR webpage and learn more about this package.

Installation

If you have access to gphost, the easiest way to obtain and run GAMBLR is to do this via Rstudio on a gphost. If you do not have access to gphost, please refer to the Run Remote On A Local Machine section. Assuming you are running Rstudio on gphost, clone the repo to your home directory (not your GAMBL working directory).

git clone git@github.com:morinlab/GAMBLR.git

In Rstudio (on a gphost), set your working directory to the place you just cloned the repo.

setwd("~/GAMBLR-master")

Install the package in R by running the following command (requires the devtools package)

devtools::install()

Running GAMBLR On Your Own Computer

If you don’t have access to gphost on GSC, no worries, you can still execute GAMBLR functions in another way. Remote support was developed for this purpose. This section explains how to run GAMBLR remote on a local machine (i.e on your own computer). There are two different approaches to get this to work, both with its own advantages and limitations. We will be going over both in this next section.

Approach 1 - Quick Start

This section details how to deploy GAMBLR with limited functionality. This approach requires either a working GSC VPN connection (or is directly accessible if connected to the GSC network).

Setup VPN Connection

  1. You need a working GSC VPN connection to use this approach. For setting up a VPN connection see this guide. Keep in mind that a VPN connection is not needed if your already connected to the GSC network.

Clone Repos, Update Paths, Install and Load R Packages

  1. Clone GAMBL and GAMBLR to your local computer. From your terminal run the following commands (folder structures can be whatever you want…)
mkdir ~/git_repos
cd ~/git_repos #set as working directiory
git clone https://github.com/morinlab/gambl
git clone https://github.com/morinlab/GAMBLR
  1. Update the paths in your local config.yml (GAMBLR-master) to point to the recently cloned, local gambl folder (repo_base). In your favorite text editor, edit the line shown below (under remote). Similarly, you will also need to edit the line above it to point to where you will eventually sync the GAMBL results.
remote:
    project_base: "/path/to/your/local/gambl_results_directory/"
    repo_base: "/path/to/your/local/gambl_repo/"
  1. Set the working directory in Rstudio. Open Rstudio on your local machine and locate the repo you cloned previously.
setwd("~/git_repos/GAMBLR-master")
  1. Install GAMBLR in your local R studio.
devtools::install()
  1. Load packages.
library(GAMBLR)

Set Config To Remote

  1. Execute the following in Rstudio console to make use of the updated paths in the config.yml from step 3.
Sys.setenv(R_CONFIG_ACTIVE = "remote")

Run GAMBLR

  1. Test if setup was successful (e.g call get_gambl_metadata() to retrieve meta data for all gambl samples).
get_gambl_metadata() %>%
  head()

Approach 2 - The Full Instalation (Snakemake)

This section details how to obtain GAMBLR with full functionality, using a dedicated snake file to retrieve all necessary files and dependencies.

Before You Get Started

  1. Make sure you have a working SSH key setup with a pass phrase. If not, follow instructions at GSC Wiki. Warning, this will not work with a pass phrase-less SSH connection. #### Clone Repos and Set Up Environment
  2. Clone GAMBL and GAMBLR.
mkdir ~/git_repos
cd ~/git_repos
git clone https://github.com/morinlab/gambl
git clone https://github.com/morinlab/GAMBLR
  1. On your local machine, make a new directory called gambl_results, for example.
mkdir ~/gambl_results/
  1. Update paths under remote in your local config.yml (GAMBLR) to point to the recently cloned, local gambl folder (repo_base) and recently created gambl_results (project_base) folder. For example:
remote:
    project_base: "~/gambl_results/"
    repo_base: "~/git_repos/gambl-master/"
  1. Copy the following files (from your recently cloned GAMBLR directory) into the folder from the previous step; config.yml and get_gambl_results.smk.
cp ~/git_repos/GAMBLR-master/config.yml ~/gambl_results/
cp ~/git_repos/GAMBLR-master/get_gambl_results.smk ~/gambl_results/
  1. Add ENVVARS bash/zsh environment variables to your bashrc/zsh or some other way that will ensure they’re in your session (e.g. you can set them manually each time if you want, just make sure they are set). For example in your local terminal run the following commands (with updated values…).
export GSC_USERNAME="your_gsc_username"
export GSC_KEY="path_to_SSH_key_with_passphrase_from_step_1"
export GSC_PASSPHRASE="passpharase_from_step_1"

Install GAMBLR In Local Rstudio

  1. Open Rstudio (locally) and set the working directory to the folder you downloaded in step 2 (in the Rstudio console) and install GAMBLR.
setwd("~/git_repos/GAMBLR-master")
  1. Install and load GAMBLR into your local R session.
devtools::install()

Create and Setup Snakemake Environment

  1. In the terminal on your local machine, create a new snakemake environment from the get_gambl_results.yml file. Note that you can name this new environment whatever you would like. In this example, the new environment is called snakemake_gambl.
cd ~/gambl_results
conda env create --name snakemake_gambl --file ~/git_repos/GAMBLR-master/get_gambl_results.yml
  1. Activate this newly created snakemake environment with:
conda activate snakemake_gambl
  1. Retrieve necessary files (download a local copy of all files needed to run a collection of GAMBLR functions). It’s strongly advised to use --cores 1 for this, since it seems to be the more stable option. In addition, if your sync gets interrupted, you only need restart the syncing of 1 file, compared to if you run on multiple cores.
snakemake -s get_gambl_results.smk --cores 1

Use GAMBLR Functions Locally

  1. In Rstudio (local), open test_remote.R in GAMBLR master folder.
  2. Execute the following in Rstudio console to make use of the updated paths in the config.yml from step 5 (line 5 in test_remote.r)
Sys.setenv(R_CONFIG_ACTIVE = "remote")
  1. Check what files (if any) are currently missing.
check_gamblr_config()
  1. You should now be all set to explore a collection of GAMBLR function remotely on your local machine. For example you could try the following test code to ensure your setup was successful. For a set of comprehensive examples and tutorials, please refer to the test_remote.r script.
get_gambl_metadata() %>%
  head()

Note, if your seeing the following message when trying to use GAMBLR, please ensure that the config/gambl repo is set up properly (step 5 and 13) and/or remember to load the remote one (i.e Sys.setenv(R_CONFIG_ACTIVE = "remote")).

get_gambl_metadata(seq_type_filter = "capture") %>%
  pull(cohort) %>%
  table()

Error: '/projects/rmorin/projects/gambl-repos/gambl-rmorin/data/metadata/gambl_all_outcomes.tsv' does not exist.

Contributing

As GAMBL users (GAMBLRs, so to speak) rely on the functionality of this package, the Master branch is protected. All commits must be submitted via pull request on a branch. Please refer to the GAMBL documentation for details on how to do this.

For Developers

When designing new functions, please refer to guid-lines and best practices detailed here. For your convenience, here is an empty function-skeleton that can be recycled when designing new GAMBLR functions. Ensure to always provide the required documentation for any new functions. See this section for more details on best practices for documenting R functions. Unsure what information goes where in a function documentation? Here is a brief outline for what the different sections should include. For more information, see this.

Title

The title is taken from the first sentence. It should be written in sentence case, not end in a full stop, and be followed by a blank line. The title is shown in various function indexes (e.g. help(package = “somepackage”)) and is what the user will usually see when browsing multiple functions.

Description

The description is taken from the next paragraph. It’s shown at the top of documentation and should briefly describe the most important features of the function.

Details

Additional details are anything after the description. Details are optional, but can be any length so are useful if you want to dig deep into some important aspect of the function. Note that, even though the details come right after the description in the introduction, they appear much later in rendered documentation.

Parameters

Detailed parameter descriptions should be included for all functions. Remember to state the required data types, default values, if the parameter is required or optional, etc.

Return

Specify the returned object, is it a data frame, a list, a vector or characters, etc.

Import

Always import all the packages from which you are calling any functions outside of base R and R packages that gets loaded per default. Remember to not import tidyverse, rather, import the individual packages from tidyverse that the function is depending on.

Export

Should this function be exported to NAMESPACE (i.e make it directly accessible for anyone who loads GAMBLR) or is the function considered to be an internal/helper function (i.e don’t export it)?

Examples

Please provide fully reproducible examples for the function. Ideally, the example should demonstrate basic usage, as well as more advanced usage with different parameter combinations. Note that examples can not extend over 100 characters per line, since this will cause the lines to be truncated in the rendered PDF manual.

#' @title
#'
#' @description
#'
#' @details
#'
#' @param a_parameter 
#' @param another_parameter 
#'
#' @return
#' 
#' @import
#' @export
#'
#' @examples
#' #this is an example
#' ###For your reference, this line is exactly 100 characters. Do not exceed 100 characters per line
#'

function_name = function(a_parameter,
                         another_parameter){
                         }

Example Function

For your convenience, as an example, here is a perfectly documented GAMBLR function, following the best practices detailed above.

#' @title ASHM Rainbow Plot
#'
#' @description Make a rainbow plot of all mutations in a region, ordered and coloured by metadata.
#'
#' @details This function creates a rainbow plot for all mutations in a region. Region can either be specified with the `region` parameter,
#' or the user can provide a maf that has already been subset to the region(s) of interest with `mutation_maf`.
#' As a third alternative, the regions can also be specified as a bed file with `bed`.
#' Lastly, this function has a variety of parameters that can be used to further customize the returned plot in many different ways.
#' Refer to the parameter descriptions, examples as well as the vignettes for more demonstrations how this function can be called.
#'
#' @param mutations_maf A data frame containing mutations (MAF format) within a region of interest (i.e. use the get_ssm_by_region).
#' @param metadata should be a data frame with sample_id as a column.
#' @param exclude_classifications Optional argument for excluding specific classifications from a metadeta file.
#' @param drop_unmutated Boolean argument for removing unmutated sample ids in mutated cases.
#' @param classification_column The name of the metadata column to use for ordering and colouring samples.
#' @param bed Optional data frame specifying the regions to annotate (required columns: start, end, name).
#' @param region Genomic region for plotting in bed format.
#' @param custom_colours Provide named vector (or named list of vectors) containing custom annotation colours if you do not want to use standartized pallette.
#' @param hide_ids Boolean argument, if TRUE, ids will be removed.
#'
#' @return ggplot2 object.
#'
#' @import dplyr ggplot2
#' @export
#'
#' @examples
#' #basic usage
#' region = "chr6:90975034-91066134"
#' metadata = get_gambl_metadata()
#' plot = ashm_rainbow_plot(metadata = metadata, region = region)
#'
#' #advanced usages
#' mybed = data.frame(start = c(128806578,
#'                              128805652,
#'                              128748315),
#'                    end = c(128806992,
#'                            128809822,
#'                            128748880),
#'                    name = c("TSS",
#'                             "enhancer",
#'                             "MYC-e1"))
#'
#' ashm_rainbow_plot(mutations_maf = my_mutations,
#'                   metadata = my_metadata,
#'                   bed = mybed)
#'