Get the details including file paths for the anticipated outputs from a pipeline or tool.

find_expected_outputs(
  targ_df,
  tool_name,
  unix_group,
  filename_end_pattern,
  update_db = FALSE,
  target_path
)

Arguments

targ_df

Optionally provide a data frame with all file details.

tool_name

The tool or pipeline that generated the files (should be the same for all). Acceptable values are manta and gridss.

unix_group

The unix group (should be the same for all).

filename_end_pattern

Optionally specify a pattern to search for the files among a longer set of files in the outputs.

update_db

Set to TRUE to overwrite any existing rows in the table for this tool/unix_group combination.

target_path

Path to targets.

Details

This function takes a tool or pipeline with tool_name and the unix group with unix_group and returns information such as paths to individual files. Optionally, the user can provide an already loaded data frame with all the file details (targ_df). for more information and examples, refer to the parameter descriptions as well as function examples.

Examples

#get paths to unmatched manta bedpe files
ex_outs = find_expected_outputs(tool_name = "manta",
                               unix_group = "gambl",
                               filename_end_pattern = "unmatched.somaticSV.bedpe")
#> # A tibble: 963 × 11
#>    unix_group tool_name tool_version seq_type genome_build tumour_sample_id     
#>    <chr>      <chr>     <chr>        <chr>    <chr>        <chr>                
#>  1 gambl      manta     2.3          genome   hg38         BLGSP-71-06-00001-01…
#>  2 gambl      manta     2.3          genome   hg38         BLGSP-71-06-00001-01…
#>  3 gambl      manta     2.3          genome   hg38         BLGSP-71-06-00005-01…
#>  4 gambl      manta     2.3          genome   hg38         BLGSP-71-06-00005-01…
#>  5 gambl      manta     2.3          genome   hg38         BLGSP-71-06-00004-01…
#>  6 gambl      manta     2.3          genome   hg38         BLGSP-71-06-00004-01…
#>  7 gambl      manta     2.3          genome   hg38         BLGSP-71-06-00007-01…
#>  8 gambl      manta     2.3          genome   hg38         BLGSP-71-06-00007-01…
#>  9 gambl      manta     2.3          genome   hg38         BLGSP-71-06-00013-01…
#> 10 gambl      manta     2.3          genome   hg38         BLGSP-71-06-00080-01…
#> # ℹ 953 more rows
#> # ℹ 5 more variables: normal_sample_id <chr>, pairing_status <chr>,
#> #   file_path <chr>, file_timestamp <dttm>, output_type <chr>
#> [1] "/projects/rmorin/projects/gambl-repos/gambl-crushton-canary/targets/manta--gambl"