Next: Useful procedures and macros, Previous: process
Fields, Up: Defining a Process [Contents][Index]
Process templates
When defining many similar processes, it can be useful to parameterize a single process template. This can be accomplished by defining a procedure that takes any number of arguments and returns a parameterized process. Here’s how to do this somewhat verbosely in plain Scheme:
(define (build-me-a-process thing)
"Return a process that displays THING."
(make-process
(name (string-append "show-" thing))
(procedure `(display ,thing))))
;; Now use this procedure to build concrete processes.
(define show-fruit
(build-me-a-process "fruit"))
(define show-kitchen
(build-me-a-process "kitchen"))
(define show-table
(build-me-a-process "table"))
As this is a somewhat common thing to do in real workflows, the GWL provides simplified syntax to express the same concepts with a little less effort:
process build-me-a-process (with thing)
name
string-append "show-" thing
procedure
` display ,thing
define show-fruit
build-me-a-process "fruit"
define show-kitchen
build-me-a-process "kitchen"
define show-table
build-me-a-process "table"
The result is the same: you get a procedure build-me-a-process
that you can use to define a number of similar processes. In the end
you have the three processes show-fruit
, show-kitchen
,
and show-table
.
In a real-life workflow, the above example would not be very
efficient. The GWL generates an executable script for every process,
passing the process properties (such as name
, inputs
,
outputs
, etc) as arguments. It is a good idea to only generate
one script per process template instead of producing one script
per process, as this vastly reduces preparation work that the
GWL has to perform.
The GWL can arrange for scripts to be reused as long as you take care
not to embed arbitrary variables in the process procedure
field. To this end the GWL offers the values
field for
arbitrary value definitions that should be passed to process scripts
as arguments.
Another thing to avoid is to make the process name dependent on template arguments. This prevents script reuse as the GWL is forced to generate scripts that are virtually identical except for their names. Here’s an example with ten processes that all share the same process script:
define LOG_DIR
file "logs"
define SAMPLES
list
. "first-sample"
. "second"
. "third-sample"
. "sample-no4"
. "take-five"
. "666"
. "se7en"
. "who-eight-nine?"
. "NEIN!"
reverse-string "net"
process index-bam (with sample)
inputs
file "mapped-reads" / sample "_Aligned.sortedByCoord.out.bam"
outputs
. bai:
file "mapped-reads" / sample "_Aligned.sortedByCoord.out.bam.bai"
. log:
file LOG_DIR / "samtools_index_" sample ".log"
packages
. "samtools"
. "coreutils"
values
. sample-id: sample
. backwards:
string-reverse
first inputs
# {
mkdir -p {{LOG_DIR}}
echo "The sample identifier is {{values:sample-id}}"
samtools index {{inputs}} {{outputs:bai}} >> {{outputs:log}} 2>&1
echo "By the way, the sample's file name in reverse is {{values:backwards}}."
}
workflow test
processes
map index-bam SAMPLES
Here the value of the variable LOG_DIR
is embedded in the
generated script, but that’s fine because it is independent of the
template argument sample
. While we could have used
sample
directly, we instead defined it as a value in the
values
field and tagged it with the keyword sample-id:
.
For the fun of it we also defined a value with the tag
backwards:
, which is defined in terms of another process field
(inputs
).
References to the fields inputs
, outputs
, name
,
and values
are resolved via arguments passed to the process
script at execution time. They do not interfere with script reuse as
their values are not embedded in the generated script.
Next: Useful procedures and macros, Previous: process
Fields, Up: Defining a Process [Contents][Index]