Next: Process templates, Up: Defining a Process [Contents][Index]
process
Fields
Both make-process
and process
accept the same fields,
which we describe below.
name
The readable name of the process as a string. This is used for display purposes and to select processes by name. When the
process
constructor is used, thename
field need not be provided explicitly.version
This field holds an arbitrary version string. This can be used to disambiguate between different implementations of a process when searching by name.
synopsis
A short summary of what this process intends to accomplish.
description
A longer description about the purpose of this process.
packages
This field is used to specify what software packages need to be available when executing the process. Packages can either be Guix package specifications — such as the string
"guile@3.0"
for Guile version 3.0 — or package variable names.By default, package specifications are looked up in the context of the current Guix, i.e. the same version of Guix that you used to invoke
guix workflow
. This is to ensure that you get exactly those packages that you would expect given the Guix channels you have configured.We strongly advise against using package variables from Guix modules. The workflow language uses Guix as a library and is compiled and tested with the version of Guix that is currently available as the
guix
package in(gnu packages package-management)
. The version of this Guix will likely be older than the version of Guix you use to invokeguix workflow
.Package variables are useful for one-off ad-hoc packages that are not contained in any channel and are defined in the workflow file itself. We suggest you use the procedure
lookup-package
from the(gwl packages)
module to look up inputs in the context of the current Guix. To ensure reproducibility, however, we urge you to publish packages in a version-controlled channel. See the Guix reference manual to learn all there is to know about channels.The
packages
field accepts a list of packages as well as multiple values (an “implicit list”). All of the following specifications are valid. A single package:process packages "guile" …
More than one package:
process packages "guile" "python" …
A single list of packages:
process packages list "guile" "python" …
inputs
This field holds inputs to the process. Commonly, this will be a list of file names that the process requires to be present. The GWL can automatically connect processes by matching up their declared inputs and outputs, so that processes generating certain outputs are executed before those that declare the same item as an input.
As with the
packages
field, theinputs
field accepts an “implicit list” of multiple values as well as an explicit list. Additionally, individual inputs can be “tagged” or named by prefixing it with a keyword (see Keywords in GNU Guile Reference Manual). Here’s an example of an implicit list of inputs spread across multiple lines where two inputs have been tagged:process inputs . genome: "hg19.fa" . "cookie-recipes.txt" . samples: "foo.fq" …
The leading period is Wisp syntax to continue the previous line. You can, of course, do without the periods, but this may look a little more cluttered:
process inputs genome: "hg19.fa" "cookie-recipes.txt" samples: "foo.fq" …
Why tag inputs at all? Because you can reference them in other parts of your process definition without having to awkwardly traverse the whole list of inputs. Here is one way to select the first input that was tagged with the
samples:
keyword:pick genome: inputs
To select the second item after the tag
genome:
do this:pick second genome: inputs
or using a numerical zero-based index:
pick 1 genome: inputs
Code Snippets for a convenient way to access named items in code snippets without having to define your picks beforehand.
The procedure
process-inputs
can be used to access the list of inputs of any given process. By default, tags are removed from the list. If you want to include tags (e.g. to select specific inputs withpick
), you can pass the keywordwith-tags
.Here is an example of two processes where the second process refers to the inputs of the first.
process count-reads (with sample) packages . "r-minimal" inputs . bam: file sample "_Aligned.sortedByCoord.out.bam" . bai: file sample "_Aligned.sortedByCoord.out.bam.bai" . script: file "count-reads.R" outputs file sample ".read_counts.csv" # { R {{inputs:script}} {{inputs:bam}} {{inputs:bai}} > {{outputs}} } process genome-coverage (with sample) packages . "r-minimal" inputs define other-inputs process-inputs count-reads sample with-tags: . files: pick bam: others pick bai: others . script: file "genome-coverage.R" outputs files sample / (list ".forward" ".reverse") ".bigwig" # { R {{inputs:script}} {{inputs::files}} > {{outputs}} }
outputs
This field holds a list of outputs that are expected to appear after executing the process. Usually this will be a list of file names. Just like the
inputs
field, this field accepts a plain list, an implicit list of one or more values, and lists with named items.The GWL can automatically connect processes by matching up their declared inputs and outputs, so that processes generating certain outputs are executed before those that declare the same item as an input.
The procedure
process-outputs
can be used to access the list of outputs of any given process. By default, tags are removed from the list. If you want to include tags (e.g. to select specific outputs withpick
), you can pass the keywordwith-tags
.Here is an example of two processes where the second process refers to the outputs of the first.
process one packages . "coreutils" inputs . "input.txt" outputs . log: "first.log" . text: "first.txt" # { tail {{inputs}} > {{outputs:text}} } process two packages . "coreutils" inputs pick text: process-outputs one with-tags: outputs . done: "second.txt" . log: "second.log" # { head {{inputs}} > {{outputs:done}} }
output-path
This is a directory prefix for all outputs.
run-time
This field is used to specify run-time resource estimates, such as the memory requirement of the process or the maximum time it should run. This is especially useful when submitting jobs to an HPC cluster scheduler such as Grid Engine, as these schedulers may give higher priority to jobs that declare a short run time.
Resources are specified as a complexity value with the fields
space
(for memory requirements),time
(for the expected duration of the computation), andthreads
(to control the number of CPU threads). For convenience, memory requirements can be specified with the unitskibibytes
(orKiB
),mebibytes
(orMiB
), orgibibytes
(orGiB
). Supported time units areseconds
,minutes
, andhours
.Here is an example of a single-threaded process that is granted 20 MiB of run-time memory for a duration of 10 seconds:
process stamp-inputs inputs "first" "second" "third" outputs "inputs.txt" run-time complexity space 20 mebibytes time 10 seconds threads 1 # { echo {{inputs}} > {{outputs}} }
When this process is executed by a scheduler that honors resource limits, the process will be granted at most 20 MiB of memory and will be killed if it has not concluded after 10 seconds.
values
This field holds a list with keyword-tagged items that can be used in code snippets. Values defined here are passed to the process script at execution time (rather than preparation time), so this field can be used to avoid embedding literal values in code snippets when generating processes from a template. To learn more about code snippets Code Snippets.
Here is a simple example of a process template with values:
process greet (with name) packages . "hello" . "coreutils" outputs file name ".txt" values . capitalized: string-upcase name # { echo "This is a greeting for {{values:capitalized}}." hello >> {{outputs}} } map greet list "rekado" "civodul" "zimoun"
The generated script from this process does not embed any specific value for
name
or evencapitalized
. Instead it looks up the value forcapitalized
in the arguments passed to the script at execution time. So instead of generating three scripts that only differ in one value (the capitalized name), the GWL will only generate one script and pass it three different values for the three processes.For another example and further discussion of embedding values versus referencing them at execution time Process templates.
procedure
This field holds an expression of code that should be run when the process is executed. This is the “work” that a process should perform. By default that’s a quoted Scheme expression, but code snippets in other languages are also supported (see Code Snippets).
Here’s an example of a process with a procedure that writes a haiku to a file:
process haiku outputs "haiku.txt" synopsis "Write a haiku to a file" description . "This process writes a haiku by Gary Hotham \ to the file \"haiku.txt\"." procedure ` with-output-to-file ,outputs lambda () display "\ the library book overdue? slow falling snow"
The Scheme expression here is quasiquoted (with a leading
`
) to allow for unquoting (with,
) of variables, such asoutputs
.Not always will Scheme be the best choice for a process procedure. Sometimes all you want to do is fire off a few shell commands. While this is, of course, possible to express in Scheme, it is admittedly somewhat verbose. For convenience we offer a simple and surprisingly short syntax for this common use case. As a bonus you can even leave off the field name “procedure” and write your code snippet right there. How? Code Snippets.
Next: Process templates, Up: Defining a Process [Contents][Index]