Getting started with the Guix workflow language
Installation
This guide assumes GNU Guix and GNU GWL have been installed already. In case the GNU GWL hasn't been installed, run: guix install gwl
Then tell Guix where to find the GWL extension: export GUIX_EXTENSIONS_PATH=$HOME/.guix-profile/share/guix/extensions
Introduction
In the GWL there are two concepts we need to know
about: processes and workflows. We describe a
computation (e.g. running a program) using a process
.
With a workflow
we describe how multiple processes relate
to each other (process B
must run after
process A
, process C
must run before
process A
).
Processes and workflows are composed using a domain specific
language embedded in the general purpose language Scheme. They can be
executed in order with the guix workflow
command.
Example
Let's start by writing the obligatory “Hello, world!” to see what a workflow might look like.
process hello-world
# { echo "Hello, world!" }
This text defines a process
named ”hello-world”
which would run a shell snippet that prints “Hello, world!” to the
screen. Delightful!
Running programs
But the “hello-world” doesn't justify building yet another
workflow language. When approaching the real world a little further,
we use the software deployment strengths and reproducibility
guarantees of GNU Guix by automating the
deployment of a potentially complex software environment using
the packages
field.
process samtools-index
packages "samtools"
inputs "/tmp/sample.bam"
# {
samtools index {{inputs}}
}
workflow do-the-thing
processes samtools-index
The packages
field declares that we want
the samtools
package to be available in the environment
of this process. The package variant is fully determined by the
version of Guix used and is installed automatically when the process
is executed. It is important to list all packages required
to run the process in the packages
field.
We also defined a simple workflow
named do-the-thing
that executes just
the samtools-index
process.
In the next section, we will see how we can combine more processes in a workflow. We will also use process templates to generate processes from a list of input file names.
Defining workflows
A workflow describes how processes
relate to each other. So before we can write the workflow, we must
define some processes. In this example we will create a file with a
process named create-file
, and we will compress that file
using a process named compress-file
.
process create-file
outputs
file "file.txt"
run-time
complexity
space 20 MiB
time 10 seconds
# { echo hello > {{outputs}} }
process compress-file
packages "gzip"
inputs
file "file.txt"
outputs
file "file.txt.gz"
run-time
complexity
space 20 mebibytes
time 2 minutes
# { gzip {{inputs}} -c > {{outputs}} }
With these definitions in place, we can run both in one go by defining a workflow.
workflow file-workflow
processes
auto-connect create-file compress-file
The workflow specifies all processes that should run.
The auto-connect
procedure links up all inputs and outputs of
all specified processes and ensures that the processes are run in the
correct order. Later we will see other ways to specify process
dependencies.
Process templates
We can parameterize the inputs and outputs for a process, so
that the same process template can serve for different inputs and
outputs. Here is a process template that is parameterized
on input
:
process compress-file (with input)
packages "gzip"
inputs input
outputs
string-append input ".gz"
run-time
complexity
space 20 mebibytes
time 10 seconds
# {
gzip {{input}} -c > {{outputs}}
}
Dynamic workflows
We can now dynamically create compression processes by
instantiating the compress-file
template with specific
input file names. We use Scheme's define
and map
to simplify the work for us:
process create-file (with filename)
outputs filename
run-time
complexity
space 20 mebibytes
time 10 seconds
# { echo "Hello, world! This is {{outputs}}." > {{outputs}} }
process compress-file (with input)
packages "gzip"
inputs input
outputs
file input ".gz"
run-time
complexity
space 20 mebibytes
time 10 seconds
# { gzip {{inputs}} -c > {{outputs}} }
;; All inputs files. The leading dot continues the previous line.
define files
list "one.txt"
. "two.txt"
. "three.txt"
;; Map process templates to files to generate a list of processes.
define create-file-processes
map create-file files
define compress-file-processes
map compress-file files
workflow dynamic-workflow
processes
auto-connect compress-file-processes create-file-processes
In the GWL, we can define process dependencies explicitly.
This is useful when processes don't have explicit outputs
or inputs
. Processes can do something other than
producing output files, such as inserting data in a database, so
process dependencies can be specified manually.
Restrictions can be specified as an association list mapping
processes to their dependencies, or via the
convenient graph
syntax.
workflow graph-example
processes
graph
A -> B C
B -> D
C -> B
Extending workflows
In the dynamic-workflow
we created files and
compressed them. In the following workflow we will generate a file
containing some information about these compressed files to learn how
we can extend a workflow at any point in a new workflow.
;; We are going to extend the workflow defined in the file
;; "example-workflow.w".
define dynamic-workflow
load-workflow "example-workflow.w"
process list-file-template (with filename)
name
string-append "list-file-"
basename filename
packages "gzip"
inputs filename
outputs
file filename ".list"
run-time
complexity
space 20 mebibytes
time 30 seconds
# { gzip --list {{inputs}} > {{outputs}} }
;; Get all processes of the other workflow.
define foreign-processes
workflow-processes dynamic-workflow
;; Get the processes that we want to extend on.
define compress-file-processes
processes-filter-by-name foreign-processes "compress-file"
;; Create the new processes.
define list-file-processes
map list-file-template
append-map process-outputs compress-file-processes
workflow extended-dynamic-workflow
processes
append
;; These are the process connections of the imported workflow
workflow-restrictions dynamic-workflow
;; And these are the new process connections. The "zip" procedure
;; pairs up each of the processes in "list-file-processes" with
;; one of the processes in "compress-file-processes".
zip list-file-processes compress-file-processes
With list-file-template
we created a procedure
that returns a process
that generates a file containing
details about the compressed archive. We use this function
in extended-dynamic-workflow
to run after
each compress-file
process.
In the processes
field we include the contents
of dynamic-workflow
, thereby concisely extending it.
Further reading
The GWL manual tries to cover everything you will need to know to write real-world scientific workflows with the GWL.
The GNU Guile and GNU Guix manuals are good places to learn the language and concepts on which GWL builds.