Getting started with the Guix workflow language
guix package -i gwl
In the GWL there are two concepts we need to know about: processes and workflows. We describe a computation (running a program, or evaluating a Scheme expression) using a
process. With a
workflow we describe how multiple processes relate to each other (process
B must run after process
C must run before process
Running processes or workflows can be done programmatically using the
workflow-run functions, or through the command-line by using the
guix process and
guix workflow commands.
To make processes and workflows available to Scheme and to the command-line, we write them as a Guile Scheme module.
Let's start by writing the obligatory “Hello, world!” to familiarize with the components of the workflow.
(define-module (example-workflow) #:use-module (gwl processes) #:use-module (gwl workflows)) (define-public hello-world (process (name "hello-world") (run-time (complexity (space (megabytes 10)) (time 10) ; In seconds (threads 1))) ; 1 thread is the default. (procedure '(format #t "Hello, world!~%"))))
define-module expression we tell GNU Guile interpreter that this is a Scheme module.
define-module statement we've created a symbol
hello-world that contains a
process named ”hello-world” and a Scheme expression to display “Hello, world!” on our screen as the computational procedure.
We also provided an upper-limit constraint on the space and time properties of the process using
run-time. These limits may be enforced by the run-time engine, but it is not required to do so. For example, when running the process with
grid-engine these limits will be enforced by the job scheduler of your grid engine implementation, but when running the same process with
simple-engine these resource limits are not enforced.
But the “hello-world” doesn't justify building yet another workflow language. When approaching the real world a little further, we use the software deployment strengths of GNU Guix by summarizing the deployment of a program using a single Scheme symbol.
(define-module (example-workflow) #:use-module (gwl processes) #:use-module (gwl workflows) #:use-module (gnu packages bioinformatics)) (define-public samtools-index (process (name "samtools-index") (package-inputs (list samtools)) (data-inputs "/tmp/sample.bam") (run-time (complexity (space (megabytes 500)) (time (hours 2)))) (procedure `(system (string-append "samtools index " ,data-inputs)))))
In the module
(gnu packages bioinformatics) we can find the symbol
samtools which will be added to the environment of the
process so that we can be sure this program is available when running the process.
It is important to list all packages required to run the process in the
For the newcomer to Scheme, the comma might seem misplaced. However, notice the backquote (`) before
system? This is the syntax for a quasiquote, and the seemingly misplaced comma is in on the plot. As you might have guessed, the value of the
data-inputs field will be put into the place of
,data-inputs inside the
Now that we have the code of a Guile Scheme module that contains a
process, we are ready to test it. To make sure Guile will find the module, we must name the file after the name we provided in the
define-module expression. In our case, save the file as
my-workflow.scm in an otherwise empty folder.
In a terminal, set the
GUIX_WORKFLOW_PATH environment variable to the folder that contains
my-workflow.scm. For example:
mkdir /tmp/workflows touch /tmp/workflow/my-workflow.scm # Make sure to put the code inside! export GUIX_WORKFLOW_PATH=/tmp/workflows
Now we can list the available processes with the command:
guix process -l
And run a process using:
guix process -r samtools-index
Free Software all the way down
process-engines are a layer between the written Scheme code, and the running scripts. Let's look at the
grid-engine as an example. If we prepare a process using:
guix process -p samtools-index -e grid-engine
The command will provide a new command it would use to schedule a job in the grid engine system:
qsub -N gwl-samtools-index /gnu/store/01iadgakyqw702sal89j3raxwd84fdzr-samtools-index
If we look inside the file specified in the last argument, we find the job script with the grid engine-specific
#$ comments for the memory, time, threads, and the actual code that will run on the compute node once scheduling has been successful.
The first line that will be executed in the script loads the proper environment for the remainder of the script to run:
The code generated from the original Scheme code can be inspected which enables us to debug, verify, and prototype fixes at a lower level than the Scheme code.
The prepare option (
-p switch) provides all the insights required to manually reproduce each step of the compute.
Defining (dynamic) workflows
On the next page, we will use templated processes and combine them in workflows.