AVBP with Guix

Table of content

This document describes how to deploy the AVBP software using Guix, whether it is available or not on the target machine.

Requisites

The AVBP packages are available in the Guix-HPC-non-free channel.

See https://gitlab.inria.fr/guix-hpc/guix-hpc-non-free and https://guix.gnu.org/manual/en/html_node/Specifying-Additional-Channels.html for instructions regarding how to configure Guix to use software from this channel.

Running AVBP with Guix locally

AVBP can be installed using the avbp Guix package.

In order to build this package, the following environment variables need to be set:

  • AVBP_GIT_REPO: path to your local clone of the git repository containing the source code of AVBP
  • AVBP_LIBSUP: path to the local folder containing the AVBP license file

The following commands instantiate a containerized environment in which a simulation is run:

  # Go to the folder containing your simulation.
  cd /path/to/simulation
  # Either export the required environment variables...
  export AVBP_GIT_REPO=... AVBP_LIBSUP=...
  # ...and run the guix command
  guix shell --container avbp coreutils openmpi@4 openssh
  # Or set the environment variables on the command line
  AVBP_GIT_REPO=... AVBP_LIBSUP=... guix shell --container avbp coreutils openmpi@4 openssh
  # Run AVBP from the folder containing the run.params file.
  cd RUN && avbp
  # Alternatively, start a parallel simulation using Open MPI
  cd RUN && mpirun -np 12 avbp

Notes:

  • in order to run AVBP from a containerized environment, the coreutils, openmpi@4 and openssh packages have to be explicitly selected (openssh being required by Open MPI).
  • in order to run a simulation, the root directory of the simulation must be accessible. This won't be the case if the containerized shell is started from the RUN subdirectory. An alternative command allowing to directly start a simulation from within the RUN folder could be:

      AVBP_GIT_REPO=... AVBP_LIBSUP=... guix shell --container avbp coreutils openmpi@4 openssh --share=/path/to/simulation -- mpirun -np 12 avbp

Running AVBP on supercomputers

At the time of writing, Guix is not natively available on the national supercomputers.

In order to use AVBP on national supercomputers, Guix provides the guix pack command, which allows to build an archive containing the full software stack required to run AVBP.

This archive can be then deployed and run on the supercomputer.

So far, the techniques that have been tested are:

  • Relocatable binaries on Adastra and Jean-zay (see the Example procedure on Adastra below which can be adapted to Jean-Zay)
  • Singularity on Jean-Zay (see the Example procedure on Jean-Zay below)

Note: the following procedures use SLURM's srun command to start a simulation (both in interactive or batch mode). SLURM's srun command communicates directly with Open MPI using the library selected with the --mpi switch (see Open MPI documentation). When using Open MPI 4.x (the current default version in Guix), this option has to be set to --mpi=pmi2 for proper communication with SLURM.

Example procedure on Adastra (relocatable binaries)

On a machine with Guix installed

The following commands:

  • create an archive that contains the avbp package,
  • copy the archive on the supercomputer
  # On the local machine, create the archive...
  AVBP_GIT_REPO=... AVBP_LIBSUP=... guix pack -R -S /bin=bin -C zstd avbp
  [...]
  /gnu/store/xxxxxxxxxxxxxxx-avbp-tarball-pack.tar.zst
  # ...then copy it to Adastra
  scp /gnu/store/xxxxxxxxxxxxxxx-avbp-tarball-pack.tar.zst user@adastra.cines.fr:/path/to/$CCFRWORK/avbp-pack.tar.zst

On Adastra

The following commands:

  • unpack the archive in the $CCFRWORK directory
  • set the required environment variables
  • start a simulation
  # Uncompress the archive in the $CCFRWORK space.
  cd $CCFRWORK && mkdir avbp-pack && zstd -d avbp-pack.tar.zst && tar xf avbp-pack.tar -C avbp-pack
  # Make sure no external library is loaded from the host machine
  unset LD_LIBRARY_PATH
  # This is needed by Slingshot when starting many MPI processes (hybrid mode gets message queue overflow).
  export FI_CXI_RX_MATCH_MODE=software
  # This is needed to run on a full node (192 cores) due to multiple PML being selected when not set.
  # This PML uses libfabric for Slingshot support.
  export OPMI_MCA_pml=cm
  # Start an interactive job from the folder containing the run.params file
  cd /path/to/simulation/run && srun -A user \
                                     --time=0:20:00 \
                                     --constraint=GENOA \
                                     --nodes=10 \
                                     --ntasks-per-node=192 \
                                     --cpus-per-task=1 \
                                     --threads-per-core=1 \
                                     --mpi=pmi2 \
                                     $CCFRWORK/avbp-pack/bin/avbp

An example sbatch script can be found below:

  #!/bin/bash
  #SBATCH -A user
  #SBATCH --constraint=GENOA
  #SBATCH --time=03:00:00
  #SBATCH --nodes=10
  #SBATCH --ntasks-per-node=192
  #SBATCH --cpus-per-task=1
  #SBATCH --threads-per-core=1

  # Make sure no external library is loaded from the host machine.
  unset LD_LIBRARY_PATH

  cd /path/to/simulation/run

  # Enforce the use of PMI2 to communicate with Open MPI 4, default Open MPI version in Guix.
  srun --mpi=pmi2 $CCFRWORK/avbp-pack/bin/avbp

Caveats

  • Interconnection errors when starting too many MPI processes on Adastra

Exemple usage on Jean-Zay with Singularity

On a machine with Guix installed

The following commands:

  • create an archive that contains the avbp, coreutils and bash packages (the last one being a Singularity requirement),
  • copy the archive on the supercomputer
  # On the local machine, create the archive...
  AVBP_GIT_REPO=... AVBP_LIBSUP=... guix pack -f squashfs -S /bin=bin --entry-point=/bin/bash avbp coreutils bash
  [...]
  /gnu/store/xxxxxxxxxxxxxxx-avbp-coreutils-bash-squashfs-pack.gz.squashfs
  # ...then copy it to Jean-Zay
  scp /gnu/store/xxxxxxxxxxxxxxx-avbp-coreutils-bash-squashfs-pack.gz.squashfs user@jean-zay.idris.fr:/path/to/$WORK/avbp.sif

Note: the coreutils package is required when running AVBP in a containerized environment.

On Jean-Zay

The Singularity image has to be copied to an authorized folder according to Jean-Zay documentation:

  # Make the image available to Singularity
  idrcontmgr cp $WORK/avbp.sif

The following commands starts a simulation in interactive mode:

  # Load the Singularity environment
  module load singularity
  # Clean the environment variable
  unset LD_LIBRARY_PATH
  # Run the simulation on one full node
  srun -A user@cpu \
       --nodes=1 \
       --ntasks-per-node=40 \
       --cpus-per-task=1 \
       --time=01:00:00 \
       --hint=nomultithread \
       --mpi=pmi2 \
       singularity exec \
                   --bind $WORK:/work \
                   $SINGULARITY_ALLOWED_DIR/avbp-bash.sif \
                   bash -c 'cd /work/path/to/simulation/run && avbp'

Below is a sample sbatch script:

  #!/bin/bash
  #SBATCH -A user@cpu
  #SBATCH --job-name=avbp
  #SBATCH --nodes=1
  #SBATCH --ntasks-per-node=40
  #SBATCH --cpus-per-task=1
  #SBATCH --time=01:00:00
  #SBATCH --hint=nomultithread

  module purge
  module load singularity

  unset LD_LIBRARY_PATH
  srun --mpi=pmi2 singularity exec --bind $WORK:/work $SINGULARITY_ALLOWED_DIR/avbp.sif /bin/bash -c 'cd /work/path/to/simulation/run && avbp'

Caveats

  • Singularity doesn't seem to honour the -W flag which sets the workdir. This requires using bash -c with mulitple commands.
  • The $WORK space doesn't seem to be accessible from within the container: the ~--bind $WORK:/work~ option makes it accessible through the /work path.
  • Open MPI parameters need to be tweaked when running on multiple nodes and multiple cores at the same time on Jean-Zay.
  • Open MPI 5.x is not working at the time of writing on Jean-Zay.

Example procedure on Irene with PCOCC

PCOCC can import Docker images generated by Guix.

On a machine with Guix installed

The following commands:

  • create an archive that contains the avbp, coreutils and bash packages,
  • copy the archive on the supercomputer
  # On the local machine, create the archive...
  AVBP_GIT_REPO=... AVBP_LIBSUP=... guix pack -f docker bash coreutils avbp
  # ...then copy it to Irene
  scp /gnu/store/xxxxxxxxxxxxxxx-bash-coreutils-avbp-docker-pack.tar.gz \
      user@irene-fr.ccc.cea.fr:/path/to/$CCFRWORK/avbp.tar.gz

On Irene

The Docker image has to be imported using PCOCC (see TGCC documentation for more details):

  pcocc-rs image import docker-archive:$CCFRWORK/avbp.tar.gz avbp

The following commands start a simulation in interactive mode:

  cd /path/to/RUN
  ccc_mprun -p rome \
            -N 2 \
            -n 256 \
            -c 1 \
            -E '--mpi=pmi2' \
            -m work,scratch \
            -A project_id \
            -T 600 \
            -C avbp -- avbp

General notes related to HPC

  • Open MPI 4.x uses PMI2 to communicate with SLURM. This requires launching AVBP using srun --mpi=pmi2.
  • The LD_LIBRAY_PATH environment variable often gets in the way, causing the execution to fail, hence the unset.

Running a test suite

The avbp-tests package provides a script running a subset of the AVBP test suite with a single MPI process.

In order to build this package, an additional environment variable has to be defined:

  • AVBP_TEST_SUITE: path to the folder containing the AVBP test suite (it can be the local clone of the testcases repository).

The following command builds the package and runs the test suite:

  AVBP_GIT_REPO=... AVBP_LIBSUP=... AVBP_TEST_SUITE=... guix build avbp-tests

It is also possible to build the avbp-tests package without actually running the tests. This is useful if you want to run the tests manually and have a look at the output files. This can be achieved using the --without-tests flag:

  AVBP_GIT_REPO=... AVBP_LIBSUP=... AVBP_TEST_SUITE=... guix build --without-tests=avbp-tests avbp-tests

If you want to run a subset of the standard test cases, simply copy them to some directory on your system, set AVBP_TEST_SUITE to point there and (re)build the avbp-tests package.

AVBP development environment

In order to instantiate a development environment for AVBP, the AVBP_LIBSUP variable has to be set.

On a machine using Guix

The following command instantiates a containerized development environment for AVBP:

  cd /path/to/avbp/source
  AVBP_LIBSUP=... guix shell --container --development avbp --expose=/path/to/avbp/license

Notes:

  • you might want to instantiate a containerized environment from the top level directory of AVBP sources so you can actually perform the build
  • you probably want to expose the path to the AVBP license inside the container, this is done with the --expose flag
  • you might want to add other packages to the development environment, for example grep, coreutils or a text editor, simply add them to the command-line ; see the documentation.

You can also store a list of packages for a development environment in a Manifest file, track it under version control (in your branch/fork of the AVBP source code for example) and use it later:

  guix shell --export-manifest package1 package2 ... --development avbp > avbp-development-environment.scm
  # [...]
  export AVBP_LIBSUP=...
  guix shell --container \
             --manifest=avbp-development-environment.scm \
             --expose=/path/to/avbp/license

Using Singularity

Generate the Singularity image

A development environment can be generated with guix pack by providing a Manifest file (see above):

  AVBP_LIBSUP=... guix pack -f squashfs --entry-point=/bin/bash -m avbp-development-environment.scm
  [...]
  /gnu/store/...-pack.gz.squashfs

Deploy the image (example on Jean-Zay)

The generated image has to be then copied to the remote machine and launched using Singularity.

Below is an example on how to deploy the image on Jean-Zay:

  # On the local machine: copy the image to Jean-Zay.
  scp /gnu/store/...-pack.gz.squashfs jean-zay.idris.fr:/path/to/$WORK/avbp-development-environment.sif
  # On Jean-Zay: copy the image to the authorized directory...
  idrcontmgr cp $WORK/avbp-development-environment.sif
  # ... load the Singularity module ...
  module load singularity
  # ... and launch the container (in this example a full node is allocated).
  srun \
    -A user@cpu \
    --time=02:00:00 \
    --exclusive \
    --node=1 \
    --ntasks-per-node=1 \
    --cpus-per-task=40 \
    --pty \
    --hint=nomultithread \
    singularity shell \
      --bind $WORK:/work \
      $SINGULARITY_ALLOWED_DIR/avbp-development-environment.sif

Notes:

  • The --pty flag sets pseudo terminal mode in order to properly handle interactive shell mode.
  • When not specifying --cups-per-task, only a single core is associated to the shell task.