Gysela with Guix

Table of content

Requisites

You need to have Guix available, either locally or remotely. If you're running a Linux environment, it can be installed on your machine according to these instructions.

Where to find the Gysela packages

The Guix HPC channel contains the CPU version of the Gysela package, while the Guix HPC non-free channel contains the CUDA version of the Gysela package.

After activating the required channel(s), you should be able to access the gyselalibxx package entry from the available packages using the following command:

  $ guix show gyselalibxx
  name: gyselalibxx
  version: 0.1-1.a3be632
  outputs:
  + out: everything
  systems: x86_64-linux
  dependencies: eigen@3.4.0 fftw@3.3.10 fftwf@3.3.10 ginkgo@1.7.0
  + googletest@1.12.1 hdf5@1.10.9 kokkos@4.1.00 libyaml@0.2.5 mdspan@0.6.0
  + openblas@0.3.20 openmpi@4.1.6 paraconf@1.0.0 pdi@1.6.0
  + pdiplugin-decl-hdf5-parallel@1.6.0 pdiplugin-mpi@1.6.0
  + pdiplugin-set-value@1.6.0 pkg-config@0.29.2 python-dask@2023.7.0
  + python-h5py@3.8.0 python-matplotlib@3.8.2 python-numpy@1.23.2
  + python-pyyaml@6.0 python-scipy@1.12.0 python-sympy@1.11.1
  + python-xarray@2023.12.0 python@3.10.7
  location: guix-hpc/packages/gysela.scm:42:4
  homepage: https://gyselax.github.io/
  license: FreeBSD
  synopsis: Collection of C++ components for writing gyrokinetic semi-lagrangian codes
  description: Gyselalib++ is a collection of C++ components for writing
  + gyrokinetic semi-lagrangian codes and similar as well as a collection of such
  + codes.

Presentation of the different Gysela packages

There are different variants of the Gysela package. The default package is called gyselalibxx and can only perform CPU calculations without threading.

GPU support for CUDA based architectures require the Guix HPC non-free channel to be activated. CUDA variants are optimised for a specific GPU micro-architecture and are not backward not forward compatible. The following architectures possess a CUDA variant of the Gysela package: K40M, P100, V100 and A100. The corresponding packages are gyselalibxx-cuda-k40, gyselalibxx-cuda-p100, gyselalibxx-cuda-v100 and gyselalibxx-cuda-a100.

A word on guix shell

In this tutorial, we rely on the guix shell subcommand with the option --pure in order to setup a controlled environment from where to launch a simulation. This command unsets various environment variables but this behaviour can be controlled with the --preserve flag (this can be used when modifying LD_PRELOAD, when using CUDA packages for example).

The basic syntax is guix shell --pure package1 package2 package3, where package1, package2 and package3 are the (only) packages which will be accessible from the environment. A specific version of a package can be specified with the @ syntax. guix shell --pure openmpi@4 will drop you in a shell with the latest packaged version of the 4.x openmpi package.

By default, this command will drop you in a shell from where you can manually launch your software or modify your environment. If you wan to run a single command, you can do it like that: guix shel --pure package1 -- my_command, where my_command is a command provided by package1.

More information on guix shell can be found here, especially the --container option which is also of interest when attempting to control an execution environment.

Notes on the SLURM scheduling system

In order to use SLURM, the slurm package should be part of our environment and thus part of the list of packages passed as arguments to the guix shell command: guix shell --pure slurm package1 ....

When using SLURM with Guix, we should ensure that the major version of the SLURM package we have in our environment is the same as the one which is running on the cluster.

From the frontend, we can check the SLURM version with:

  $ squeue --version
  slurm 23.11.1

At the time of writing (2024/03/25), the default SLURM version in Guix is 23.02.6. This can be verified with a command such as:

  $ guix shell slurm -- squeue --version
  slurm 23.02.6

In our example above, the major version is 23 in both cases, so nothing needs to be done. If the SLURM package installed on the cluster was at, say, version 22.x, we would have to add slurm@22 to our list of packages.

Running Gysela on a machine where Guix is available

In this chapter, we'll focus on running a simulation that is compiled as a binary and part of the Gysela package.

The first step will be to generate a configuration file for our simulation by providing the --dump-config flag (see section below).

Running precompiled binaries

From the terminal on the current machine

This command will generate a file named config.yaml containing the configuration needed to run the simulation:

  $ guix shell --pure gyselalibxx -- sheath_xperiod_vx --dump-config config.yaml

The simulation can then be run with the following command:

  $ guix shell --pure gyselalibxx -- sheath_xperiod_vx config.yaml

Using SLURM (interactive mode)

In order to use SLURM, we have to add slurm to our package list.

CPU package

At the time of writing, gyselalibxx doesn't support distributed computations using MPI. The following simulation runs on a single node.

  $ guix shell --pure gyselalibxx slurm -- srun -N 1 --exclusive sheath_xperiod_vx config.yaml
GPU (CUDA) package

As stated above, the Gysela packages with CUDA support are built for a specific micro-architecture platform. The example below uses the variant targeted to the A100 micro-architecture, gyselalibxx-cusa-a100.

Due to libcuda.so being tightly coupled to the kernel driver and its location not being standard, the CUDA variants uses LD_PRELOAD to set the path libcuda.so . One way to setup LD_PRELOAD is to use the env command provided by the coreutils package.

  $ guix shell --pure gyselalibxx-cuda-a100 slurm coreutils -- srun -N 1 -C a100 --exclusive env LD_PRELOAD=/usr/lib64/libcuda.so sheath_xperiod_vx config.yaml

Alternatively, if you don't want to include coreutils in your execution environment, you can set LD_PRELOAD on the command line and preserve it with the --preserve flag:

  $ LD_PRELOAD=/usr/lib64/libcuda.so guix shell --pure --preserve=^LD_PRELOAD gyselalibxx slurm -- srun -N 1  -C a100 --exclusive sheath_xperiod_vx config.yaml
  ERROR: ld.so: object '/usr/lib64/libcuda.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
  ERROR: ld.so: object '/usr/lib64/libcuda.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
  srun: job XXXX queued and waiting for resources
  [...]

Note the error due to the absence of libcuda.so on the frontend machine, which can be safely ignored.

Using OAR

Building your own version of Gysela from source

Transformations of the Guix package

Using your personal source tree

Running Gysela on a machine where Guix is not available

If Guix is not available, it can still be used to generate an execution environment that will be deployed with another tool.

Using Singularity

In these examples, we will target the Jean Zay cluster which supports custom Singularity images and uses SLURM as scheduling system.

The method will consist in three steps:

  • create an image locally using Guix
  • load the image on the cluster
  • run the simulation

Gysela CPU variant

Gysela GPU variant (CUDA)

We first create locally an image compatible with Singularity:

  $ guix pack -f squashfs bash coreutils gyselalibxx-cuda-v100 slurm  -S /bin=bin --entry-point=/bin/bash
  [...]
  /gnu/store/xxxxxxxxxxxxxx-bash-coreutils-gyselalibxx-cuda-v100-squashfs-pack.gz.squashfs

We then copy it into the $WORK folder on the remote machine and make it accessible for singularity:

   # Upload the image to the $WORK folder on Jean Zay with a .sif extension,
   # this is needed for the image to be accessible.
   [local-machine] $ scp /gnu/store/xxxxxxxxx-bash-coreutils-gyselalibxx-cuda-v100-squashfs-pack.gz.squashfs \
                   user@jean-zay:/path/to/work/folder/image.sif
   [...]
   [local-machine] $ ssh user@jean-zay
   # Make the image accessible to the Singularity runtime.
   [jean-zay] $ idrcontmgr cp /path/to/work/folder/image.sif

Finally, we run the simulation from the container, asking for one node with one GPU:

  # Activate the Singularity runtime.
  [jean-zay] $ module load singularity
  # Create the config file from the container. It will reside in $HOME, which
  # is automatically bound to the container's $HOME.
  [jean-zay] $ srun -A user@v100 --ntasks=1 --gres=gpu:1 --cpus-per-task=1 --hint=nomultithread -l singularity exec --nv $SINGULARITY_ALLOWED_DIR/image.sif env LD_PRELOAD=/.singularity.d/libs/libcuda.so sheath_xperiod_vx --dump-config config.yaml
  # Run the simulation. The results of the simulation can be found in the $HOME
  # folder.
  [jean-zay] $ srun -A user@v100 --ntasks=1 --gres=gpu:1 --cpus-per-task=1 --hint=nomultithread -l singularity exec --nv $SINGULARITY_ALLOWED_DIR/image.sif env LD_PRELOAD=/.singularity.d/libs/libcuda.so sheath_xperiod_vx config.yaml

A few notes on the command line options for singularity: according to Jean Zay documentation, the --nv flag is required to access the GPU hardware ; the LD_PRELOAD=/.singularity.d/libs/libcuda.so allows to access the host bound libcuda.so by Jean-Zay administrators.

Using relocatable binaries

When using relocatable binaries, it is highly preferable to use non-interactive (batch) mode with SLURM as various commands are needed to setup the environment on the computation node.

Gysela GPU variant (CUDA)

We first locally create a tarball using guix pack:

  # This command exports the path to the generated tarball in the store.
  # guix pack could be as well called directly and the path manually copied from the standard output.
  $ export RR_TARBALL=$(guix pack -R gyselalibxx-cuda-v100 slurm -S /bin=bin -S /etc=etc -S /lib=lib | tail -n 1)

We then copy it into the $WORK folder on the remote machine, unpack it in a subfolder (this subfolder will contain all the Guix filesystem hierarchy) and setup the environment:

  # Upload the tarball to the $WORK folder on Jean Zay.
  [local-machine] $ scp $RR_TARBALL \
                  user@jean-zay:/path/to/work/folder/tarball.tar.gz
  [...]
  [local-machine] $ ssh user@jean-zay
  # Unpack the tarball in a subfolder of the $WORK directory
  [jean-zay] $ mkdir $WORK/guix && tar xf $WORK/tarball.tar.gz -C $WORK/guix
  # Load the environment from the subfolder.
  [jean-zay] $ export GUIX_PROFILE=$WORK/guix && source $GUIX_PROFILE/etc/profile

Finally, we create a batch file and use it to run the simulation, asking for one node with one GPU:

  # Create the batch file in the $WORK folder.
  [jean-zay] cd $WORK
  [jean-zay] cat > gysela-run.sh <<EOF
  #!/bin/bash
  #SBATCH -A user@v100
  #SBATCH --job-name=gysela-run
  #SBATCH --ntasks=1
  #SBATCH --cpus-per-task=1
  #SBATCH --time=0:10:00
  #SBATCH --gres=gpu:1
  #SBATCH --hint=nomultithread

  # Setup the environment on the node
  export GUIX_PROFILE=$WORK/guix
  source \$GUIX_PROFILE/etc/profile
  export LD_PRELOAD=/usr/lib64/libcuda.so

  # Ensure we are in the \$WORK folder
  cd $WORK
  # Generate the config file
  sheath_xperiod_vx --dump-config config.yaml
  # Launch the simulation
  sheath_xperiod_vx config.yaml
  EOF
  # Run the simulation. The results of the simulation can be found in
  # the current folder ($WORK/experiment).
  [jean-zay] $ sbatch gysela-run.sh

A few notes:

  • In this paragraph, we started our simulation using the SLURM binaries provided by Guix. This is transparently done by prepending the PATH variable when sourcing the $GUIX_PROFILE/etc/profile file. It is also possible to use the SLURM binaries provided by the cluster administrator, in which case the slurm package needs to be removed from the arguments of guix pack.
  • When generating the tarball using guix pack the -S flag is used to setup different symbolic links. While the bin symplink is not needed (the binaries are accessed through the PATH ), the etc and lib symlinks are needed, as they allow access to the etc/profile and the required PDI plugins, respectively.

Spawning a development environment with Guix

As the software has a package definition in Guix, it is straightforward to create a development environment using the -D flag of the guix shell command.

Note that at the time of writing, the Guix package specifies a particular commit of the git repository and does some tranformations (specifically it removes a add_subfolder entry in the CMakeLists.txt file in order to use the libraries provided by Guix).

Manually

We can spawn a Guix shell containing all the dependencies of Gysela in a gyselalibxx git repo:

  # Go to the git repo
  $ cd /path/to/the/gysela/repo
  # Start the shell
  $ guix shell --pure -D gyselalibxx

You can now run cmake from this terminal.

Using direnv

The previous paragraph can be automated using direnv (make sure direnv is installed on your system):

  # Go to the git repo
  $ cd /path/to/the/gysela/repo
  # Create the .envrc file
  $ echo "use guix -D gyselalibxx" > .envrc
  # Activate direnv for this folder
  $ direnv allow .

Generate the completion file

From the development environment shell, it is possible to generate the file compile_commands.json , needed for command completion in the editor, by passing a specific flag to cmake:

  # Go to the git repo
  $ cd /path/to/the/gysela/repo
  # Create a build dir
  $ mkdir -p build
  # prepare the build dir and generate the compile commands
  $ cmake . -B build -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DGYSELALIBXX_DEPENDENCY_POLICIES=INSTALLED
  # Link the compile commands to the root of the repo
  $ ln -s build/compile_commands.json .

Spawning a development environment when Guix is not available

When Guix is not available on your machine, you can generate an image using guix pack and deploy it using a tool available on the machine like Singularity or Docker.

In order to generate such an image with Guix, we first need to generate a manifest file.

  # Generate a manifest file containing the development dependencies of
  # gyselalibxx-cuda-v100 and a couple of extra packages.
  guix shell -D gyselalibxx-cuda-v100 neovim tmux --export-manifest > manifest-gyselalibxx-dev-env.scm

Some notes on the previous command: the -D flag asks for the development dependencies of its package argument and can be specified multiple times. The --export-manifest flag prints the corresponding Scheme code on stdout. The resulting environment will contain neovim, tmux and the development dependencies of gyselalibxx-cuda-v100 (which are the same as any CUDA variant).

With Singularity (Jean Zay)

Singularity needs a squashfs image, which can be either built using the previously generated manifest file or downloaded from Guix HPC build farm.

Downloading the image

The easiest way to get started is by downloading the Guix pack that is built by our continuous integration system. It is a Singularity/Apptainer image and is available from https://guix.bordeaux.inria.fr/search/latest/archive?query=spec:images-x86_64+status:success+gyselalibxx-squashfs.

Once downloaded, make sure to rename to file so that it has a .sif extension. See Deploying the image on Jean-Zay for information on how to use that image.

Generating the image

Using the manifest file generated in the previous section, the following commands build the image and copy it to Jean-Zay:

  # Generate the pack file.
  export PACK_FILE=$(guix pack -f squashfs -S /bin=bin -S /lib=lib --entry-point=/bin/bash -m manifest-gyselalibxx-dev-env.scm | tail -n 1)
  # Copy the image file on Jean-Zay.
  scp $PACK_FILE jean-zay.idris.fr:gysela-dev-env.sif

Deploying the image on Jean-Zay

Using a shell from a Singularity image requires using the --pty flag.

  # On Jean-Zay
  # Move the image to $WORK
  mv ~/gysela-dev-env.sif $WORK
  # Make the image available in Singularity
  idrcontmgr cp $WORK/gysela-dev-env.sif
  # Activate Singularity.
  module load singularity
  # Launch the container. Here we ask for 2 hours, one GPU and 10 CPU
  # cores.
  srun -A user@v100 --time=02:00:00 --ntasks=1 --gres=gpu:1 --cpus-per-task=10 --pty --hint=nomultithread -l singularity shell --bind $WORK:/work --nv $SINGULARITY_ALLOWED_DIR/gysela-dev-env.sif

You can now compile Gysela from within the container.

With Docker (Persée)

With relocatable binaries

  guix pack -RR -S /bin=bin -S /etc=etc -S /lib=lib -S /share=share -S /lib64=lib64 -m manifest-gyselalibxx-dev-env.scm