Subsections of HPC Environment

Binary deployments with Spack

Table of content

Spack (https://spack.io) is a package manager for Linux that makes it easy to deploy scientific software on computers. One of the main differences from apt or dnf is that Spack can be installed at the user level, rather than globally by the system administrator. To install Spack, you only need to clone a GitHub repository, which can then be loaded and used to install packages that are usually installed at the system level, such as GCC, CUDA, and others.

Spack is also a source-based package manager. This means that the primary method of installing packages is to have them built from source by Spack itself. For example, if we want to install CMake, Spack will fetch the source code and build it locally — along with every dependency of CMake. In contrast, a binary-based package manager like APT will download the pre-built CMake from Debian’s servers, which was compiled in their server farm. Source-based package managers have several advantages over binary-based ones, especially in the context of supercomputers:

  • Optimization flags: When building C/C++ packages, you can apply compiler flags to optimize the build for specific CPU microarchitectures. This may make a binary for x86-64 unusable on other x86-64 machines if the compiler adds CPU instructions that are not present on older processors. Binary-based package managers must make a compromise by ensuring their pre-built packages are generic enough for many CPU variants, while sacrificing potential performance.

  • Fine control of version dependencies: Scientific software often relies on specific versions of libraries. A source-based package manager allows you to rebuild all reverse-dependencies of a package when you change its version, while you are very limited when using a binary-based one.

  • Different variants of the same package: Similar to having multiple versions of the same package, packages can be compiled with different sets of features, such as support for CUDA or ROCm, support for specific XML libraries, etc. This can be solved by binary-based package managers by providing multiple package variants that conflict with each other, but it is more cleanly solved by configuring your package variants in Spack.

On the other hand, one of the main issues with source-based package managers, including Spack, is the time required to compile packages. Because the entire dependency chain must be built from source, it can take a considerable amount of time and compute resources to prepare your toolchain. This can be an even worse problem when iterating on different package variants or versions to configure your environment.

To overcome this problem, this post discusses how to bridge the best of both worlds for Spack package management: using a binary cache. By using a binary cache, we can “pre-build” packages for an environment on our cloud platform, so that when we call spack install, it will attempt to use pre-built packages. Spack remains a source-based package manager, and if it doesn’t find a binary for a specific package, it can still build it from source.

By using a binary cache, we can provide users with an easy onboarding experience with Spack while retaining the features that make Spack useful, such as changing package variants, versions, or optimization levels.

Using a binary cache

Before discussing how to set up a binary cache, let’s see how the binary cache is used from a user’s perspective.

Tip

In Spack, we will see the name “mirror” to refer to binary caches.

$ spack mirror --help
usage: spack mirror [-hn] SUBCOMMAND ...

manage mirrors (source and binary)

positional arguments:
  SUBCOMMAND
    create           create a directory to be used as a spack mirror, and fill it with package archives
    destroy          given a url, recursively delete everything under it
    add              add a mirror to Spack
    remove (rm)      remove a mirror by name
    set-url          change the URL of a mirror
    set              configure the connection details of a mirror
    list             print out available mirrors to the console

options:
  -h, --help         show this help message and exit
  -n, --no-checksum  do not use checksums to verify downloaded files (unsafe)

To add a binary cache, the user simply needs to run a command. It is also possible to include the mirror as part of a Spack environment instead:

$ spack mirror add --unsigned inria-mirror oci://registry.gitlab.inria.fr/numpex-pc5/wp3/spack-stack/buildcache-rhel-8

# or in your environment's spack.yaml
spack:
  mirrors:
    inria-mirror:
      url: oci://registry.gitlab.inria.fr/numpex-pc5/wp3/spack-stack/buildcache-rhel-8
      signed: false

Finally, to install packages from the mirror, the user simply calls spack install as usual. As Spack installs packages, it will check if the package is cached in the mirror. If it is, it will download it instead of building from source.

It is important to understand that Spack checks if a package is cached by comparing the package’s spec hash. For example:

$ spack spec -L cmake | grep cmake
 -   gq5bqyhsvdlqs6qqjvs6useiq6kzhakf  cmake@3.31.6~doc+ncurses+ownlibs~qtgui build_system=generic build_type=Release arch=linux-ubuntu24.04-icelake %c,cxx=gcc@13.3.0

Spack will only download a CMake package from the cache if there is one with the same hash. The hash is calculated from the version of CMake, the build flags, the system architecture, and the hashes of all of its dependencies.

Important

This means packages in a binary cache must match the user’s system to be downloadable (Debian 11, RHEL 9, etc).

Creating a binary cache

To populate a binary cache, it is important to identify:

  • Where to store the packages
  • Where to build the packages

Package storage and serving

To serve packages from a binary cache, Spack doesn’t use any custom-made system that requires you to host a solution. Instead, it relies on a Docker Container Registry. To be clear: we don’t do anything related to containers. Spack pushes packages as if they were container layers to a container registry and pulls them transparently.

The benefit of this approach is that there are many cloud providers that offer container registry solutions:

The URL format for the Spack cache would be:

  • For GitHub: oci://ghcr.io/<username>/<repository>/<mirror name>
  • For GitLab (self-hosted): oci://<hostname>/numpex-pc5/<group>/<subgroup>/<repository>/<mirror name>

To be able to push to the cache, you will need a username and password. Since these are the same credentials you would use for the container registry of your choice, there should be documentation available for it. You can configure the credentials for the mirror as follows:

$ spack mirror add \
  --unsigned \
  --oci-username-variable OCI_USERNAME_VARIABLE \
  --oci-password-variable OCI_PASSWORD_VARIABLE \
  name \
  url

We have a quick guide to configure the built-in container registry for GitLab in the following link: Setup a Container Registry on GitLab.

Package building

After we have configured our mirror to push packages to, we need to build the packages themselves. The first option is to build them locally on your PC for debugging purposes. This is useful for getting comfortable with the tools and checking that pushing and pulling work as expected. Packages can be pushed with the following command:

$ spack buildcache push <mirror name> <specs...>

To automate this process, we can use a CI solution like GitHub Actions or GitLab pipelines. The checklist of elements you will need to consider includes:

  • Which packages to build
  • Committing the lockfile
  • Which microarchitecture to target
  • Which operating system to target
  • Configuring the padded_length

A simple Spack environment like the following should be enough to get started:

# spack.yaml
spack:
  view: true
  concretizer:
    unify: true

  # Declare which packages to be cached
  specs:
  - cmake
  - python

  # Set the padding length to a high value.
  config:
    install_tree:
      padded_length: 128

  # Declare the target microarchitecture
  packages:
    all:
      require:
      - x86_64_v2

We mark all packages to target x86_64_v2 (or whichever architecture you want to target) so that Spack doesn’t try to autodetect the architecture of the host system, but rather uses the explicitly set value. Otherwise, you may encounter the following problem:

  • GitHub Actions concretizes the environment to x86_64_v4 because of the CI system
  • You download the environment to an older machine
  • Programs crash due to missing instructions

As for padded_length: when you build a package, the build system will insert references to the absolute paths of other packages it depends on. The problem is that Spack can be installed to any path, so it must perform relocation.

Relocation is a process where a pre-built package gets its references replaced, for example /home/runner/spack/bin/cmake/home/myuser/spack/bin/cmake. When dealing with an actual binary, this string will be embedded at some location:

                   0 1 2 3 4 5 6 7                 0 1 2 3 4 5 6 7
   0 1 2 3 4 5 6 7                 0 1 2 3 4 5 6 7
0  x x x x x x x x / h o m e / r u n n e r / s p a c k / b i n / c
1  m a k e 0 x x x x x x x x x x x x x x x x x x x x x x x x x x x

The problem arises when you want to replace it with a string longer than the original: you cannot assume that there will be free space after the original string. Therefore, the only operation you can safely perform is to shrink a string. By using padded_length, Spack will artificially create paths with a specified number of padding characters, so that if the number is large enough, we can assume that paths will always be shortened. For example, it will install packages to /home/runner/spack/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/bin/cmake instead (I simplified the actual installation paths for clarity).

Regarding the spack.lock file, to enable users to easily download from the cache, I would recommend committing and sharing the spack.lock with users of the cache. The two possibilities are:

  • CI builds packages, but the spack.lock is not preserved: when users concretize the environment on their own, they may or may not get the same versions and hashes that are actually cached, since a spack.yaml doesn’t guarantee reproducibility.

  • CI builds packages and pushes the spack.lock back into the environment: users won’t have to re-concretize the environment, so they will get the hashes for the packages that were built and cached by CI.

To preserve the spack.lock from CI, your caching workflow might look like this:

  1. Set up Spack
  2. Concretize the environment, rewriting the spack.lock if needed
  3. Build and push the environment
  4. Commit and push the spack.lock back to the repository

Finally, when populating a Spack binary cache, it is important to consider the Linux distribution to target. Since Spack doesn’t bootstrap itself from glibc, it actually links its packages to the system’s glibc. This adds an implicit dependency on the system and is reflected in the package spec. For example, if I concretize a package:

$ spack spec -L cmake | grep cmake
 -   gq5bqyhsvdlqs6qqjvs6useiq6kzhakf  cmake@3.31.6~doc+ncurses+ownlibs~qtgui build_system=generic build_type=Release arch=linux-ubuntu24.04-icelake %c,cxx=gcc@13.3.0

This package is concretized for ubuntu24.04. Therefore, if you want to deploy pre-built Spack packages for an OS like Debian 11, you must ensure that you concretize and build the environment on Debian 11 as well.

This can be easily accomplished with a container, and both GitHub and GitLab provide means of running action steps under a container:

# github workflow
jobs:
  main:
    container: debian:11
  #...
# gitlab pipeline
main:
  image: debian:11
  # ...

Need more help?

If you need more help with creating your Spack binary cache to accelerate deployments on HPC centers, please get in contact with us !

Modern HPC Workflow Example (Guix)

Table of content

This guide applies the workflow presented at Modern HPC Workflow with Containers to run an application container built with Guix.

  • This tutorial will focus on using Grid5000 for both building the container with Guix and deploying it with Singularity, as it provides both tools.

  • The container may be built on any computer with Guix installed. You may refer to the documentation if you wish to install Guix on your machine. Beware that if you build it on your local machine, you’ll have to copy it to Grid5000.

  • Additional instructions will be provided for deployment on Jean-Zay, that can be easily adapted to any cluster supporting Singularity and using SLURM as job management system.

  • The application chosen as an example is Chameleon, a dense linear algebra software for heterogeneous architectures that supports MPI and NVIDIA GPUs through CUDA or AMD GPUs through ROCm.

Chameleon on NVIDIA GPUs

Build the container on Grid5000

  1. Login to Grid500 (detailed instructions here). The full list of resources shows where to find an NVIDIA GPU and an x86_64 CPU (for Singularity). For instance, the chifflot queue, located in Lille, contains nodes with NVIDIA P100 GPUs.
ssh lille.g5k
mkdir tuto && cd tuto
  1. Get the channels file. The chameleon-cuda package (the chameleon package variant with CUDA support) is defined in the Guix-HPC non-free channel, which is not activated by default.
wget https://numpex-pc5.gitlabpages.inria.fr/tutorials/hpc-env/workflow-example/channels.scm

The channels.scm file contains the following:

(list
  (channel
    (name 'guix-hpc)
    (url "https://gitlab.inria.fr/guix-hpc/guix-hpc.git")
    (branch "master")
    (commit
      "ae4a812197a7e565c22f763a1f09684257e79723"))
  (channel
    (name 'guix-hpc-non-free)
    (url "https://gitlab.inria.fr/guix-hpc/guix-hpc-non-free.git")
    (branch "master")
    (commit
      "c546e8121b4d8a715dcb6cd743680dd7eee30b0e"))
  (channel
    (name 'guix)
    (url "https://git.guix.gnu.org/guix.git")
    (branch "master")
    (commit
      "97fb1887ad10000c067168176c504274e29e4430")
    (introduction
      (make-channel-introduction
        "9edb3f66fd807b096b48283debdcddccfea34bad"
        (openpgp-fingerprint
          "BBB0 2DDF 2CEA F6A8 0D1D  E643 A2A0 6DF2 A33A 54FA"))))
  (channel
    (name 'guix-science-nonfree)
    (url "https://codeberg.org/guix-science/guix-science-nonfree.git")
    (branch "master")
    (commit
      "de7cda4027d619bddcecf6dd68be23b040547bc7")
    (introduction
      (make-channel-introduction
        "58661b110325fd5d9b40e6f0177cc486a615817e"
        (openpgp-fingerprint
          "CA4F 8CF4 37D7 478F DA05  5FD4 4213 7701 1A37 8446"))))
  (channel
    (name 'guix-science)
    (url "https://codeberg.org/guix-science/guix-science.git")
    (branch "master")
    (commit
      "4ace0bab259e91bf0ec9d37e56c2676b1517fccd")
    (introduction
      (make-channel-introduction
        "b1fe5aaff3ab48e798a4cce02f0212bc91f423dc"
        (openpgp-fingerprint
          "CA4F 8CF4 37D7 478F DA05  5FD4 4213 7701 1A37 8446"))))
  (channel
    (name 'guix-past)
    (url "https://codeberg.org/guix-science/guix-past.git")
    (branch "master")
    (commit
      "70fc56e752ef6d5ff6e1e1a0997fa72e04337b24")
    (introduction
      (make-channel-introduction
        "0c119db2ea86a389769f4d2b9c6f5c41c027e336"
        (openpgp-fingerprint
          "3CE4 6455 8A84 FDC6 9DB4  0CFB 090B 1199 3D9A EBB5")))))
  1. Generate the Singularity container image with the guix pack command, prefixed with guix time-machine in order to use our channels.scm file. The -r option creates a symbolic link to the resulting container image in the Guix store, as chameleon.sif.
guix time-machine -C channels.scm -- pack -f squashfs chameleon-cuda bash -r ./chameleon.sif
Tip

guix pack can generate different formats, like Singularity (squashfs), Docker or relocatable binaries.

Tip

Singularity needs bash to be in the package list.

Deploy the container on Grid5000

  1. Start an interactive job using OAR.
oarsub -I -p chifflot -l host=2
export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libcuda.so
export OPENBLAS_NUM_THREADS=1
mpirun -machinefile $OAR_NODEFILE \
       --bind-to board \
       singularity exec\
                   --bind /usr/lib/x86_64-linux-gnu/:/usr/lib/x86_64-linux-gnu/ \
                   --bind /tmp:/tmp chameleon-cuda.sif \
                   chameleon_stesting -o gemm -n 4000 -b 160 --nowarmup -g 2
Tip

CUDA applications deployed with Guix need LD_PRELOAD to be set with the path to libcuda.so since the library is provided by the proprietary CUDA driver, installed on the machine, and not part of the Guix software stack.

Tip

The OPENBLAS_NUM_THREADS environment variable is set to improve the computation performance and not compulsory.

Deploy the container on Jean-Zay

  1. Copy the image to Jean-Zay. Depending on your SSH setup, you might have to adapt the commands below.
# Disconnect from Grid5000.
exit

# Copy the image from Grid5000 to Jean-Zay
scp lille.g5k:tuto/chameleon.sif jean-zay:chameleon.sif
  1. Setup the container image on Jean-Zay. First, the image has to be copied to the allowed space ($SINGULARITY_ALLOWED_DIR) in order to be accessible to Singularity. This step is specific to Jean-Zay, more details in the documentation. Then the singularity module needs to be loaded (this step is not always necessary, depending on the supercomputer, but is not specific to Jean-Zay).
ssh jean-zay
idrcontmgr cp chameleon.sif
module load singularity
  1. Start the job using the container with SLURM.
OPENBLAS_NUM_THREADS=1 \
  srun -A account@v100 \
       --time=0:10:00 \
       --nodes=2 \
       --cpu-bind=socket \
       --exclusive \
       --hint=nomultithread \
       --gres=gpu:4 \
       --mpi=pmi2 \
       singularity \
         exec --nv $SINGULARITY_ALLOWED_DIR/chameleon.sif \
         bash -c "LD_PRELOAD=/.singularity.d/libs/libcuda.so chameleon_stesting -o gemm -n 40000 -b 2000 --nowarmup -g 4"
Tip

Environment variables are propagated to the Singularity container context, but since the path to libcuda.so doesn’t exist outside of the container context (the path in bind-mounted by Singularity due to the --nv flag) it leads to an error when LD_PRELOAD is declared outside of the container context.

Deploy the container on Vega (EuroHPC)

  1. Copy the image to Vega. Depending on your SSH setup, you might have to adapt the commands below.
# Copy the image from Grid5000 to Vega
scp lille.g5k:tuto/chameleon.sif vega:chameleon.sif
  1. Start the job using the container with SLURM.
srun --exclusive \
     --partition=gpu \
     --gres=gpu:4 \
     -N 2 \
     --mpi=pmi2 singularity exec --bind /tmp:/tmp chameleon-cuda.sif \
     bash -c "LD_PRELOAD=/.singularity.d/libs/libcuda.so chameleon_stesting -o gemm -n 96000 -b 2000 --nowarmup -g 4"

Deploy the container on MeluXina (EuroHPC)

  1. Copy the image to MeluXina.
# Copy the image from Grid5000 to Jean-Zay
scp lille.g5k:tuto/chameleon.sif meluxina:chameleon.sif
  1. Start an interactive allocation with SLURM and load Singularity/Apptainer. On MeluXina, the singularity command is available through a module and the module command is only accessible on a compute node.
[login] srun --pty \
             -A project_id \
             --partition=gpu \
             -N 1 \
             --exclusive \
             --gpus-per-task=4 \
             --time=00:10:00 \
             --qos=test \
             bash
[compute] module load Apptainer/1.3.1-GCCcore-12.3.0
  1. Start the computation using Singularity.
[compute] OPENBLAS_NUM_THREADS=1 \
            exec \
            --nv \
            --bind /tmp:/tmp \
            chameleon-cuda.sif \
            bash -c 'LD_PRELOAD=/.singularity.d/libs/libcuda.so chameleon_stesting -o gemm -n 20000 -b 2000 --nowarmup -g 4'
Tip

In this example, we use a single node. In order to use multiple nodes, a script should be submitted using sbatch (not covered in this tutorial).

Build a Docker image on Grid5000

  1. Build a Docker container on Grid5000.
ssh lille.g5k
guix time-machine -C channels.scm -- pack -f docker chameleon-cuda bash -r ./chameleon.tar.gz

Deploy a Docker container on Irene (TGCC)

  1. Copy the container to Irene. Depending on your SSH setup, you might have to adapt the commands below.
# Disconnect from Grid5000.
exit

# Copy the image from Grid5000 to Jean-Zay
scp lille.g5k:chameleon.tar.gz irene:
  1. **SSH to Irene and import the image using pcocc-rs.
[irene-login] pcocc-rs image import docker-archive:chameleon.tar.gz chameleon
Tip

The TGCC uses a specific tool to deploy Docker images called pcocc-rs. See the documentation.

  1. Start a job in interactive mode.
[irene-login] ccc_mprun \
                -p v100 \
                -N 4 \
                -n 4 \
                -c 40 \
                -E '--mpi=pmi2' \
                -m work,scratch \
                -A user_account \
                -T 600 \
                -C chameleon \
                -E '--ctr-module nvidia' \
                -- bash -c "LD_PRELOAD=/pcocc/nvidia/usr/lib64/libcuda.so chameleon_stesting -o gemm -n 20000 -b 2000 --nowarmup -g 4"
Tip

On Irene, resources are allocated using ccc_mprun. See the documentation. For instance, the -s option spawns an interactive session directly on a compute node.

Tip

On Irene, the number of allocated GPUs is directly related to the number of allocated cores on the node. Here, 20 cores are allocated on a V100 which contains 40 cores in total, so 50% of the GPUs available on the node (4 x V100) are allocated. See the documentation.

Tip

The --module nvidia option make the CUDA libraries available inside the image in the /pcocc/nvidia/usr/lib64 folder.

Chameleon on AMD GPUs

Build the image on Grid5000

  1. Connect to Grid5000 and build the Singularity container.
ssh lille@g5k
cd tuto
guix time-machine -C channels.scm -- pack -f squashfs chameleon-hip bash -r ./chameleon-hip.sif

Deploy on Adastra

  1. Copy the Singularity image to Adastra. Depending on your SSH setup, you might have to adapt the commands below.
# Disconnect from Grid5000.
exit

# Copy the image from Grid5000 to Adastra
scp lille.g5k:tuto/chameleon-hip.sif adastra:chameleon-hip.sif
Warning

Before being able to use a custom Singularity image, it has be manually copied to an authorized path by the support, which should be contacted by email. See the documentation.

  1. Start a job in interactive mode.
ssh adastra
OPENBLAS_NUM_THREADS=1 \
  srun --cpu-bind=socket \
       -A user_account \
       --time=0:10:00 \
       --constraint=MI250 \
       --exclusive \
       --nodes=4 \
       --mpi=pmi2 \
       singularity exec \
                   --bind /proc \
                   --bind /sys \
                   /opt/software/containers/images/users/cad15174/chameleon-hip.sif \
                   chameleon_stesting -o gemm -n 96000 -b 2000 --nowarmup -g 8

Deploy on LUMI

  1. Copy the Singularity image to LUMI. Depending on your SSH setup, you might have to adapt the commands below.
# Copy the image from Grid5000 to LUMI
scp lille.g5k:tuto/chameleon-hip.sif lumi:chameleon-hip.sif
  1. Start a job in interactive mode.
ssh lumi
OPENBLAS_NUM_THREADS=1 \
  srun --cpu-bind=socket \
       -A project_id \
       --threads-per-core=1 \
       --cpus-per-task=56 \
       --ntasks-per-node=1 \
       -N 4 \
       --time=00:05:00 \
       --partition=dev-g \
       --mpi=pmi2 \
       singularity exec \
                   --bind /sys:/sys \
                   chameleon-hip.sif \
                   chameleon_stesting -o gemm -n 40000 -b 2000 --nowarmup -g 8

Bonus: relocatable binaries

For machines where Singularity is not available (or you have to ask support to deploy your custom image), an alternative can be the relocatable binary archive. The command below generates an archive containing chameleon-hip for AMD GPUs that can be run on e.g. Adastra:

guix time-machine -C channels.scm -- pack -R -S /bin=bin -C zstd chameleon-hip -r chameleon-hip.tar.zst

This archive can then be uploaded to a supercomputer (e.g. Adastra) and deployed:

# Copy the archive to Adastra
scp chameleon-hip.tar.zst adastra:
# SSH into Adastra
ssh adastra
# Extract the archive into its own folder
[adastra] mkdir chameleon-hip && zstd -d chameleon-hip.tar.zst \
                              && tar xf  chameleon-hip.tar -C chameleon-hip
# Start the job
[adastra] OPENBLAS_NUM_THREADS=1 \
            srun --cpu-bind=socket \
                 -A cad15174 \
                 --time=0:10:00 \
                 --constraint=MI250 \
                 --exclusive \
                 --nodes=4 \
                 --mpi=pmi2 \
                 $CCFRWORK/chameleon-hip-common/bin/chameleon_stesting \
                              -o gemm -n 96000 -b 2000 --nowarmup -g 8

Modern HPC Workflow Example (Spack)

Table of content

This is the second part of the Worklow Tutorial. In the previous example we show how to use Singularity and Guix for our running example, Chameleon, on HPC clusters (Modern HPC Workflow Example (Guix)).

Warning

This tutorial relies on a GitLab access token for the registry. Since the tutorial took place, this token has expired.

In this second part, we will use Spack instead of Guix. We will also produce Spack-generated containers, for easy reproducibility of the workflow across different computers.

In summary, we are going to:

  • Install Spack on Grid'5000.
  • Build Chameleon with CUDA support.
  • Push the packages into a container registry.
  • Pull the packages as a Singularity container.
  • Run the container in the GPU partition of Grid'5000, or other supercomputer.

About Container Registries

There are 2 ways to generate containers with Spack:

The containerize option has a number of drawbacks, so we want to push with the Build Caches option. This also has the benefit of being able to build and cache packages on CI/CD, allowing for quicker deployments.

The Spack build cache will require setting up a container registry, in some Git Forge solution. Both GitHub and GitLab provide their own Container Registry solutions. This guide presents how to create it: Setup a Container Registry on GitLab.

For this tutorial, we will use the container registry hosted at Inria’s GitLab.

Build the Container on Grid'5000

We will connect to the Lillie site on Grid'5000, exactly the same as with the Guix guide.

Note

If you are having trouble at any step, you can skip this and download the container directly:

$ wget --user=<your_g5k_login> --ask-password https://api.grid5000.fr/sid/sites/lille/public/fayatsllamas/chameleon-spack.sif
$ ssh lille.g5k
$ mkdir tuto-spack && cd tuto-spack

Spack is installed at the user level. To install Spack, you have to clone the Spack repo, and load it with source:

$ git clone -c feature.manyFiles=true https://github.com/spack/spack

$ cd spack
$ git checkout b7f556e4b444798e5cab2f6bbfa7f6164862700e
$ cd ..

$ source spack/share/spack/setup-env.sh
$ spack --version
1.0.0.dev0 (b7f556e4b444798e5cab2f6bbfa7f6164862700e)

We will create an Spack environment, that holds our configuration and installed packages. The Spack environment will create a spack.yaml file, which we will edit:

$ spack env create --dir ./myenv   # this may be a bit slow
$ spack env activate ./myenv
$ spack env status
==> In environment /home/fayatsllamas/myenv

Open the ./myenv/spack.yaml with your favorite editor, and you will see something like this:

# ./myenv/spack.yaml
spack:
  specs: []
  view: true
  concretizer:
    unify: true

We will perform 3 modifications:

  • Add Chamelon to the list of installed packages
  • Configure Spack to build our packages for generic x86_64. This will ensure it doesn’t mix the ISA’s of the nodes we will use.
  • Configure 2 mirrors:
    • inria-pull is a mirror I populated with caches of the packages for the tutorial.
    • inria-<name> is a mirror you will use to push the packages you build, as an example.
Important

Change inria-<name> and the URL .../buildcache-<name> to a unique name. You will push to this cache as an example, so we don’t collide between each other. You can use your G5k login, for example.

# ./myenv/spack.yaml
spack:
  specs:
  - chameleon@master+cuda
  packages:
    all:
      require: target=x86_64

  view: true
  concretizer:
    unify: true

  mirrors:
    inria-pull:
      url: oci://registry.gitlab.inria.fr/numpex-pc5/tutorials/buildcache
      signed: false

    inria-<name>:
      url: oci://registry.gitlab.inria.fr/numpex-pc5/tutorials/buildcache-<name>
      access_pair:
      - guest
      - glpat-x_uFkxezH1iTKi6KmLrb
      signed: false

Edit the spack.yaml file and save it. After the environment has been modified, we call spack concretize to “lock” our changes (to a spack.lock file). We can use spack spec to preview the status of our environment. It will show the packages we are missing to be built.

Note

spack concretize locks the characteristics of the environment to the current machine. We are concretizing on the frontend node for convenience, and to be able to test our packages in it.

$ spack concretize
$ spack spec
 -   chameleon@master%gcc@10.2.1+cuda~fxt~ipo+mpi+shared~simgrid build_system=cmake build_type=Release cuda_arch=none generator=make runtime=starpu arch=linux-debian11-x86_64
 -       ^cmake@3.31.4%gcc@10.2.1~doc+ncurses+ownlibs~qtgui build_system=generic build_type=Release arch=linux-debian11-x86_64
 -           ^curl@8.11.1%gcc@10.2.1~gssapi~ldap~libidn2~librtmp~libssh~libssh2+nghttp2 build_system=autotools libs=shared,static tls=openssl arch=linux-debian11-x86_64
 -               ^nghttp2@1.64.0%gcc@10.2.1 build_system=autotools arch=linux-debian11-x86_64
...

The next step is to build the packages. Usually, this is a CPU-intensive job. Let’s move to a CPU node of G5k for this (chiclet):

$ oarsub -t allowed=special --project lab-2025-numpex-exadi-guix-spack-in-hpccenters -I -p chiclet -l /nodes=1
[compute]$ source spack/share/spack/setup-env.sh
[compute]$ spack env activate --dir ./myenv

To build our software stack, just call spack install. We have configured a pull-only build cache previously, so packages will not be re-compiled:

[compute]$ spack install

You may want to check that everything was built, by running spack spec again:

[compute]$ spack spec 
[+]  chameleon@master%gcc@10.2.1+cuda~fxt~ipo+mpi+shared~simgrid build_system=cmake build_type=Release cuda_arch=none generator=make runtime=starpu arch=linux-debian11-zen
[+]      ^cmake@3.31.4%gcc@10.2.1~doc+ncurses+ownlibs~qtgui build_system=generic build_type=Release arch=linux-debian11-zen
[+]          ^curl@8.11.1%gcc@10.2.1~gssapi~ldap~libidn2~librtmp~libssh~libssh2+nghttp2 build_system=autotools libs=shared,static tls=openssl arch=linux-debian11-zen

After the packages have been built, let’s push them into the container registry.

Important

To push our packages to be used as containers, we must add the --base-image flag. As Spack doesn’t built everything from the bottom, we must provide a base image, from which the libc library will be taken. You must match your --base-image to the system that built the packages. We have built the packages under Grid'5000 Debian 11 installation, so the base image should be a Debian 11 too. Not matching this, or not passing --base-image will render the push unusable.

Because Docker might put a rate-limit on the pulls of an image, and we are sharing the same IP address (10 downloads per hour per IP), I mirrored the Debian 11 image to the Inria registry. Please use this image instead (otherwise, the command would be --base-image debian:11):

[compute]$ spack buildcache push --base-image registry.gitlab.inria.fr/numpex-pc5/tutorials/debian:11 inria-<name> chameleon
....
==> [55/55] Tagged chameleon@master/7p4fs2v as registry.gitlab.inria.fr/numpex-pc5/tutorials/buildcache:chameleon-master-7p4fs2vb7isbfol3ceumigyxqs7bhuoq.spack

Annotate the URL that Spack gives you

Because Singularity might use heavy CPU/Memory resources, we build the container Image while we are in the compute node. The output is a SIF file (Singularity Image Format).

[compute]$ module load singularity
[compute]$ singularity pull chameleon-spack.sif docker://registry.gitlab.inria.fr/numpex-pc5/tutorials/<.....>
[compute]$ exit

Deploying on NVIDIA GPU’s

The commands for running the container on other machines like Jean-Zay, Vega, etc; will be the same as in the Guix Tutorial.

We will demonstrate how to run the container on the GPU partition of Grid'5000.

Deploying Chameleon on Grid'5000

Before trying the container, we can try our Spack-installed Chameleon directly:

$ oarsub -t allowed=special --project lab-2025-numpex-exadi-guix-spack-in-hpccenters -I -p chifflot -l /host=2,walltime=0:10:00
[compute]$ source spack/share/spack/setup-env.sh
[compute]$ spack env activate ./myenv
[compute]$ mpirun -machinefile $OAR_NODEFILE -x PATH chameleon_stesting -o gemm -n 4000 -b 160 --nowarmup -g 2
[compute]$ exit

To use the singularity container:

$ oarsub -t allowed=special --project lab-2025-numpex-exadi-guix-spack-in-hpccenters -I -p chifflot -l /host=2,walltime=0:10:00
[compute]$ mpirun -machinefile $OAR_NODEFILE \
                  --bind-to board \
                  singularity exec \
                    --bind /usr/lib/x86_64-linux-gnu/:/usr/lib/x86_64-linux-gnu/ \
                    --bind /tmp:/tmp chameleon-spack.sif \
                    chameleon_stesting -o gemm -n 4000 -b 160 --nowarmup -g 2

Modern HPC Workflow with Containers

Table of content

Diagram Diagram

Software deployment in HPC systems is a complex problem, due to specific constrains, such as:

  • No access to root
  • No package install, update or modification as a user
  • Some kernel features are disabled, like user namespaces

As users develop more complex software, their needs for extra dependencies increase. The classical solution to providing extra software to the user involves modules. Modules can be loaded from the terminal of a user, and are managed by the HPC admin team.

$ module avail -t | grep kokkos
kokkos/3.2.00
kokkos/3.2.00-cuda
kokkos/3.4.01-cuda
kokkos/3.4.01-cuda-c++17
kokkos/3.4.01-cuda-static
kokkos/3.4.01-cuda-static-c++17

This is solution has some shortcomings:

  • How to use software not provided by modules?
  • How to deploy different versions of the package, or different variants?
  • How to reproduce the software stack at a later point in time (even for archival purposes)
  • How to move from one machine to another, given that the exposed modules are machine dependant?
  • How to modify a package in the dependency chain?

Shift in the paradigm of software deployment

In order to solve the above mentioned issues and in the view of a future of Exascale computing, we propose a shift in the paradigm of software deployment, from the classical way, where the admin team provides the software stack for the users, to a new procedure where the user brings their own software stack.

This method has a number of advantages, among the following:

  • The user is in full control of their software stack.
  • A container is portable across different compute centers.
  • The cost of moving to a new HPC system is reduced.

Diagram Diagram

Singularity/Apptainer

Singularity is an application that can run containers in an HPC environment. It is highly optimized for the task, and has interoperability with Slurm, MPI or GPU specific drivers.

Usually, we find a duplicity of software stacks, and platforms to deploy to:

Diagram Diagram

Containers (Singularity or Docker) solve this by having a single interface that merges everything. From the software stack, the container is the platform to deploy to. From the platform point of view, software comes bundled as a container:

Diagram Diagram

Singularity uses its own container format (sif), which can also be transparently generated from a Docker container.

Singularity is available in the majority of Tier-1 and Tier-0 HPC centers, either in the default environment or loaded from a module:

# On LUMI (European Tier-0 cluster)
$ singularity --version
singularity-ce version 4.1.3-150500.10.7
#
# On Jean-Zay (French Tier-1 cluster)
$ module load singularity
$ singularity --version
singularity version 3.8.5

Singularity can download and run a container image directly from an online container registry such as DockerHub using the docker:// reference:

[lumi] $ singularity shell docker://ubuntu:latest

Singularity> grep VERSION= /etc/os-release
VERSION="24.04.1 LTS (Noble Numbat)

This feature is not available in all clusters.

See also documentation about the Github Container Registry (GHCR) for setting up a Github hosted registry.


Using containers through Singularity can provide a solution to some of the points mentioned in the previous section, but also transfers to the user the task to build a container with the specific software stack they need.

Building a container can be streamlined using package managers.

In our approach, we selected two package managers to build the containers: Guix and Spack.

Differences between Guix and Spack

GNU Guix is a package manager for GNU/Linux systems. It is designed to give users more control over their general-purpose and specialized computing environments, and make these easier to reproduce over time and deploy to one or many devices. (source: Guix official website)

Spack is a package manager for supercomputers, Linux, and macOS. It makes installing scientific software easy. Spack isn’t tied to a particular language; you can build a software stack in Python or R, link to libraries written in C, C++, or Fortran, and easily swap compilers or target specific microarchitectures. (source: Spack official website)

A key feature of the Spack package manager is that it allows users to integrate parts of the system they are building on: Spack packages can use compilers or link against libraries provided by the host system. Use of system-provided software is even a requirement at the lowest level of the stack.

Guix differs from Spack in two fundamental ways: self containment, and support for reproducibility and provenance tracking. Self containment stems from the fact that Guix packages never rely on software pre-installed on the system; its packages express all their dependencies, thereby ensuring control over the software stack, wherever Guix deploys it. This is in stark contrast with Spack, where packages may depend on software pre-installed on the system.

Unlike Spack, Guix builds packages in isolated environments (containers), which guarantees independence from the host system and allows for reproducible builds. As a result, reproducible deployment with Guix means that the same software stack can be deployed on different machines and at different points in time—there are no surprises. Conversely, deployment with Spack depends on the state of the host system.

TODO: Comparative table of features

Building containers with Guix

Guix is a package manager for Linux focused on the reproducibility of its artifacts. Given a fixed set of package definitions (a list of channels at a specific commit in Guix terminology), Guix will produce the same binaries bit-by-bit, even after years between experiments.

The Guix project itself maintains a list of package definitions installed together with the package manager tool.

For some specific scientific packages, it might be necessary to include extra package definitions from third-party channels: a list of science-related channels can be found here.

Note that these channels contain only FOSS-licensed packages. In order to access to package definitions of proprietary software or of software that depend on non-free software, the following channels could be included:

The Guix package manager is able by itself to instantiate a containerized environment with a set of packages using the guix shell --container command.

Unfortunately, Guix is not yet available on Tier-1 and Tier-0 supercomputers, but it can be used to generate a Singularity image locally before deploying it on a supercomputer. This gives the user both the reproducibility properties of the Guix package manager and the portability of Singularity containers.

To get started, install Guix or connect to a machine with a Guix installation (Grid5000 for example).

Guix generates Singularity images with the guix pack -f squashfs command, followed by a list of packages. For example, the following command would generate a Singularity image containing the bash and gcc-toolchain packages:

$ guix pack -f squashfs bash gcc-toolchain
[...]
/gnu/store/xxxxxxxxxxxxxxxxxxxxxxxxx-bash-gcc-toolchain-squashfs-pack.gz.squashfs

The image can be configured with an entry point, allowing to directly start an arbitrary program when called with the run subcommand of Singularity. This is done using the --entry-point flag:

# Create an image containing bash and hello, an "hello world" program,
# that will be started by default.
$ guix pack -f squashfs --entry-point=/bin/hello bash hello
[...]
/gnu/store/xxxxxxxxxxxxxxxxxxxxxxxxx-bash-hello-squashfs-pack.gz.squashfs

In order to easily find the generated image, the -r flag creates a link to the image (along with other actions):

# Create an image containing bash and hello, an "hello world" program,
# that will be started by default.
$ guix pack -f squashfs --entry-point=/bin/hello bash hello -r hello.sif
[...]
/gnu/store/xxxxxxxxxxxxxxxxxxxxxxxxx-bash-hello-squashfs-pack.gz.squashfs
$ ls -l
[...] hello.sif -> /gnu/store/xxxxxxxxxxxxxxxxxxxxxxxxx-bash-hello-squashfs-pack.gz.squashfs

The image can be then transfered to the target supercomputer and run using Singularity. Below is an example on LUMI:

$ scp hello.sif lumi:~
[...]
$ ssh lumi
[...]
[lumi] $ singularity run hello.sif 
[...]
Hello, world!
[lumi] $ singularity shell hello.sif
[lumi] Singularity> command -v hello
/gnu/store/xxxxxxxxxxxxxxxxxxxx-profile/bin/hello

Instead of specifying the list of packages on the command line, the packages can be specified through a manifest file. This file can be written by hand or generated using the command guix shell --export-manifest. Manifests are useful when dealing with a long list of packages or package transformations. Since they contain code, they can be used to perform a broad variety of modifications on the package set such as defining package variants or new packages that are needed in a specific context. The example below generates a simple manifest.scm file containing the bash and hello packages:

$ guix shell --export-manifest bash hello > manifest.scm

This manifest file can be then used to generate the same Singularity image as above with the following command:

$ guix pack -f squashfs --entry-point=/bin/hello -m manifest.scm

The command guix describe -f channels generates a channels file that is used to keep track of the current state of package definitions:

$ guix describe -f channels > channels.scm

Both files channels.scm and manifest.scm should be kept under version control and are sufficient to generate an image containing the exact same software stack down to the lib C, with the exact same version and compile options, in any machine where the guix command is available, using the command guix time-machine:

$ guix time-machine -C channels.scm -- pack -f squashfs --entry-point=/bin/hello -m manifest.scm

Note that in order to generate the exact same file (bit-for-bit identical), the same image specific options such as --entry-point have to be specified.

Building container images with Spack

Spack is a package manager specifically targeted at HPC systems. One of its selling points is that it can easily target specific features of the supercomputer, like compiler, CPU architecture, configuration, etc.

Unlike Guix, Spack can be installed directly on a supercomputer by the user, as it only requires git clone in the home directory. There are some problems with this:

  • High usage of inodes and storage
  • Reproducibility and portability of the environment across machines or time

Instead of using Spack directly on the supercomputer, it is possible to use Spack to generate Singularity or Docker containers. Once the container is generated, the same environment will be able to deployed to any machine.

To generate the container, Spack documents 2 ways:

  1. Generating a Dockerfile. This method has some downsides, so we will use the next one.
  2. Using a build cache

Going with the build cache approach, you need to:


Diagram Diagram

$ spack env create --dir ./myenv
==> Created independent environment in: /home/ubuntu/spack/myenv
==> Activate with: spack env activate ./myenv
$ spack env activate ./myenv

$ spack mirror add \
    MY_MIRROR \                         # name for spack
    --oci-username <OCI_USERNAME> \     # username
    --oci-password <OCI_PASSWORD> \     # api token
    oci://<URL>                         # URL of the registry

$ spack add cmake
==> Adding cmake to environment /home/ubuntu/spack/myenv
$ spack install
...

$ spack buildcache push --base-image ubuntu:24.04 MY_MIRROR cmake
==> [16/16] Tagged cmake@3.30.5/twhtshf as registry.gitlab.inria.fr/numpex-pc5/wp3/spack-repo/buildcache-myenv:cmake-3.30.5-twhtshfoj7cxxomelmcic2oyv6tr2jts.spack
Important

The argument --base-image <image> must be passed, and should match the host system used to build the packages.

Once Spack pushes the package, we can go into our supercomputer and run the container directly:

$ ssh lumi
[lumi] $ singularity shell docker://registry.gitlab.inria.fr/numpex-pc5/wp3/spack-repo/buildcache-myenv:cmake-3.30.5-twhtshfoj7cxxomelmcic2oyv6tr2jts.spack

Singularity> command -v cmake
/home/ubuntu/.spack/opt/linux-ubuntu24.04-icelake/gcc-13.2.0/cmake-3.30.5-twhtshfoj7cxxomelmcic2oyv6tr2jts/bin/cmake

Hands-on examples

The examples below follow the methodology described in this document to deploy containers on supercomputers:

Advanced topics

Using custom packages

Both Spack and Guix expose command-line mechanisms for package customization.

Guix uses so-called package transformations.

Spack uses the specification mechanism.

CPU optimization

In order to generate a binary optimized for a specific CPU micro-architecture, the --tune flag can be passed to a variety of Guix commands:

# Build a PETSc package optimized for Intel x86_64 Cascade Lake micro-architecture.
$ guix build --tune=cascadelake petsc
# Instantiate a containerized environment containing an optimized PETSc package.
$ guix shell --container --tune=cascadelake petsc
# Generate a manifest file where all the tunable packages are optimized.
$ guix shell --export-manifest --tune=cascadelake pkg1 pkg2 ... pkgN

For Spack, this can be done by adding the target specification on the command-line:

$ spack install petsc target=cascadelake

Spack also is capable of easily configuring the CFLAGS for a package:

$ spack install petsc cppflags=-O3

MPI performance

Diagram Diagram

The 3 aspects of concern when getting the best performance with MPI and Containers are:

  • Container runtime performance: the slowdown caused by the container runtime having to translate between namespaces is not significant enough.
  • Network drivers: as long as the containers are properly built, the drivers should discover the high-speed network stack properly.
  • MPI distribution: the admin team might use custom compilation flags for their MPI distribution. It remains to be seen what’s the impact of this.

After many tests, we have concluded that Singularity doesn’t seem to pose an issue against performance. Although the benchmark figures don’t indicate any significant performance loss, the user is expected to compare the performance with their own software to run.

Composyx CG convergence - 40 nodes - 40 subdomains

composyx perf composyx perf

OSU bandwidth benchmark

osu1 osu1

If the MPI drivers aren’t properly detected, the performance figures for benchmarks will be orders of magnitude different, as this usually means falling back to the TCP network stack instead of using the high-performance network. The network driver for MPI is controlled with MCA parameters --mca key value.

Usually MPI detects the driver automatically, but you can force some driver with --mca pml <name>, or to debug if MPI is selecting the proper driver. This is further explained in Notes on MPI.

Regarding the actual MPI installation, a generic OpenMPI installation usually can get performance figures in the same order of magnitudes as the MPI installation provided by the admin team, provided the network driver is properly selected. If the user has the technical expertise, the MPI installation can be passed-through the container and replaced at runtime. More investigation around the viability of this method is to be done.

CUDA and ROCM stacks

  • Singularity allows passing through the graphics cards to the containers, with the --nv and --rocm flags.

  • Spack packages that may support CUDA, have the +cuda specification that can be enabled. Additionally, other packages support specifying the cuda architecture with cuda_arch=<arch>. ROCM support is also provided in selected packages through the +rocm spec.

  • Guix provides CUDA packages through the Guix-HPC Non-free repository. This contains package variants with CUDA support. ROCM software stack and package variants are hosted in the regular Guix-HPC channel.

Application vs development containers

When building a container or a software environment, we usually make the distinction between “application” and “development” containers:

  • If we have every dependency to build some package, except the package itself, it’s a development container.
  • If it only contains the application itself, it’s an app container.

Because of this, there are 2 separate usecases for a container:

  • Getting all the dependencies to iterate when developing a package.
  • Deploying a final package into a supercomputer.

Alternatives

This workflow provides some flexibility on how to use the tools proposed. Other alternative ways are:

  • Using Spack natively:

    • Useful for iterating a solution on a local machine.
    • Installing Spack doesn’t require admin, so it can be tested on a supercomputer as well.
    • Can run into the limit of inodes if used in a supercomputer.
  • Using Guix natively:

    • Also useful for local testing.
    • Guix is not available is supercomputers.
  • Using Singularity containers not built with Guix or Spack:

    • Doesn’t have the guarantees of reproducibility or customizability, but still a good step towards isolation and portability.
  • Guix relocatable binaries:

    • This is an alternative format of guix pack, which produces a single file that can be run without Singularity.
    • Very good option for application deployment, but can be tricky to be setup as development solutions.

HPC centers support for Singularity

The following list describes the platform support for the supercomputers we have tested the workflow on, and any caveats encountered.

Supercomputer High-speed Network CPU GPU Singularity support?
Jean-Zay InfiniBand Intel x86-64 Nvidia (CUDA) ✅*
Adastra Cray AMD x86-64 AMD ✅*
Irene InfiniBand Intel x86-64 Nvidia P100 (CUDA) ❌*
LUMI Cray AMD x86-64 AMD MI250X (ROCM)
Vega InfiniBand Intel x86-64 Nvidia A100 (CUDA)
Meluxina InfiniBand AMD x86-64 Nvidia A100 (CUDA)

Jean-Zay

Containers must be placed in the “allowed directory” with idrcontmgr:

$ idrcontmgr cp image.sif
$ singularity shell $SINGULARITY_ALLOWED_DIR/image.sif

Irene

Singularity is not supported. Instead, a Docker-compatible runtime pcocc-rs is provided.

Guix images must be genrated with -f docker instead.

Adastra

The admin team has to verify each container image before use.

If quick deployment is required, itis also possible to use Guix relocatable binaries or a native spack installation. Guix can generate relocatable binaries with:

# Generate the pack, linking /bin
$ guix pack --relocatable -S /bin=bin <package>
...
/gnu/store/...-tarball-pack.tar.gz

# Move the pack to the Adastra and unpack it
$ scp /gnu/store/...-tarball-pack.tar.gz adastra:~/reloc.tar.gz
$ ssh adastra
[adastra] $ tar -xvf reloc.tar.gz

[adastra] $ ./bin/something

Notes on MPI

Table of content

MCA parameters

MPI uses the MCA (Modular Component Architectures) as a framework for configuration for different run-time parameters of an MPI application.

MCA parameters can be adjusted with the flag:

$ mpirun --mca <name> <value> ...

There are 2 ways to debug which MCA parameters are used:

  1. ompi_info --all will display all the MCA parameters that are avaiable a priori.
  2. The mpi_show_mca_params MCA parameters can be set to all, default, file, api or enviro to display their selected value. Sometimes thy will just show as key= (default), which is not useful.

Network drivers

There are 3 modes for MPI to select networks 1: ob1, cm and ucx, that can be set with --mca pml <ob1,cm,ucx> (PML: Point-to-point Message Layer).

  • ucx manages the devices on its own. It should be used for InfiniBand networks. UCX can be further configured with ucx-specific env variables, for example mpirun --mca pml ucx -X UCX_LOG_LEVEL=debug ....
  • ob1 is the multi-device, multi-rail engine and is the “default” choice. It is configured with --mca pml ob1. It used different backends for the Byte-Transport-Layer (btl), which can be configured with --mca btl <name>, such as:
    • tcp
    • self
    • sm shared memory
    • ofi Libfabric, alternate way
    • uct UCX, alternate way
  • cm can interface with “matching” network cards that are MPI-enabled. It uses MTL’s (not BTL’s) which can be set with --mca mtl <name>
    • psm2 Single-threaded Omni-Path
    • ofi Libfabric

In short: ucx provides the performance for InfiniBand, cm can be used for specific setups, and ob1 as the fallback for low-performance TCP or local-device. libfabric can be used through cm or ob1.

TODO: Discuss MCA transports for CUDA

Using Grid'5000

Table of content

The purpose of this tuto is to let you experiment the Grid'5000 platform, which is a large-scale and flexible testbed for experiment-driven research in all areas of computer science, with a focus on parallel and distributed computing including Cloud, HPC and Big Data and AI.

As an example we will try to run an implementation of Conway’s Game of Life using Message Passing Interface (MPI) for parallelization.

Set up a Grid'5000 account

To request an account on Grid’5000 fill that form and select the appropriate Group Granting Access, Team and Project. Members of the NumPEx-PC5 Team should use the values documented here.

Then make sure to generate a SSH keypair on your PC and to upload the public key on Grid'5000 ; this will allow direct connection using ssh from your PC. Detailled explanations are given here.

Read the documentation

A very extensive documentation is available on Grid'5000 User Portal. For that tutorial you may start with these two articles:

If you are not familiar with MPI you might also have a look here.

Prepare the work

Connect to one Grid'5000 site

If you applied the correct SSH configuration on your PC (see here), you should be able to connect directly to a given Grid'5000 front-end, let’s say for instance Grenoble, with a simple ssh command :

jcharousset@DEDIPPCY117:~$ ssh grenoble.g5k
Linux fgrenoble 5.10.0-30-amd64 #1 SMP Debian 5.10.218-1 (2024-06-01) x86_64
----- Grid'5000 - Grenoble - fgrenoble.grenoble.grid5000.fr -----

** This site has 5 clusters (more details at https://www.grid5000.fr/w/Grenoble:Hardware)
 * Available in queue default with exotic job type:
 - drac   (2016): 12 nodes (2 CPUs POWER8NVL 1.0, 10 cores/CPU, 4 GPUs Tesla P100-SXM2-16GB, 128GB RAM, 2x931GB HDD, 1 x 10Gb Ethernet, 2 x 100Gb InfiniBand)
 - yeti   (2017): 4 nodes (4 CPUs Intel Xeon Gold 6130, 16 cores/CPU, 768GB RAM, 447GB SSD, 2x1490GB SSD, 3x1863GB HDD, 1 x 10Gb Ethernet, 1 x 100Gb Omni-Path)
 - troll  (2019): 4 nodes (2 CPUs Intel Xeon Gold 5218, 16 cores/CPU, 384GB RAM, 1536GB PMEM, 447GB SSD, 1490GB SSD, 1 x 25Gb Ethernet, 1 x 100Gb Omni-Path)
 - servan (2021): 2 nodes (2 CPUs AMD EPYC 7352, 24 cores/CPU, 128GB RAM, 2x1490GB SSD, 1 x 25Gb Ethernet, 2 x 100Gb FPGA/Ethernet)
 * Available in queue default:
 - dahu   (2017): 32 nodes (2 CPUs Intel Xeon Gold 6130, 16 cores/CPU, 192GB RAM, 223GB SSD, 447GB SSD, 3726GB HDD, 1 x 10Gb Ethernet, 1 x 100Gb Omni-Path)

** Useful links:
 - users home: https://www.grid5000.fr/w/Users_Home
 - usage policy: https://www.grid5000.fr/w/Grid5000:UsagePolicy
 - account management (password change): https://api.grid5000.fr/ui/account
 - support: https://www.grid5000.fr/w/Support

** Other sites: lille luxembourg lyon nancy nantes rennes sophia strasbourg toulouse

Last login: Fri Jun 14 11:40:27 2024 from 192.168.66.33
jcharous@fgrenoble:~$

Build the example application

Let’s first retrieve the original source code by cloning the Github repository:

jcharous@fgrenoble:~$: git clone https://github.com/giorgospan/Game-Of-Life.git
Cloning into 'Game-Of-Life'...
remote: Enumerating objects: 127, done.
remote: Total 127 (delta 0), reused 0 (delta 0), pack-reused 127
Receiving objects: 100% (127/127), 171.69 KiB | 1.95 MiB/s, done.
Resolving deltas: 100% (48/48), done.
Tip

There is a distinct home directory on each Grid'5000 site, so what has been stored in Grenoble will not be available if you connect to Lyon or Nancy.

To generate a more verbose output, you might want to uncomment lines 257 to 264 of the file Game-of-Life/mpi/game.c - thus a subset of the matrix will be printed at each generation.

		/*Uncomment following lines if you want to see the generations of process with
		rank "my_rank" evolving*/
		
		
		// if(my_rank==0)
		// {
			// printf("Generation:%d\n",gen+1);
			// for(i=0;i<local_M;++i)
				// putchar('~');
			// putchar('\n');
			// print_local_matrix();
		// }

Next step is to build the application:

jcharous@fgrenoble:~$ cd Game-Of-Life/
jcharous@fgrenoble:~/Game-Of-Life$ make mpi
rm -f ./mpi/functions.o ./mpi/game.o ./mpi/main.o ./mpi/gameoflife
mpicc -g -O3 -c mpi/functions.c -o mpi/functions.o
mpicc -g -O3 -c mpi/game.c -o mpi/game.o
mpicc -g -O3 -c mpi/main.c -o mpi/main.o
mpicc -o mpi/gameoflife mpi/functions.o mpi/game.o mpi/main.o
jcharous@fgrenoble:~/Game-Of-Life$

Resulting executable is available at ~/Game-Of-Life/mpi/gameoflife

Run the computation

Request nodes for a computation

Now we will ask the Grid'5000 platform to give us access to one node (comprising multiple CPU cores, 32 in our case) for an interactive session. We use also the walltime option to set an upper limit of 1 hour to our session ; after that time the session will be automatically killed.

jcharous@fgrenoble:~/Game-Of-Life$ oarsub -I -l nodes=1,walltime=1
# Filtering out exotic resources (servan, drac, yeti, troll).
OAR_JOB_ID=2344074
# Interactive mode: waiting...
# [2024-06-14 13:41:46] Start prediction: 2024-06-14 13:41:46 (FIFO scheduling OK)

Let’s wait until the scheduler decides to serve our request… be patient. Eventually our request will be picked up from the queue and the scheduler will grant us access to one computation node (dahu-28 in our example):

# [2024-06-14 13:45:13] Start prediction: 2024-06-14 13:45:13 (FIFO scheduling OK)
# Starting...
jcharous@dahu-28:~/Game-Of-Life$

Launch the computation

And finally we can execute the command to launch the computation:

jcharous@dahu-28:~/Game-Of-Life$ mpirun --mca pml ^ucx -machinefile $OAR_NODEFILE ./mpi/gameoflife -n 3200 -m 3200 -max 100

Where:

  • mpirun is the command to launch a MPI application on multiple cpus and cores,
  • --mca pml ^ucx is the set of options to tell Open MPI not to try to use high performance interconnect hardware and avoid a HUGE amount of warnings beeing shown,
  • $OAR_NODEFILE is the list of cpu cores to be used for the computation - this file was generated by the oarsub command in the previous section,
  • -n 3200 -m 3200 -max 100 are the parameters for our application, asking for a grid size of 3200*3200 and 100 generations.

You should see printouts of the matrix at each generation, followed by an information about the total time spent.

Congratulations, you did it 👋

What’s next ?

This very simple exercise should give you the basic idea. There are still lot of additionals topics you should explore:

  • Fine tune Open MPI config for performance optimisation,
  • Use OAR batch jobs instead of interactive sessions,
  • Use OAR options to precisely specify the ressources you want, requesting for a specific hardware property (e.g. 2 nodes with an SSD and 256G or more RAM) and/or a specific topology (e.g. 16 cores distributed on 4 different CPUs from the same node)
  • Automate complete workflows including transport of data, executables and/or source code to and from Grid'5000, before and after a computation,
Caution

Grid'5000 does NOT have any BACKUP service for users’ home directories, it is your responsibility to save what needs to be saved in some place outside Grid'5000.

  • Run computations on multiple nodes,
Tip

For this you will need to properly configure the High Performance Interconnect hardware available on the specific nodes that were assigned for your computation, either Infiniband or Omni-Path. See specific subsection in Run MPI On Grid'5000.

  • Customize the software environment, add new packages, deploy specific images,
  • Make use of GPU acceleration,
  • Learn tools for debugging, benchmarking and monitoring

Guix for HPC

Table of content

This short tutorial summarizes the steps to install Guix on a Linux distribution using systemd as an init system and the additional steps that make it suitable to use in a HPC context.

Install Guix using the provided script.

Download the script and run it as the superuser.

  cd /tmp
  wget https://guix.gnu.org/install.sh -O guix-install.sh
  chmod +x guix-install.sh
  sudo ./guix-install.sh

You can safely answer yes to all the questions asked by the script.

Tip

If you wish to do the installation manually, the steps are provided in the documentation.

Configure additional Guix channels

Per-user channel configuration in Guix is defined in the file channels.scm, located in $HOME/.config/guix.

The Guix-Science channel contains scientific software that is too specific to be included in Guix.

The Guix-HPC channel contains more HPC-centered software and the ROCm stack definition.

Both these channels have a non-free counterpart containing package definition of proprietary software (e.g. CUDA toolkit) and free software which depends on proprietary software (e.g. packages with CUDA support).

Since the Guix-HPC non-free channel depends on all the above mentioned channels, it can be a good starting point, provided that you don't mind having access to non-free software.

In this case, the following channels.scm file could be used:

  (append
    (list
      (channel
        (name 'guix-hpc-non-free)
        (url "https://gitlab.inria.fr/guix-hpc/guix-hpc-non-free.git")))
    %default-channels)

Tip

The content of the channels.scm file is Scheme code (it is actually a list of channel objects). The %default-channels variable is a list containing the Guix channel and should be used as a base to generate a list of channels.

If you'd like to have both the Guix-HPC and Guix-Science channels without any proprietary software definition, you could use the following channels.scm file:

  (append
   (list (channel
          (name 'guix-science)
          (url "https://codeberg.org/guix-science/guix-science.git")
          (branch "master")
          (introduction
           (make-channel-introduction
              "b1fe5aaff3ab48e798a4cce02f0212bc91f423dc"
              (openpgp-fingerprint
               "CA4F 8CF4 37D7 478F DA05  5FD4 4213 7701 1A37 8446"))))
         (channel
          (name 'guix-hpc)
          (url "https://gitlab.inria.fr/guix-hpc/guix-hpc.git")))
   %default-channels)

Get the latest version of the package definitions

In a shell, launch the following command:

  $ guix pull

This will take some time as this command updates the available channels and builds up the package definitions.

Add the Guix HPC substitute server

In order to avoid building the packages defined in the Guix HPC channels, it is possible to configure the guix-daemon to connect to Guix HPC substitute server which serves precompiled binaries of the software packaged in the Guix HPC channels and is located at https://guix.bordeaux.inria.fr.

This requires two steps: modifying the guix-daemon configuration and adding the new substitute server key to Guix.

Configure the guix-daemon

If you are using Guix System, please refer to the official documentation is available here.

The following instructions apply when Guix is installed on a foreign distribution using systemd.

In order to add a new substitute server, the guix-daemon must be specified the full list of substitute servers, through the --substitute-urls switch. In our case the full list is 'https://guix.bordeaux.inria.fr https://ci.guix.gnu.org https://bordeaux.guix.gnu.org'.

The guix-daemon.service file (generally located in /etc/systemd/system or in /lib/systemd/system/) should be manually edited to add the above-mentioned flag:

  ExecStart=[...]/guix-daemon [...] --substitute-urls='https://guix.bordeaux.inria.fr https://ci.guix.gnu.org https://bordeaux.guix.gnu.org'

The guix-daemon service then needs to be restarted:

  # Reload the configuration.
  sudo systemctl daemon-reload
  # Restart the deamon.
  sudo systemctl restart guix-daemon.service

Authenticate the new substitute server

In order to accept substitutes from the Guix HPC substitute server, its key must be authorized:

  # Download the server key.
  wget https://guix.bordeaux.inria.fr/signing-key.pub
  # Add the key to Guix configuration.
  sudo guix archive --authorize < signing-key.pub
  # Optionally remove the key file.
  rm signing-key.pub

Check that everything is working properly

Run for instance the following command, which instantiates a dynamic environment containing the hello-mpi package defined in the Guix-HPC channel and runs it:

  guix shell hello-mpi -- hello-mpi

Tips and Tricks

Error with guix shell –container

Due to user namespaces set up, using guix shell with the --container or -C option may fail with an error like:

$ guix shell --container coreutils  
guix shell: error: clone: 2114060305: Invalid argument

User namespaces are crucial for achieving process and resource isolation and are indispensable for containerization. For security concern they are disabled by default on certain Debian and Ubuntu distributions, so that non-root users are not allowed to create or handle user namespaces, and the setting of the user.max_user_namespaces to 0 causes the guix shell --container to fail.

To enable the user namespaces temporarily run:

  sudo sysctl user.max_user_namespaces = 1024

For the change to be persistent after reboot:

  echo "user.max_user_namespaces = 1024" | sudo tee /etc/sysctl.d/local.conf
  sudo service procps force-reload
  sudo sysctl --system

In the above settings, the parameter is set to 1024. Note that any non-zero integer would be relevant. An alternative method for enabling the user namespaces, which is specific to Debian and Ubuntu distributions, is to set kernel.unprivileged_userns_clone=1.