Modern HPC Workflow with Containers
Software deployment in HPC systems is a complex problem, due to specific constrains, such as:
- No access to root
- No package install, update or modification as a user
- Some kernel features are disabled, like user namespaces
As users develop more complex software, their needs for extra dependencies increase. The classical solution to providing extra software to the user involves modules. Modules can be loaded from the terminal of a user, and are managed by the HPC admin team.
$ module avail -t | grep kokkos
kokkos/3.2.00
kokkos/3.2.00-cuda
kokkos/3.4.01-cuda
kokkos/3.4.01-cuda-c++17
kokkos/3.4.01-cuda-static
kokkos/3.4.01-cuda-static-c++17This is solution has some shortcomings:
- How to use software not provided by modules?
- How to deploy different versions of the package, or different variants?
- How to reproduce the software stack at a later point in time (even for archival purposes)
- How to move from one machine to another, given that the exposed modules are machine dependant?
- How to modify a package in the dependency chain?
Shift in the paradigm of software deployment
In order to solve the above mentioned issues and in the view of a future of Exascale computing, we propose a shift in the paradigm of software deployment, from the classical way, where the admin team provides the software stack for the users, to a new procedure where the user brings their own software stack.
This method has a number of advantages, among the following:
- The user is in full control of their software stack.
- A container is portable across different compute centers.
- The cost of moving to a new HPC system is reduced.
Singularity/Apptainer
Singularity is an application that can run containers in an HPC environment. It is highly optimized for the task, and has interoperability with Slurm, MPI or GPU specific drivers.
Usually, we find a duplicity of software stacks, and platforms to deploy to:
Containers (Singularity or Docker) solve this by having a single interface that merges everything. From the software stack, the container is the platform to deploy to. From the platform point of view, software comes bundled as a container:
Singularity uses its own container format (sif), which can also be
transparently generated from a Docker container.
Singularity is available in the majority of Tier-1 and Tier-0 HPC centers, either in the default environment or loaded from a module:
# On LUMI (European Tier-0 cluster)
$ singularity --version
singularity-ce version 4.1.3-150500.10.7
#
# On Jean-Zay (French Tier-1 cluster)
$ module load singularity
$ singularity --version
singularity version 3.8.5Singularity can download and run a container image directly from an online
container registry such as DockerHub using the
docker:// reference:
[lumi] $ singularity shell docker://ubuntu:latest
Singularity> grep VERSION= /etc/os-release
VERSION="24.04.1 LTS (Noble Numbat)This feature is not available in all clusters.
See also documentation about the Github Container Registry (GHCR) for setting up a Github hosted registry.
Using containers through Singularity can provide a solution to some of the points mentioned in the previous section, but also transfers to the user the task to build a container with the specific software stack they need.
Building a container can be streamlined using package managers.
In our approach, we selected two package managers to build the containers: Guix and Spack.
Differences between Guix and Spack
GNU Guix is a package manager for GNU/Linux systems. It is designed to give users more control over their general-purpose and specialized computing environments, and make these easier to reproduce over time and deploy to one or many devices. (source: Guix official website)
Spack is a package manager for supercomputers, Linux, and macOS. It makes installing scientific software easy. Spack isn’t tied to a particular language; you can build a software stack in Python or R, link to libraries written in C, C++, or Fortran, and easily swap compilers or target specific microarchitectures. (source: Spack official website)
A key feature of the Spack package manager is that it allows users to integrate parts of the system they are building on: Spack packages can use compilers or link against libraries provided by the host system. Use of system-provided software is even a requirement at the lowest level of the stack.
Guix differs from Spack in two fundamental ways: self containment, and support for reproducibility and provenance tracking. Self containment stems from the fact that Guix packages never rely on software pre-installed on the system; its packages express all their dependencies, thereby ensuring control over the software stack, wherever Guix deploys it. This is in stark contrast with Spack, where packages may depend on software pre-installed on the system.
Unlike Spack, Guix builds packages in isolated environments (containers), which guarantees independence from the host system and allows for reproducible builds. As a result, reproducible deployment with Guix means that the same software stack can be deployed on different machines and at different points in time—there are no surprises. Conversely, deployment with Spack depends on the state of the host system.
TODO: Comparative table of features
Building containers with Guix
Guix is a package manager for Linux focused on the reproducibility of its artifacts. Given a fixed set of package definitions (a list of channels at a specific commit in Guix terminology), Guix will produce the same binaries bit-by-bit, even after years between experiments.
The Guix project itself maintains a list of package definitions installed together with the package manager tool.
For some specific scientific packages, it might be necessary to include extra package definitions from third-party channels: a list of science-related channels can be found here.
Note that these channels contain only FOSS-licensed packages. In order to access to package definitions of proprietary software or of software that depend on non-free software, the following channels could be included:
The Guix package manager is able by itself to instantiate a containerized environment with a set of packages using the guix shell --container command.
Unfortunately, Guix is not yet available on Tier-1 and Tier-0 supercomputers, but it can be used to generate a Singularity image locally before deploying it on a supercomputer. This gives the user both the reproducibility properties of the Guix package manager and the portability of Singularity containers.
To get started, install Guix or connect to a machine with a Guix installation (Grid5000 for example).
Guix generates Singularity images with the guix pack -f squashfs command, followed by a list of packages. For example, the following command would generate a Singularity image containing the bash and gcc-toolchain packages:
$ guix pack -f squashfs bash gcc-toolchain
[...]
/gnu/store/xxxxxxxxxxxxxxxxxxxxxxxxx-bash-gcc-toolchain-squashfs-pack.gz.squashfsThe image can be configured with an entry point, allowing to directly start an arbitrary program when called with the run subcommand of Singularity. This is done using the --entry-point flag:
# Create an image containing bash and hello, an "hello world" program,
# that will be started by default.
$ guix pack -f squashfs --entry-point=/bin/hello bash hello
[...]
/gnu/store/xxxxxxxxxxxxxxxxxxxxxxxxx-bash-hello-squashfs-pack.gz.squashfsIn order to easily find the generated image, the -r flag creates a link to the
image (along with other actions):
# Create an image containing bash and hello, an "hello world" program,
# that will be started by default.
$ guix pack -f squashfs --entry-point=/bin/hello bash hello -r hello.sif
[...]
/gnu/store/xxxxxxxxxxxxxxxxxxxxxxxxx-bash-hello-squashfs-pack.gz.squashfs
$ ls -l
[...] hello.sif -> /gnu/store/xxxxxxxxxxxxxxxxxxxxxxxxx-bash-hello-squashfs-pack.gz.squashfsThe image can be then transfered to the target supercomputer and run using Singularity. Below is an example on LUMI:
$ scp hello.sif lumi:~
[...]
$ ssh lumi
[...]
[lumi] $ singularity run hello.sif
[...]
Hello, world!
[lumi] $ singularity shell hello.sif
[lumi] Singularity> command -v hello
/gnu/store/xxxxxxxxxxxxxxxxxxxx-profile/bin/helloInstead of specifying the list of packages on the command line, the packages can
be specified through a manifest file. This file can be written by hand or
generated using the command guix shell --export-manifest. Manifests are useful
when dealing with a long list of packages or package transformations. Since they
contain code, they can be used to perform a broad variety of modifications on
the package set such as defining package variants or new packages that are
needed in a specific context. The example below generates a simple
manifest.scm file containing the bash and hello packages:
$ guix shell --export-manifest bash hello > manifest.scmThis manifest file can be then used to generate the same Singularity image as above with the following command:
$ guix pack -f squashfs --entry-point=/bin/hello -m manifest.scmThe command guix describe -f channels generates a channels file that is used
to keep track of the current state of package definitions:
$ guix describe -f channels > channels.scmBoth files channels.scm and manifest.scm should be kept under version
control and are sufficient to generate an image containing the exact same
software stack down to the lib C, with the exact same version and compile
options, in any machine where the guix command is available, using the command
guix time-machine:
$ guix time-machine -C channels.scm -- pack -f squashfs --entry-point=/bin/hello -m manifest.scmNote that in order to generate the exact same file (bit-for-bit identical), the
same image specific options such as --entry-point have to be specified.
Building container images with Spack
Spack is a package manager specifically targeted at HPC systems. One of its selling points is that it can easily target specific features of the supercomputer, like compiler, CPU architecture, configuration, etc.
Unlike Guix, Spack can be installed directly on a supercomputer by the user, as
it only requires git clone in the home directory. There are some problems with
this:
- High usage of inodes and storage
- Reproducibility and portability of the environment across machines or time
Instead of using Spack directly on the supercomputer, it is possible to use Spack to generate Singularity or Docker containers. Once the container is generated, the same environment will be able to deployed to any machine.
To generate the container, Spack documents 2 ways:
- Generating a Dockerfile. This method has some downsides, so we will use the next one.
- Using a build cache
Going with the build cache approach, you need to:
- Install Spack
- Configure a build cache Many git providers have their integrated container registry, such as:
$ spack env create --dir ./myenv
==> Created independent environment in: /home/ubuntu/spack/myenv
==> Activate with: spack env activate ./myenv
$ spack env activate ./myenv
$ spack mirror add \
MY_MIRROR \ # name for spack
--oci-username <OCI_USERNAME> \ # username
--oci-password <OCI_PASSWORD> \ # api token
oci://<URL> # URL of the registry
$ spack add cmake
==> Adding cmake to environment /home/ubuntu/spack/myenv
$ spack install
...
$ spack buildcache push --base-image ubuntu:24.04 MY_MIRROR cmake
==> [16/16] Tagged cmake@3.30.5/twhtshf as registry.gitlab.inria.fr/numpex-pc5/wp3/spack-repo/buildcache-myenv:cmake-3.30.5-twhtshfoj7cxxomelmcic2oyv6tr2jts.spackImportant
The argument --base-image <image> must be passed, and should match the host
system used to build the packages.
Once Spack pushes the package, we can go into our supercomputer and run the container directly:
$ ssh lumi
[lumi] $ singularity shell docker://registry.gitlab.inria.fr/numpex-pc5/wp3/spack-repo/buildcache-myenv:cmake-3.30.5-twhtshfoj7cxxomelmcic2oyv6tr2jts.spack
Singularity> command -v cmake
/home/ubuntu/.spack/opt/linux-ubuntu24.04-icelake/gcc-13.2.0/cmake-3.30.5-twhtshfoj7cxxomelmcic2oyv6tr2jts/bin/cmakeHands-on examples
The examples below follow the methodology described in this document to deploy containers on supercomputers:
Advanced topics
Using custom packages
Both Spack and Guix expose command-line mechanisms for package customization.
Guix uses so-called package transformations.
Spack uses the specification mechanism.
CPU optimization
In order to generate a binary optimized for a specific CPU micro-architecture,
the --tune flag can be passed to a variety of Guix commands:
# Build a PETSc package optimized for Intel x86_64 Cascade Lake micro-architecture.
$ guix build --tune=cascadelake petsc
# Instantiate a containerized environment containing an optimized PETSc package.
$ guix shell --container --tune=cascadelake petsc
# Generate a manifest file where all the tunable packages are optimized.
$ guix shell --export-manifest --tune=cascadelake pkg1 pkg2 ... pkgNFor Spack, this can be done by adding the target specification on the command-line:
$ spack install petsc target=cascadelakeSpack also is capable of easily configuring the CFLAGS for a package:
$ spack install petsc cppflags=-O3MPI performance
The 3 aspects of concern when getting the best performance with MPI and Containers are:
- Container runtime performance: the slowdown caused by the container runtime having to translate between namespaces is not significant enough.
- Network drivers: as long as the containers are properly built, the drivers should discover the high-speed network stack properly.
- MPI distribution: the admin team might use custom compilation flags for their MPI distribution. It remains to be seen what’s the impact of this.
After many tests, we have concluded that Singularity doesn’t seem to pose an issue against performance. Although the benchmark figures don’t indicate any significant performance loss, the user is expected to compare the performance with their own software to run.
Composyx CG convergence - 40 nodes - 40 subdomains
OSU bandwidth benchmark
If the MPI drivers aren’t properly detected, the performance figures for
benchmarks will be orders of magnitude different, as this usually means falling
back to the TCP network stack instead of using the high-performance network. The
network driver for MPI is controlled with MCA parameters --mca key value.
Usually MPI detects the driver automatically, but you can force some driver with
--mca pml <name>, or to debug if MPI is selecting the proper driver. This is
further explained in Notes on MPI.
Regarding the actual MPI installation, a generic OpenMPI installation usually can get performance figures in the same order of magnitudes as the MPI installation provided by the admin team, provided the network driver is properly selected. If the user has the technical expertise, the MPI installation can be passed-through the container and replaced at runtime. More investigation around the viability of this method is to be done.
CUDA and ROCM stacks
-
Singularity allows passing through the graphics cards to the containers, with the
--nvand--rocmflags. -
Spack packages that may support CUDA, have the
+cudaspecification that can be enabled. Additionally, other packages support specifying the cuda architecture withcuda_arch=<arch>. ROCM support is also provided in selected packages through the+rocmspec. -
Guix provides CUDA packages through the Guix-HPC Non-free repository. This contains package variants with CUDA support. ROCM software stack and package variants are hosted in the regular Guix-HPC channel.
Application vs development containers
When building a container or a software environment, we usually make the distinction between “application” and “development” containers:
- If we have every dependency to build some package, except the package itself, it’s a development container.
- If it only contains the application itself, it’s an app container.
Because of this, there are 2 separate usecases for a container:
- Getting all the dependencies to iterate when developing a package.
- Deploying a final package into a supercomputer.
Alternatives
This workflow provides some flexibility on how to use the tools proposed. Other alternative ways are:
-
Using Spack natively:
- Useful for iterating a solution on a local machine.
- Installing Spack doesn’t require admin, so it can be tested on a supercomputer as well.
- Can run into the limit of inodes if used in a supercomputer.
-
Using Guix natively:
- Also useful for local testing.
- Guix is not available is supercomputers.
-
Using Singularity containers not built with Guix or Spack:
- Doesn’t have the guarantees of reproducibility or customizability, but still a good step towards isolation and portability.
-
Guix relocatable binaries:
- This is an alternative format of
guix pack, which produces a single file that can be run without Singularity. - Very good option for application deployment, but can be tricky to be setup as development solutions.
- This is an alternative format of
HPC centers support for Singularity
The following list describes the platform support for the supercomputers we have tested the workflow on, and any caveats encountered.
| Supercomputer | High-speed Network | CPU | GPU | Singularity support? |
|---|---|---|---|---|
| Jean-Zay | InfiniBand | Intel x86-64 | Nvidia (CUDA) | ✅* |
| Adastra | Cray | AMD x86-64 | AMD | ✅* |
| Irene | InfiniBand | Intel x86-64 | Nvidia P100 (CUDA) | ❌* |
| LUMI | Cray | AMD x86-64 | AMD MI250X (ROCM) | ✅ |
| Vega | InfiniBand | Intel x86-64 | Nvidia A100 (CUDA) | ✅ |
| Meluxina | InfiniBand | AMD x86-64 | Nvidia A100 (CUDA) | ✅ |
Jean-Zay
Containers must be placed in the “allowed directory” with idrcontmgr:
$ idrcontmgr cp image.sif
$ singularity shell $SINGULARITY_ALLOWED_DIR/image.sifIrene
Singularity is not supported. Instead, a Docker-compatible runtime pcocc-rs
is provided.
Guix images must be genrated with -f docker instead.
Adastra
The admin team has to verify each container image before use.
If quick deployment is required, itis also possible to use Guix relocatable binaries or a native spack installation. Guix can generate relocatable binaries with:
# Generate the pack, linking /bin
$ guix pack --relocatable -S /bin=bin <package>
...
/gnu/store/...-tarball-pack.tar.gz
# Move the pack to the Adastra and unpack it
$ scp /gnu/store/...-tarball-pack.tar.gz adastra:~/reloc.tar.gz
$ ssh adastra
[adastra] $ tar -xvf reloc.tar.gz
[adastra] $ ./bin/something
