Exa-DI Tutorials

Welcome, this is the new front page for Exa-DI Tutorials:


This website was generated on 2025-05-13 07:52:50 by Hugo v0.147.1


Subsections of Exa-DI Tutorials

NumPEx Software Catalog

This is a DRAFT DOCUMENT
Info

If you would like to add your own projects to this catalog, please open this GitHub badge and follow the instructions.

Gyselalib++

Gyselalib++ is a collection of C++ components for writing gyrokinetic semi-lagrangian codes.

Feel++

Feel++ is an Open-Source C++ library which allows to solve a large range of partial differential equations using Galerkin methods, e.g. finite element method, spectral element method, discontinuous Galerkin methods or reduced basis methods. Feel++ enables parallel computing in a seamless way and allows to solve large scale systems up to tens of thousands of cores.

BigDFT

BigDFT is a free software package, whose main program allows the total energy, charge density, and electronic structure of systems made of electrons and nuclei (molecules and periodic/crystalline solids) to be calculated within density functional theory (DFT), using pseudopotentials, and a wavelet basis.

Chameleon

Chameleon is a dense linear algebra solver relying on sequential task-based algorithms where sub-tasks of the overall algorithms are submitted to a run-time system. Such a system is a layer between the application and the hardware which handles the scheduling and the effective execution of tasks on the processing units. A run-time system such as StarPU is able to manage automatically data transfers between not shared memory area (CPUs-GPUs, distributed nodes).

PDI

PDI supports loose coupling of simulation codes with data handling the simulation code is annotated in a library-agnostic way, libraries are used from the specification tree.

Melissa

Melissa is a file-avoiding, adaptive, fault-tolerant and elastic framework, to run large-scale sensitivity analysis or deep-surrogate training on supercomputers.

FreeFEM++

FreeFEM is a popular 2D and 3D partial differential equations (PDE) solver. It allows you to easily implement your own physics modules using the provided FreeFEM language. FreeFEM offers a large list of finite elements, like the Lagrange, Taylor-Hood, etc., usable in the continuous and discontinuous Galerkin method framework.

StarPU

StarPU is a task-based runtime system for heterogeneous platforms coupling a performance modeling scheduler with a distributed shared-memory manager. It provides a framework for task scheduling on heterogeneous, accelerated platforms, together with an API for implementing various classes of scheduling algorithms. This scheduling framework jointly works with a distributed shared-memory manager to optimize task mappings and data transfers, and to overlap communications with computations.

Subsections of NumPEx Software Catalog

NumPEx Software Guidelines

This is a DRAFT DOCUMENT
Table of content
Attachments

The following guidelines have been defined by the NumPEx Transverse Working Group “Software Production & Integration”.

1. Packaging

Software should be packaged (Spack and/or Guix package formats). They should be published in public (community controlled) package repositories (Guix-science, etc.).

  • Packages exist
  • Packages are published in an easily usable repository
  • Packages installation is tested on supercomputers
  • Packages are available in community repositories
  • Either Guix or Spack packages are available
  • Both Guix and Spack packages are available

2. Minimal Validation Tests

Software should include minimal validation tests triggered through automated mechanism such as Guix. These tests should be automatic functional tests that do not require specific hardware.

  • Unit tests exist
  • CI exists
  • CI runs regularly (each new release)
  • CI runs regularly (each new commit in main branch)

3. Public Repository

A public repository, must be available for at least the development version of the software, allowing for pull requests to be submitted.

  • A repository where sources can be downloaded by anyone
  • A repository where anyone can submit a modification proposition (pull request)

4. Clearly-identified license

Sources published under a clearly-identified free software license (preferably with REUSE).

5. Minimal Documentation

Basic documentation should be publicly available to facilitate user understanding and usage of the software.

  • Documentation exists
  • It is easily discoverable and browsable online

6. Open Public Discussion Channel

An open, public discussion channel must be provided that is easily accessible to all potential users. The chosen platform must not require special permissions or memberships that could limit user participation.

  • A channel exist
  • Anyone can join the discussion channel, free of charge

7. Metadata

Each repository should include metadata easing integration and publicity on a software list.

8. API compatibility information

Each repository should include information enabling downstream users to know which versions they can use.

  • Any API addition or breakage should be documented
  • Semantic Versioning (https://semver.org) is used
  • A policy easing predictability of these aspects for future release is provided (Release schedule, support policy for previous releases)

9. Minimal Performance Tests

Software should include a minimal set of performance tests divided in three categories: single node without specific hardware, single node with specific hardware, multi-nodes. These tests should be automated as much as possible.

  • Tests exist
  • Scripts to automate launching the tests on a supercomputer and adaptable for another exist
  • Scripts using a tool easing portability to new HW exist
Info

In the context of NumPEx, the decision was made to shift from strict policies to more flexible guidelines to accommodate a broader range of software development stages, from research proofs of concept to mature software. This approach contrasts with the E4S policies, which were deemed too restrictive due to their high maturity requirements for integrated software. By adopting guidelines, NumPEx aims to provide recommendations and frameworks that support diverse development levels without imposing mandatory compliance. This flexibility allows for the integration of innovative and experimental software alongside well-established solutions, fostering a more inclusive and dynamic development environment.

Subsections of Webinars & Trainings

Spack Tutorial for Beginners

Table of content

Contact

Abstract

This second NUMPEX tutorial related to software packaging will specifically target beginners with the Spack package manager. The tutorial will introduce Spack installation, base commands, specifications , environments, … up to the very basic of software packaging. No prior experience with Spack is required.

The webinar will be organized as a hands-on session so users can directly experiment with Spack. We will provide accounts on Grid'5000 for all attendees. Users can also bring their own applications and try deployment on their preferred supercomputers. We will be available to assist you and answer questions via video and chat.

Note that this tutorial is part of the NUMPEX software integration strategy backed by the Exa-DI WP3 team. Our ambition is to have all NUMPEX-related libraries packaged with Guix and Spack, make Guix/Spack-based deployment part of every developer’s arsenal, and work with computing centers to make Guix/Spack-based user-level software deployment as frictionless as possible.

Resources

Recordings:

  • TBA

Slides:

Parallel IO and in situ analytics : High-performance data handling @ Exascale

Table of content

Abstract

Maison de la Simulation, together with the Exa-DoST and Exa-DI projects of NumPEx organize a free Parallel IO and in situ analytics: High-performance data handling @ Exascale training.

Over 3.5 days, from Tuesday, June the 4th 1PM to Friday, June the 7th, learn about the must-know tools for high-performance input-output on supercomputers and discover the latest tools for in situ data analytics. From ADIOS to PDI, meet the developers of the leading tools in their category as well as IO administrators of the top French national super-computing facility.

Context

The increase in computational power goes hand in hand with an increase in the amount of data to manage. At Exascale, IO can easily become the main performance bottleneck. Understanding parallel file system mechanisms and parallel IO libraries becomes critical to leverage the latest supercomputers.

With the increasing performance gap between compute and storage, even the best use of IO bandwidth might even not be enough. In this case, in situ analytics become a requirement to exploit Exascale at its peak.

Content

This course introduces the concepts, libraries and tools for IO and in situ data handling to make the best of the top available computing power and storage technologies:

- ADIOS,
- Deisa,
- HDF5,
- Lustre parallel file-system,
- NetCDF,
- PDI.

Resources

Slides:

Training material:

Organizers:

Guix and Spack for Application Deployment Across Supercomputers

Table of content
  • Day: Thursday, 6th of February, 2025
  • Time:
    • 10h-12h: General presentation
    • 13h-16h: Practice time
  • Speakers:
    • Fernando Ayats Llamas
    • Romain Garbage
    • Bruno Raffin (for the intro)

Abstract

The standard way to install a library or application on a supercomputer is to rely on modules to load most dependencies, then compile and install the remaining pieces of software manually. In many cases, this is a time-consuming process that needs to be revisited for each supercomputer. As part of NUMPEX software integration efforts, we advocate relying on advanced package managers to compile and deploy codes with their dependencies. Guix and Spack, the two package managers favored at NUMPEX, are backed by strong communities and have thousands of packages regularly tested on CI farms. They come with feature-rich CLIs enabling package tuning and transformations to produce customized binaries, including modules and containers. They carefully control the dependency graph for portability and reproducibility. However, most HPC users have not yet made package managers a central tool for controlling and deploying their software stack, and supercomputer centers usually do not install them for easy direct user-level access. The goal of this tutorial is to demystify their usage for HPC and show that they can indeed be used as a universal deployment solution. This tutorial will be split into two parts. The morning (10:00-12:00) will be dedicated to presentations and discussions. We will present how users can leverage Guix and Spack to deploy libraries (potentially with all necessary extra tools to provide a full-blown development environment) on different supercomputers (Jean-Zay, Adastra, CCRT), either directly if available on the machine or through containers generated with Spack and Guix, and discuss issues related to performance, portability, and reproducibility. Anyone who has dealt with HPC application installation should find useful material in this tutorial. No specific background in Spack or Guix is necessary, and this is not about learning Spack or Guix or how to package a given library (other dedicated tutorials will be scheduled later).

The afternoon will be organized as a hands-on session where interested users will experiment with Spack/Guix-based deployment. We will provide a base application and open accounts on Grid'5000 for all attendees (where Spack and Guix:s are installed), but users can also bring their own applications and try deployment on their preferred supercomputers. We will be available (via visio and chat) to assist you and answer questions throughout the afternoon.

Note that this tutorial is part of the NUMPEX software integration strategy backed by the Exa-DI WP3 team. Our ambition is to have all NUMPEX-related libraries packaged with Guix and Spack, make Guix/Spack-based deployment part of every developer’s arsenal, and work with computing centers to make Guix/Spack-based user-level software deployment as frictionless as possible.

Resources

Recordings:

Slides:

Written guides:

Contact:

Subsections of Application Specific Setup

Bench-in-situ with Guix

This is a DRAFT DOCUMENT
Table of content

This approach creates a guix environment with the build dependencies of the benchmarks, like gcc or openmpi. The environment is defined in the following manifest:

;; manifest.scm
(use-modules (guix build-system gnu)
             (ice-9 match))

(define stdenv
  (map (lambda* (pkg)
         (match pkg
           ((_ value _ ...)
            value)))
    (standard-packages)))

(concatenate-manifests
  (list
    (specifications->manifest
      (list
        "bash"
        "gcc-toolchain"

        "cmake"
        "openmpi@4"
        "pdi"
        "pdiplugin-mpi"
        "pdiplugin-user-code"
        "pdiplugin-decl-hdf5-parallel"
        "kokkos"
        "paraconf"
        "pkg-config"
        "libyaml"))
    (packages->manifest stdenv)))

Building the pack

On a machine with guix, and the guix-hpc channel, the following commands produces the pack.sqsh singularity image.

$ guix pack \
    --format=squashfs \
    -r pack.sqsh \
    -S /bin=bin \
    -S /etc=etc \
    -m ./manifest.scm

The singularity image can be copied to the target server with scp.

Loading the image

Jean-Zay

This machine requires you to load the image into a safe directory first, and to load a compute node:

$ idrcontmgr cp pack.sqsh
$ srun \
    --pty \
    --ntasks=2 \
    --mpi=pmix_v3 \
    --hint=nomultithread \
    bash

Getting a shell within the singularity only requires:

$ singularity \
    shell \
    $SINGULARITY_ALLOWED_DIR/pack.sqsh

Building and running

$ git clone https://github.com/Maison-de-la-Simulation/bench-in-situ
$ mkdir build && cd build
$ cmake -DSESSION=MPI_SESSION -DKokkos_ENABLE_OPENMP=ON -DEuler_ENABLE_PDI=ON ..
$ make
$ ./main ../setup.ini ../io_chkpt.yml
--------------------------------------------------------------------------
WARNING: There was an error initializing an OpenFabrics device.

  Local host:   r1i3n1
  Local device: hfi1_1
--------------------------------------------------------------------------
main: error: Invalid user for SlurmUser slurm, ignored
main: fatal: Unable to process configuration file

AVBP with Guix

Table of content

This document describes how to deploy the AVBP software using Guix, whether it is available or not on the target machine.

Requisites

The AVBP packages are available in the Guix-HPC-non-free channel.

See https://gitlab.inria.fr/guix-hpc/guix-hpc-non-free and https://guix.gnu.org/manual/en/html_node/Specifying-Additional-Channels.html for instructions regarding how to configure Guix to use software from this channel.

Running AVBP with Guix locally

AVBP can be installed using the avbp Guix package.

In order to build this package, the following environment variables need to be set:

  • AVBP_GIT_REPO: path to your local clone of the git repository containing the source code of AVBP
  • AVBP_LIBSUP: path to the local folder containing the AVBP license file

The following commands instantiate a containerized environment in which a simulation is run:

  # Go to the folder containing your simulation.
  cd /path/to/simulation
  # Either export the required environment variables...
  export AVBP_GIT_REPO=... AVBP_LIBSUP=...
  # ...and run the guix command
  guix shell --container avbp coreutils openmpi@4 openssh
  # Or set the environment variables on the command line
  AVBP_GIT_REPO=... AVBP_LIBSUP=... guix shell --container avbp coreutils openmpi@4 openssh
  # Run AVBP from the folder containing the run.params file.
  cd RUN && avbp
  # Alternatively, start a parallel simulation using Open MPI
  cd RUN && mpirun -np 12 avbp

Notes:

  • in order to run AVBP from a containerized environment, the coreutils, openmpi@4 and openssh packages have to be explicitly selected (openssh being required by Open MPI).
  • in order to run a simulation, the root directory of the simulation must be accessible. This won't be the case if the containerized shell is started from the RUN subdirectory. An alternative command allowing to directly start a simulation from within the RUN folder could be:

      AVBP_GIT_REPO=... AVBP_LIBSUP=... guix shell --container avbp coreutils openmpi@4 openssh --share=/path/to/simulation -- mpirun -np 12 avbp

Running AVBP on supercomputers

At the time of writing, Guix is not natively available on the national supercomputers.

In order to use AVBP on national supercomputers, Guix provides the guix pack command, which allows to build an archive containing the full software stack required to run AVBP.

This archive can be then deployed and run on the supercomputer.

So far, the techniques that have been tested are:

  • Relocatable binaries on Adastra and Jean-zay (see the Example procedure on Adastra below which can be adapted to Jean-Zay)
  • Singularity on Jean-Zay (see the Example procedure on Jean-Zay below)

Note: the following procedures use SLURM's srun command to start a simulation (both in interactive or batch mode). SLURM's srun command communicates directly with Open MPI using the library selected with the --mpi switch (see Open MPI documentation). When using Open MPI 4.x (the current default version in Guix), this option has to be set to --mpi=pmi2 for proper communication with SLURM.

Example procedure on Adastra (relocatable binaries)

On a machine with Guix installed

The following commands:

  • create an archive that contains the avbp package,
  • copy the archive on the supercomputer
  # On the local machine, create the archive...
  AVBP_GIT_REPO=... AVBP_LIBSUP=... guix pack -R -S /bin=bin -C zstd avbp
  [...]
  /gnu/store/xxxxxxxxxxxxxxx-avbp-tarball-pack.tar.zst
  # ...then copy it to Adastra
  scp /gnu/store/xxxxxxxxxxxxxxx-avbp-tarball-pack.tar.zst user@adastra.cines.fr:/path/to/$CCFRWORK/avbp-pack.tar.zst

On Adastra

The following commands:

  • unpack the archive in the $CCFRWORK directory
  • set the required environment variables
  • start a simulation
  # Uncompress the archive in the $CCFRWORK space.
  cd $CCFRWORK && mkdir avbp-pack && zstd -d avbp-pack.tar.zst && tar xf avbp-pack.tar -C avbp-pack
  # Make sure no external library is loaded from the host machine
  unset LD_LIBRARY_PATH
  # This is needed by Slingshot when starting many MPI processes (hybrid mode gets message queue overflow).
  export FI_CXI_RX_MATCH_MODE=software
  # This is needed to run on a full node (192 cores) due to multiple PML being selected when not set.
  # This PML uses libfabric for Slingshot support.
  export OPMI_MCA_pml=cm
  # Start an interactive job from the folder containing the run.params file
  cd /path/to/simulation/run && srun -A user \
                                     --time=0:20:00 \
                                     --constraint=GENOA \
                                     --nodes=10 \
                                     --ntasks-per-node=192 \
                                     --cpus-per-task=1 \
                                     --threads-per-core=1 \
                                     --mpi=pmi2 \
                                     $CCFRWORK/avbp-pack/bin/avbp

An example sbatch script can be found below:

  #!/bin/bash
  #SBATCH -A user
  #SBATCH --constraint=GENOA
  #SBATCH --time=03:00:00
  #SBATCH --nodes=10
  #SBATCH --ntasks-per-node=192
  #SBATCH --cpus-per-task=1
  #SBATCH --threads-per-core=1

  # Make sure no external library is loaded from the host machine.
  unset LD_LIBRARY_PATH

  cd /path/to/simulation/run

  # Enforce the use of PMI2 to communicate with Open MPI 4, default Open MPI version in Guix.
  srun --mpi=pmi2 $CCFRWORK/avbp-pack/bin/avbp

Caveats

  • Interconnection errors when starting too many MPI processes on Adastra

Exemple usage on Jean-Zay with Singularity

On a machine with Guix installed

The following commands:

  • create an archive that contains the avbp, coreutils and bash packages (the last one being a Singularity requirement),
  • copy the archive on the supercomputer
  # On the local machine, create the archive...
  AVBP_GIT_REPO=... AVBP_LIBSUP=... guix pack -f squashfs -S /bin=bin --entry-point=/bin/bash avbp coreutils bash
  [...]
  /gnu/store/xxxxxxxxxxxxxxx-avbp-coreutils-bash-squashfs-pack.gz.squashfs
  # ...then copy it to Jean-Zay
  scp /gnu/store/xxxxxxxxxxxxxxx-avbp-coreutils-bash-squashfs-pack.gz.squashfs user@jean-zay.idris.fr:/path/to/$WORK/avbp.sif

Note: the coreutils package is required when running AVBP in a containerized environment.

On Jean-Zay

The Singularity image has to be copied to an authorized folder according to Jean-Zay documentation:

  # Make the image available to Singularity
  idrcontmgr cp $WORK/avbp.sif

The following commands starts a simulation in interactive mode:

  # Load the Singularity environment
  module load singularity
  # Clean the environment variable
  unset LD_LIBRARY_PATH
  # Run the simulation on one full node
  srun -A user@cpu \
       --nodes=1 \
       --ntasks-per-node=40 \
       --cpus-per-task=1 \
       --time=01:00:00 \
       --hint=nomultithread \
       --mpi=pmi2 \
       singularity exec \
                   --bind $WORK:/work \
                   $SINGULARITY_ALLOWED_DIR/avbp-bash.sif \
                   bash -c 'cd /work/path/to/simulation/run && avbp'

Below is a sample sbatch script:

  #!/bin/bash
  #SBATCH -A user@cpu
  #SBATCH --job-name=avbp
  #SBATCH --nodes=1
  #SBATCH --ntasks-per-node=40
  #SBATCH --cpus-per-task=1
  #SBATCH --time=01:00:00
  #SBATCH --hint=nomultithread

  module purge
  module load singularity

  unset LD_LIBRARY_PATH
  srun --mpi=pmi2 singularity exec --bind $WORK:/work $SINGULARITY_ALLOWED_DIR/avbp.sif /bin/bash -c 'cd /work/path/to/simulation/run && avbp'

Caveats

  • Singularity doesn't seem to honour the -W flag which sets the workdir. This requires using bash -c with mulitple commands.
  • The $WORK space doesn't seem to be accessible from within the container: the ~--bind $WORK:/work~ option makes it accessible through the /work path.
  • Open MPI parameters need to be tweaked when running on multiple nodes and multiple cores at the same time on Jean-Zay.
  • Open MPI 5.x is not working at the time of writing on Jean-Zay.

Example procedure on Irene with PCOCC

PCOCC can import Docker images generated by Guix.

On a machine with Guix installed

The following commands:

  • create an archive that contains the avbp, coreutils and bash packages,
  • copy the archive on the supercomputer
  # On the local machine, create the archive...
  AVBP_GIT_REPO=... AVBP_LIBSUP=... guix pack -f docker bash coreutils avbp
  # ...then copy it to Irene
  scp /gnu/store/xxxxxxxxxxxxxxx-bash-coreutils-avbp-docker-pack.tar.gz \
      user@irene-fr.ccc.cea.fr:/path/to/$CCFRWORK/avbp.tar.gz

On Irene

The Docker image has to be imported using PCOCC (see TGCC documentation for more details):

  pcocc-rs image import docker-archive:$CCFRWORK/avbp.tar.gz avbp

The following commands start a simulation in interactive mode:

  cd /path/to/RUN
  ccc_mprun -p rome \
            -N 2 \
            -n 256 \
            -c 1 \
            -E '--mpi=pmi2' \
            -m work,scratch \
            -A project_id \
            -T 600 \
            -C avbp -- avbp

General notes related to HPC

  • Open MPI 4.x uses PMI2 to communicate with SLURM. This requires launching AVBP using srun --mpi=pmi2.
  • The LD_LIBRAY_PATH environment variable often gets in the way, causing the execution to fail, hence the unset.

Running a test suite

The avbp-tests package provides a script running a subset of the AVBP test suite with a single MPI process.

In order to build this package, an additional environment variable has to be defined:

  • AVBP_TEST_SUITE: path to the folder containing the AVBP test suite (it can be the local clone of the testcases repository).

The following command builds the package and runs the test suite:

  AVBP_GIT_REPO=... AVBP_LIBSUP=... AVBP_TEST_SUITE=... guix build avbp-tests

It is also possible to build the avbp-tests package without actually running the tests. This is useful if you want to run the tests manually and have a look at the output files. This can be achieved using the --without-tests flag:

  AVBP_GIT_REPO=... AVBP_LIBSUP=... AVBP_TEST_SUITE=... guix build --without-tests=avbp-tests avbp-tests

If you want to run a subset of the standard test cases, simply copy them to some directory on your system, set AVBP_TEST_SUITE to point there and (re)build the avbp-tests package.

AVBP development environment

In order to instantiate a development environment for AVBP, the AVBP_LIBSUP variable has to be set.

On a machine using Guix

The following command instantiates a containerized development environment for AVBP:

  cd /path/to/avbp/source
  AVBP_LIBSUP=... guix shell --container --development avbp --expose=/path/to/avbp/license

Notes:

  • you might want to instantiate a containerized environment from the top level directory of AVBP sources so you can actually perform the build
  • you probably want to expose the path to the AVBP license inside the container, this is done with the --expose flag
  • you might want to add other packages to the development environment, for example grep, coreutils or a text editor, simply add them to the command-line ; see the documentation.

You can also store a list of packages for a development environment in a Manifest file, track it under version control (in your branch/fork of the AVBP source code for example) and use it later:

  guix shell --export-manifest package1 package2 ... --development avbp > avbp-development-environment.scm
  # [...]
  export AVBP_LIBSUP=...
  guix shell --container \
             --manifest=avbp-development-environment.scm \
             --expose=/path/to/avbp/license

Using Singularity

Generate the Singularity image

A development environment can be generated with guix pack by providing a Manifest file (see above):

  AVBP_LIBSUP=... guix pack -f squashfs --entry-point=/bin/bash -m avbp-development-environment.scm
  [...]
  /gnu/store/...-pack.gz.squashfs

Deploy the image (example on Jean-Zay)

The generated image has to be then copied to the remote machine and launched using Singularity.

Below is an example on how to deploy the image on Jean-Zay:

  # On the local machine: copy the image to Jean-Zay.
  scp /gnu/store/...-pack.gz.squashfs jean-zay.idris.fr:/path/to/$WORK/avbp-development-environment.sif
  # On Jean-Zay: copy the image to the authorized directory...
  idrcontmgr cp $WORK/avbp-development-environment.sif
  # ... load the Singularity module ...
  module load singularity
  # ... and launch the container (in this example a full node is allocated).
  srun \
    -A user@cpu \
    --time=02:00:00 \
    --exclusive \
    --node=1 \
    --ntasks-per-node=1 \
    --cpus-per-task=40 \
    --pty \
    --hint=nomultithread \
    singularity shell \
      --bind $WORK:/work \
      $SINGULARITY_ALLOWED_DIR/avbp-development-environment.sif

Notes:

  • The --pty flag sets pseudo terminal mode in order to properly handle interactive shell mode.
  • When not specifying --cups-per-task, only a single core is associated to the shell task.

Gysela Development Environment with Guix

Table of content

This tutorial covers how to use Guix to get a development environment for Gysela with Guix. The core item to the tutorial is a Guix pack, a bundle containing all the software. The tutorial is then divided in 2 sections:

  • Fetching a pre-built Guix pack
  • Generating a Guix pack yourself
  • Using the Guix pack as a development environment

Downloading the pack

The easiest way to get started is by downloading the Guix pack that is built by our continuous integration system. It is a Singularity/Apptainer image and is available from https://guix.bordeaux.inria.fr/search/latest/archive?query=spec:images+status:success+gyselalibxx-squashfs.

Once downloaded, make sure to rename to file so that it has a .sif extension. See Using the Guix pack as a development environment for information on how to use that image.

Generating the pack

Requisites

You need to have Guix available, either locally or remotely. If you're running a Linux environment, it can be installed on your machine according to these instructions.

  guix pack -f squashfs -S /bin=bin -S /lib=lib --entry-point=/bin/bash -m manifest-gyselalibxx-dev-env.scm -r gysela-dev-env.sif

  # Send the pack from your computer to jean-zay
  scp ./gysela-dev-env.sif jean-zay.idris.fr:gysela-dev-env.sif

Jean-Zay requires that the image is moved into the $WORK directory, and then copied into the "authorized" directory:

  # In jean-zay
  mv -v gysela-dev-env.sif $WORK/gysela-dev-env.sif

  idrcontmgr cp $WORK/gysela-dev-env.sif
  # Now the container is copied into $SINGULARITY_ALLOWED_DIR/gysela-dev-env.sif

Using the Guix pack as a development environment

This step requires having the pack on the target machine. We have prepackaged an image at $WORK/../commun, which you can load into the authorized directory:

  idrcontmgr cp /gpfswork/rech/jpz/commun/gysela-dev-env.sif
  module load singularity

Singularity only works on compute nodes, so ask SLURM to give you an interactive session:

  # Launch the container. Here we ask for 2 hours and 10 CPU cores.
  srun \
    -A user@v100 \
    --time=02:00:00 \
    --ntasks=1 \
    --cpus-per-task=10 \
    --pty \
    --hint=nomultithread \
    -l \
    singularity shell \
      --bind $WORK:/work \
      $SINGULARITY_ALLOWED_DIR/gysela-dev-env.sif

You can now follow the regular Gysela development workflow:

  # clone gysela
  git clone --recurse-submodules git@gitlab.maisondelasimulation.fr:gysela-developpers/gyselalibxx.git gyselalibxx
  cd gyselalibxx

  # build
  mkdir build
  cd build
  cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS="-Wall -Wno-sign-compare" ..
  make -j$(nproc)

  # test
  ctest --output-on-failure

Gysela with Guix

Table of content

Requisites

You need to have Guix available, either locally or remotely. If you're running a Linux environment, it can be installed on your machine according to these instructions.

Where to find the Gysela packages

The Guix HPC channel contains the CPU version of the Gysela package, while the Guix HPC non-free channel contains the CUDA version of the Gysela package.

After activating the required channel(s), you should be able to access the gyselalibxx package entry from the available packages using the following command:

  $ guix show gyselalibxx
  name: gyselalibxx
  version: 0.1-1.a3be632
  outputs:
  + out: everything
  systems: x86_64-linux
  dependencies: eigen@3.4.0 fftw@3.3.10 fftwf@3.3.10 ginkgo@1.7.0
  + googletest@1.12.1 hdf5@1.10.9 kokkos@4.1.00 libyaml@0.2.5 mdspan@0.6.0
  + openblas@0.3.20 openmpi@4.1.6 paraconf@1.0.0 pdi@1.6.0
  + pdiplugin-decl-hdf5-parallel@1.6.0 pdiplugin-mpi@1.6.0
  + pdiplugin-set-value@1.6.0 pkg-config@0.29.2 python-dask@2023.7.0
  + python-h5py@3.8.0 python-matplotlib@3.8.2 python-numpy@1.23.2
  + python-pyyaml@6.0 python-scipy@1.12.0 python-sympy@1.11.1
  + python-xarray@2023.12.0 python@3.10.7
  location: guix-hpc/packages/gysela.scm:42:4
  homepage: https://gyselax.github.io/
  license: FreeBSD
  synopsis: Collection of C++ components for writing gyrokinetic semi-lagrangian codes
  description: Gyselalib++ is a collection of C++ components for writing
  + gyrokinetic semi-lagrangian codes and similar as well as a collection of such
  + codes.

Presentation of the different Gysela packages

There are different variants of the Gysela package. The default package is called gyselalibxx and can only perform CPU calculations without threading.

GPU support for CUDA based architectures require the Guix HPC non-free channel to be activated. CUDA variants are optimised for a specific GPU micro-architecture and are not backward not forward compatible. The following architectures possess a CUDA variant of the Gysela package: K40M, P100, V100 and A100. The corresponding packages are gyselalibxx-cuda-k40, gyselalibxx-cuda-p100, gyselalibxx-cuda-v100 and gyselalibxx-cuda-a100.

A word on guix shell

In this tutorial, we rely on the guix shell subcommand with the option --pure in order to setup a controlled environment from where to launch a simulation. This command unsets various environment variables but this behaviour can be controlled with the --preserve flag (this can be used when modifying LD_PRELOAD, when using CUDA packages for example).

The basic syntax is guix shell --pure package1 package2 package3, where package1, package2 and package3 are the (only) packages which will be accessible from the environment. A specific version of a package can be specified with the @ syntax. guix shell --pure openmpi@4 will drop you in a shell with the latest packaged version of the 4.x openmpi package.

By default, this command will drop you in a shell from where you can manually launch your software or modify your environment. If you wan to run a single command, you can do it like that: guix shel --pure package1 -- my_command, where my_command is a command provided by package1.

More information on guix shell can be found here, especially the --container option which is also of interest when attempting to control an execution environment.

Notes on the SLURM scheduling system

In order to use SLURM, the slurm package should be part of our environment and thus part of the list of packages passed as arguments to the guix shell command: guix shell --pure slurm package1 ....

When using SLURM with Guix, we should ensure that the major version of the SLURM package we have in our environment is the same as the one which is running on the cluster.

From the frontend, we can check the SLURM version with:

  $ squeue --version
  slurm 23.11.1

At the time of writing (2024/03/25), the default SLURM version in Guix is 23.02.6. This can be verified with a command such as:

  $ guix shell slurm -- squeue --version
  slurm 23.02.6

In our example above, the major version is 23 in both cases, so nothing needs to be done. If the SLURM package installed on the cluster was at, say, version 22.x, we would have to add slurm@22 to our list of packages.

Running Gysela on a machine where Guix is available

In this chapter, we'll focus on running a simulation that is compiled as a binary and part of the Gysela package.

The first step will be to generate a configuration file for our simulation by providing the --dump-config flag (see section below).

Running precompiled binaries

From the terminal on the current machine

This command will generate a file named config.yaml containing the configuration needed to run the simulation:

  $ guix shell --pure gyselalibxx -- sheath_xperiod_vx --dump-config config.yaml

The simulation can then be run with the following command:

  $ guix shell --pure gyselalibxx -- sheath_xperiod_vx config.yaml

Using SLURM (interactive mode)

In order to use SLURM, we have to add slurm to our package list.

CPU package

At the time of writing, gyselalibxx doesn't support distributed computations using MPI. The following simulation runs on a single node.

  $ guix shell --pure gyselalibxx slurm -- srun -N 1 --exclusive sheath_xperiod_vx config.yaml
GPU (CUDA) package

As stated above, the Gysela packages with CUDA support are built for a specific micro-architecture platform. The example below uses the variant targeted to the A100 micro-architecture, gyselalibxx-cusa-a100.

Due to libcuda.so being tightly coupled to the kernel driver and its location not being standard, the CUDA variants uses LD_PRELOAD to set the path libcuda.so . One way to setup LD_PRELOAD is to use the env command provided by the coreutils package.

  $ guix shell --pure gyselalibxx-cuda-a100 slurm coreutils -- srun -N 1 -C a100 --exclusive env LD_PRELOAD=/usr/lib64/libcuda.so sheath_xperiod_vx config.yaml

Alternatively, if you don't want to include coreutils in your execution environment, you can set LD_PRELOAD on the command line and preserve it with the --preserve flag:

  $ LD_PRELOAD=/usr/lib64/libcuda.so guix shell --pure --preserve=^LD_PRELOAD gyselalibxx slurm -- srun -N 1  -C a100 --exclusive sheath_xperiod_vx config.yaml
  ERROR: ld.so: object '/usr/lib64/libcuda.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
  ERROR: ld.so: object '/usr/lib64/libcuda.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
  srun: job XXXX queued and waiting for resources
  [...]

Note the error due to the absence of libcuda.so on the frontend machine, which can be safely ignored.

Using OAR

Building your own version of Gysela from source

Transformations of the Guix package

Using your personal source tree

Running Gysela on a machine where Guix is not available

If Guix is not available, it can still be used to generate an execution environment that will be deployed with another tool.

Using Singularity

In these examples, we will target the Jean Zay cluster which supports custom Singularity images and uses SLURM as scheduling system.

The method will consist in three steps:

  • create an image locally using Guix
  • load the image on the cluster
  • run the simulation

Gysela CPU variant

Gysela GPU variant (CUDA)

We first create locally an image compatible with Singularity:

  $ guix pack -f squashfs bash coreutils gyselalibxx-cuda-v100 slurm  -S /bin=bin --entry-point=/bin/bash
  [...]
  /gnu/store/xxxxxxxxxxxxxx-bash-coreutils-gyselalibxx-cuda-v100-squashfs-pack.gz.squashfs

We then copy it into the $WORK folder on the remote machine and make it accessible for singularity:

   # Upload the image to the $WORK folder on Jean Zay with a .sif extension,
   # this is needed for the image to be accessible.
   [local-machine] $ scp /gnu/store/xxxxxxxxx-bash-coreutils-gyselalibxx-cuda-v100-squashfs-pack.gz.squashfs \
                   user@jean-zay:/path/to/work/folder/image.sif
   [...]
   [local-machine] $ ssh user@jean-zay
   # Make the image accessible to the Singularity runtime.
   [jean-zay] $ idrcontmgr cp /path/to/work/folder/image.sif

Finally, we run the simulation from the container, asking for one node with one GPU:

  # Activate the Singularity runtime.
  [jean-zay] $ module load singularity
  # Create the config file from the container. It will reside in $HOME, which
  # is automatically bound to the container's $HOME.
  [jean-zay] $ srun -A user@v100 --ntasks=1 --gres=gpu:1 --cpus-per-task=1 --hint=nomultithread -l singularity exec --nv $SINGULARITY_ALLOWED_DIR/image.sif env LD_PRELOAD=/.singularity.d/libs/libcuda.so sheath_xperiod_vx --dump-config config.yaml
  # Run the simulation. The results of the simulation can be found in the $HOME
  # folder.
  [jean-zay] $ srun -A user@v100 --ntasks=1 --gres=gpu:1 --cpus-per-task=1 --hint=nomultithread -l singularity exec --nv $SINGULARITY_ALLOWED_DIR/image.sif env LD_PRELOAD=/.singularity.d/libs/libcuda.so sheath_xperiod_vx config.yaml

A few notes on the command line options for singularity: according to Jean Zay documentation, the --nv flag is required to access the GPU hardware ; the LD_PRELOAD=/.singularity.d/libs/libcuda.so allows to access the host bound libcuda.so by Jean-Zay administrators.

Using relocatable binaries

When using relocatable binaries, it is highly preferable to use non-interactive (batch) mode with SLURM as various commands are needed to setup the environment on the computation node.

Gysela GPU variant (CUDA)

We first locally create a tarball using guix pack:

  # This command exports the path to the generated tarball in the store.
  # guix pack could be as well called directly and the path manually copied from the standard output.
  $ export RR_TARBALL=$(guix pack -R gyselalibxx-cuda-v100 slurm -S /bin=bin -S /etc=etc -S /lib=lib | tail -n 1)

We then copy it into the $WORK folder on the remote machine, unpack it in a subfolder (this subfolder will contain all the Guix filesystem hierarchy) and setup the environment:

  # Upload the tarball to the $WORK folder on Jean Zay.
  [local-machine] $ scp $RR_TARBALL \
                  user@jean-zay:/path/to/work/folder/tarball.tar.gz
  [...]
  [local-machine] $ ssh user@jean-zay
  # Unpack the tarball in a subfolder of the $WORK directory
  [jean-zay] $ mkdir $WORK/guix && tar xf $WORK/tarball.tar.gz -C $WORK/guix
  # Load the environment from the subfolder.
  [jean-zay] $ export GUIX_PROFILE=$WORK/guix && source $GUIX_PROFILE/etc/profile

Finally, we create a batch file and use it to run the simulation, asking for one node with one GPU:

  # Create the batch file in the $WORK folder.
  [jean-zay] cd $WORK
  [jean-zay] cat > gysela-run.sh <<EOF
  #!/bin/bash
  #SBATCH -A user@v100
  #SBATCH --job-name=gysela-run
  #SBATCH --ntasks=1
  #SBATCH --cpus-per-task=1
  #SBATCH --time=0:10:00
  #SBATCH --gres=gpu:1
  #SBATCH --hint=nomultithread

  # Setup the environment on the node
  export GUIX_PROFILE=$WORK/guix
  source \$GUIX_PROFILE/etc/profile
  export LD_PRELOAD=/usr/lib64/libcuda.so

  # Ensure we are in the \$WORK folder
  cd $WORK
  # Generate the config file
  sheath_xperiod_vx --dump-config config.yaml
  # Launch the simulation
  sheath_xperiod_vx config.yaml
  EOF
  # Run the simulation. The results of the simulation can be found in
  # the current folder ($WORK/experiment).
  [jean-zay] $ sbatch gysela-run.sh

A few notes:

  • In this paragraph, we started our simulation using the SLURM binaries provided by Guix. This is transparently done by prepending the PATH variable when sourcing the $GUIX_PROFILE/etc/profile file. It is also possible to use the SLURM binaries provided by the cluster administrator, in which case the slurm package needs to be removed from the arguments of guix pack.
  • When generating the tarball using guix pack the -S flag is used to setup different symbolic links. While the bin symplink is not needed (the binaries are accessed through the PATH ), the etc and lib symlinks are needed, as they allow access to the etc/profile and the required PDI plugins, respectively.

Spawning a development environment with Guix

As the software has a package definition in Guix, it is straightforward to create a development environment using the -D flag of the guix shell command.

Note that at the time of writing, the Guix package specifies a particular commit of the git repository and does some tranformations (specifically it removes a add_subfolder entry in the CMakeLists.txt file in order to use the libraries provided by Guix).

Manually

We can spawn a Guix shell containing all the dependencies of Gysela in a gyselalibxx git repo:

  # Go to the git repo
  $ cd /path/to/the/gysela/repo
  # Start the shell
  $ guix shell --pure -D gyselalibxx

You can now run cmake from this terminal.

Using direnv

The previous paragraph can be automated using direnv (make sure direnv is installed on your system):

  # Go to the git repo
  $ cd /path/to/the/gysela/repo
  # Create the .envrc file
  $ echo "use guix -D gyselalibxx" > .envrc
  # Activate direnv for this folder
  $ direnv allow .

Generate the completion file

From the development environment shell, it is possible to generate the file compile_commands.json , needed for command completion in the editor, by passing a specific flag to cmake:

  # Go to the git repo
  $ cd /path/to/the/gysela/repo
  # Create a build dir
  $ mkdir -p build
  # prepare the build dir and generate the compile commands
  $ cmake . -B build -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DGYSELALIBXX_DEPENDENCY_POLICIES=INSTALLED
  # Link the compile commands to the root of the repo
  $ ln -s build/compile_commands.json .

Spawning a development environment when Guix is not available

When Guix is not available on your machine, you can generate an image using guix pack and deploy it using a tool available on the machine like Singularity or Docker.

In order to generate such an image with Guix, we first need to generate a manifest file.

  # Generate a manifest file containing the development dependencies of
  # gyselalibxx-cuda-v100 and a couple of extra packages.
  guix shell -D gyselalibxx-cuda-v100 neovim tmux --export-manifest > manifest-gyselalibxx-dev-env.scm

Some notes on the previous command: the -D flag asks for the development dependencies of its package argument and can be specified multiple times. The --export-manifest flag prints the corresponding Scheme code on stdout. The resulting environment will contain neovim, tmux and the development dependencies of gyselalibxx-cuda-v100 (which are the same as any CUDA variant).

With Singularity (Jean Zay)

Singularity needs a squashfs image, which can be either built using the previously generated manifest file or downloaded from Guix HPC build farm.

Downloading the image

The easiest way to get started is by downloading the Guix pack that is built by our continuous integration system. It is a Singularity/Apptainer image and is available from https://guix.bordeaux.inria.fr/search/latest/archive?query=spec:images+status:success+gyselalibxx-squashfs.

Once downloaded, make sure to rename to file so that it has a .sif extension. See Deploying the image on Jean-Zay for information on how to use that image.

Generating the image

Using the manifest file generated in the previous section, the following commands build the image and copy it to Jean-Zay:

  # Generate the pack file.
  export PACK_FILE=$(guix pack -f squashfs -S /bin=bin -S /lib=lib --entry-point=/bin/bash -m manifest-gyselalibxx-dev-env.scm | tail -n 1)
  # Copy the image file on Jean-Zay.
  scp $PACK_FILE jean-zay.idris.fr:gysela-dev-env.sif

Deploying the image on Jean-Zay

Using a shell from a Singularity image requires using the --pty flag.

  # On Jean-Zay
  # Move the image to $WORK
  mv ~/gysela-dev-env.sif $WORK
  # Make the image available in Singularity
  idrcontmgr cp $WORK/gysela-dev-env.sif
  # Activate Singularity.
  module load singularity
  # Launch the container. Here we ask for 2 hours, one GPU and 10 CPU
  # cores.
  srun -A user@v100 --time=02:00:00 --ntasks=1 --gres=gpu:1 --cpus-per-task=10 --pty --hint=nomultithread -l singularity shell --bind $WORK:/work --nv $SINGULARITY_ALLOWED_DIR/gysela-dev-env.sif

You can now compile Gysela from within the container.

With Docker (Persée)

With relocatable binaries

  guix pack -RR -S /bin=bin -S /etc=etc -S /lib=lib -S /share=share -S /lib64=lib64 -m manifest-gyselalibxx-dev-env.scm

Subsections of Continuous Integration

Setup a Container Registry on GitLab

Table of content

A Container Registry can be used by Docker and Singularity to store container images, that can be shared with the rest of the internet. The most popular container registry is DockerHub, from which you can pull images with docker run docker.io/debian:latest.

A GitLab installation can be configured to expose a Container Registry for a repository. This is useful to store containers to be used by the CI/CD system, or by users of your application

Container Registries and Spack

Spack can use it as a place to store the packages that it builds. The interface is exposed through spack buildcache <subcommand>. Spack is known for optimizing builds for a specific architecture. If the user configures Spack correctly, the packages built for this configuration can be shared with other users.

Finally, it is possible to transform the spack-generated containers the registry to be used with Docker or Singularity. The user must push the Spack packages with the --base-image flag, matching the system that built the image (Debian 11, Ubuntu 20.04, etc). Then, you can pull the image normally.

Gitlab

  1. Enable the Container Registry in the repository’s configuration (Settings > General > Visibility, project features, permissions > Container registry) Screenshot from Gitlab web interface Screenshot from Gitlab web interface

  2. Create an Access Token with at least Developer Role, with Read and Write registry permissions: Screenshot from Gitlab web interface Screenshot from Gitlab web interface

  3. Annotate the URL’s to push and pull from the Container Registry: Screenshot from Gitlab web interface Screenshot from Gitlab web interface

Setup a Gitlab runner containing Guix

Table of content

This tutorial describes how to setup a Gitlab runner that can be used to run Guix commands, i.e. for CI/CD or pages generation.

VM creation

We use the infrastructure provided by https://ci.inria.fr but the instructions below can be adapted to another provider.

The first step is to create a new project: Projects → + Create new project.

In the project creation window, fill the name of your project and select None in the Software part in order to use gitlab-runner.

Click on Create project and then go to Dashboard, where the new project should appear.

Click on Manage project. You should get an overview of the project, with no virtual machines (so called slaves) so far.

Click on Manage slaves, which leads you to the list of configured VMs. Add a new VM by clicking on + add slave.

Select Ubuntu 22.04 LTS amd64 server (ubuntu-22.04-amd64) as a template and configure the resources you need, depending on your usage of the VM. As a good starting point, 8GB of RAM, 4 cores and 50 GB of disk space will allow you to use Guix comfortably.

When you're done, click on Create slave. The newly created VM should appear in the VM list.

Keep the webpage open in your browser.

Gitlab runner configuration

In your Gitlab project, you should activate CI/CD in Settings → General under the Visibility submenu.

Once CI/CD is activated, you should see a new CI/CD item in the Settings menu. Click on it.

Now, in the Runners submenu, you can create a new runner for your project by clicking New project runner.

If you want to setup tagged jobs, add tags information, otherwise it is safe to check the Run untagged jobs box.

Click on Create runner and you should see the Register runner page.

Keep the webpage open in your browser and follow with the instructions below.

VM configuration

Before being able to use the VM as a Gitlab runner, we have to install both Guix and Gitlab runner related software.

Connexion to the VM

Click on the Connect button in the VM summary list on https://ci.inria.fr website (Dashboard → Manage project → Manage slaves) to display the commands needed to log into the VM using SSH.

It boils down to:

  ssh <user>@ci-ssh.inria.fr
  # From within the new shell
  ssh ci@<machine_name_or_ip>
  # It is a good time to change default root password...
  sudo passwd
  # ...and ci user password.
  passwd

Increasing the disk space on /

The default template comes with a somehow tight root filesystem as an LVM logical volume, but with unallocated space in the corresponding LVM volume group.

  # Display the free space in the volume group.
  sudo vgdisplay
  # Display the logical volumes (only one in the template).
  sudo lvdisplay
  # Add 10GB to the logical volume containing the root filesystem.
  sudo lvextend -L +10G /dev/ubuntu-vg/ubuntu-lv
  # Online resize the underlying filesystem.
  sudo resize2fs

Installing gitlab-runner

Here is a summary of the required commands. More details in the documentation.

  # This prevents issues where unattended-upgrades takes a lock on the package DB.
  sudo systemctl stop unattended-upgrades.service
  # Get the Debian/Ubuntu installation script from Gitlab.
  curl -L "https://packages.gitlab.com/install/repositories/runner/gitlab-runner/script.deb.sh" | sudo bash
  # Install the Ubuntu package.
  sudo apt install gitlab-runner
  # Don't forget to restart unattended-upgrades.
  sudo systemctl start unattended-upgrades.service

Installing Guix in the VM

Here is a summary of the required commands. More details in the documentation.

  #
  # Guix installation from the script. The version packaged in Ubuntu is outdated
  # and is not compatible with the substitute server at Guix HPC.
  cd /tmp
  wget https://git.savannah.gnu.org/cgit/guix.git/plain/etc/guix-install.sh
  chmod +x guix-install.sh
  # Answer No to "Discover substitute servers locally", Yes elsewhere.
  sudo ./guix-install.sh
  # Configure Guix to use Guix HPC substitute server.
  wget https://guix.bordeaux.inria.fr/signing-key.pub
  sudo guix archive --authorize < signing-key.pub
  rm signing-key.pub
  sudo sed -i.bak -e "s@\(ExecStart.*guix-daemon\)@\1 --substitute-urls='https://guix.bordeaux.inria.fr https://ci.guix.gnu.org https://bordeaux.guix.gnu.org'@" /etc/systemd/system/guix-daemon.service
  # Restart =guix-daemon=
  sudo systemctl daemon-reload
  sudo systemctl restart guix-daemon.service
  # Log in as user gitlab-runner, which is used by the CI
  sudo -u gitlab-runner -s
  # Add Guix HPC configuration
  mkdir -p $HOME/.config/guix
  cat > $HOME/.config/guix <<EOF
  (append
       (list
         (channel
           (name 'guix-hpc-non-free)
           (url "https://gitlab.inria.fr/guix-hpc/guix-hpc-non-free.git")
           (branch "master")
           (commit "23d5f240e10f6431e8b6feb57bf20b4def78baa2")))
        %default-channels)
  EOF
  # Update the channels.
  guix pull

Configuring gitlab-runner

Go back to the Register runner webpage and follow the instructions there.

It boils down to executing the following command (use sudo to execute it as root):

  gitlab-runner register --url https://gitlab.inria.fr --token <TOKEN>

and selecting the right instance (https://gitlab.inria.fr in our case) and the right executor (shell in our case, see the documentation for more details).

Once in the Runners submenu of Settings → CI/CD, you should see a new runner up and running.

Using the runner

Here is a sample .gitlab-ci.yaml file running a helloworld stage:

  stages:
    - helloworld

  helloworld:
    script:
      - guix shell -C hello -- hello

When pushed to your repo, this file should spawn a successful CI job and you should be able to see the result of a helloworld command in the corresponding log.

Subsections of HPC Environment

Modern HPC Workflow Example (Guix)

Table of content

This guide applies the workflow presented at Modern HPC Workflow with Containers to run an application container built with Guix.

  • This tutorial will focus on using Grid5000 for both building the container with Guix and deploying it with Singularity, as it provides both tools.

  • The container may be built on any computer with Guix installed. You may refer to the documentation if you wish to install Guix on your machine. Beware that if you build it on your local machine, you’ll have to copy it to Grid5000.

  • Additional instructions will be provided for deployment on Jean-Zay, that can be easily adapted to any cluster supporting Singularity and using SLURM as job management system.

  • The application chosen as an example is Chameleon, a dense linear algebra software for heterogeneous architectures that supports MPI and NVIDIA GPUs through CUDA or AMD GPUs through ROCm.

Chameleon on NVIDIA GPUs

Build the container on Grid5000

  1. Login to Grid500 (detailed instructions here). The full list of resources shows where to find an NVIDIA GPU and an x86_64 CPU (for Singularity). For instance, the chifflot queue, located in Lille, contains nodes with NVIDIA P100 GPUs.
ssh lille.g5k
mkdir tuto && cd tuto
  1. Get the channels file. The chameleon-cuda package (the chameleon package variant with CUDA support) is defined in the Guix-HPC non-free channel, which is not activated by default.
wget https://numpex-pc5.gitlabpages.inria.fr/tutorials/hpc-env/workflow-example/channels.scm

The channels.scm file contains the following:

(list
  (channel
    (name 'guix-hpc)
    (url "https://gitlab.inria.fr/guix-hpc/guix-hpc.git")
    (branch "master")
    (commit
      "ae4a812197a7e565c22f763a1f09684257e79723"))
  (channel
    (name 'guix-hpc-non-free)
    (url "https://gitlab.inria.fr/guix-hpc/guix-hpc-non-free.git")
    (branch "master")
    (commit
      "c546e8121b4d8a715dcb6cd743680dd7eee30b0e"))
  (channel
    (name 'guix)
    (url "https://git.savannah.gnu.org/git/guix.git")
    (branch "master")
    (commit
      "97fb1887ad10000c067168176c504274e29e4430")
    (introduction
      (make-channel-introduction
        "9edb3f66fd807b096b48283debdcddccfea34bad"
        (openpgp-fingerprint
          "BBB0 2DDF 2CEA F6A8 0D1D  E643 A2A0 6DF2 A33A 54FA"))))
  (channel
    (name 'guix-science-nonfree)
    (url "https://codeberg.org/guix-science/guix-science-nonfree.git")
    (branch "master")
    (commit
      "de7cda4027d619bddcecf6dd68be23b040547bc7")
    (introduction
      (make-channel-introduction
        "58661b110325fd5d9b40e6f0177cc486a615817e"
        (openpgp-fingerprint
          "CA4F 8CF4 37D7 478F DA05  5FD4 4213 7701 1A37 8446"))))
  (channel
    (name 'guix-science)
    (url "https://codeberg.org/guix-science/guix-science.git")
    (branch "master")
    (commit
      "4ace0bab259e91bf0ec9d37e56c2676b1517fccd")
    (introduction
      (make-channel-introduction
        "b1fe5aaff3ab48e798a4cce02f0212bc91f423dc"
        (openpgp-fingerprint
          "CA4F 8CF4 37D7 478F DA05  5FD4 4213 7701 1A37 8446"))))
  (channel
    (name 'guix-past)
    (url "https://codeberg.org/guix-science/guix-past.git")
    (branch "master")
    (commit
      "70fc56e752ef6d5ff6e1e1a0997fa72e04337b24")
    (introduction
      (make-channel-introduction
        "0c119db2ea86a389769f4d2b9c6f5c41c027e336"
        (openpgp-fingerprint
          "3CE4 6455 8A84 FDC6 9DB4  0CFB 090B 1199 3D9A EBB5")))))
  1. Generate the Singularity container image with the guix pack command, prefixed with guix time-machine in order to use our channels.scm file. The -r option creates a symbolic link to the resulting container image in the Guix store, as chameleon.sif.
guix time-machine -C channels.scm -- pack -f squashfs chameleon-cuda bash -r ./chameleon.sif
Tip

guix pack can generate different formats, like Singularity (squashfs), Docker or relocatable binaries.

Tip

Singularity needs bash to be in the package list.

Deploy the container on Grid5000

  1. Start an interactive job using OAR.
oarsub -I -p chifflot -l host=2
export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libcuda.so
export OPENBLAS_NUM_THREADS=1
mpirun -machinefile $OAR_NODEFILE \
       --bind-to board \
       singularity exec\
                   --bind /usr/lib/x86_64-linux-gnu/:/usr/lib/x86_64-linux-gnu/ \
                   --bind /tmp:/tmp chameleon-cuda.sif \
                   chameleon_stesting -o gemm -n 4000 -b 160 --nowarmup -g 2
Tip

CUDA applications deployed with Guix need LD_PRELOAD to be set with the path to libcuda.so since the library is provided by the proprietary CUDA driver, installed on the machine, and not part of the Guix software stack.

Tip

The OPENBLAS_NUM_THREADS environment variable is set to improve the computation performance and not compulsory.

Deploy the container on Jean-Zay

  1. Copy the image to Jean-Zay. Depending on your SSH setup, you might have to adapt the commands below.
# Disconnect from Grid5000.
exit

# Copy the image from Grid5000 to Jean-Zay
scp lille.g5k:tuto/chameleon.sif jean-zay:chameleon.sif
  1. Setup the container image on Jean-Zay. First, the image has to be copied to the allowed space ($SINGULARITY_ALLOWED_DIR) in order to be accessible to Singularity. This step is specific to Jean-Zay, more details in the documentation. Then the singularity module needs to be loaded (this step is not always necessary, depending on the supercomputer, but is not specific to Jean-Zay).
ssh jean-zay
idrcontmgr cp chameleon.sif
module load singularity
  1. Start the job using the container with SLURM.
OPENBLAS_NUM_THREADS=1 \
  srun -A account@v100 \
       --time=0:10:00 \
       --nodes=2 \
       --cpu-bind=socket \
       --exclusive \
       --hint=nomultithread \
       --gres=gpu:4 \
       --mpi=pmi2 \
       singularity \
         exec --nv $SINGULARITY_ALLOWED_DIR/chameleon.sif \
         bash -c "LD_PRELOAD=/.singularity.d/libs/libcuda.so chameleon_stesting -o gemm -n 40000 -b 2000 --nowarmup -g 4"
Tip

Environment variables are propagated to the Singularity container context, but since the path to libcuda.so doesn’t exist outside of the container context (the path in bind-mounted by Singularity due to the --nv flag) it leads to an error when LD_PRELOAD is declared outside of the container context.

Deploy the container on Vega (EuroHPC)

  1. Copy the image to Vega. Depending on your SSH setup, you might have to adapt the commands below.
# Copy the image from Grid5000 to Vega
scp lille.g5k:tuto/chameleon.sif vega:chameleon.sif
  1. Start the job using the container with SLURM.
srun --exclusive \
     --partition=gpu \
     --gres=gpu:4 \
     -N 2 \
     --mpi=pmi2 singularity exec --bind /tmp:/tmp chameleon-cuda.sif \
     bash -c "LD_PRELOAD=/.singularity.d/libs/libcuda.so chameleon_stesting -o gemm -n 96000 -b 2000 --nowarmup -g 4"

Deploy the container on MeluXina (EuroHPC)

  1. Copy the image to MeluXina.
# Copy the image from Grid5000 to Jean-Zay
scp lille.g5k:tuto/chameleon.sif meluxina:chameleon.sif
  1. Start an interactive allocation with SLURM and load Singularity/Apptainer. On MeluXina, the singularity command is available through a module and the module command is only accessible on a compute node.
[login] srun --pty \
             -A project_id \
             --partition=gpu \
             -N 1 \
             --exclusive \
             --gpus-per-task=4 \
             --time=00:10:00 \
             --qos=test \
             bash
[compute] module load Apptainer/1.3.1-GCCcore-12.3.0
  1. Start the computation using Singularity.
[compute] OPENBLAS_NUM_THREADS=1 \
            exec \
            --nv \
            --bind /tmp:/tmp \
            chameleon-cuda.sif \
            bash -c 'LD_PRELOAD=/.singularity.d/libs/libcuda.so chameleon_stesting -o gemm -n 20000 -b 2000 --nowarmup -g 4'
Tip

In this example, we use a single node. In order to use multiple nodes, a script should be submitted using sbatch (not covered in this tutorial).

Build a Docker image on Grid5000

  1. Build a Docker container on Grid5000.
ssh lille.g5k
guix time-machine -C channels.scm -- pack -f docker chameleon-cuda bash -r ./chameleon.tar.gz

Deploy a Docker container on Irene (TGCC)

  1. Copy the container to Irene. Depending on your SSH setup, you might have to adapt the commands below.
# Disconnect from Grid5000.
exit

# Copy the image from Grid5000 to Jean-Zay
scp lille.g5k:chameleon.tar.gz irene:
  1. **SSH to Irene and import the image using pcocc-rs.
[irene-login] pcocc-rs image import docker-archive:chameleon.tar.gz chameleon
Tip

The TGCC uses a specific tool to deploy Docker images called pcocc-rs. See the documentation.

  1. Start a job in interactive mode.
[irene-login] ccc_mprun \
                -p v100 \
                -N 4 \
                -n 4 \
                -c 40 \
                -E '--mpi=pmi2' \
                -m work,scratch \
                -A user_account \
                -T 600 \
                -C chameleon \
                -E '--ctr-module nvidia' \
                -- bash -c "LD_PRELOAD=/pcocc/nvidia/usr/lib64/libcuda.so chameleon_stesting -o gemm -n 20000 -b 2000 --nowarmup -g 4"
Tip

On Irene, resources are allocated using ccc_mprun. See the documentation. For instance, the -s option spawns an interactive session directly on a compute node.

Tip

On Irene, the number of allocated GPUs is directly related to the number of allocated cores on the node. Here, 20 cores are allocated on a V100 which contains 40 cores in total, so 50% of the GPUs available on the node (4 x V100) are allocated. See the documentation.

Tip

The --module nvidia option make the CUDA libraries available inside the image in the /pcocc/nvidia/usr/lib64 folder.

Chameleon on AMD GPUs

Build the image on Grid5000

  1. Connect to Grid5000 and build the Singularity container.
ssh lille@g5k
cd tuto
guix time-machine -C channels.scm -- pack -f squashfs chameleon-hip bash -r ./chameleon-hip.sif

Deploy on Adastra

  1. Copy the Singularity image to Adastra. Depending on your SSH setup, you might have to adapt the commands below.
# Disconnect from Grid5000.
exit

# Copy the image from Grid5000 to Adastra
scp lille.g5k:tuto/chameleon-hip.sif adastra:chameleon-hip.sif
Warning

Before being able to use a custom Singularity image, it has be manually copied to an authorized path by the support, which should be contacted by email. See the documentation.

  1. Start a job in interactive mode.
ssh adastra
OPENBLAS_NUM_THREADS=1 \
  srun --cpu-bind=socket \
       -A user_account \
       --time=0:10:00 \
       --constraint=MI250 \
       --exclusive \
       --nodes=4 \
       --mpi=pmi2 \
       singularity exec \
                   --bind /proc \
                   --bind /sys \
                   /opt/software/containers/images/users/cad15174/chameleon-hip.sif \
                   chameleon_stesting -o gemm -n 96000 -b 2000 --nowarmup -g 8

Deploy on LUMI

  1. Copy the Singularity image to LUMI. Depending on your SSH setup, you might have to adapt the commands below.
# Copy the image from Grid5000 to LUMI
scp lille.g5k:tuto/chameleon-hip.sif lumi:chameleon-hip.sif
  1. Start a job in interactive mode.
ssh lumi
OPENBLAS_NUM_THREADS=1 \
  srun --cpu-bind=socket \
       -A project_id \
       --threads-per-core=1 \
       --cpus-per-task=56 \
       --ntasks-per-node=1 \
       -N 4 \
       --time=00:05:00 \
       --partition=dev-g \
       --mpi=pmi2 \
       singularity exec \
                   --bind /sys:/sys \
                   chameleon-hip.sif \
                   chameleon_stesting -o gemm -n 40000 -b 2000 --nowarmup -g 8

Bonus: relocatable binaries

For machines where Singularity is not available (or you have to ask support to deploy your custom image), an alternative can be the relocatable binary archive. The command below generates an archive containing chameleon-hip for AMD GPUs that can be run on e.g. Adastra:

guix time-machine -C channels.scm -- pack -R -S /bin=bin -C zstd chameleon-hip -r chameleon-hip.tar.zst

This archive can then be uploaded to a supercomputer (e.g. Adastra) and deployed:

# Copy the archive to Adastra
scp chameleon-hip.tar.zst adastra:
# SSH into Adastra
ssh adastra
# Extract the archive into its own folder
[adastra] mkdir chameleon-hip && zstd -d chameleon-hip.tar.zst \
                              && tar xf  chameleon-hip.tar -C chameleon-hip
# Start the job
[adastra] OPENBLAS_NUM_THREADS=1 \
            srun --cpu-bind=socket \
                 -A cad15174 \
                 --time=0:10:00 \
                 --constraint=MI250 \
                 --exclusive \
                 --nodes=4 \
                 --mpi=pmi2 \
                 $CCFRWORK/chameleon-hip-common/bin/chameleon_stesting \
                              -o gemm -n 96000 -b 2000 --nowarmup -g 8

Modern HPC Workflow Example (Spack)

Table of content

This is the second part of the Worklow Tutorial. In the previous example we show how to use Singularity and Guix for our running example, Chameleon, on HPC clusters (Modern HPC Workflow Example (Guix)).

Warning

This tutorial relies on a GitLab access token for the registry. Since the tutorial took place, this token has expired.

In this second part, we will use Spack instead of Guix. We will also produce Spack-generated containers, for easy reproducibility of the workflow across different computers.

In summary, we are going to:

  • Install Spack on Grid'5000.
  • Build Chameleon with CUDA support.
  • Push the packages into a container registry.
  • Pull the packages as a Singularity container.
  • Run the container in the GPU partition of Grid'5000, or other supercomputer.

About Container Registries

There are 2 ways to generate containers with Spack:

The containerize option has a number of drawbacks, so we want to push with the Build Caches option. This also has the benefit of being able to build and cache packages on CI/CD, allowing for quicker deployments.

The Spack build cache will require setting up a container registry, in some Git Forge solution. Both GitHub and GitLab provide their own Container Registry solutions. This guide presents how to create it: Setup a Container Registry on GitLab.

For this tutorial, we will use the container registry hosted at Inria’s GitLab.

Build the Container on Grid'5000

We will connect to the Lillie site on Grid'5000, exactly the same as with the Guix guide.

Note

If you are having trouble at any step, you can skip this and download the container directly:

$ wget --user=<your_g5k_login> --ask-password https://api.grid5000.fr/sid/sites/lille/public/fayatsllamas/chameleon-spack.sif
$ ssh lille.g5k
$ mkdir tuto-spack && cd tuto-spack

Spack is installed at the user level. To install Spack, you have to clone the Spack repo, and load it with source:

$ git clone -c feature.manyFiles=true https://github.com/spack/spack

$ cd spack
$ git checkout b7f556e4b444798e5cab2f6bbfa7f6164862700e
$ cd ..

$ source spack/share/spack/setup-env.sh
$ spack --version
1.0.0.dev0 (b7f556e4b444798e5cab2f6bbfa7f6164862700e)

We will create an Spack environment, that holds our configuration and installed packages. The Spack environment will create a spack.yaml file, which we will edit:

$ spack env create --dir ./myenv   # this may be a bit slow
$ spack env activate ./myenv
$ spack env status
==> In environment /home/fayatsllamas/myenv

Open the ./myenv/spack.yaml with your favorite editor, and you will see something like this:

# ./myenv/spack.yaml
spack:
  specs: []
  view: true
  concretizer:
    unify: true

We will perform 3 modifications:

  • Add Chamelon to the list of installed packages
  • Configure Spack to build our packages for generic x86_64. This will ensure it doesn’t mix the ISA’s of the nodes we will use.
  • Configure 2 mirrors:
    • inria-pull is a mirror I populated with caches of the packages for the tutorial.
    • inria-<name> is a mirror you will use to push the packages you build, as an example.
Important

Change inria-<name> and the URL .../buildcache-<name> to a unique name. You will push to this cache as an example, so we don’t collide between each other. You can use your G5k login, for example.

# ./myenv/spack.yaml
spack:
  specs:
  - chameleon@master+cuda
  packages:
    all:
      require: target=x86_64

  view: true
  concretizer:
    unify: true

  mirrors:
    inria-pull:
      url: oci://registry.gitlab.inria.fr/numpex-pc5/tutorials/buildcache
      signed: false

    inria-<name>:
      url: oci://registry.gitlab.inria.fr/numpex-pc5/tutorials/buildcache-<name>
      access_pair:
      - guest
      - glpat-x_uFkxezH1iTKi6KmLrb
      signed: false

Edit the spack.yaml file and save it. After the environment has been modified, we call spack concretize to “lock” our changes (to a spack.lock file). We can use spack spec to preview the status of our environment. It will show the packages we are missing to be built.

Note

spack concretize locks the characteristics of the environment to the current machine. We are concretizing on the frontend node for convenience, and to be able to test our packages in it.

$ spack concretize
$ spack spec
 -   chameleon@master%gcc@10.2.1+cuda~fxt~ipo+mpi+shared~simgrid build_system=cmake build_type=Release cuda_arch=none generator=make runtime=starpu arch=linux-debian11-x86_64
 -       ^cmake@3.31.4%gcc@10.2.1~doc+ncurses+ownlibs~qtgui build_system=generic build_type=Release arch=linux-debian11-x86_64
 -           ^curl@8.11.1%gcc@10.2.1~gssapi~ldap~libidn2~librtmp~libssh~libssh2+nghttp2 build_system=autotools libs=shared,static tls=openssl arch=linux-debian11-x86_64
 -               ^nghttp2@1.64.0%gcc@10.2.1 build_system=autotools arch=linux-debian11-x86_64
...

The next step is to build the packages. Usually, this is a CPU-intensive job. Let’s move to a CPU node of G5k for this (chiclet):

$ oarsub -t allowed=special --project lab-2025-numpex-exadi-guix-spack-in-hpccenters -I -p chiclet -l /nodes=1
[compute]$ source spack/share/spack/setup-env.sh
[compute]$ spack env activate --dir ./myenv

To build our software stack, just call spack install. We have configured a pull-only build cache previously, so packages will not be re-compiled:

[compute]$ spack install

You may want to check that everything was built, by running spack spec again:

[compute]$ spack spec 
[+]  chameleon@master%gcc@10.2.1+cuda~fxt~ipo+mpi+shared~simgrid build_system=cmake build_type=Release cuda_arch=none generator=make runtime=starpu arch=linux-debian11-zen
[+]      ^cmake@3.31.4%gcc@10.2.1~doc+ncurses+ownlibs~qtgui build_system=generic build_type=Release arch=linux-debian11-zen
[+]          ^curl@8.11.1%gcc@10.2.1~gssapi~ldap~libidn2~librtmp~libssh~libssh2+nghttp2 build_system=autotools libs=shared,static tls=openssl arch=linux-debian11-zen

After the packages have been built, let’s push them into the container registry.

Important

To push our packages to be used as containers, we must add the --base-image flag. As Spack doesn’t built everything from the bottom, we must provide a base image, from which the libc library will be taken. You must match your --base-image to the system that built the packages. We have built the packages under Grid'5000 Debian 11 installation, so the base image should be a Debian 11 too. Not matching this, or not passing --base-image will render the push unusable.

Because Docker might put a rate-limit on the pulls of an image, and we are sharing the same IP address (10 downloads per hour per IP), I mirrored the Debian 11 image to the Inria registry. Please use this image instead (otherwise, the command would be --base-image debian:11):

[compute]$ spack buildcache push --base-image registry.gitlab.inria.fr/numpex-pc5/tutorials/debian:11 inria-<name> chameleon
....
==> [55/55] Tagged chameleon@master/7p4fs2v as registry.gitlab.inria.fr/numpex-pc5/tutorials/buildcache:chameleon-master-7p4fs2vb7isbfol3ceumigyxqs7bhuoq.spack

Annotate the URL that Spack gives you

Because Singularity might use heavy CPU/Memory resources, we build the container Image while we are in the compute node. The output is a SIF file (Singularity Image Format).

[compute]$ module load singularity
[compute]$ singularity pull chameleon-spack.sif docker://registry.gitlab.inria.fr/numpex-pc5/tutorials/<.....>
[compute]$ exit

Deploying on NVIDIA GPU’s

The commands for running the container on other machines like Jean-Zay, Vega, etc; will be the same as in the Guix Tutorial.

We will demonstrate how to run the container on the GPU partition of Grid'5000.

Deploying Chameleon on Grid'5000

Before trying the container, we can try our Spack-installed Chameleon directly:

$ oarsub -t allowed=special --project lab-2025-numpex-exadi-guix-spack-in-hpccenters -I -p chifflot -l /host=2,walltime=0:10:00
[compute]$ source spack/share/spack/setup-env.sh
[compute]$ spack env activate ./myenv
[compute]$ mpirun -machinefile $OAR_NODEFILE -x PATH chameleon_stesting -o gemm -n 4000 -b 160 --nowarmup -g 2
[compute]$ exit

To use the singularity container:

$ oarsub -t allowed=special --project lab-2025-numpex-exadi-guix-spack-in-hpccenters -I -p chifflot -l /host=2,walltime=0:10:00
[compute]$ mpirun -machinefile $OAR_NODEFILE \
                  --bind-to board \
                  singularity exec \
                    --bind /usr/lib/x86_64-linux-gnu/:/usr/lib/x86_64-linux-gnu/ \
                    --bind /tmp:/tmp chameleon-spack.sif \
                    chameleon_stesting -o gemm -n 4000 -b 160 --nowarmup -g 2

Modern HPC Workflow with Containers

Table of content

Diagram Diagram

Software deployment in HPC systems is a complex problem, due to specific constrains, such as:

  • No access to root
  • No package install, update or modification as a user
  • Some kernel features are disabled, like user namespaces

As users develop more complex software, their needs for extra dependencies increase. The classical solution to providing extra software to the user involves modules. Modules can be loaded from the terminal of a user, and are managed by the HPC admin team.

$ module avail -t | grep kokkos
kokkos/3.2.00
kokkos/3.2.00-cuda
kokkos/3.4.01-cuda
kokkos/3.4.01-cuda-c++17
kokkos/3.4.01-cuda-static
kokkos/3.4.01-cuda-static-c++17

This is solution has some shortcomings:

  • How to use software not provided by modules?
  • How to deploy different versions of the package, or different variants?
  • How to reproduce the software stack at a later point in time (even for archival purposes)
  • How to move from one machine to another, given that the exposed modules are machine dependant?
  • How to modify a package in the dependency chain?

Shift in the paradigm of software deployment

In order to solve the above mentioned issues and in the view of a future of Exascale computing, we propose a shift in the paradigm of software deployment, from the classical way, where the admin team provides the software stack for the users, to a new procedure where the user brings their own software stack.

This method has a number of advantages, among the following:

  • The user is in full control of their software stack.
  • A container is portable across different compute centers.
  • The cost of moving to a new HPC system is reduced.

Diagram Diagram

Singularity/Apptainer

Singularity is an application that can run containers in an HPC environment. It is highly optimized for the task, and has interoperability with Slurm, MPI or GPU specific drivers.

Usually, we find a duplicity of software stacks, and platforms to deploy to:

Diagram Diagram

Containers (Singularity or Docker) solve this by having a single interface that merges everything. From the software stack, the container is the platform to deploy to. From the platform point of view, software comes bundled as a container:

Diagram Diagram

Singularity uses its own container format (sif), which can also be transparently generated from a Docker container.

Singularity is available in the majority of Tier-1 and Tier-0 HPC centers, either in the default environment or loaded from a module:

# On LUMI (European Tier-0 cluster)
$ singularity --version
singularity-ce version 4.1.3-150500.10.7
#
# On Jean-Zay (French Tier-1 cluster)
$ module load singularity
$ singularity --version
singularity version 3.8.5

Singularity can download and run a container image directly from an online container registry such as DockerHub using the docker:// reference:

[lumi] $ singularity shell docker://ubuntu:latest

Singularity> grep VERSION= /etc/os-release
VERSION="24.04.1 LTS (Noble Numbat)

This feature is not available in all clusters.

See also documentation about the Github Container Registry (GHCR) for setting up a Github hosted registry.


Using containers through Singularity can provide a solution to some of the points mentioned in the previous section, but also transfers to the user the task to build a container with the specific software stack they need.

Building a container can be streamlined using package managers.

In our approach, we selected two package managers to build the containers: Guix and Spack.

Differences between Guix and Spack

GNU Guix is a package manager for GNU/Linux systems. It is designed to give users more control over their general-purpose and specialized computing environments, and make these easier to reproduce over time and deploy to one or many devices. (source: Guix official website)

Spack is a package manager for supercomputers, Linux, and macOS. It makes installing scientific software easy. Spack isn’t tied to a particular language; you can build a software stack in Python or R, link to libraries written in C, C++, or Fortran, and easily swap compilers or target specific microarchitectures. (source: Spack official website)

A key feature of the Spack package manager is that it allows users to integrate parts of the system they are building on: Spack packages can use compilers or link against libraries provided by the host system. Use of system-provided software is even a requirement at the lowest level of the stack.

Guix differs from Spack in two fundamental ways: self containment, and support for reproducibility and provenance tracking. Self containment stems from the fact that Guix packages never rely on software pre-installed on the system; its packages express all their dependencies, thereby ensuring control over the software stack, wherever Guix deploys it. This is in stark contrast with Spack, where packages may depend on software pre-installed on the system.

Unlike Spack, Guix builds packages in isolated environments (containers), which guarantees independence from the host system and allows for reproducible builds. As a result, reproducible deployment with Guix means that the same software stack can be deployed on different machines and at different points in time—there are no surprises. Conversely, deployment with Spack depends on the state of the host system.

TODO: Comparative table of features

Building containers with Guix

Guix is a package manager for Linux focused on the reproducibility of its artifacts. Given a fixed set of package definitions (a list of channels at a specific commit in Guix terminology), Guix will produce the same binaries bit-by-bit, even after years between experiments.

The Guix project itself maintains a list of package definitions installed together with the package manager tool.

For some specific scientific packages, it might be necessary to include extra package definitions from third-party channels: a list of science-related channels can be found here.

Note that these channels contain only FOSS-licensed packages. In order to access to package definitions of proprietary software or of software that depend on non-free software, the following channels could be included:

The Guix package manager is able by itself to instantiate a containerized environment with a set of packages using the guix shell --container command.

Unfortunately, Guix is not yet available on Tier-1 and Tier-0 supercomputers, but it can be used to generate a Singularity image locally before deploying it on a supercomputer. This gives the user both the reproducibility properties of the Guix package manager and the portability of Singularity containers.

To get started, install Guix or connect to a machine with a Guix installation (Grid5000 for example).

Guix generates Singularity images with the guix pack -f squashfs command, followed by a list of packages. For example, the following command would generate a Singularity image containing the bash and gcc-toolchain packages:

$ guix pack -f squashfs bash gcc-toolchain
[...]
/gnu/store/xxxxxxxxxxxxxxxxxxxxxxxxx-bash-gcc-toolchain-squashfs-pack.gz.squashfs

The image can be configured with an entry point, allowing to directly start an arbitrary program when called with the run subcommand of Singularity. This is done using the --entry-point flag:

# Create an image containing bash and hello, an "hello world" program,
# that will be started by default.
$ guix pack -f squashfs --entry-point=/bin/hello bash hello
[...]
/gnu/store/xxxxxxxxxxxxxxxxxxxxxxxxx-bash-hello-squashfs-pack.gz.squashfs

In order to easily find the generated image, the -r flag creates a link to the image (along with other actions):

# Create an image containing bash and hello, an "hello world" program,
# that will be started by default.
$ guix pack -f squashfs --entry-point=/bin/hello bash hello -r hello.sif
[...]
/gnu/store/xxxxxxxxxxxxxxxxxxxxxxxxx-bash-hello-squashfs-pack.gz.squashfs
$ ls -l
[...] hello.sif -> /gnu/store/xxxxxxxxxxxxxxxxxxxxxxxxx-bash-hello-squashfs-pack.gz.squashfs

The image can be then transfered to the target supercomputer and run using Singularity. Below is an example on LUMI:

$ scp hello.sif lumi:~
[...]
$ ssh lumi
[...]
[lumi] $ singularity run hello.sif 
[...]
Hello, world!
[lumi] $ singularity shell hello.sif
[lumi] Singularity> command -v hello
/gnu/store/xxxxxxxxxxxxxxxxxxxx-profile/bin/hello

Instead of specifying the list of packages on the command line, the packages can be specified through a manifest file. This file can be written by hand or generated using the command guix shell --export-manifest. Manifests are useful when dealing with a long list of packages or package transformations. Since they contain code, they can be used to perform a broad variety of modifications on the package set such as defining package variants or new packages that are needed in a specific context. The example below generates a simple manifest.scm file containing the bash and hello packages:

$ guix shell --export-manifest bash hello > manifest.scm

This manifest file can be then used to generate the same Singularity image as above with the following command:

$ guix pack -f squashfs --entry-point=/bin/hello -m manifest.scm

The command guix describe -f channels generates a channels file that is used to keep track of the current state of package definitions:

$ guix describe -f channels > channels.scm

Both files channels.scm and manifest.scm should be kept under version control and are sufficient to generate an image containing the exact same software stack down to the lib C, with the exact same version and compile options, in any machine where the guix command is available, using the command guix time-machine:

$ guix time-machine -C channels.scm -- pack -f squashfs --entry-point=/bin/hello -m manifest.scm

Note that in order to generate the exact same file (bit-for-bit identical), the same image specific options such as --entry-point have to be specified.

Building container images with Spack

Spack is a package manager specifically targeted at HPC systems. One of its selling points is that it can easily target specific features of the supercomputer, like compiler, CPU architecture, configuration, etc.

Unlike Guix, Spack can be installed directly on a supercomputer by the user, as it only requires git clone in the home directory. There are some problems with this:

  • High usage of inodes and storage
  • Reproducibility and portability of the environment across machines or time

Instead of using Spack directly on the supercomputer, it is possible to use Spack to generate Singularity or Docker containers. Once the container is generated, the same environment will be able to deployed to any machine.

To generate the container, Spack documents 2 ways:

  1. Generating a Dockerfile. This method has some downsides, so we will use the next one.
  2. Using a build cache

Going with the build cache approach, you need to:


Diagram Diagram

$ spack env create --dir ./myenv
==> Created independent environment in: /home/ubuntu/spack/myenv
==> Activate with: spack env activate ./myenv
$ spack env activate ./myenv

$ spack mirror add \
    MY_MIRROR \                         # name for spack
    --oci-username <OCI_USERNAME> \     # username
    --oci-password <OCI_PASSWORD> \     # api token
    oci://<URL>                         # URL of the registry

$ spack add cmake
==> Adding cmake to environment /home/ubuntu/spack/myenv
$ spack install
...

$ spack buildcache push --base-image ubuntu:24.04 MY_MIRROR cmake
==> [16/16] Tagged cmake@3.30.5/twhtshf as registry.gitlab.inria.fr/numpex-pc5/wp3/spack-repo/buildcache-myenv:cmake-3.30.5-twhtshfoj7cxxomelmcic2oyv6tr2jts.spack
Important

The argument --base-image <image> must be passed, and should match the host system used to build the packages.

Once Spack pushes the package, we can go into our supercomputer and run the container directly:

$ ssh lumi
[lumi] $ singularity shell docker://registry.gitlab.inria.fr/numpex-pc5/wp3/spack-repo/buildcache-myenv:cmake-3.30.5-twhtshfoj7cxxomelmcic2oyv6tr2jts.spack

Singularity> command -v cmake
/home/ubuntu/.spack/opt/linux-ubuntu24.04-icelake/gcc-13.2.0/cmake-3.30.5-twhtshfoj7cxxomelmcic2oyv6tr2jts/bin/cmake

Hands-on examples

The examples below follow the methodology described in this document to deploy containers on supercomputers:

Advanced topics

Using custom packages

Both Spack and Guix expose command-line mechanisms for package customization.

Guix uses so-called package transformations.

Spack uses the specification mechanism.

CPU optimization

In order to generate a binary optimized for a specific CPU micro-architecture, the --tune flag can be passed to a variety of Guix commands:

# Build a PETSc package optimized for Intel x86_64 Cascade Lake micro-architecture.
$ guix build --tune=cascadelake petsc
# Instantiate a containerized environment containing an optimized PETSc package.
$ guix shell --container --tune=cascadelake petsc
# Generate a manifest file where all the tunable packages are optimized.
$ guix shell --export-manifest --tune=cascadelake pkg1 pkg2 ... pkgN

For Spack, this can be done by adding the target specification on the command-line:

$ spack install petsc target=cascadelake

Spack also is capable of easily configuring the CFLAGS for a package:

$ spack install petsc cppflags=-O3

MPI performance

Diagram Diagram

The 3 aspects of concern when getting the best performance with MPI and Containers are:

  • Container runtime performance: the slowdown caused by the container runtime having to translate between namespaces is not significant enough.
  • Network drivers: as long as the containers are properly built, the drivers should discover the high-speed network stack properly.
  • MPI distribution: the admin team might use custom compilation flags for their MPI distribution. It remains to be seen what’s the impact of this.

After many tests, we have concluded that Singularity doesn’t seem to pose an issue against performance. Although the benchmark figures don’t indicate any significant performance loss, the user is expected to compare the performance with their own software to run.

Composyx CG convergence - 40 nodes - 40 subdomains

composyx perf composyx perf

OSU bandwidth benchmark

osu1 osu1

If the MPI drivers aren’t properly detected, the performance figures for benchmarks will be orders of magnitude different, as this usually means falling back to the TCP network stack instead of using the high-performance network. The network driver for MPI is controlled with MCA parameters --mca key value.

Usually MPI detects the driver automatically, but you can force some driver with --mca pml <name>, or to debug if MPI is selecting the proper driver. This is further explained in Notes on MPI.

Regarding the actual MPI installation, a generic OpenMPI installation usually can get performance figures in the same order of magnitudes as the MPI installation provided by the admin team, provided the network driver is properly selected. If the user has the technical expertise, the MPI installation can be passed-through the container and replaced at runtime. More investigation around the viability of this method is to be done.

CUDA and ROCM stacks

  • Singularity allows passing through the graphics cards to the containers, with the --nv and --rocm flags.

  • Spack packages that may support CUDA, have the +cuda specification that can be enabled. Additionally, other packages support specifying the cuda architecture with cuda_arch=<arch>. ROCM support is also provided in selected packages through the +rocm spec.

  • Guix provides CUDA packages through the Guix-HPC Non-free repository. This contains package variants with CUDA support. ROCM software stack and package variants are hosted in the regular Guix-HPC channel.

Application vs development containers

When building a container or a software environment, we usually make the distinction between “application” and “development” containers:

  • If we have every dependency to build some package, except the package itself, it’s a development container.
  • If it only contains the application itself, it’s an app container.

Because of this, there are 2 separate usecases for a container:

  • Getting all the dependencies to iterate when developing a package.
  • Deploying a final package into a supercomputer.

Alternatives

This workflow provides some flexibility on how to use the tools proposed. Other alternative ways are:

  • Using Spack natively:

    • Useful for iterating a solution on a local machine.
    • Installing Spack doesn’t require admin, so it can be tested on a supercomputer as well.
    • Can run into the limit of inodes if used in a supercomputer.
  • Using Guix natively:

    • Also useful for local testing.
    • Guix is not available is supercomputers.
  • Using Singularity containers not built with Guix or Spack:

    • Doesn’t have the guarantees of reproducibility or customizability, but still a good step towards isolation and portability.
  • Guix relocatable binaries:

    • This is an alternative format of guix pack, which produces a single file that can be run without Singularity.
    • Very good option for application deployment, but can be tricky to be setup as development solutions.

HPC centers support for Singularity

The following list describes the platform support for the supercomputers we have tested the workflow on, and any caveats encountered.

Supercomputer High-speed Network CPU GPU Singularity support?
Jean-Zay InfiniBand Intel x86-64 Nvidia (CUDA) âś…*
Adastra Cray AMD x86-64 AMD âś…*
Irene InfiniBand Intel x86-64 Nvidia P100 (CUDA) ❌*
LUMI Cray AMD x86-64 AMD MI250X (ROCM) âś…
Vega InfiniBand Intel x86-64 Nvidia A100 (CUDA) âś…
Meluxina InfiniBand AMD x86-64 Nvidia A100 (CUDA) âś…

Jean-Zay

Containers must be placed in the “allowed directory” with idrcontmgr:

$ idrcontmgr cp image.sif
$ singularity shell $SINGULARITY_ALLOWED_DIR/image.sif

Irene

Singularity is not supported. Instead, a Docker-compatible runtime pcocc-rs is provided.

Guix images must be genrated with -f docker instead.

Adastra

The admin team has to verify each container image before use.

If quick deployment is required, itis also possible to use Guix relocatable binaries or a native spack installation. Guix can generate relocatable binaries with:

# Generate the pack, linking /bin
$ guix pack --relocatable -S /bin=bin <package>
...
/gnu/store/...-tarball-pack.tar.gz

# Move the pack to the Adastra and unpack it
$ scp /gnu/store/...-tarball-pack.tar.gz adastra:~/reloc.tar.gz
$ ssh adastra
[adastra] $ tar -xvf reloc.tar.gz

[adastra] $ ./bin/something

Notes on MPI

Table of content

MCA parameters

MPI uses the MCA (Modular Component Architectures) as a framework for configuration for different run-time parameters of an MPI application.

MCA parameters can be adjusted with the flag:

$ mpirun --mca <name> <value> ...

There are 2 ways to debug which MCA parameters are used:

  1. ompi_info --all will display all the MCA parameters that are avaiable a priori.
  2. The mpi_show_mca_params MCA parameters can be set to all, default, file, api or enviro to display their selected value. Sometimes thy will just show as key= (default), which is not useful.

Network drivers

There are 3 modes for MPI to select networks 1: ob1, cm and ucx, that can be set with --mca pml <ob1,cm,ucx> (PML: Point-to-point Message Layer).

  • ucx manages the devices on its own. It should be used for InfiniBand networks. UCX can be further configured with ucx-specific env variables, for example mpirun --mca pml ucx -X UCX_LOG_LEVEL=debug ....
  • ob1 is the multi-device, multi-rail engine and is the “default” choice. It is configured with --mca pml ob1. It used different backends for the Byte-Transport-Layer (btl), which can be configured with --mca btl <name>, such as:
    • tcp
    • self
    • sm shared memory
    • ofi Libfabric, alternate way
    • uct UCX, alternate way
  • cm can interface with “matching” network cards that are MPI-enabled. It uses MTL’s (not BTL’s) which can be set with --mca mtl <name>
    • psm2 Single-threaded Omni-Path
    • ofi Libfabric

In short: ucx provides the performance for InfiniBand, cm can be used for specific setups, and ob1 as the fallback for low-performance TCP or local-device. libfabric can be used through cm or ob1.

TODO: Discuss MCA transports for CUDA

Using Grid'5000

Table of content

The purpose of this tuto is to let you experiment the Grid'5000 platform, which is a large-scale and flexible testbed for experiment-driven research in all areas of computer science, with a focus on parallel and distributed computing including Cloud, HPC and Big Data and AI.

As an example we will try to run an implementation of Conway’s Game of Life using Message Passing Interface (MPI) for parallelization.

Set up a Grid'5000 account

To request an account on Grid’5000 fill that form and select the appropriate Group Granting Access, Team and Project. Members of the NumPEx-PC5 Team should use the values documented here.

Then make sure to generate a SSH keypair on your PC and to upload the public key on Grid'5000 ; this will allow direct connection using ssh from your PC. Detailled explanations are given here.

Read the documentation

A very extensive documentation is available on Grid'5000 User Portal. For that tutorial you may start with these two articles:

If you are not familiar with MPI you might also have a look here.

Prepare the work

Connect to one Grid'5000 site

If you applied the correct SSH configuration on your PC (see here), you should be able to connect directly to a given Grid'5000 front-end, let’s say for instance Grenoble, with a simple ssh command :

jcharousset@DEDIPPCY117:~$ ssh grenoble.g5k
Linux fgrenoble 5.10.0-30-amd64 #1 SMP Debian 5.10.218-1 (2024-06-01) x86_64
----- Grid'5000 - Grenoble - fgrenoble.grenoble.grid5000.fr -----

** This site has 5 clusters (more details at https://www.grid5000.fr/w/Grenoble:Hardware)
 * Available in queue default with exotic job type:
 - drac   (2016): 12 nodes (2 CPUs POWER8NVL 1.0, 10 cores/CPU, 4 GPUs Tesla P100-SXM2-16GB, 128GB RAM, 2x931GB HDD, 1 x 10Gb Ethernet, 2 x 100Gb InfiniBand)
 - yeti   (2017): 4 nodes (4 CPUs Intel Xeon Gold 6130, 16 cores/CPU, 768GB RAM, 447GB SSD, 2x1490GB SSD, 3x1863GB HDD, 1 x 10Gb Ethernet, 1 x 100Gb Omni-Path)
 - troll  (2019): 4 nodes (2 CPUs Intel Xeon Gold 5218, 16 cores/CPU, 384GB RAM, 1536GB PMEM, 447GB SSD, 1490GB SSD, 1 x 25Gb Ethernet, 1 x 100Gb Omni-Path)
 - servan (2021): 2 nodes (2 CPUs AMD EPYC 7352, 24 cores/CPU, 128GB RAM, 2x1490GB SSD, 1 x 25Gb Ethernet, 2 x 100Gb FPGA/Ethernet)
 * Available in queue default:
 - dahu   (2017): 32 nodes (2 CPUs Intel Xeon Gold 6130, 16 cores/CPU, 192GB RAM, 223GB SSD, 447GB SSD, 3726GB HDD, 1 x 10Gb Ethernet, 1 x 100Gb Omni-Path)

** Useful links:
 - users home: https://www.grid5000.fr/w/Users_Home
 - usage policy: https://www.grid5000.fr/w/Grid5000:UsagePolicy
 - account management (password change): https://api.grid5000.fr/ui/account
 - support: https://www.grid5000.fr/w/Support

** Other sites: lille luxembourg lyon nancy nantes rennes sophia strasbourg toulouse

Last login: Fri Jun 14 11:40:27 2024 from 192.168.66.33
jcharous@fgrenoble:~$

Build the example application

Let’s first retrieve the original source code by cloning the Github repository:

jcharous@fgrenoble:~$: git clone https://github.com/giorgospan/Game-Of-Life.git
Cloning into 'Game-Of-Life'...
remote: Enumerating objects: 127, done.
remote: Total 127 (delta 0), reused 0 (delta 0), pack-reused 127
Receiving objects: 100% (127/127), 171.69 KiB | 1.95 MiB/s, done.
Resolving deltas: 100% (48/48), done.
Tip

There is a distinct home directory on each Grid'5000 site, so what has been stored in Grenoble will not be available if you connect to Lyon or Nancy.

To generate a more verbose output, you might want to uncomment lines 257 to 264 of the file Game-of-Life/mpi/game.c - thus a subset of the matrix will be printed at each generation.

		/*Uncomment following lines if you want to see the generations of process with
		rank "my_rank" evolving*/
		
		
		// if(my_rank==0)
		// {
			// printf("Generation:%d\n",gen+1);
			// for(i=0;i<local_M;++i)
				// putchar('~');
			// putchar('\n');
			// print_local_matrix();
		// }

Next step is to build the application:

jcharous@fgrenoble:~$ cd Game-Of-Life/
jcharous@fgrenoble:~/Game-Of-Life$ make mpi
rm -f ./mpi/functions.o ./mpi/game.o ./mpi/main.o ./mpi/gameoflife
mpicc -g -O3 -c mpi/functions.c -o mpi/functions.o
mpicc -g -O3 -c mpi/game.c -o mpi/game.o
mpicc -g -O3 -c mpi/main.c -o mpi/main.o
mpicc -o mpi/gameoflife mpi/functions.o mpi/game.o mpi/main.o
jcharous@fgrenoble:~/Game-Of-Life$

Resulting executable is available at ~/Game-Of-Life/mpi/gameoflife

Run the computation

Request nodes for a computation

Now we will ask the Grid'5000 platform to give us access to one node (comprising multiple CPU cores, 32 in our case) for an interactive session. We use also the walltime option to set an upper limit of 1 hour to our session ; after that time the session will be automatically killed.

jcharous@fgrenoble:~/Game-Of-Life$ oarsub -I -l nodes=1,walltime=1
# Filtering out exotic resources (servan, drac, yeti, troll).
OAR_JOB_ID=2344074
# Interactive mode: waiting...
# [2024-06-14 13:41:46] Start prediction: 2024-06-14 13:41:46 (FIFO scheduling OK)

Let’s wait until the scheduler decides to serve our request… be patient. Eventually our request will be picked up from the queue and the scheduler will grant us access to one computation node (dahu-28 in our example):

# [2024-06-14 13:45:13] Start prediction: 2024-06-14 13:45:13 (FIFO scheduling OK)
# Starting...
jcharous@dahu-28:~/Game-Of-Life$

Launch the computation

And finally we can execute the command to launch the computation:

jcharous@dahu-28:~/Game-Of-Life$ mpirun --mca pml ^ucx -machinefile $OAR_NODEFILE ./mpi/gameoflife -n 3200 -m 3200 -max 100

Where:

  • mpirun is the command to launch a MPI application on multiple cpus and cores,
  • --mca pml ^ucx is the set of options to tell Open MPI not to try to use high performance interconnect hardware and avoid a HUGE amount of warnings beeing shown,
  • $OAR_NODEFILE is the list of cpu cores to be used for the computation - this file was generated by the oarsub command in the previous section,
  • -n 3200 -m 3200 -max 100 are the parameters for our application, asking for a grid size of 3200*3200 and 100 generations.

You should see printouts of the matrix at each generation, followed by an information about the total time spent.

Congratulations, you did it :smiley:

What’s next ?

This very simple exercise should give you the basic idea. There are still lot of additionals topics you should explore:

  • Fine tune Open MPI config for performance optimisation,
  • Use OAR batch jobs instead of interactive sessions,
  • Use OAR options to precisely specify the ressources you want, requesting for a specific hardware property (e.g. 2 nodes with an SSD and 256G or more RAM) and/or a specific topology (e.g. 16 cores distributed on 4 different CPUs from the same node)
  • Automate complete workflows including transport of data, executables and/or source code to and from Grid'5000, before and after a computation,
Caution

Grid'5000 does NOT have any BACKUP service for users’ home directories, it is your responsibility to save what needs to be saved in some place outside Grid'5000.

  • Run computations on multiple nodes,
Tip

For this you will need to properly configure the High Performance Interconnect hardware available on the specific nodes that were assigned for your computation, either Infiniband or Omni-Path. See specific subsection in Run MPI On Grid'5000.

  • Customize the software environment, add new packages, deploy specific images,
  • Make use of GPU acceleration,
  • Learn tools for debugging, benchmarking and monitoring
  • …

Guix for HPC

Table of content

This short tutorial summarizes the steps to install Guix on a Linux distribution using systemd as an init system and the additional steps that make it suitable to use in a HPC context.

Install Guix using the provided script.

Download the script and run it as the superuser.

  cd /tmp
  wget 'https://git.savannah.gnu.org/gitweb/?p=guix.git;a=blob_plain;f=etc/guix-install.sh;hb=HEAD' -O guix-install.sh
  chmod +x guix-install.sh
  sudo ./guix-install.sh

You can safely answer yes to all the questions asked by the script.

Tip

If you wish to do the installation manually, the steps are provided in the documentation.

Configure additional Guix channels

Per-user channel configuration in Guix is defined in the file channels.scm, located in $HOME/.config/guix.

The Guix-Science channel contains scientific software that is too specific to be included in Guix.

The Guix-HPC channel contains more HPC-centered software and the ROCm stack definition.

Both these channels have a non-free counterpart containing package definition of proprietary software (e.g. CUDA toolkit) and free software which depends on proprietary software (e.g. packages with CUDA support).

Since the Guix-HPC non-free channel depends on all the above mentioned channels, it can be a good starting point, provided that you don't mind having access to non-free software.

In this case, the following channels.scm file could be used:

  (append
    (list
      (channel
        (name 'guix-hpc-non-free)
        (url "https://gitlab.inria.fr/guix-hpc/guix-hpc-non-free.git")))
    %default-channels)

Tip

The content of the channels.scm file is Scheme code (it is actually a list of channel objects). The %default-channels variable is a list containing the Guix channel and should be used as a base to generate a list of channels.

If you'd like to have both the Guix-HPC and Guix-Science channels without any proprietary software definition, you could use the following channels.scm file:

  (append
   (list (channel
          (name 'guix-science)
          (url "https://codeberg.org/guix-science/guix-science.git")
          (branch "master")
          (introduction
           (make-channel-introduction
              "b1fe5aaff3ab48e798a4cce02f0212bc91f423dc"
              (openpgp-fingerprint
               "CA4F 8CF4 37D7 478F DA05  5FD4 4213 7701 1A37 8446"))))
         (channel
          (name 'guix-hpc)
          (url "https://gitlab.inria.fr/guix-hpc/guix-hpc.git")))
   %default-channels)

Get the latest version of the package definitions

In a shell, launch the following command:

  $ guix pull

This will take some time as this command updates the available channels and builds up the package definitions.

Add the Guix HPC substitute server

In order to avoid building the packages defined in the Guix HPC channels, it is possible to configure the guix-daemon to connect to Guix HPC substitute server which serves precompiled binaries of the software packaged in the Guix HPC channels and is located at https://guix.bordeaux.inria.fr.

This requires two steps: modifying the guix-daemon configuration and adding the new substitute server key to Guix.

Configure the guix-daemon

If you are using Guix System, please refer to the official documentation is available here.

The following instructions apply when Guix is installed on a foreign distribution using systemd.

In order to add a new substitute server, the guix-daemon must be specified the full list of substitute servers, through the --substitute-urls switch. In our case the full list is 'https://guix.bordeaux.inria.fr https://ci.guix.gnu.org https://bordeaux.guix.gnu.org'.

The guix-daemon.service file (generally located in /etc/systemd/system or in /lib/systemd/system/) should be manually edited to add the above-mentioned flag:

  ExecStart=[...]/guix-daemon [...] --substitute-urls='https://guix.bordeaux.inria.fr https://ci.guix.gnu.org https://bordeaux.guix.gnu.org'

The guix-daemon service then needs to be restarted:

  # Reload the configuration.
  sudo systemctl daemon-reload
  # Restart the deamon.
  sudo systemctl restart guix-daemon.service

Authenticate the new substitute server

In order to accept substitutes from the Guix HPC substitute server, its key must be authorized:

  # Download the server key.
  wget https://guix.bordeaux.inria.fr/signing-key.pub
  # Add the key to Guix configuration.
  sudo guix archive --authorize < signing-key.pub
  # Optionally remove the key file.
  rm signing-key.pub

Check that everything is working properly

Run for instance the following command, which instantiates a dynamic environment containing the hello-mpi package defined in the Guix-HPC channel and runs it:

  guix shell hello-mpi -- hello-mpi

Package Managers

This is the landing page for the “Package Managers” sub-section of the site.

Subsections of Package Managers

Notes on Spack

Table of content

Creating Docker images from Spack

Based on Build Caches tutorial.

Create a OCI registry in Inria GitLab

https://docs.gitlab.com/ee/user/packages/container_registry

  1. Go to a repository you are administrator of
  2. Go to Settings > General
  3. Open “Visibility, project features, permissions”, and enable “Container Registry”

Next, we will create some access tokens to push to the registry. Access tokens can be created with different scopes:

https://docs.gitlab.com/ee/user/packages/container_registry/authenticate_with_container_registry.html

To create an Access Token for the project (repository):

  1. Go to the same repository
  2. Go to Settings > Access tokens
  3. “Add new token”
  4. Enable write_registry and read_registry
  5. Take note of your username and token keypair, we will use that in Spack

Crate a Spack environment

A spack environment can be created from a spack.yaml file, with the following contents:

spack:
  view: true
  concretizer:
    unify: true
  packages:
    all:
      require:
      - target=<target>
  mirrors:
    inria:
      url: oci://registry.gitlab.inria.fr/<owner>/<repo>/<cache name>
      access_pair:
      - <user>
      - <token>
      signed: false

You have to fill out:

  • target: fix the compilation target to have binaries that are compatible with the target machine. For example: x86_64_v1, sandybridge, etc.
  • owner and repo for the repository that holds the cache.
  • cache name is a unique name, because a single repo may hold different. caches.
  • user is your GitLab username and token the one we created before.
Warning

Make sure to not git commit your token

Build and push packages to the cache

If we hold the spack.yaml in /path/to/my_environment/spack.yaml:

$ (activate your spack installation)
$ spacktivate -p /path/to/my_environment

# Add some packages to spack.yaml
$ spack add osu-micro-benchmarks

# Verify the concretization
$ spack spec && cat spack.yaml

# Build the packages into disk
$ spack install

After the packages are in disk, we can push our whole environment, or a single package to the buildcache:

$ spack buildcache push --base-image <BASE IMAGE> inria
Important

--base-image <BASE IMAGE> should point to a docker image that macthes your host system. For example, I build my packages from Ubuntu 24.04, so I pass --base-image ubuntu:24.04.

After everything is pushed, Spack will notify you about the OCI urls:

$ spack buildcache push --base-image ubuntu:24.04 --force inria
==> Selected 39 specs to push to oci://registry.gitlab.inria.fr/numpex-pc5/wp3/spack-repo/buildcache-x86_64_v4
...
==> [39/39] Tagged osu-micro-benchmarks@7.4/c6t2x6x as registry.gitlab.inria.fr/numpex-pc5/wp3/spack-repo/buildcache-x86_64_v4:osu-micro-benchmarks-7.4-c6t2x6xfynvaop3ed66rtq3rvb5jaap5.spack

You can use the URI in docker run <uri> or singularity run docker://<uri>

Converting Docker images to Singularity

Simply:

$ singularity build <filename.sif> docker://<uri>

# For example:
$ singularity build osu.sif docker://registry.gitlab.inria.fr/numpex-pc5/wp3/spack-repo/buildcache-x86_64_v4:osu-micro-benchmarks-7.4-c6t2x6xfynvaop3ed66rtq3rvb5jaap5.spack
$ singularity exec ./osu.sif hostname