HPC Environment
This is the landing page for the “HPC Environment” sub-section of the site.
This is the landing page for the “HPC Environment” sub-section of the site.
Spack (https://spack.io) is a package manager for Linux that makes it easy to deploy scientific software on computers. One of the main differences from apt or dnf is that Spack can be installed at the user level, rather than globally by the system administrator. To install Spack, you only need to clone a GitHub repository, which can then be loaded and used to install packages that are usually installed at the system level, such as GCC, CUDA, and others.
Spack is also a source-based package manager. This means that the primary method of installing packages is to have them built from source by Spack itself. For example, if we want to install CMake, Spack will fetch the source code and build it locally — along with every dependency of CMake. In contrast, a binary-based package manager like APT will download the pre-built CMake from Debian’s servers, which was compiled in their server farm. Source-based package managers have several advantages over binary-based ones, especially in the context of supercomputers:
Optimization flags: When building C/C++ packages, you can apply compiler flags to optimize the build for specific CPU microarchitectures. This may make a binary for x86-64 unusable on other x86-64 machines if the compiler adds CPU instructions that are not present on older processors. Binary-based package managers must make a compromise by ensuring their pre-built packages are generic enough for many CPU variants, while sacrificing potential performance.
Fine control of version dependencies: Scientific software often relies on specific versions of libraries. A source-based package manager allows you to rebuild all reverse-dependencies of a package when you change its version, while you are very limited when using a binary-based one.
Different variants of the same package: Similar to having multiple versions of the same package, packages can be compiled with different sets of features, such as support for CUDA or ROCm, support for specific XML libraries, etc. This can be solved by binary-based package managers by providing multiple package variants that conflict with each other, but it is more cleanly solved by configuring your package variants in Spack.
On the other hand, one of the main issues with source-based package managers, including Spack, is the time required to compile packages. Because the entire dependency chain must be built from source, it can take a considerable amount of time and compute resources to prepare your toolchain. This can be an even worse problem when iterating on different package variants or versions to configure your environment.
To overcome this problem, this post discusses how to bridge the best of both worlds for Spack package management: using a binary cache. By using a binary cache, we can “pre-build” packages for an environment on our cloud platform, so that when we call spack install, it will attempt to use pre-built packages. Spack remains a source-based package manager, and if it doesn’t find a binary for a specific package, it can still build it from source.
By using a binary cache, we can provide users with an easy onboarding experience with Spack while retaining the features that make Spack useful, such as changing package variants, versions, or optimization levels.
Before discussing how to set up a binary cache, let’s see how the binary cache is used from a user’s perspective.
In Spack, we will see the name “mirror” to refer to binary caches.
To add a binary cache, the user simply needs to run a command. It is also possible to include the mirror as part of a Spack environment instead:
Finally, to install packages from the mirror, the user simply calls spack install as usual. As Spack installs packages, it will check if the package is cached in the mirror. If it is, it will download it instead of building from source.
It is important to understand that Spack checks if a package is cached by comparing the package’s spec hash. For example:
Spack will only download a CMake package from the cache if there is one with the same hash. The hash is calculated from the version of CMake, the build flags, the system architecture, and the hashes of all of its dependencies.
This means packages in a binary cache must match the user’s system to be downloadable (Debian 11, RHEL 9, etc).
To populate a binary cache, it is important to identify:
To serve packages from a binary cache, Spack doesn’t use any custom-made system that requires you to host a solution. Instead, it relies on a Docker Container Registry. To be clear: we don’t do anything related to containers. Spack pushes packages as if they were container layers to a container registry and pulls them transparently.
The benefit of this approach is that there are many cloud providers that offer container registry solutions:
The URL format for the Spack cache would be:
oci://ghcr.io/<username>/<repository>/<mirror name>oci://<hostname>/numpex-pc5/<group>/<subgroup>/<repository>/<mirror name>To be able to push to the cache, you will need a username and password. Since these are the same credentials you would use for the container registry of your choice, there should be documentation available for it. You can configure the credentials for the mirror as follows:
We have a quick guide to configure the built-in container registry for GitLab in the following link: Setup a Container Registry on GitLab.
After we have configured our mirror to push packages to, we need to build the packages themselves. The first option is to build them locally on your PC for debugging purposes. This is useful for getting comfortable with the tools and checking that pushing and pulling work as expected. Packages can be pushed with the following command:
To automate this process, we can use a CI solution like GitHub Actions or GitLab pipelines. The checklist of elements you will need to consider includes:
padded_lengthA simple Spack environment like the following should be enough to get started:
We mark all packages to target x86_64_v2 (or whichever architecture you want to target) so that Spack doesn’t try to autodetect the architecture of the host system, but rather uses the explicitly set value. Otherwise, you may encounter the following problem:
x86_64_v4 because of the CI systemAs for padded_length: when you build a package, the build system will insert references to the absolute paths of other packages it depends on. The problem is that Spack can be installed to any path, so it must perform relocation.
Relocation is a process where a pre-built package gets its references replaced, for example /home/runner/spack/bin/cmake → /home/myuser/spack/bin/cmake. When dealing with an actual binary, this string will be embedded at some location:
The problem arises when you want to replace it with a string longer than the original: you cannot assume that there will be free space after the original string. Therefore, the only operation you can safely perform is to shrink a string. By using padded_length, Spack will artificially create paths with a specified number of padding characters, so that if the number is large enough, we can assume that paths will always be shortened. For example, it will install packages to /home/runner/spack/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/bin/cmake instead (I simplified the actual installation paths for clarity).
Regarding the spack.lock file, to enable users to easily download from the cache, I would recommend committing and sharing the spack.lock with users of the cache. The two possibilities are:
CI builds packages, but the spack.lock is not preserved: when users concretize the environment on their own, they may or may not get the same versions and hashes that are actually cached, since a spack.yaml doesn’t guarantee reproducibility.
CI builds packages and pushes the spack.lock back into the environment: users won’t have to re-concretize the environment, so they will get the hashes for the packages that were built and cached by CI.
To preserve the spack.lock from CI, your caching workflow might look like this:
spack.lock if neededspack.lock back to the repositoryFinally, when populating a Spack binary cache, it is important to consider the Linux distribution to target. Since Spack doesn’t bootstrap itself from glibc, it actually links its packages to the system’s glibc. This adds an implicit dependency on the system and is reflected in the package spec. For example, if I concretize a package:
This package is concretized for ubuntu24.04. Therefore, if you want to deploy pre-built Spack packages for an OS like Debian 11, you must ensure that you concretize and build the environment on Debian 11 as well.
This can be easily accomplished with a container, and both GitHub and GitLab provide means of running action steps under a container:
If you need more help with creating your Spack binary cache to accelerate deployments on HPC centers, please get in contact with us !
This guide applies the workflow presented at Modern HPC Workflow with Containers to run an application container built with Guix.
This tutorial will focus on using Grid5000 for both building the container with Guix and deploying it with Singularity, as it provides both tools.
The container may be built on any computer with Guix installed. You may refer to the documentation if you wish to install Guix on your machine. Beware that if you build it on your local machine, you’ll have to copy it to Grid5000.
Additional instructions will be provided for deployment on Jean-Zay, that can be easily adapted to any cluster supporting Singularity and using SLURM as job management system.
The application chosen as an example is Chameleon, a dense linear algebra software for heterogeneous architectures that supports MPI and NVIDIA GPUs through CUDA or AMD GPUs through ROCm.
x86_64 CPU (for
Singularity). For instance, the chifflot queue, located in Lille,
contains nodes with NVIDIA P100 GPUs.chameleon-cuda
package (the chameleon package variant with CUDA support) is
defined in the Guix-HPC
non-free
channel, which is not activated by default.The channels.scm file contains the following:
guix pack
command, prefixed with guix time-machine in order to use our
channels.scm file. The -r option creates a symbolic link to the
resulting container image in the Guix store, as chameleon.sif.guix pack can generate different formats, like
Singularity (squashfs), Docker or relocatable binaries.
Singularity needs bash to be in the package list.
CUDA applications deployed with Guix need LD_PRELOAD to
be set with the path to libcuda.so since the library is
provided by the proprietary CUDA driver, installed on the
machine, and not part of the Guix software stack.
The OPENBLAS_NUM_THREADS environment variable is set to
improve the computation performance and not compulsory.
$SINGULARITY_ALLOWED_DIR) in
order to be accessible to Singularity. This step is specific to
Jean-Zay, more details in the
documentation.
Then the singularity module needs to be loaded (this step is not
always necessary, depending on the supercomputer, but is not
specific to Jean-Zay).Environment variables are propagated to the Singularity container
context, but since the path to libcuda.so doesn’t exist outside of the
container context (the path in bind-mounted by Singularity due to the --nv
flag) it leads to an error when LD_PRELOAD is declared outside of the
container context.
singularity command is
available through a module and the module command is only
accessible on a compute node.In this example, we use a single node. In order to use multiple nodes, a
script should be submitted using sbatch (not covered in this tutorial).
pcocc-rs.The TGCC uses a specific tool to deploy Docker images called
pcocc-rs. See the
documentation.
On Irene, resources are allocated using ccc_mprun. See the documentation.
For instance, the -s option spawns an interactive session directly on a compute node.
On Irene, the number of allocated GPUs is directly related to the number of allocated cores on the node. Here, 20 cores are allocated on a V100 which contains 40 cores in total, so 50% of the GPUs available on the node (4 x V100) are allocated. See the documentation.
The --module nvidia option make the CUDA libraries available inside the
image in the /pcocc/nvidia/usr/lib64 folder.
Before being able to use a custom Singularity image, it has be manually copied to an authorized path by the support, which should be contacted by email. See the documentation.
For machines where Singularity is not available (or you have to ask
support to deploy your custom image), an alternative can be the
relocatable binary archive. The command below generates an archive
containing chameleon-hip for AMD GPUs that can be run on e.g.
Adastra:
This archive can then be uploaded to a supercomputer (e.g. Adastra) and deployed:
This is the second part of the Worklow Tutorial. In the previous example we show how to use Singularity and Guix for our running example, Chameleon, on HPC clusters (Modern HPC Workflow Example (Guix)).
This tutorial relies on a GitLab access token for the registry. Since the tutorial took place, this token has expired.
In this second part, we will use Spack instead of Guix. We will also produce Spack-generated containers, for easy reproducibility of the workflow across different computers.
In summary, we are going to:
There are 2 ways to generate containers with Spack:
spack containerize: https://spack.readthedocs.io/en/latest/containers.html#a-quick-introductionThe containerize option has a number of drawbacks, so we want to push with the
Build Caches option. This also has the benefit of being able to build and cache
packages on CI/CD, allowing for quicker deployments.
The Spack build cache will require setting up a container registry, in some Git Forge solution. Both GitHub and GitLab provide their own Container Registry solutions. This guide presents how to create it: Setup a Container Registry on GitLab.
For this tutorial, we will use the container registry hosted at Inria’s GitLab.
We will connect to the Lillie site on Grid'5000, exactly the same as with the Guix guide.
If you are having trouble at any step, you can skip this and download the container directly:
Spack is installed at the user level. To install Spack, you have to clone the
Spack repo, and load it with source:
We will create an Spack environment, that holds our configuration and installed
packages. The Spack environment will create a spack.yaml file, which we will
edit:
Open the ./myenv/spack.yaml with your favorite editor, and you will see something
like this:
We will perform 3 modifications:
inria-pull is a mirror I populated with caches of the packages for the
tutorial.inria-<name> is a mirror you will use to push the packages you build, as
an example.Change inria-<name> and the URL .../buildcache-<name> to a unique name.
You will push to this cache as an example, so we don’t collide between
each other. You can use your G5k login, for example.
Edit the spack.yaml file and save it. After the environment has been modified,
we call spack concretize to “lock” our changes (to a spack.lock file). We
can use spack spec to preview the status of our environment. It will show the
packages we are missing to be built.
spack concretize locks the characteristics of the environment to the current
machine. We are concretizing on the frontend node for convenience, and to be
able to test our packages in it.
The next step is to build the packages. Usually, this is a CPU-intensive job.
Let’s move to a CPU node of G5k for this (chiclet):
To build our software stack, just call spack install. We have configured a
pull-only build cache previously, so packages will not be re-compiled:
You may want to check that everything was built, by running spack spec again:
After the packages have been built, let’s push them into the container registry.
To push our packages to be used as containers, we must add the
--base-image flag. As Spack doesn’t built everything from the bottom, we must
provide a base image, from which the libc library will be taken. You must
match your --base-image to the system that built the packages. We have built
the packages under Grid'5000 Debian 11 installation, so the base image should be
a Debian 11 too. Not matching this, or not passing --base-image will render
the push unusable.
Because Docker might put a rate-limit on the pulls of an image, and we are
sharing the same IP address (10 downloads per hour per IP), I mirrored the
Debian 11 image to the Inria registry. Please use this image instead (otherwise,
the command would be --base-image debian:11):
Annotate the URL that Spack gives you
Because Singularity might use heavy CPU/Memory resources, we build the container Image while we are in the compute node. The output is a SIF file (Singularity Image Format).
The commands for running the container on other machines like Jean-Zay, Vega, etc; will be the same as in the Guix Tutorial.
We will demonstrate how to run the container on the GPU partition of Grid'5000.
Before trying the container, we can try our Spack-installed Chameleon directly:
To use the singularity container:
Software deployment in HPC systems is a complex problem, due to specific constrains, such as:
As users develop more complex software, their needs for extra dependencies increase. The classical solution to providing extra software to the user involves modules. Modules can be loaded from the terminal of a user, and are managed by the HPC admin team.
This is solution has some shortcomings:
In order to solve the above mentioned issues and in the view of a future of Exascale computing, we propose a shift in the paradigm of software deployment, from the classical way, where the admin team provides the software stack for the users, to a new procedure where the user brings their own software stack.
This method has a number of advantages, among the following:
Singularity is an application that can run containers in an HPC environment. It is highly optimized for the task, and has interoperability with Slurm, MPI or GPU specific drivers.
Usually, we find a duplicity of software stacks, and platforms to deploy to:
Containers (Singularity or Docker) solve this by having a single interface that merges everything. From the software stack, the container is the platform to deploy to. From the platform point of view, software comes bundled as a container:
Singularity uses its own container format (sif), which can also be
transparently generated from a Docker container.
Singularity is available in the majority of Tier-1 and Tier-0 HPC centers, either in the default environment or loaded from a module:
Singularity can download and run a container image directly from an online
container registry such as DockerHub using the
docker:// reference:
This feature is not available in all clusters.
See also documentation about the Github Container Registry (GHCR) for setting up a Github hosted registry.
Using containers through Singularity can provide a solution to some of the points mentioned in the previous section, but also transfers to the user the task to build a container with the specific software stack they need.
Building a container can be streamlined using package managers.
In our approach, we selected two package managers to build the containers: Guix and Spack.
GNU Guix is a package manager for GNU/Linux systems. It is designed to give users more control over their general-purpose and specialized computing environments, and make these easier to reproduce over time and deploy to one or many devices. (source: Guix official website)
Spack is a package manager for supercomputers, Linux, and macOS. It makes installing scientific software easy. Spack isn’t tied to a particular language; you can build a software stack in Python or R, link to libraries written in C, C++, or Fortran, and easily swap compilers or target specific microarchitectures. (source: Spack official website)
A key feature of the Spack package manager is that it allows users to integrate parts of the system they are building on: Spack packages can use compilers or link against libraries provided by the host system. Use of system-provided software is even a requirement at the lowest level of the stack.
Guix differs from Spack in two fundamental ways: self containment, and support for reproducibility and provenance tracking. Self containment stems from the fact that Guix packages never rely on software pre-installed on the system; its packages express all their dependencies, thereby ensuring control over the software stack, wherever Guix deploys it. This is in stark contrast with Spack, where packages may depend on software pre-installed on the system.
Unlike Spack, Guix builds packages in isolated environments (containers), which guarantees independence from the host system and allows for reproducible builds. As a result, reproducible deployment with Guix means that the same software stack can be deployed on different machines and at different points in time—there are no surprises. Conversely, deployment with Spack depends on the state of the host system.
TODO: Comparative table of features
Guix is a package manager for Linux focused on the reproducibility of its artifacts. Given a fixed set of package definitions (a list of channels at a specific commit in Guix terminology), Guix will produce the same binaries bit-by-bit, even after years between experiments.
The Guix project itself maintains a list of package definitions installed together with the package manager tool.
For some specific scientific packages, it might be necessary to include extra package definitions from third-party channels: a list of science-related channels can be found here.
Note that these channels contain only FOSS-licensed packages. In order to access to package definitions of proprietary software or of software that depend on non-free software, the following channels could be included:
The Guix package manager is able by itself to instantiate a containerized environment with a set of packages using the guix shell --container command.
Unfortunately, Guix is not yet available on Tier-1 and Tier-0 supercomputers, but it can be used to generate a Singularity image locally before deploying it on a supercomputer. This gives the user both the reproducibility properties of the Guix package manager and the portability of Singularity containers.
To get started, install Guix or connect to a machine with a Guix installation (Grid5000 for example).
Guix generates Singularity images with the guix pack -f squashfs command, followed by a list of packages. For example, the following command would generate a Singularity image containing the bash and gcc-toolchain packages:
The image can be configured with an entry point, allowing to directly start an arbitrary program when called with the run subcommand of Singularity. This is done using the --entry-point flag:
In order to easily find the generated image, the -r flag creates a link to the
image (along with other actions):
The image can be then transfered to the target supercomputer and run using Singularity. Below is an example on LUMI:
Instead of specifying the list of packages on the command line, the packages can
be specified through a manifest file. This file can be written by hand or
generated using the command guix shell --export-manifest. Manifests are useful
when dealing with a long list of packages or package transformations. Since they
contain code, they can be used to perform a broad variety of modifications on
the package set such as defining package variants or new packages that are
needed in a specific context. The example below generates a simple
manifest.scm file containing the bash and hello packages:
This manifest file can be then used to generate the same Singularity image as above with the following command:
The command guix describe -f channels generates a channels file that is used
to keep track of the current state of package definitions:
Both files channels.scm and manifest.scm should be kept under version
control and are sufficient to generate an image containing the exact same
software stack down to the lib C, with the exact same version and compile
options, in any machine where the guix command is available, using the command
guix time-machine:
Note that in order to generate the exact same file (bit-for-bit identical), the
same image specific options such as --entry-point have to be specified.
Spack is a package manager specifically targeted at HPC systems. One of its selling points is that it can easily target specific features of the supercomputer, like compiler, CPU architecture, configuration, etc.
Unlike Guix, Spack can be installed directly on a supercomputer by the user, as
it only requires git clone in the home directory. There are some problems with
this:
Instead of using Spack directly on the supercomputer, it is possible to use Spack to generate Singularity or Docker containers. Once the container is generated, the same environment will be able to deployed to any machine.
To generate the container, Spack documents 2 ways:
Going with the build cache approach, you need to:
The argument --base-image <image> must be passed, and should match the host
system used to build the packages.
Once Spack pushes the package, we can go into our supercomputer and run the container directly:
The examples below follow the methodology described in this document to deploy containers on supercomputers:
Both Spack and Guix expose command-line mechanisms for package customization.
Guix uses so-called package transformations.
Spack uses the specification mechanism.
In order to generate a binary optimized for a specific CPU micro-architecture,
the --tune flag can be passed to a variety of Guix commands:
For Spack, this can be done by adding the target specification on the command-line:
Spack also is capable of easily configuring the CFLAGS for a package:
The 3 aspects of concern when getting the best performance with MPI and Containers are:
After many tests, we have concluded that Singularity doesn’t seem to pose an issue against performance. Although the benchmark figures don’t indicate any significant performance loss, the user is expected to compare the performance with their own software to run.
If the MPI drivers aren’t properly detected, the performance figures for
benchmarks will be orders of magnitude different, as this usually means falling
back to the TCP network stack instead of using the high-performance network. The
network driver for MPI is controlled with MCA parameters --mca key value.
Usually MPI detects the driver automatically, but you can force some driver with
--mca pml <name>, or to debug if MPI is selecting the proper driver. This is
further explained in Notes on MPI.
Regarding the actual MPI installation, a generic OpenMPI installation usually can get performance figures in the same order of magnitudes as the MPI installation provided by the admin team, provided the network driver is properly selected. If the user has the technical expertise, the MPI installation can be passed-through the container and replaced at runtime. More investigation around the viability of this method is to be done.
Singularity allows passing through the graphics cards to the containers, with
the --nv and --rocm flags.
Spack packages that may support CUDA, have the +cuda specification that can
be enabled. Additionally, other packages support specifying the cuda
architecture with cuda_arch=<arch>. ROCM support is also provided in
selected packages through the +rocm spec.
Guix provides CUDA packages through the Guix-HPC Non-free repository. This contains package variants with CUDA support. ROCM software stack and package variants are hosted in the regular Guix-HPC channel.
When building a container or a software environment, we usually make the distinction between “application” and “development” containers:
Because of this, there are 2 separate usecases for a container:
This workflow provides some flexibility on how to use the tools proposed. Other alternative ways are:
Using Spack natively:
Using Guix natively:
Using Singularity containers not built with Guix or Spack:
Guix relocatable binaries:
guix pack, which produces a single file
that can be run without Singularity.The following list describes the platform support for the supercomputers we have tested the workflow on, and any caveats encountered.
| Supercomputer | High-speed Network | CPU | GPU | Singularity support? |
|---|---|---|---|---|
| Jean-Zay | InfiniBand | Intel x86-64 | Nvidia (CUDA) | ✅* |
| Adastra | Cray | AMD x86-64 | AMD | ✅* |
| Irene | InfiniBand | Intel x86-64 | Nvidia P100 (CUDA) | ❌* |
| LUMI | Cray | AMD x86-64 | AMD MI250X (ROCM) | ✅ |
| Vega | InfiniBand | Intel x86-64 | Nvidia A100 (CUDA) | ✅ |
| Meluxina | InfiniBand | AMD x86-64 | Nvidia A100 (CUDA) | ✅ |
Containers must be placed in the “allowed directory” with idrcontmgr:
Singularity is not supported. Instead, a Docker-compatible runtime pcocc-rs
is provided.
Guix images must be genrated with -f docker instead.
The admin team has to verify each container image before use.
If quick deployment is required, itis also possible to use Guix relocatable binaries or a native spack installation. Guix can generate relocatable binaries with:
MPI uses the MCA (Modular Component Architectures) as a framework for configuration for different run-time parameters of an MPI application.
MCA parameters can be adjusted with the flag:
There are 2 ways to debug which MCA parameters are used:
ompi_info --all will display all the MCA parameters that are avaiable a priori.mpi_show_mca_params MCA parameters can be set to all, default,
file, api or enviro to display their selected value. Sometimes thy will
just show as key= (default), which is not useful.There are 3 modes for MPI to select networks 1: ob1, cm and ucx, that can be
set with --mca pml <ob1,cm,ucx> (PML: Point-to-point Message Layer).
ucx manages the devices on its own. It should be used for InfiniBand
networks. UCX can be further configured with ucx-specific env variables, for
example mpirun --mca pml ucx -X UCX_LOG_LEVEL=debug ....ob1 is the multi-device, multi-rail engine and is the “default” choice. It is
configured with --mca pml ob1. It used different backends for the
Byte-Transport-Layer (btl), which can be configured with --mca btl <name>,
such as:
tcpselfsm shared memoryofi Libfabric, alternate wayuct UCX, alternate waycm can interface with “matching” network cards that are MPI-enabled. It uses
MTL’s (not BTL’s) which can be set with --mca mtl <name>
psm2 Single-threaded Omni-Pathofi LibfabricIn short: ucx provides the performance for InfiniBand, cm can be used for
specific setups, and ob1 as the fallback for low-performance TCP or
local-device. libfabric can be used through cm or ob1.
TODO: Discuss MCA transports for CUDA
The purpose of this tuto is to let you experiment the Grid'5000 platform, which is a large-scale and flexible testbed for experiment-driven research in all areas of computer science, with a focus on parallel and distributed computing including Cloud, HPC and Big Data and AI.
As an example we will try to run an implementation of Conway’s Game of Life using Message Passing Interface (MPI) for parallelization.
To request an account on Grid’5000 fill that form and select the appropriate Group Granting Access, Team and Project. Members of the NumPEx-PC5 Team should use the values documented here.
Then make sure to generate a SSH keypair on your PC and to upload the public key on Grid'5000 ; this will allow direct connection using ssh from your PC. Detailled explanations are given here.
A very extensive documentation is available on Grid'5000 User Portal. For that tutorial you may start with these two articles:
If you are not familiar with MPI you might also have a look here.
If you applied the correct SSH configuration on your PC (see here), you should be able to connect directly to a given Grid'5000 front-end, let’s say for instance Grenoble, with a simple ssh command :
Let’s first retrieve the original source code by cloning the Github repository:
There is a distinct home directory on each Grid'5000 site, so what has been stored in Grenoble will not be available if you connect to Lyon or Nancy.
To generate a more verbose output, you might want to uncomment lines 257 to 264 of the file Game-of-Life/mpi/game.c - thus a subset of the matrix will be printed at each generation.
Next step is to build the application:
Resulting executable is available at ~/Game-Of-Life/mpi/gameoflife
Now we will ask the Grid'5000 platform to give us access to one node (comprising multiple CPU cores, 32 in our case) for an interactive session. We use also the walltime option to set an upper limit of 1 hour to our session ; after that time the session will be automatically killed.
Let’s wait until the scheduler decides to serve our request… be patient.
Eventually our request will be picked up from the queue and the scheduler will grant us access to one computation node (dahu-28 in our example):
And finally we can execute the command to launch the computation:
Where:
mpirun is the command to launch a MPI application on multiple cpus and cores,--mca pml ^ucx is the set of options to tell Open MPI not to try to use high performance interconnect hardware and avoid a HUGE amount of warnings beeing shown,$OAR_NODEFILE is the list of cpu cores to be used for the computation - this file was generated by the oarsub command in the previous section,-n 3200 -m 3200 -max 100 are the parameters for our application, asking for a grid size of 3200*3200 and 100 generations.You should see printouts of the matrix at each generation, followed by an information about the total time spent.
Congratulations, you did it 👋
This very simple exercise should give you the basic idea. There are still lot of additionals topics you should explore:
Grid'5000 does NOT have any BACKUP service for users’ home directories, it is your responsibility to save what needs to be saved in some place outside Grid'5000.
For this you will need to properly configure the High Performance Interconnect hardware available on the specific nodes that were assigned for your computation, either Infiniband or Omni-Path. See specific subsection in Run MPI On Grid'5000.
This short tutorial summarizes the steps to install Guix on a Linux
distribution using systemd as an init system and the additional
steps that make it suitable to use in a HPC context.
Download the script and run it as the superuser.
cd /tmp
wget https://guix.gnu.org/install.sh -O guix-install.sh
chmod +x guix-install.sh
sudo ./guix-install.shYou can safely answer yes to all the questions asked by the script.
If you wish to do the installation manually, the steps are provided in
the documentation.
Tip
Per-user channel configuration in Guix is defined in the file
channels.scm, located in $HOME/.config/guix.
The Guix-Science channel contains scientific software that is too specific to be included in Guix.
It has a non-free counterpart containing package definition of proprietary software (e.g. CUDA toolkit) and free software which depends on proprietary software (e.g. packages with CUDA support).
Since the Guix-Science-nonfree channel depends on the Guix-Science channel, it can be a good starting point, provided that you don't mind having access to non-free software.
In this case, the following channels.scm file could be used:
(append
(list
(channel
(name 'guix-science-nonfree)
(url "https://codeberg.org/guix-science/guix-science-nonfree.git")
(introduction
(make-channel-introduction
"58661b110325fd5d9b40e6f0177cc486a615817e"
(openpgp-fingerprint
"CA4F 8CF4 37D7 478F DA05 5FD4 4213 7701 1A37 8446")))))
%default-channels)
The content of the
Tip
channels.scm file is Scheme code (it is actually
a list of channel objects). The %default-channels variable is a
list containing the Guix channel and should be used as a base to
generate a list of channels.
If you'd like to have the Guix-Science channel without any proprietary
software definition, you could use the following channels.scm file:
(append
(list (channel
(name 'guix-science)
(url "https://codeberg.org/guix-science/guix-science.git")
(branch "master")
(introduction
(make-channel-introduction
"b1fe5aaff3ab48e798a4cce02f0212bc91f423dc"
(openpgp-fingerprint
"CA4F 8CF4 37D7 478F DA05 5FD4 4213 7701 1A37 8446")))))
%default-channels)In a shell, launch the following command:
$ guix pullThis will take some time as this command updates the available channels and builds up the package definitions.
In order to avoid building the packages defined in the Guix HPC
channels, it is possible to configure the guix-daemon to connect to
Guix HPC substitute server which serves precompiled binaries of the
software packaged in various channels, including the Guix-Science
channels, and is located at https://guix.bordeaux.inria.fr.
This requires two steps: modifying the guix-daemon configuration and
adding the new substitute server key to Guix.
guix-daemon
If you are using Guix System, please refer to the official documentation is available here.
The following instructions apply when Guix is installed on a foreign
distribution using systemd.
In order to add a new substitute server, the guix-daemon must be
specified the full list of substitute servers, through the
--substitute-urls switch. In our case the full list is
'https://guix.bordeaux.inria.fr https://ci.guix.gnu.org
https://bordeaux.guix.gnu.org'.
The guix-daemon.service file (generally located in
/etc/systemd/system or in /lib/systemd/system/) should be manually
edited to add the above-mentioned flag:
ExecStart=[...]/guix-daemon [...] --substitute-urls='https://guix.bordeaux.inria.fr https://ci.guix.gnu.org https://bordeaux.guix.gnu.org'
The guix-daemon service then needs to be restarted:
# Reload the configuration.
sudo systemctl daemon-reload
# Restart the deamon.
sudo systemctl restart guix-daemon.serviceIn order to accept substitutes from the Guix HPC substitute server, its key must be authorized:
# Download the server key.
wget https://guix.bordeaux.inria.fr/signing-key.pub
# Add the key to Guix configuration.
sudo guix archive --authorize < signing-key.pub
# Optionally remove the key file.
rm signing-key.pub
Run for instance the following command, which instantiates a dynamic
environment containing the hello-mpi package defined in the
Guix-Science channel and runs it:
guix shell hello-mpi -- hello-mpi
Due to user namespaces set up, using guix shell with the --container or -C option may fail with an error like:
$ guix shell --container coreutils guix shell: error: clone: 2114060305: Invalid argument
User namespaces are crucial
for achieving process and resource isolation and are indispensable for containerization.
For security concern they are disabled by default on certain Debian and Ubuntu distributions,
so that non-root users are not allowed to create or handle user namespaces, and the
setting of the user.max_user_namespaces to 0 causes the guix shell --container to fail.
To enable the user namespaces temporarily run:
sudo sysctl user.max_user_namespaces = 1024For the change to be persistent after reboot:
echo "user.max_user_namespaces = 1024" | sudo tee /etc/sysctl.d/local.conf
sudo service procps force-reload
sudo sysctl --system
In the above settings, the parameter is set to 1024. Note that any non-zero integer would be relevant.
An alternative method for enabling the user namespaces, which is specific to Debian and Ubuntu distributions,
is to set kernel.unprivileged_userns_clone=1.