Notes on MPI

Table of content

MCA parameters

MPI uses the MCA (Modular Component Architectures) as a framework for configuration for different run-time parameters of an MPI application.

MCA parameters can be adjusted with the flag:

$ mpirun --mca <name> <value> ...

There are 2 ways to debug which MCA parameters are used:

  1. ompi_info --all will display all the MCA parameters that are avaiable a priori.
  2. The mpi_show_mca_params MCA parameters can be set to all, default, file, api or enviro to display their selected value. Sometimes thy will just show as key= (default), which is not useful.

Network drivers

There are 3 modes for MPI to select networks 1: ob1, cm and ucx, that can be set with --mca pml <ob1,cm,ucx> (PML: Point-to-point Message Layer).

  • ucx manages the devices on its own. It should be used for InfiniBand networks. UCX can be further configured with ucx-specific env variables, for example mpirun --mca pml ucx -X UCX_LOG_LEVEL=debug ....
  • ob1 is the multi-device, multi-rail engine and is the “default” choice. It is configured with --mca pml ob1. It used different backends for the Byte-Transport-Layer (btl), which can be configured with --mca btl <name>, such as:
    • tcp
    • self
    • sm shared memory
    • ofi Libfabric, alternate way
    • uct UCX, alternate way
  • cm can interface with “matching” network cards that are MPI-enabled. It uses MTL’s (not BTL’s) which can be set with --mca mtl <name>
    • psm2 Single-threaded Omni-Path
    • ofi Libfabric

In short: ucx provides the performance for InfiniBand, cm can be used for specific setups, and ob1 as the fallback for low-performance TCP or local-device. libfabric can be used through cm or ob1.

TODO: Discuss MCA transports for CUDA