Using Grid'5000
The purpose of this tuto is to let you experiment the Grid'5000 platform, which is a large-scale and flexible testbed for experiment-driven research in all areas of computer science, with a focus on parallel and distributed computing including Cloud, HPC and Big Data and AI.
As an example we will try to run an implementation of Conway’s Game of Life using Message Passing Interface (MPI) for parallelization.
Set up a Grid'5000 account
To request an account on Grid’5000 fill that form and select the appropriate Group Granting Access, Team and Project. Members of the NumPEx-PC5 Team should use the values documented here.
Then make sure to generate a SSH keypair on your PC and to upload the public key on Grid'5000 ; this will allow direct connection using ssh from your PC. Detailled explanations are given here.
Read the documentation
A very extensive documentation is available on Grid'5000 User Portal. For that tutorial you may start with these two articles:
If you are not familiar with MPI you might also have a look here.
Prepare the work
Connect to one Grid'5000 site
If you applied the correct SSH configuration on your PC (see here), you should be able to connect directly to a given Grid'5000 front-end, let’s say for instance Grenoble, with a simple ssh command :
Build the example application
Let’s first retrieve the original source code by cloning the Github repository:
Tip
There is a distinct home directory on each Grid'5000 site, so what has been stored in Grenoble will not be available if you connect to Lyon or Nancy.
To generate a more verbose output, you might want to uncomment lines 257 to 264 of the file Game-of-Life/mpi/game.c - thus a subset of the matrix will be printed at each generation.
Next step is to build the application:
Resulting executable is available at ~/Game-Of-Life/mpi/gameoflife
Run the computation
Request nodes for a computation
Now we will ask the Grid'5000 platform to give us access to one node (comprising multiple CPU cores, 32 in our case) for an interactive session. We use also the walltime option to set an upper limit of 1 hour to our session ; after that time the session will be automatically killed.
Let’s wait until the scheduler decides to serve our request… be patient.
Eventually our request will be picked up from the queue and the scheduler will grant us access to one computation node (dahu-28 in our example):
Launch the computation
And finally we can execute the command to launch the computation:
Where:
mpirunis the command to launch a MPI application on multiple cpus and cores,--mca pml ^ucxis the set of options to tell Open MPI not to try to use high performance interconnect hardware and avoid a HUGE amount of warnings beeing shown,$OAR_NODEFILEis the list of cpu cores to be used for the computation - this file was generated by theoarsubcommand in the previous section,-n 3200 -m 3200 -max 100are the parameters for our application, asking for a grid size of 3200*3200 and 100 generations.
You should see printouts of the matrix at each generation, followed by an information about the total time spent.
Congratulations, you did it 👋
What’s next ?
This very simple exercise should give you the basic idea. There are still lot of additionals topics you should explore:
- Fine tune Open MPI config for performance optimisation,
- Use OAR batch jobs instead of interactive sessions,
- Use OAR options to precisely specify the ressources you want, requesting for a specific hardware property (e.g. 2 nodes with an SSD and 256G or more RAM) and/or a specific topology (e.g. 16 cores distributed on 4 different CPUs from the same node)
- Automate complete workflows including transport of data, executables and/or source code to and from Grid'5000, before and after a computation,
Caution
Grid'5000 does NOT have any BACKUP service for users’ home directories, it is your responsibility to save what needs to be saved in some place outside Grid'5000.
- Run computations on multiple nodes,
Tip
For this you will need to properly configure the High Performance Interconnect hardware available on the specific nodes that were assigned for your computation, either Infiniband or Omni-Path. See specific subsection in Run MPI On Grid'5000.
- Customize the software environment, add new packages, deploy specific images,
- Make use of GPU acceleration,
- Learn tools for debugging, benchmarking and monitoring
- …