HPC: GPUs

Affiliation	OMNI Cluster
Free of charge	Free of charge
Difficulty	Medium

Graphical Processing Units (GPUs) are specialized processors that can perform certain operations faster than CPUs thanks to their massively parallel architectures. They are therefore also referred to as accelerators (of which GPUs are only a subgroup). The OMNI cluster has 10 nodes with a total of 24 NVIDIA Tesla V100 GPUs.

In order for a program to be able to use the GPUs, suitable program parts must be extended using appropriate libraries (e.g. CUDA, OpenACC, OpenCL, OpenMP).

On this page we describe how you can request GPU nodes for your jobs. It also describes which software libraries are available to program software for GPUs and how to integrate these libraries.

Requesting GPU nodes on the OMNI cluster

To request a node with GPUs, the queue (partition) gpu must be specified in the job script. In addition, the exact number of GPUs required must also be specified via the --gres=gpu: option.

The 24 GPUs on the OMNI cluster are distributed across the 10 GPU nodes as follows:

Node	Number of GPUs
`gpu-node[001-004]`	4
`gpu-node[005-008]`	1
`gpu-node[009-010]`	2

Example job script header:

#!/bin/bash
#SBATCH --time=0:30:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=2
#SBATCH --partition=gpu
#SBATCH --gres=gpu:2
...

This will request a GPU node with at least 2 GPUs from the GPU queue. Vary --gres=gpu:[1|2|4] depending on how many GPUs you want to use for your use case. In the Slurm documentation
you will also find a number of other parameters with which you can control GPU usage.

GPU programming on the OMNI cluster

Most GPU-compatible modules are not immediately available on the OMNI (and e.g. not visible with module avail ) because they are grouped in a separate software stack, the GPU modules. This is necessary for compatibility reasons. To switch to the GPU stack, you must enter the following command:

module load GpuModules

When the GPU stack is loaded, you can list all modules available in the GPU stack as usual with module avail.

To switch back to the normal software stack, enter :

module unload GpuModules

Please remember that you must also specify the commands for loading the modules in your job scripts.

GPU sharding

To enable a more efficient use of GPU resources on the OMNI cluster, it is now possible to use GPU sharding to run multiple Slurm jobs on one GPU. There are 64 shards available per GPU node, i.e. up to 64 jobs can use the GPUs simultaneously on each GPU node. Shards can be requested via the Slurm parameter --gres=shard:. Please only use as many shards as you actually need for your calculation.

Example job script header for 2 shards:

#SBATCH --time=0:30:00 
#SBATCH --nodes=1 
#SBATCH --ntasks-per-node=2 
#SBATCH --partition=gpu 
#SBATCH --gres=shard:2
...

Please note that the number of available shards, regardless of the number of GPUs, is 64 per GPU node. Therefore, the GPU of a node with only one GPU is divided into 64 shards, but on nodes with 2 GPUs each GPU is only divided into 32 shards. This means that 1 shard on a GPU with more cores has correspondingly more performance. If you want to make sure that your calculations are calculated on a node with a certain number of GPUs, you can exclude the nodes on which you do not want to calculate with --exclude.

Example job script header for nodes with one GPU:

#SBATCH --time=0:30:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=2
#SBATCH --partition=gpu
#SBATCH --gres=shard:2
#SBATCH --exclude=gpu-node[001-004],gpu-node[009-010]
...

1 GPU: --exclude=gpu-node[001-004],gpu-node[009-010]
2 GPUs: --exclude=gpu-node[001-008]
4 GPUs: --exclude=gpu-node[005-010]

Further notes

If an entire GPU with gres=gpu:1 is allocated on a node with two GPUs, 32 shards are used proportionally.
With 64 shards and 256 GB of memory, slightly less than 4 GB of memory is available per shard.