HPC: System information

HPC system information

The cluster extends over 9 water-cooled double racks (cabinets) in the New Data Center on the Hölderlinstraße campus. Some details of the cluster hardware are listed below.

Nodes

The cluster is divided into nodes. Each node has several cores (CPUs) and a local working memory. Access is via four login nodes(hpc-login01 to hpc-login04). The hard disk storage is central, the individual nodes have no hard disks.

Compute nodes

The cluster has 439 regular compute nodes. Each of these nodes has 64 cores and 256 gigabytes of RAM. The names of the nodes are hpc-node001-hpc-node439. An additional eight computing nodes with the names fat-node001-fat-node008 are each equipped with 512 gigabytes of RAM. All computing nodes have the same hardware architecture. Each node consists of two AMD EPYC 7452 CPU processors, each with 32 cores per EPYC CPU. The clock frequency is 2.35-3.35 GHz. Each core has a separate L1 (32 kB) and L2 (512 kB) cache. The L3 cache is shared by 4 cores (16 MB) and is 256 MB per node. The working memory of a node is logically divided into 8 NUMA domains of 32 MB each. Simultaneous multithreading (the AMD equivalent of Intel's "hyperthreading") is deactivated. The computing nodes are divided into several partitions/queues in order to serve jobs with different requirements (number of nodes, runtime). In the section on setting up jobs
you will find detailed information on the queues and the use of the architecture.

GPU nodes

In addition to the regular computing nodes, the cluster also has 10 nodes, each equipped with 1, 2 or 4 NVIDIA Tesla V100 GPUs.

The Tesla V100 enables the vectorization of double-precision floating point numbers. The remaining structure of the GPU nodes (CPUs, RAM etc.) is identical to that of the compute nodes. Further information on the use of GPUs can be found in the GPU usage submenu.

SMP nodes

The OMNI cluster has 2 nodes for shared multiprocessing (SMP), also known as fat nodes. Each of these two nodes has 4 Intel CPUs of type Xeon Gold 5218 and 1536 GB of RAM. The nodes are called smp-node001 and smp-node002 and are accessible via the smp queue.

Please note that these nodes have a different architecture than all other nodes in the cluster due to their Intel processors. Please contact us
if you have any questions or problems with compatibility.

Storage

The cluster has a number of central file systems
(central in the sense that they are accessible from every node). The first, with a total of 10 TB of storage space, contains the users' home directories, which are limited to 100 GB per user. The workspaces are located on a separate but also central file system with a total of approx. 1 PB. Individual workspaces are unlimited in size, but have a maximum duration of 30 days (can be extended three times by 30 days each). The cluster also has a so-called burst buffer, which is a file system for calculations that have to read or write large amounts of data in a particularly short time. The burst buffer physically consists of solid state disks (SSDs) and has a total size of 32 TB.

Network

The nodes are connected to each other via a fast Infiniband interconnect and can be accessed externally via the network.

Technical data at a glance:

Nodes:
- Computing nodes: 439
  - CPUs (2 per node): AMD EPYC 7452, 32 cores, 2.35-3.35 GHz, 128 MB cache
  - RAM: 256 GB DDR4, 3200 MHz
- GPU nodes: 10
  - CPUs (2 per node): AMD EPYC 7452, 32 cores, 2.35-3.35 GHz, 128 MB cache
  - RAM: 256 GB DDR4, 3200 MHz
  - GPUs (1/2/4 per node): NVIDIA Tesla V100, 5120 CUDA cores, 16 GB HBM2 memory 3200 MHz
- SMP nodes (fat nodes): 2
  - CPUs (4 per node): Intel Xeon 5218, 16 cores, 2.3-3.9 GHz, 22 MB cache
  - RAM: 1536 GB DDR4, 2666/2933 MHz
- Fat nodes: 8
  - CPUs (2 per node): AMD EPYC 7452, 32 cores, 2.35-3.35 GHz, 128 MB cache
  - RAM: 512 GB DDR4, 3200 MHz
- Login nodes: 4
  - CPUs (2 per node): AMD EPYC 7452, 32 cores, 2.35-3.35 GHz, 128 MB cache
  - RAM: 512 GB DDR4, 3200 MHz
Total computing power: approx. 1044 TFlop/s (peak)
Hard disks:
- Home directories: 8.5 TB
- Work directories: 1 PB IBM Spectrum Scale
- Burst buffer: 32 TB SSD storage
Network:
- Infiniband HDR100
- Ethernet
Power consumption: approx. 240 kilowatts

Operating system

The cluster's operating system is Rocky Linux, Release 8.6 (as of August 2022).

The cluster is managed with the Bright Cluster Manager
(version 9.1, as of August 2022).