System Overview

Introduction

Four nodes within the raad2 infrastructure contain graphics processing units (GPUs) that may be employed for general purpose computing. These four nodes are treated as a separate cluster logically, and have their own login node called raad2-gfx. Physically, however, they are connected to the same infrastructure as raad2, using an FDR infiniband network to integrate with the same DDN storage system that the non-GPU nodes rely on. The graphics nodes are equipped with NVIDIA V100 GPUs (2 per node) and Intel Xeon Skylake processors. Users who want to accelerate their AI, HPC or Data Science applications can largely benefit from this resource. Most commonly used GPU packages are already available on the system.

GPU	02 Tesla V100 Per Node
GPU Nodes	gfx[1-4]
Memory	192GB Per Node
NVIDIA Tensor Cores	640 Per GPU
NVIDIA CUDA Cores	5,120 Per GPU
CPU	Intel Xeon Gold 6140
CPU Base Frequency	2.30 GHz
CPU Max Turbo Frequency	3.70 GHz
Sockets	02 Per Server
Cores Per Socket	18

Job Scheduler

GPU Cluster uses 'slurm' has a job scheduler.

Workload Manager	Slurm 20.11.7
Queue	gpu
Local SSD Storage	/tmp
Per User GPU limit	1 GPU Per Job
Per User CPU limit	18 CPUs Per Job
Per User memory limit	92GB Per Job
Default Walltime job	1 Hour
Maximum Walltime job	24 Hours