Benchmark Cluster Access

How to Access and Use the Benchmark Cluster

Login

To login, please connect to our gateway login.megware.de using ssh and
your login name which you received via email.

ssh <login name>@benchmarkcenter.megware.de

On the frontend and all nodes we installed the batch system SLURM and the software environment management tool Modules.

As the frontend is not updated as often as the nodes and may not have all necessary libraries installed , please use a interactive job (see below) to compile your application on a target node.

To use different compiler, MPI- or math-libraries use the "modules" tool (see below).

SLURM

The SLURM batch system is managing all nodes of the cluster. A batch system controls the access to resources (compute nodes, cpus, memory and gpus) and the execution of applications on that resources. A "job" is either interactive or a simple shell script which prepares and starts an application. SLURM will reserve compute nodes for this jobs and only allows jobs to run when the requested resources are available. At the MEGWARE benchmark center we have "partitions" for certain kinds of hardware. Typically all nodes with a certain CPU are within the same CPU_xxx partition. The same is true for certain GPUs. They can be found in the GPU_xxx partition.

The command

"sinfo"

shows the free resources available to you at the moment. You can show only the available nodes of one partition using the "-p" switch: "sinfo -p GPU_A100" only shows servers in this partition.

"squeue"

shows all running jobs

Interactive Jobs

If you want to run an interactive job on a node, you can simply use our script

/cluster/tools/slurm/interactive.sh <partition>

to login to an available node in the given partition.

Alternatively, you can start an interactive job with "srun":

For example you can reserve a NVidia A100 node for 2 hours within the GPU_A100 partition using the following command :

srun --exclusive -p GPU_A100 --nodes=1 -t 2:00:00 --pty bash

To stop a job, type

exit

to exit the node and

exit

again, to end the job.

Note: Please do not use "srun" to start mpi jobs as we did not configure slurm to do this. Just use the MPI implementations "mpirun"

Batch Jobs

To submit several jobs at once one can use "sbatch". Depending on the state of the cluster and number of nodes used, these jobs will run in parallel or after one another.

A simple sbatch script for IntelMPI looks like this:

#!/bin/bash ##Job name: #SBATCH --job-name=MyBench ##do not allow other jobs on execution node(s) #SBATCH --exclusive ##Wall clock limit: #SBATCH --time=1:00:00 source /etc/profile.d/modules.sh module add mpi/intelmpi/stable module add intel-studio-2019 cd my/work/directory mpirun ./mybinary

save this script as run.sbatch and start it (similar to the salloc command) with:

sbatch -N <NumberOfNodes> -n <TotalNumberOfProcesses> -p <Partition> ./run.sbatch

A file called "slurm-<slurm-job-id>.log" will be created in the directory you submitted the job from. The output of your job (standard as well as errors) will be logged in this file.

Modules

The modules environment system allows the usage of different compilers, libraries and MPI implementations and their versions on the same node. You can list the available software on the cluster with

modules av

This software is available on all nodes.

To add software to your environment type

module add <software module name>

So if you want to add Intel Compilers and Intel MPI type:

module add intel-studio-2020 mpi/intelmpi/latest

You can list loaded modules with

module list

You can remove modules with

module rm <software module name>

Installing and Using Your Own Python Packages

Create yourself a virtual python environment in your home:

python -m venv ~/MyPythonEnv/
source ~/MyPythonEnv/bin/activate

pip install pip --upgrade # update pip itself for up-to-date packages

pip install <python package>

Whenever you want to use the newly installed python packages, make sure to activate the environment beforehand using "source ~/MyPythonEnv/bin/activate"

Installing new Software with Dependencies

While we provide some software via the modules system, we recommend Spack ( https://spack.io/about/ ) if you want to install additional software yourself.

Spack is a package manager for supercomputers, Linux, and macOS. It makes installing scientific software easy. With Spack, you can build a package with multiple versions, configurations, platforms, and compilers, and all of these builds can coexist on the same machine.

To install Spack run the following command in your home:

git clone github.com/spack/spack.git
source ~/spack/share/spack/setup-env.sh

Let Spack find preinstalled compilers:

module add intel-studio-2020 comp/gcc/11.1.0

spack compiler find
spack compilers

Now you can install additional software using

spack install hdf5

or to compiler with GCC 11.1.0:

spack install hdf5%gcc@11.1.0

Please login to you target node before installing software.

Working with multiple NVidia GPGPUs

Some of our benchmark nodes are equipped with multiple GPGPUs (of different kinds). To select a certain GPGPU first run

nvidia-smi

This displays all the GPGPUs installed in the system. To benchmark only certain GPGPUs use the envrionment variable CUDA_VISIBLE_DEVICES and set it to the ID(s) displayed by nvidia-smi:

export CUDA_VISIBLE_DEVICES=0,1; /usr/local/cuda/samples/1_Utilities/deviceQuery/deviceQuery

export CUDA_VISIBLE_DEVICES=2; /usr/local/cuda/samples/1_Utilities/deviceQuery/deviceQuery

Further Help

In case of problems don't hesitate to contact us. Please also inform us when you've finished your tests.