Benchmark Cluster Access
How to Access and Use the Benchmark Cluster
Login
To login, please connect to our gateway login.megware.de using ssh and
your login name which you received via email.
ssh <login name>@benchmarkcenter.megware.de
On the frontend and all nodes we installed the batch system SLURM and the software environment management tool Modules.
As the frontend is not updated as often as the nodes and may not have all necessary libraries installed , please use a interactive job (see below) to compile your application on a target node.
To use different compiler, MPI- or math-libraries use the "modules" tool (see below).
SLURM
The SLURM batch system is managing all nodes of the cluster. A batch system controls the access to resources (compute nodes, cpus, memory and gpus) and the execution of applications on that resources. A "job" is either interactive or a simple shell script which prepares and starts an application. SLURM will reserve compute nodes for this jobs and only allows jobs to run when the requested resources are available. At the MEGWARE benchmark center we have "partitions" for certain kinds of hardware. Typically all nodes with a certain CPU are within the same CPU_xxx partition. The same is true for certain GPUs. They can be found in the GPU_xxx partition.
The command
"sinfo"
shows the free resources available to you at the moment. You can show only the available nodes of one partition using the "-p" switch: "sinfo -p GPU_A100" only shows servers in this partition.
"squeue"
shows all running jobs
Interactive Jobs
If you want to run an interactive job on a node, you can simply use our script
/cluster/tools/slurm/interactive.sh <partition>
to login to an available node in the given partition.
Alternatively, you can start an interactive job with "srun":
For example you can reserve a NVidia A100 node for 2 hours within the GPU_A100 partition using the following command :
srun --exclusive -p GPU_A100 --nodes=1 -t 2:00:00 --pty bash
To stop a job, type
exit
to exit the node and
exit
again, to end the job.
Note: Please do not use "srun" to start mpi jobs as we did not configure slurm to do this. Just use the MPI implementations "mpirun"
Batch Jobs
To submit several jobs at once one can use "sbatch". Depending on the state of the cluster and number of nodes used, these jobs will run in parallel or after one another.
A simple sbatch script for IntelMPI looks like this:
|
save this script as run.sbatch and start it (similar to the salloc command) with:
sbatch -N <NumberOfNodes> -n <TotalNumberOfProcesses> -p <Partition> ./run.sbatch
A file called "slurm-<slurm-job-id>.log" will be created in the directory you submitted the job from. The output of your job (standard as well as errors) will be logged in this file.
Modules
The modules environment system allows the usage of different compilers, libraries and MPI implementations and their versions on the same node. You can list the available software on the cluster with
modules av
This software is available on all nodes.
To add software to your environment type
module add <software module name>
So if you want to add Intel Compilers and Intel MPI type:
module add intel-studio-2020 mpi/intelmpi/latest
You can list loaded modules with
module list
You can remove modules with
module rm <software module name>
Installing and Using Your Own Python Packages
Create yourself a virtual python environment in your home:
python -m venv ~/MyPythonEnv/
source ~/MyPythonEnv/bin/activate
pip install pip --upgrade # update pip itself for up-to-date packages
pip install <python package>
Whenever you want to use the newly installed python packages, make sure to activate the environment beforehand using "source ~/MyPythonEnv/bin/activate"
Installing new Software with Dependencies
While we provide some software via the modules system, we recommend Spack ( https://spack.io/about/ ) if you want to install additional software yourself.
Spack is a package manager for supercomputers, Linux, and macOS. It makes installing scientific software easy. With Spack, you can build a package with multiple versions, configurations, platforms, and compilers, and all of these builds can coexist on the same machine.
To install Spack run the following command in your home:
git clone github.com/spack/spack.git
source ~/spack/share/spack/setup-env.sh
Let Spack find preinstalled compilers:
module add intel-studio-2020 comp/gcc/11.1.0
spack compiler find
spack compilers
Now you can install additional software using
spack install hdf5
or to compiler with GCC 11.1.0:
spack install hdf5%gcc@11.1.0
Please login to you target node before installing software.
Working with multiple NVidia GPGPUs
Some of our benchmark nodes are equipped with multiple GPGPUs (of different kinds). To select a certain GPGPU first run
nvidia-smi
This displays all the GPGPUs installed in the system. To benchmark only certain GPGPUs use the envrionment variable CUDA_VISIBLE_DEVICES and set it to the ID(s) displayed by nvidia-smi:
export CUDA_VISIBLE_DEVICES=0,1; /usr/local/cuda/samples/1_Utilities/deviceQuery/deviceQuery
export CUDA_VISIBLE_DEVICES=2; /usr/local/cuda/samples/1_Utilities/deviceQuery/deviceQuery
Further Help
In case of problems don't hesitate to contact us. Please also inform us when you've finished your tests.