Skip to Main Content

The University of Tennessee

Newton header banner

Frequently Used Tools:



Home » Documentation » Systems » Rho Cluster

Using the Tesla GPGPU Compute Nodes

Each node of the GPGPU compute cluster has an Tesla M2090 GPGPU card onboard along with 16 CPU cores (Intel Xeon E5-2670) and 32GB of RAM. The size of the cluster is 48 nodes. For testing and code development of GPU-enabled code, you must compile and run your code from within an interactive session on one of the GPU-enabled compute nodes obtained with the command:

$ qlogin -l gpgpu=1

Once you have working code, you may access the nodes through the Grid Engine on Newton by adding the following line to your job file:

#$ -l gpgpu=1

Note that this resource request works in the same manner as memory and dedicated core requests, in that it is multiplied by the number of parallel environment tasks that you request. So, for example, if you were to combine it with

#$ -pe openmpi* 8

you are requesting 8 mpi tasks, each of which will have dedicated access to a GPU. The cuda toolkit and SDK is available at /data/apps/cuda/ by executing "module load cuda". For general information about GPU programming with cuda, see Nvidia's Introduction to CUDA in C. Once you have written some cuda code you would like to compile, you can do so by issuing the following commands:

module load cuda
nvcc -o executable sourcefile.cu

You will then need to use qlogin to login to one of the GPGPU-enabled machines to run your job interactively, or use qsub to submit a batch job to the GPGPU cluster.

Additionally, the OpenACC interface to the GPU is available for programs compiled with pgi, which is available using "module load pgi". Further details regarding the OpenACC API are available from the OpenACC website. You can compile OpenACC code for use on the GPU via

module load pgi
pgcc -o executable sourcefile.c -acc -ta=nvidia

If you wish to instead compile OpenACC code to run on the host processor, you should change -ta=nvidia to -ta=host in the above command line. As with cuda code, OpenACC code compiled with -ta=nvidia must be run on the GPU cluster, whereas code compiled with -ta=host may be run on any of the non-GPU clusters as well.

The usage policy for the nodes in the Tesla cluster is identical to that for other nodes in the Newton system, meaning that your usage limits on the GPU cluster are determined only by the cluster-wide limits for CPU slots and job run time.