Skip to Main Content

The University of Tennessee

Newton header banner

Frequently Used Tools:



Home » Documentation » Using the Grid Engine » Using Dedicated Nodes

Using Dedicated Nodes

By default, jobs are scheduled onto the Newton compute nodes on a CPU-core basis. For example, this means that a job may use a single CPU core on a compute node which has multiple cores available. In this case, the Grid Engine may schedule other jobs onto the remaining cores of the compute node. This system offers flexibility by allowing a user's job to reserve only as many CPU cores as the job needs, but it has the disadvantage of allowing different user's jobs to potentially interfere with each other; a user's job could accidentally launch too many processes and (illegally) use CPU cores that were reserved for other user's jobs.

To resolve this potential problem, the Newton system allows a job to request all CPU cores on a given node. To do this, a user must first decide how many cores per node the job will use and then modify the job file to request the "cores_per_node" Grid Engine complex. When a job requests cores_per_node=N, the job will only run on nodes with N cores and all N cores will be reserved for use by the job. No other jobs will be allocated to this job's nodes. This method of requesting exclusive node access works for multi-threaded, MPI, and hybrid job types.

Note that this procedure replaces the old method of node allocation using the "dedicated" Grid Engine complex. Any jobs using "-l dedicated=#" should be converted to use this new node allocation method.

Multi-threaded jobs

For multi-threaded applications, a single job can make simultaneous use of all CPU cores on a single compute node. A user decides how many cores the application will need to use and then sets the Grid Engine complex "cores_per_node" equal to this number. The job will then execute only on compute nodes with this number of CPU cores. For example, here is a job file for an OpenMP application using 12 cores:

#$ -N ThreadedJob
#$ -cwd
#$ -q short*
#$ -l cores_per_node=12
echo Running on $CORES_PER_NODE cores
./application

When this job is executed, the system sets two variables:

$CORES_PER_NODE = 12
$OMP_NUM_THREADS = 12

The variable $OMP_NUM_THREADS will automatically inform OpenMP applications on the desired number of cores to use. The variable $CORES_PER_NODE can be used by a non-OpenMP application to control the number of threads or processes that are launched. Since the application has exclusive access to the node, the application can also query the system to determine the number of available CPU cores without the risk of using cores that are allocated to another job.

MPI Jobs

For exclusive-node-access MPI or other network-parallel jobs, a job must request the number of compute nodes that it needs and the number of cores per node on those compute nodes. For example, an MPI job can request 10 compute nodes with 16 CPU cores each for a total of 160 CPU cores (MPI processes). Here is the job file:

#$ -N MPIJob
#$ -cwd
#$ -q short*
#$ -pe openmpi* 10
#$ -l cores_per_node=16
echo Running on $NSLOTS nodes with $CORES_PER_NODE cores each
mpirun application

The line -pe openmpi* 10 requests 10 compute nodes, and the line -l cores_per_node=16 requests that each compute node provide 16 CPU cores for the job. When executed, the Grid Engine sets the variable $NSLOTS equal to the number of compute nodes and $CORES_PER_NODE equal to the number of CPU cores on each of the nodes. The mpirun program will automatically multiply the $CORES_PER_NODE by $NSLOTS to get the total number of CPU cores for the job (160 cores) and then runs 160 instances of "application."

If you know how many total MPI processes your application needs to run, then you must decide what type of compute node to use. Then divide the total number of MPI processes by the cores_per_node value for these compute nodes to get the number of compute nodes to request. The total MPI processes must be an integer multiple of the core_per_node and number of compute nodes. For example, a job needing 144 MPI processes could request the following combinations of compute node and cores_per_node:

#$ -pe openmpi* 18
#$ -l cores_per_node=8

#$ -pe openmpi* 12
#$ -l cores_per_node=12

#$ -pe openmpi* 9
#$ -l cores_per_node=16

#$ -pe openmpi* 3
#$ -l cores_per_node=48

Hybrid MPI+Threaded Jobs

Some parallel applications use shared memory to communicate within each compute node and use MPI to communicate between two or more compute nodes. These jobs typically will launch only one MPI process on each compute node and then launch threads to make use of all CPU cores on the nodes. The job file for these types of jobs is similar to simple MPI jobs except that they must use the command "mpirun_hybrid" instead of "mpirun" to launch the application. Here is an example job file for an application that uses 9 MPI processes (compute nodes) with each MPI process using 16 threads (CPU cores):

#$ -N HybridMPIJob
#$ -cwd
#$ -q short*
#$ -pe openmpi* 9
#$ -l cores_per_node=16
echo Running on $NSLOTS nodes with $CORES_PER_NODE cores each
mpirun_hybrid application

Other Notes

Because this functionality requires the "mpirun" command to be overloaded with a custom command, you may not use "time mpirun" or any other prefix wrapper on the mpirun command line.

Currently the tao and dao clusters are reserved for only cores_per_node=8 jobs. Single processor jobs cannot run on these clusters. All other clusters can run dedicated or single CPU jobs. The available core_per_node values for each cluster are as follows:

Cluster cores_per_node # of nodes Single CPU jobs
alpha 8 16 yes
tao 8 84 no
dao 8 36 no
phi 12 72 yes
psi 12 20 yes
chi 48 36 yes
rho 16 36 yes
sigma 24 108 yes

Back to Using The Grid Engine