Skip to Main Content

The University of Tennessee

Newton header banner

Frequently Used Tools:

Home » Documentation » Using the Grid Engine » Grid Engine Tricks

Grid Engine Tricks

Parallel Environment ranges

When requesting a parallel environment (e.g. to run an MPI job), you can include a range of slot numbers that would work for your job. This has the effect of more quickly scheduling the job on the maximum number of slots that would allow the job to immediately execute. For example, if my job can run on between 10 and 100 CPUs (and I want it to execute as soon as possible), then I could use a qsub request of "-pe openmpi* 10-100". This is usually only recommended if your job will run for a relatively short period of time. If your job will run longer, then it is often best to include a hard slot value that will cause the job to wait long for initial execution (but to finish sooner by making use of a larger number of CPUs). Your parallel executable must also be able to adapt to use a range of process slots.

Array jobs

Array jobs are the preferred way to submit large numbers of jobs at once. The idea is that you create one batch file that describes a number of jobs which are similar enough that they can be parameterized by a single integer value. The Grid Engine then takes that job definition and makes x copies of the job all with different task ID numbers. This is a particularly well suited technique for data partitioned tasks:

1 Break your input data into x files with names containing sequential integers: for example "input-#.dat". 1 Submit the jobs with the option "-t 1-x" to create array job tasks. 1 The job executable accesses the environment variable $SGE_TASK_ID in order to determine which input file to use.

Here is an example job submit file using array jobs:

#$ -t 1-100
my_application inputfile-$SGE_TASK_ID.dat

Memory requests

If your jobs require more than the standard RAM allocation of 2GB per batch slot, then you must request the ammount of RAM needed by using the qsub request "-l mem=5G".

Job Holds

If your computing workflow includes dependencies, you may want to consider using job holds in order to help automate the process. This often occurs when a task is data-parallel: you are using something like array jobs to analyse the data and after all the analysis jobs are done you want to run a final job to aggregate the data. All the array jobs must be finished before the aggregation job is executed. You can do this by placing a hold on the aggregation job:

1 First submit the data analysis jobs all using the name "analysis". 1 Then submit the aggregation job using the additional qsub option "-hold_jid analysis".

The hold_jid option also accepts job ID numbers and regular expressions.

Interactive jobs

You may want to execute a short interactive task (such as a compile) through the batch system if it is very resource intensive (requires a lot of RAM or processor threads). This will give you dedicated use of the RAM or processors for this task so that it may finish sooner that if you simply ran it on the command line. You can do this using the "qlogin" command.

If you type "qlogin" it will immediately create a batch job on a compute node and redirect your input and output streams to that node. This has the effect of putting you on the compute node that was allocated to you. You may then type commands as normal (such as "make"). By default, qlogin allocates one CPU for your use and the default shell for you, however, you can give it a command to run and/or many of the same options that are accepted by "qsub". E.g.:

$ qlogin -cwd -l dedicated=4 make -j4

If your current working directory was a software source tree with a makefile, this would have the effect of executing the compile using 4 CPUs.

E-mail notifications

You can get email notifications sent to any address when your jobs change state. You set the email address using "-M user@domain" syntax and then specify which events you wish to be notified of by using the "-m" option to qsub. The specifics of using these options (and all options mentioned in this document) are available on the qsub manual page: "man qsub".

Back to Using The Grid Engine