Running Jobs

Tivoli LoadLeveler

All jobs are scheduled on the BG/P, the P690 machines and the Cluster via the Tivoli Load Leveler software.

A local copy of the users' document LoadLeveler - v3.4 - Using & Administering Media:am2ug305.pdf Local PDF or from IBM or HTML/JSP version may also be found on the CHPC unix machine chpcln:/CHPC/usr/local/doc/QUICKSTART .

By default LoadLeveler will use the 1 Gigabit Ethernet adapters over a TCP/IP protocal rather than the faster InfiniBand adapters over the Voltaire drivers irrespective of which version or MPI is being used (MVAPICH or MPICH).

A worked example using LoadLeveler command scripts is given in Hello World

Essential Load Leveler commands

llclass:Listing the available classes (queues)

llclass

Use this to determine which of the par*_* classes to submit to. Below is a typical result:

Name MaxJobCPU
d+hh:mm:ss
MaxProcCPU
d+hh:mm:ss
Free Slots Max Slots Description
smp_1 undefined undefined 32 32 32 CPUs (1 nodes), 3 day P690
par16_2d undefined undefined 123 576 64 CPUs (16 nodes), 2 day class
par64_1 undefined undefined 123 576 256 CPUs (64 nodes), 1 hour class
par128_12 undefined undefined 123 576 512 CPUs (128 nodes), 12 hour class
par128_6 undefined undefined 123 576 512 CPUs (128 nodes), 6 hour class
par128_3 undefined undefined 123 576 512 CPUs (128 nodes), 3 hour class
par32_1w undefined undefined 0 576 128 CPUs (32 nodes), 1 week class
cspeed undefined undefined 32 32 Test ClearSpeed class
compile undefined undefined 0 36 4 CPUs (1 node), 1 hour class for compile
UAT undefined undefined 123 612 640 CPUs (160 nodes), 12 hour class for UAT
par1_12 undefined undefined 155 644 4 CPUs (1 nodes), 12 hour class
par2_12 undefined undefined 123 612 8 CPUs (2 nodes), 12 hour class
par4_12 undefined undefined 123 612 16 CPUs (4 nodes), 12 hour class
par2_2w undefined undefined 0 612 8 CPUs (2 nodes), 2 week class
par160_1 undefined undefined 123 612 640 CPUs (160 nodes), 1 hour class
par160_2 undefined undefined 123 612 640 CPUs (160 nodes), 2 hour class
par160_1d undefined undefined 123 612 640 CPUs (160 nodes), 1 day class
par160_2d undefined undefined 123 612 640 CPUs (160 nodes), 2 day class
"Free Slots" values of the classes "par16_2d", "par64_1", "par128_12", "par128_6", "par128_3", "par32_1w", "compile", "UAT", "par1_12", "par2_12", "par4_12", "par2_2w", "par160_1", "par160_2", "par160_1d", "par160_2d" are constrained by the MAX_STARTERS limit(s).

List all available clusters:

llclass -X all

List available classes on Blue Gene/P:

llclass -X bgp

Cluster bgp
Name MaxJobCPU
d+hh:mm:ss
MaxProcCPU
d+hh:mm:ss
Free Slots Max Slots Description
BGPCOMP undefined undefined 15 16  
BGP undefined undefined 15 16  
"Free Slots" value of the class "BGPCOMP" is constrained by the MAX_STARTERS limit(s).

llsubmit:Submitting a job

llsubmit <script.ll>

llq and llstatus:Querying the status of the queues

Usage:

llq [ -? ] [ -H ] [ -v ] [ -W ] [ -x [ -d ] ] [ -s ] [ -l ] [ -w ]
[ -X {cluster_list | all} ] [ -j joblist | joblist ]
[ -u userlist ] [ -h hostlist ] [ -c classlist ] [ -R reservationlist ]
[ -f category_list ] [ -r category_list ] [ -b ]

Usage:

llstatus [-?] [-H] [-v] [-W] [-R] [-F] [-M] [-l] [-a] [-C] [-b]
[-B {base_partition_list | all}] [-P {partition_list | all}]
[-X {cluster_list | all}] [-f category_list] [-r category_list]
[-h hostlist | hostlist]

llcancel:Cancelling a job

llcancel %id

Email notification of job status

Add the following two lines to the head section of the load leveler script to receive Load Leveler notifications.

# @ notification = always
# @ notify_user = user@email.domain

Debugging

mpirun_dbg.dbx, mpirun_dbg.ddd, mpirun_dbg.gdb

Monitoring

For monitoring on nodes use one of

  • nmon
  • vmstat
  • top
  • xloadl(X11)

and of course ps and free

Home | About us | Research | Services | Infrastructure | Support & Queries |News
Copyright & Disclaimer | Graphic Design & Web Design by Kimia