Secondary navigation

HPC Job Scheduler

By default, when running anything on the CQ University HPC systems, unless you are preforming simple tasks or doing a "quick test", all programs must be executed on the "compute nodes.

The CQ University's HPC Facilities are "Large Shared" resources.  Unlike personal computers, these system are used by multiple users at the same time.  Given the usage of the HPC can vary at times, there is a need for a "HPC Scheduler" to me implemented.  This scheduler will check if the requested resources are available.  If they are available, they will execute the job on one of the available compute resources, if no resources are available, the request is "queued", until resources become available.

If users execute large jobs on any of the "Login" nodes, this will slow down usability and will impact other users performance.

The CQ University's HPC Facilities uses "PBS Pro" as the scheduler for resource management.  Information on PBS commands can be found on the "PBS Commands" user guide.

In an effort to make using the scheduler easier, as number of PBS sample scripts have been created (See here for sample information).  Additionally, some simple scripts have been created to highlight current HPC usage and to assist with deleting HPC jobs.

Command Usage Example Output
qusers This will provide an overall summary of HPC usage Thu Sep 12 12:32:30 EST 2013
There are 3 users with jobs on:  hawking
 Username  #jobs  #run  #cpus   #Memory    #queued  #other  Real Name
=========================================================================
    moserg  150   150    150    3000 gb      0       0   Gerhard Moser
  vanderj2    1     1     16      20 gb      0       0   Jeremy VanDerWal
      wuq1    1     1      1       1 gb      0       0   Qing Wu
=========================================================================
Totals      152   152    167    3021 gb      0       0
=========================================================================
myjobs

This command will provide information on your current HPC jobs, as well a providing a comparison of HPC Scheduler Requested Resources vs Actual Compute resources used for all (R)unning jobs.

bellj@newton:~> myjobs

Jobs running for bellj

--------------------------------

pbsserver:

Req'd  Req'd   Elap

Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time

--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----

407256.pbsserve bellj    workq    Test-run1   65575   4  32    --  01:00 R 00:55
n005[0]/0*8+n005[0]/1*8+n008[0]/0*8+n008[0]/1*8

407257.pbsserve bellj    workq    Test-run2   44175   4  32    --  01:00 R 00:55
n009[0]/0*8+n009[0]/1*8+n022[0]/1*8+n023[0]/0*8

407260.pbsserve bellj    workq    Test-run5   86742   4  32    --  01:00 R 00:55
n027[0]/1*8+n028[0]/0*8+n028[0]/1*8+gn002[1]/0*8

407828.pbsserve bellj    workq    STDIN         --    1   4   10gb   --  Q   --   --

=====================================================================================

Job D            Job          #CPU's      CPU's (%)            Memory (gb)    Memory (gb)

                 Name         requested    Utilisation          requested      in use

=====================================================================================

407256.pbsserver   Test-run1           32             99            0            1

407257.pbsserver   Test-run2           32             98            0            1

407260.pbsserver   Test-run5           32             98            0            0

deletemyjobs This command will delete all your submitted jobs (both "R"unning and "Q"ueued)