Where "cpus" are mentioned, read "cores". I'll try to make this clearer at some point.
When Gridengine (e.g., 8.1.9) calculates cpu usage information, it is done based on slots used and either cpu time or wallclock time. This works if each slot corresponds to a single cpu. But, if more than one cpu is allocated to a slot, this calculation does not give the desired usage information.
For the purposes of this article:
execd_paramsis configured with
SHARETREE_RESERVED_USAGE=trueso that wallclock time rather than cpu time is used.
cpu = wallclock * nslots * ncpus_per_slot
The complex used to track the number of cpus used per slot is specified in the
The complex would be defined as:
ncpus ncpus INT <= FORCED YES 0 1000
A queue would be configured with:
So, a job request for 4 slots X 16 cpus would be:
#$ -pe dev 4 #$ -l ncpus=16
and a run of 30s would amount to:
cpu = 30 * 4 *16 = 1920
versus what is currently returned:
cpu = 30 *4 = 120
slot_multiplier_namestatic variable to hold value set in
build_reserved_usage()declaration to accept reference to job
build_reserved_usage()to accept reference to job
build_reserved_usage()to get slot multiplier and use it when calculating cpu usage
get_slot_multiplier()static function get the multiplier value if defined, or 1.0 otherwise
calculate_reserved_usage()to get job reference and provide it when calling
build_derived_final_usage()to provide job reference when calling
Patch files (based off of https://arc.liv.ac.uk/downloads/SGE/releases/8.1.9/:
Changes to the code are minimal. The highlights are:
build_reserved_usage()function to take a job reference so that the consumable information can be obtained
build_reserved_usage()to use the slot multiplier value
These changes make it possible to account for the number of cpus per slot which is critical to calculating cpu usage by wallclock.