Introduction

When Gridengine (e.g., 8.1.9) calculates cpu usage information, it is done based on slots used and either cpu time or wallclock time. This works if each slot corresponds to a single cpu. But, if more than one cpu is allocated to a slot, this calculation does not give the desired usage information.

For the purposes of this article:

cpu = wallclock * nslots * ncpus_per_slot

Usage

The complex used to track the number of cpus used per slot is specified in the execd_params with SLOT_MULTIPLIER_NAME. E.g.

execd_params    SLOT_MULTIPLIER_NAME=ncpus

The complex would be defined as:

ncpus               ncpus       INT       <=    FORCED      YES        0        1000

A queue would be configured with:

complex_values        ncpus=4

So, a job request for 4 slots X 16 cpus would be:

#$ -pe dev 4
#$ -l ncpus=16

and a run of 30s would amount to:

cpu = 30 * 4 *16 = 1920

versus what is currently returned:

cpu = 30 *4 = 120

Implementation

Changes

Patch files (based off of https://arc.liv.ac.uk/downloads/SGE/releases/8.1.9/:

Summary

Changes to the code are minimal. The highlights are:

Conclusion

These changes make it possible to account for the number of cpus per slot which is critical to calculating cpu usage by wallclock.


Ⓒ 2023 expl.info | Powered by rudiweb.