Skip to end of metadata
Go to start of metadata

Contents

Introduction

One of the most popular use cases for cron is to perform maintenance operations on a machine. This is manageable for a handful of machines. But as the number of systems increases, this becomes more of a chore. And questions arise such as:

  • Are all systems configured?
  • With what configurations?
  • When will the operations be done?
  • Are the operations timed/coordinated well?
  • What to do when one or more machines need to be taken out of the maintenance loop?

We will present a setup to show how to manage this using hcron.

Event Tree

For each operation, we use the following structure:

.../events/
  <opname>/
    template
    <timegroup>/
      <hostname> -> ../template
      ...

Notes:

  • At least one template is created for each operation.
  • The timegroup name encodes when the event should be launched.
  • The template decodes the timegroup name from the HCRON_EVENT_NAME.
  • The template extracts the hostname from the HCRON_EVENT_NAME.

For example:

.../events/
  archivebackups/
    template
    1_0/
      node1 -> ../template
      ...
    2_0/
      node101 -> ../template
      ...
  cleantmp/
    template
    0,12_0/
      node1 -> ../template
      ...
  processacct/
    ...
  rotatelogs
    ...

Notes:

  • 1_0 is decoded to "when_hour=1" and "when_minute=0".
  • 0,12_0 is decoded to "when_hour=0,12" and "when_minute=0".

Template Events

To limit the complexity and need to create an event file for each operation+host, templates are used, each of which decodes the HCRON_EVENT_NAME to obtain the hostwhen_hour, and when_minute settings.

/archivebackups/template
# archivebackups
#
# copy backup files to archive server; prepend dst copy with hostname

HOST=$HCRON_EVENT_NAME[-1]
TIMEGROUP=$HCRON_EVENT_NAME[-2]
HR=$TIMEGROUP[_!-2]
MN=$TIMEGROUP[_!-1]
# YYYY:MM:DD:hh:mm:ss:WOY:DOY -> YYYYMMDDhhmm
DATESTAMP=$HCRON_SCHEDULE_DATETIME[:?!0:5]

as_user=
hostname=$HOST
command=scp /var/spool/sys-backup.tgz archive:$HOST-$DATESTAMP-sys-backup.tgz
notify_email=
notify_subject=
notify_message=
when_month=*
when_day=*
when_hour=$HR
when_minute=$MN
when_dow=*
template_name=template
failover_event=/report_failure
/cleantmp/template
# cleantmp
#
# remove files older than 2d old

HOST=$HCRON_EVENT_NAME[-1]
TIMEGROUP=$HCRON_EVENT_NAME[-2]
HR=TIMEGROUP[_!-2]
MN=TIMEGROUP[_!-1]

as_user=
hostname=$HOST
command=cd /tmp; find /tmp -mtime 2 -exec rm 
notify_email=
notify_subject=
notify_message=
when_month=*
when_day=*
when_hour=$HR
when_minute=$MN
when_dow=*
template_name=template
failover_event=/report_failure

Extras

Tagging Timegroups

As presented, the timegroup name is effectively: <when_hour>_<when_minute>. Though clear for scheduling, the name does not communicate anything about the members. We can address by adding a tag. Two ways to do this are:

  1. Incorporate the tag into the timegroup name.
  2. Add an additional level to the event tree.

Enhancing the Timegroup Name

Incorporating the tag into the timegroup name is easiest because the current setup requires almost no change. Simply prefix the timegroup name with a tag.

For example, if the event tree is modified as:

.../events/
  archivebackups/
    template
    red_1_0/
      node1 -> ../template
      ...
    blue_1_0/
      node51 -> ../template
      ...

the template event does not need to change because the HR and MN values are read from the end of the timegroup name:

/archivebackups/template
...
TIMEGROUP=$HCRON_EVENT_NAME[-2]
HR=TIMEGROUP[_!-2]
MN=TIMEGROUP[_!-1]

...

so that anything preceding is effectively ignored.

Adding a Level to the Event Tree

Alternatively, another level could be added to the event tree.

For example:

.../events/
  archivebackups/
    template
    1_0/
      red/
        node1 -> ../../template
        ...
      blue/
        node51 -> ../../template
        ...

This will require that:

  • The symlinks be updated with an additional "../".
  • The template be modified to extract the TIMEGROUP from index -3 not -2.
/archivebackups/template
...
TIMEGROUP=$HCRON_EVENT_NAME[-3]
HR=TIMEGROUP[_!-2]
MN=TIMEGROUP[_!-1]

...

It is also possible to order the reverse the timegroup and tag to get "red/1_0" instead of "1_0/red" at the cost of making the tag more important than the timegroup.

Summary

Both approaches achieve the goal, each with their own strengths. However, incorporating the tag into the timegroup name is simplest.

Conclusion

Using hcron and the setup described above, all the questions raised in the introduction are easily answered. There is only one, authoritative, place to look for the information. And with much of the configuration immediately visible in the structure and event names, documentation of the setup comes for free (this is enhanced with the information fields of v1.5 and the hcron doc support). Logging helps to track what happened, when, and where. Updates are easy to do and unaffected by whether a machine is up or down, answering or not. The solution is scalable from 10 to 1000s of machines. The setup can be backed up, can be managed with a version control system. This approach makes administration of such maintenance type operations much easier than it ever was with local crontab files.

  • No labels