We’re sensitive to the fact that your jobs may need to run over our maintenance window and will take a reasonable effort to ensure they aren’t disrupted. In order to ensure as minimal disruption as possible, these are the steps that we take:
You can set a time limit on your job that will be reached before the maintenance window begins. This will ensure that the job will not conflict with the maintenance window so that it can start normally and run before the outage begins. Note that a job that reaches its time limit without completing will automatically be killed.
If you are submitting your jobs from the command line with sbatch
, you can use the --time
argument to set a time limit, in the form days-hours:minutes:seconds
. For example this will submit a job that will run for a maximum of 2 days.
sbatch --time=2-00:00:00 my_job.sh
If you are using slurm_apply()
from the rslurm
package to submit your job, specify the time limit within the list of options passed to the slurm_options
argument, like this:
slurm_apply(f, params, slurm_options = list(time = '2-00:00:00'))
We’ll allow your job to continue to run and the node which was running it will go offline when your processing finishes so that we can patch it.