Our queue setup for job priority is basically, first in, first out. However, at times, a user will have many jobs in the queue, but only a couple are running. Then, if more jobs are submit to the queue, those jobs run before the first ones are done. This happens when something goes wrong with the scheduler and these jobs are given DEFER status. Unfortunately, these jobs remain deferred even after whatever caused the problem has passed. The current solution is to run (as root) /var/maui/bin/releasehold job number, which will release the hold. I haven’t yet found a way to do this for all jobs, so as of now, this has to be run for each job.

This manual should give some clues as to what we can do to eliminate this problem.