Archive for September, 2006

Delete the highlighted part of the message, shown in the photo and replace with:

Approved: “password” without the quotes

approve.jpg

After installing the Maui scheduler, I made a couple of changes to the queue. Here are the current settings:

Qmgr: p s
#
# Create queues and set their attributes.
#
#
# Create and define queue cdf1
#
create queue cdf1
set queue cdf1 queue_type = Execution
set queue cdf1 from_route_only = True
set queue cdf1 resources_max.cput = 240:00:00
set queue cdf1 resources_min.cput = 00:00:01
set queue cdf1 enabled = True
set queue cdf1 started = True
#
# Create and define queue cdf
#
create queue cdf
set queue cdf queue_type = Route
set queue cdf route_destinations = cdf1
set queue cdf route_held_jobs = True
set queue cdf route_waiting_jobs = True
set queue cdf enabled = True
set queue cdf started = True
#
# Set server attributes.
#
set server scheduling = True
set server acl_host_enable = True
set server acl_hosts = *.uchicago.edu
set server default_queue = cdf
set server log_events = 511
set server mail_from = adm
set server query_other_jobs = True
set server resources_default.nodect = 1
set server resources_default.nodes = 1
set server resources_default.walltime = 24:00:00
set server scheduler_iteration = 300
set server node_check_rate = 120
set server tcp_timeout = 6
set server pbs_version = 2.1.1

Trying again to get the Maui scheduler working because the plain pbs scheduler is having lots of problems.

I found a page on the web that said to leave the –with-pbs flag off the configure command. So the configure command I used was:

./configure –prefix=/var/maui
make ran without any errors
make install

I forgot that I did have these environment variables set. I don’t know if they did anything.

CPPFLAGS=’-I/var/torque/include’
LDFLAGS=’-L/var/torque/lib’

Edit the file /var/maui/maui.cfg. Basically, I just took the one from the Tier2 cluster and changed the name of the server. To start the Maui scheduler, be sure to stop pbs_sched and then run: /var/maui/sbin/maui -C /var/maui/maui.cfg

It also created a bunch of files in /usr/spool/maui, which is NOT where I’ve been keeping that stuff. So, I copied it all to /var/maui and made a link from /usr/spool/maui to /var/maui. Edited /etc/rc.d/init.d/torque to reflect the new scheduler and restarted it.

Sometimes the queue would have open machines and jobs waiting, but wouldn’t start the jobs. The comment on these jobs would be that they weren’t running, they were waiting for starving jobs to finish. The default setting is to not run any jobs when there are jobs that have been waiting for at least 24 hours (classified as starving). This behavior was disabled by editing /var/spool/pbs/sched_priv/sched_config and change the help_starving_jobs option to false.

Previously, I had set the ideal_load to 0.3 and the max_load to 1.0. These are too low for the machines in the glass room. I’m changing them all to:

$ideal_load 2.0
$max_load 3.0