After the recent reinstallations, torque and maui need to be reinstalled. Since we’ve changed the setup a bit, I think that it’s now ok to take the default installation locations (/usr/local). So the command to configure and install the software is:
cd /system/software/linux/torque-2.0.0 ./configure --with-rcp=scp make make install (as root)
This puts all the programs in /usr/local/bin and the needed libraries in /usr/local/lib. The spool directory is in /var/spool/torque.
Maui is done with:
cd /system/software/linux/maui-3.2.6p19 ./configure make make install (as root)
Maui’s home is /usr/local/maui.
Startup scripts are provided in /system/software/linux/torque-2.3.0/contrib/init.d. We only need pbs_mom and pbs_server because we’ll be using maui for the scheduler. They need to be edited with the correct values.
PBS_DAEMON=/usr/local/sbin/pbs_server
PBS_HOME=/var/spool/torque
Copy pbs_mom and pbs_server to /etc/rc.d/init.d. And run /etc/rc.d/init.d/pbs_server. Once it’s a running process, can create the queues with qmgr.
[root@cpserver init.d]# qmgr Max open servers: 4 Qmgr: p s # # Set server attributes. # set server acl_hosts = cpserver set server log_events = 511 set server mail_from = adm set server scheduler_iteration = 600 set server node_check_rate = 150 set server tcp_timeout = 6 Qmgr: c q cp1 Qmgr: s q cp1 queue_type=Execution Qmgr: s q cp1 from_route_only=True Qmgr: s q cp1 resources_max.cput=240:00:00 Qmgr: s q cp1 resources_min.cput=00:00:01 Qmgr: s q cp1 enabled=True Qmgr: s q cp1 started=True Qmgr: c q cp Qmgr: s q cp queue_type=Route Qmgr: s q cp route_destinations=cp1 Qmgr: s q cp route_held_jobs=True Qmgr: s q cp route_waiting_jobs=True Qmgr: s q cp enabled=True Qmgr: s q cp started=True Qmgr: s s scheduling=True Qmgr: s s acl_host_enable=True Qmgr: s s acl_hosts=*.uchicago.edu Qmgr: s s default_queue=cp Qmgr: s s query_other_jobs=True Qmgr: s s resources_default.nodect=1 Qmgr: s s resources_default.nodes=1 Qmgr: s s resources_max.walltime-96:00:00 Qmgr: s s resources_max.walltime=96:00:00 Qmgr: s s submit_hosts = cpserver Qmgr: c n cpserver np=2
Maui’s startup script is provided in /system/software/linux/maui-3.2.6p19/contrib/service-scripts/redhat.maui.d. Edit this file:
MAUI_PREFIX=/usr/local/maui
also change the user as which it should run. We don’t have a maui user, so use my own username instead. This turned out to be a big problem, so have to run as root.
and copy to /etc/rc.d/init.d/maui.
Now, chkconfig –add pbs_mom, pbs_server and maui. Restart them all and submit a test job.
The job was accepted in the queue, but never executed. Oops, forgot to edit maui.cfg, add ADMIN1 and ADMIN3 and change the RMCFG line:
ADMIN1 root maryh ADMIN3 ALL #RMCFG[CPS1] TYPE=PBS@RMNMHOST@ RMCFG[base] TYPE=PBS
Test job now works, so can move on to the compute node.
The compute node doesn’t need maui, only torque. So simply run make install on the compute node.
In /var/spool/torque, check that server_name has the proper name. Copy the pbs_mom startup script from the server to this node. Start it up. Back on the server, create a new node in qmgr.
c n cpcompute np=8
Create /var/spool/torque/mom_priv/config
$usecp cpserver.uchicago.edu $ideal_load 8.0 $max_load 10.0 $restricted *.uchicago.edu
This node has eight cores, so the ideal_load is eight.
Finally go back on the server into qmgr and add the compute host as another submit host:
qmgr s s submit_hosts += cpcompute