Archive for November, 2006

Each time tried to login to squirrelmail, immediately bounced back to a page saying that you have to login before you can access the page. The problem turned out to be the group ownership of /var/lib/php/session. It was set as root:root. After changed to root:apache, could login.

When attempting to start httpd on the new server, kept getting the message:

hep1:init.d$ ./httpd start
Starting httpd: execvp: Permission denied

This appears to be selinux blocking the program. So, I changed /etc/sysconfig/selinux to disabled instead of enforcing. I tried permissive for a while, but things still didn’t work and I didn’t want this to be the problem.

Queue server: pnn
Queue computes: pnn2 pnn3 pnn4 pnn5 pnn6

Computes:
1. In /support/data1/maryh/torque-2.1.1-27, run:

make install_mom install_clients

2. Edit /var/spool/pbs/mom_priv/config:

$usecp pnn.uchicago.edu
$ideal_load 2.0
$max_load 3.0
$restricted *.uchicago.edu

3. Edit /var/spool/pbs/server_name
pnn

4. Get /etc/rc.d/init.d/torque from another pbs compute node.

Server:
1. Unpack new torque directory in /support/data1/maryh

2. /configure –prefix=/var/torque –exec-prefix=/var/torque –with-rcp=scp –with-server-home=/var/spool/pbs –with-server-name=/var/torque/server_name

3. make –got errors, needed yum install tclx-devel

4. make install

5. edit /var/torque/server_name with pnn

6. edit /var/spool/pbs/torque.cfg

SERVERHOST pnn
ALLOWCOMPUTEHOSTSUBMIT true

7. Set up Maui
Maui would not compile. The config file had some strange lines in it. So, I just copied the Make file from the cdf30 setup and it compiled fine. Again, I had to copy everything from /usr/local/maui to /var/maui and make a link to /usr/local/maui from /var/maui.

8. Edit /var/maui/maui.cfg, look at cdf30 copy for the two lines.

Our queue setup for job priority is basically, first in, first out. However, at times, a user will have many jobs in the queue, but only a couple are running. Then, if more jobs are submit to the queue, those jobs run before the first ones are done. This happens when something goes wrong with the scheduler and these jobs are given DEFER status. Unfortunately, these jobs remain deferred even after whatever caused the problem has passed. The current solution is to run (as root) /var/maui/bin/releasehold job number, which will release the hold. I haven’t yet found a way to do this for all jobs, so as of now, this has to be run for each job.

This manual should give some clues as to what we can do to eliminate this problem.