Archive for the ‘Hardware’ Category


/c2/p2 show all
.
.
.

The light on the drive corresponding to this id will flash when this command is run. Helpful for identifying which drive is which. Especially useful, since /cx/px set identifiy doesn’t work on our raid card. (9650SE-8LPML).

In our RHEL5 monitoring station with 10 monitors, we’ve been having some problems with crashes. Turns out that the kernel we were using, which was the standard kernel, would not access all of the ram in our system. (Currently 4GB) By installing and running the PAE (Physical Address Extension) kernel, we are now able to get all the ram in our system.

Old kernel and ram:

[~]# more mem1.txt
Linux monstation 2.6.18-164.2.1.el5 #1 SMP Mon Sep 21 04:37:51 EDT 2009 i686 i686 i386 GNU/Linux
total used free shared buffers cached
Mem: 2265 938 1327 0 191 605
-/+ buffers/cache: 141 2124
Swap: 1027 0 1027
Total: 3293 938 2355

New kernel and ram:

[~]# more mem-PAE.txt
Linux monstation 2.6.18-164.2.1.el5PAE #1 SMP Mon Sep 21 04:45:05 EDT 2009 i686 i686 i386 GNU/Linux
total used free shared buffers cached
Mem: 4043 837 3205 0 65 569
-/+ buffers/cache: 202 3841
Swap: 1027 0 1027
Total: 5071 837 4233

I’ve ordered four more gb of ram for this system, so we’ll have it maxed out at 8gb.

My problem from the other day was rebuilding a raid when I got ecc errors on a different disk than the one being rebuilt. I did a rescan and the ecc errors went away, but the rebuild seemed to be stuck. I contacted 3ware, makers of our raid card and was told to do this:

//cdfs3> info c0

Unit  UnitType  Status         %Cmpl  Stripe  Size(GB)  Cache  AVerify  IgnECC
------------------------------------------------------------------------------
u0    RAID-5    OK             -      64K     1396.95   ON     OFF      OFF      
u1    RAID-5    REBUILDING     89     64K     1396.95   OFF    OFF      OFF      

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     OK               u0     465.76 GB   976773168     WD-WCANU1137212     
p1     OK               u0     465.76 GB   976773168     WD-WCANU1090078     
p2     OK               u0     465.76 GB   976773168     WD-WCANU1119743     
p3     OK               u0     465.76 GB   976773168     WD-WCANU1089924     
p4     OK               u1     465.76 GB   976773168     WD-WCANU1136981     
p5     OK               u1     465.76 GB   976773168     WD-WCANU1109927     
p6     DEGRADED         u1     465.76 GB   976773168     WD-WCAPW5103756     
p7     OK               u1     465.76 GB   976773168     WD-WCANU1125288     

//cdfs3> maint remove c0 p6
Exporting port /c0/p6 ... Done.

//cdfs3> info c0

Unit  UnitType  Status         %Cmpl  Stripe  Size(GB)  Cache  AVerify  IgnECC
------------------------------------------------------------------------------
u0    RAID-5    OK             -      64K     1396.95   ON     OFF      OFF      
u1    RAID-5    DEGRADED       -      64K     1396.95   OFF    OFF      OFF      

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     OK               u0     465.76 GB   976773168     WD-WCANU1137212     
p1     OK               u0     465.76 GB   976773168     WD-WCANU1090078     
p2     OK               u0     465.76 GB   976773168     WD-WCANU1119743     
p3     OK               u0     465.76 GB   976773168     WD-WCANU1089924     
p4     OK               u1     465.76 GB   976773168     WD-WCANU1136981     
p5     OK               u1     465.76 GB   976773168     WD-WCANU1109927     
p6     NOT-PRESENT      -      -           -             -
p7     OK               u1     465.76 GB   976773168     WD-WCANU1125288     

//cdfs3> rescan
Rescanning controller /c0 for units and drives ...Done.
Found the following unit(s): [none].
Found the following drive(s): [/c0/p6].

//cdfs3> info c0

Unit  UnitType  Status         %Cmpl  Stripe  Size(GB)  Cache  AVerify  IgnECC
------------------------------------------------------------------------------
u0    RAID-5    OK             -      64K     1396.95   ON     OFF      OFF      
u1    RAID-5    DEGRADED       -      64K     1396.95   OFF    OFF      OFF      

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     OK               u0     465.76 GB   976773168     WD-WCANU1137212     
p1     OK               u0     465.76 GB   976773168     WD-WCANU1090078     
p2     OK               u0     465.76 GB   976773168     WD-WCANU1119743     
p3     OK               u0     465.76 GB   976773168     WD-WCANU1089924     
p4     OK               u1     465.76 GB   976773168     WD-WCANU1136981     
p5     OK               u1     465.76 GB   976773168     WD-WCANU1109927     
p6     OK               -      465.76 GB   976773168     WD-WCAPW5103756     
p7     OK               u1     465.76 GB   976773168     WD-WCANU1125288     

//cdfs3> /c0/u1 start rebuild disk=6 ignoreecc
Sending rebuild start request to /c0/u1 on 1 disk(s) [6] ... Done.


//cdfs3> info c0

Unit  UnitType  Status         %Cmpl  Stripe  Size(GB)  Cache  AVerify  IgnECC
------------------------------------------------------------------------------
u0    RAID-5    OK             -      64K     1396.95   ON     OFF      OFF      
u1    RAID-5    REBUILDING     0      64K     1396.95   OFF    OFF      ON       

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     OK               u0     465.76 GB   976773168     WD-WCANU1137212     
p1     OK               u0     465.76 GB   976773168     WD-WCANU1090078     
p2     OK               u0     465.76 GB   976773168     WD-WCANU1119743     
p3     OK               u0     465.76 GB   976773168     WD-WCANU1089924     
p4     OK               u1     465.76 GB   976773168     WD-WCANU1136981     
p5     OK               u1     465.76 GB   976773168     WD-WCANU1109927     
p6     DEGRADED         u1     465.76 GB   976773168     WD-WCAPW5103756     
p7     OK               u1     465.76 GB   976773168     WD-WCANU1125288     

This seems to be working. I guess I’ll know in a few hours if everything is ok.

If this still doesn’t work, I’m supposed to send 3ware an error log.

./tw_CLI info c0 diag>error.txt

The easy raid rebuild from yesterday turned out to not be so easy. Checking today, I got this message:

//cdfs3> info c0

Unit  UnitType  Status         %Cmpl  Stripe  Size(GB)  Cache  AVerify  IgnECC
------------------------------------------------------------------------------
u0    RAID-5    OK             -      64K     1396.95   ON     OFF      OFF      
u1    RAID-5    REBUILDING     89     64K     1396.95   OFF    OFF      OFF      

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     OK               u0     465.76 GB   976773168     WD-WCANU1137212     
p1     OK               u0     465.76 GB   976773168     WD-WCANU1090078     
p2     OK               u0     465.76 GB   976773168     WD-WCANU1119743     
p3     OK               u0     465.76 GB   976773168     WD-WCANU1089924     
p4     OK               u1     465.76 GB   976773168     WD-WCANU1136981     
p5     ECC-ERROR        u1     465.76 GB   976773168     WD-WCANU1109927     
p6     DEGRADED         u1     465.76 GB   976773168     WD-WCAPW5103756     
p7     OK               u1     465.76 GB   976773168     WD-WCANU1125288   

This does not look good at all, but the disk seems to be ok. So I tried this:

//cdfs3> /c0 rescan
Rescanning controller /c0 for units and drives ...Done.
Found the following unit(s): [none].
Found the following drive(s): [none].

//cdfs3> info c0

Unit  UnitType  Status         %Cmpl  Stripe  Size(GB)  Cache  AVerify  IgnECC
------------------------------------------------------------------------------
u0    RAID-5    OK             -      64K     1396.95   ON     OFF      OFF      
u1    RAID-5    REBUILDING     89     64K     1396.95   OFF    OFF      OFF      

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     OK               u0     465.76 GB   976773168     WD-WCANU1137212     
p1     OK               u0     465.76 GB   976773168     WD-WCANU1090078     
p2     OK               u0     465.76 GB   976773168     WD-WCANU1119743     
p3     OK               u0     465.76 GB   976773168     WD-WCANU1089924     
p4     OK               u1     465.76 GB   976773168     WD-WCANU1136981     
p5     OK               u1     465.76 GB   976773168     WD-WCANU1109927     
p6     DEGRADED         u1     465.76 GB   976773168     WD-WCAPW5103756     
p7     OK               u1     465.76 GB   976773168     WD-WCANU1125288     

That’s good. Now, I know the disk p6 is good because it was just replaced. I’m unsure now, if the rebuild will continue without problems or if I need to restart it. I think I’ll leave it for a while to see if things start working again. If it stays stuck at 89% for a while, I’ll run the rebuild command again.

I tried to reissue the command and got the following:

//cdfs3> /c0/u1 start rebuild disk=6
Sending rebuild start request to /c0/u1 on 1 disk(s) [6] ... Failed.

(0x0B:0x0032): Unit is rebuilding
//cdfs3> info c0

Unit  UnitType  Status         %Cmpl  Stripe  Size(GB)  Cache  AVerify  IgnECC
------------------------------------------------------------------------------
u0    RAID-5    OK             -      64K     1396.95   ON     OFF      OFF      
u1    RAID-5    REBUILDING     89     64K     1396.95   OFF    OFF      OFF      

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     OK               u0     465.76 GB   976773168     WD-WCANU1137212     
p1     OK               u0     465.76 GB   976773168     WD-WCANU1090078     
p2     OK               u0     465.76 GB   976773168     WD-WCANU1119743     
p3     OK               u0     465.76 GB   976773168     WD-WCANU1089924     
p4     OK               u1     465.76 GB   976773168     WD-WCANU1136981     
p5     OK               u1     465.76 GB   976773168     WD-WCANU1109927     
p6     DEGRADED         u1     465.76 GB   976773168     WD-WCAPW5103756     
p7     OK               u1     465.76 GB   976773168     WD-WCANU1125288     

So I think I’ll just have to leave it and hope that it finishes.

Fdisk doesn’t always work properly on disk sizes over 2TB. In our new servers, we’ve been having 3TB raids installed. Here are the instructions for setting up the disk.

[root@server ~]# parted /dev/sda
GNU Parted 1.6.19
Copyright (C) 1998 - 2004 Free Software Foundation, Inc.
This program is free software, covered by the GNU General Public License.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the
implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
for more details.

Using /dev/sda
(parted) print                                                            
Disk geometry for /dev/sda: 0.000-2860962.000 megabytes
Disk label type: gpt
Minor    Start       End     Filesystem  Name                  Flags
1          0.017   3000.000  ext3                              
(parted) mklabel gpt
(parted) mkpart primary 0 -0                                              
(parted) quit                                                             
Information: Don't forget to update /etc/fstab, if necessary.             

[root@server ~]# mkfs.ext3 /dev/sda1
mke2fs 1.35 (28-Feb-2004)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
366215168 inodes, 732406263 blocks
36620313 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=734003200
22352 block groups
32768 blocks per group, 32768 fragments per group
16384 inodes per group
Superblock backups stored on blocks: 
	32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
	4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 
	102400000, 214990848, 512000000, 550731776, 644972544

Writing inode tables: done                            
Creating journal (8192 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 28 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.
[root@cps3 ~]# tune2fs -c0 -i0 /dev/sda1
tune2fs 1.35 (28-Feb-2004)
Setting maximal mount count to -1
Setting interval between check 0 seconds

The mkpart primary 0 -0 says to use the entire disk for the partition.

//cdfs3> info

Ctl   Model        Ports   Drives   Units   NotOpt   RRate   VRate   BBU
------------------------------------------------------------------------
c0    9650SE-8LPML 8       8        2       1        4       4       -        

//cdfs3> info c0

Unit  UnitType  Status         %Cmpl  Stripe  Size(GB)  Cache  AVerify  IgnECC
------------------------------------------------------------------------------
u0    RAID-5    OK             -      64K     1396.95   ON     OFF      OFF      
u1    RAID-5    DEGRADED       -      64K     1396.95   OFF    OFF      OFF      

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     OK               u0     465.76 GB   976773168     WD-WCANU1137212     
p1     OK               u0     465.76 GB   976773168     WD-WCANU1090078     
p2     OK               u0     465.76 GB   976773168     WD-WCANU1119743     
p3     OK               u0     465.76 GB   976773168     WD-WCANU1089924     
p4     OK               u1     465.76 GB   976773168     WD-WCANU1136981     
p5     OK               u1     465.76 GB   976773168     WD-WCANU1109927     
p6     DEVICE-ERROR     u1     465.76 GB   976773168     WD-WCANU1069353     
p7     OK               u1     465.76 GB   976773168     WD-WCANU1125288     

Took out disk p6 and replaced with a new one. Waited a minute or so and got:

//cdfs3> info c0

Unit  UnitType  Status         %Cmpl  Stripe  Size(GB)  Cache  AVerify  IgnECC
------------------------------------------------------------------------------
u0    RAID-5    OK             -      64K     1396.95   ON     OFF      OFF      
u1    RAID-5    DEGRADED       -      64K     1396.95   OFF    OFF      OFF      

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     OK               u0     465.76 GB   976773168     WD-WCANU1137212     
p1     OK               u0     465.76 GB   976773168     WD-WCANU1090078     
p2     OK               u0     465.76 GB   976773168     WD-WCANU1119743     
p3     OK               u0     465.76 GB   976773168     WD-WCANU1089924     
p4     OK               u1     465.76 GB   976773168     WD-WCANU1136981     
p5     OK               u1     465.76 GB   976773168     WD-WCANU1109927     
p6     OK               -      465.76 GB   976773168     WD-WCAPW5103756     
p7     OK               u1     465.76 GB   976773168     WD-WCANU1125288     

//cdfs3> /c0/u1 start rebuild disk=6
Sending rebuild start request to /c0/u1 on 1 disk(s) [6] ... Done.


//cdfs3> info c0

Unit  UnitType  Status         %Cmpl  Stripe  Size(GB)  Cache  AVerify  IgnECC
------------------------------------------------------------------------------
u0    RAID-5    OK             -      64K     1396.95   ON     OFF      OFF      
u1    RAID-5    REBUILDING     0      64K     1396.95   OFF    OFF      OFF      

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     OK               u0     465.76 GB   976773168     WD-WCANU1137212     
p1     OK               u0     465.76 GB   976773168     WD-WCANU1090078     
p2     OK               u0     465.76 GB   976773168     WD-WCANU1119743     
p3     OK               u0     465.76 GB   976773168     WD-WCANU1089924     
p4     OK               u1     465.76 GB   976773168     WD-WCANU1136981     
p5     OK               u1     465.76 GB   976773168     WD-WCANU1109927     
p6     DEGRADED         u1     465.76 GB   976773168     WD-WCAPW5103756     
p7     OK               u1     465.76 GB   976773168     WD-WCANU1125288     

Now just need to wait for the rebuild to finish.

Until I find a place to store info on all the raids and disks that I’ve replaced, this blog will have to do. So here is the latest replacement.

[root@pnn tw_cli]# ./tw_cli
//pnn> info c1

Unit  UnitType  Status         %Cmpl  Stripe  Size(GB)  Cache  AVerify  IgnECC
------------------------------------------------------------------------------
u0    RAID-5    DEGRADED       -      64K     1629.74   OFF    OFF      OFF      

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     OK               u0     233.76 GB   490234752     WD-WCANY1850307     
p1     OK               u0     233.76 GB   490234752     WD-WCANY1790824     
p2     OK               u0     233.76 GB   490234752     WD-WCANY1851579     
p3     OK               u0     233.76 GB   490234752     WD-WCANY1789766     
p4     NOT-PRESENT      -      -           -             -
p5     OK               u0     233.76 GB   490234752     WD-WCANY1787889     
p6     OK               u0     233.76 GB   490234752     WD-WCANY1788952     
p7     OK               u0     233.76 GB   490234752     WD-WCANY1788819     

//pnn> /c1 remove p4
Exporting port /c1/p4 ... Failed.

(0x0B:0x002E): Port empty
//pnn> info c1 

Unit  UnitType  Status         %Cmpl  Stripe  Size(GB)  Cache  AVerify  IgnECC
------------------------------------------------------------------------------
u0    RAID-5    DEGRADED       -      64K     1629.74   OFF    OFF      OFF      

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     OK               u0     233.76 GB   490234752     WD-WCANY1850307     
p1     OK               u0     233.76 GB   490234752     WD-WCANY1790824     
p2     OK               u0     233.76 GB   490234752     WD-WCANY1851579     
p3     OK               u0     233.76 GB   490234752     WD-WCANY1789766     
p4     NOT-PRESENT      -      -           -             -
p5     OK               u0     233.76 GB   490234752     WD-WCANY1787889     
p6     OK               u0     233.76 GB   490234752     WD-WCANY1788952     
p7     OK               u0     233.76 GB   490234752     WD-WCANY1788819     

//pnn> /c1 rescan
Rescanning controller /c1 for units and drives ...Done.
Found the following unit(s): [none].
Found the following drive(s): [none].

//pnn> info c1

Unit  UnitType  Status         %Cmpl  Stripe  Size(GB)  Cache  AVerify  IgnECC
------------------------------------------------------------------------------
u0    RAID-5    DEGRADED       -      64K     1629.74   OFF    OFF      OFF      

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     OK               u0     233.76 GB   490234752     WD-WCANY1850307     
p1     OK               u0     233.76 GB   490234752     WD-WCANY1790824     
p2     OK               u0     233.76 GB   490234752     WD-WCANY1851579     
p3     OK               u0     233.76 GB   490234752     WD-WCANY1789766     
p4     OK               -      233.81 GB   490350672     WD-WCAT1A628774     
p5     OK               u0     233.76 GB   490234752     WD-WCANY1787889     
p6     OK               u0     233.76 GB   490234752     WD-WCANY1788952     
p7     OK               u0     233.76 GB   490234752     WD-WCANY1788819     

//pnn> /c1/u0 start rebuild disk=4
Sending rebuild start request to /c1/u0 on 1 disk(s) [4] ... Done.


//pnn> info c1

Unit  UnitType  Status         %Cmpl  Stripe  Size(GB)  Cache  AVerify  IgnECC
------------------------------------------------------------------------------
u0    RAID-5    REBUILDING     0      64K     1629.74   OFF    OFF      OFF      

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     OK               u0     233.76 GB   490234752     WD-WCANY1850307     
p1     OK               u0     233.76 GB   490234752     WD-WCANY1790824     
p2     OK               u0     233.76 GB   490234752     WD-WCANY1851579     
p3     OK               u0     233.76 GB   490234752     WD-WCANY1789766     
p4     DEGRADED         u0     233.81 GB   490350672     WD-WCAT1A628774     
p5     OK               u0     233.76 GB   490234752     WD-WCANY1787889     
p6     OK               u0     233.76 GB   490234752     WD-WCANY1788952     
p7     OK               u0     233.76 GB   490234752     WD-WCANY1788819     

//pnn> exit
[root@pnn tw_cli]# 

Here are the steps we did:
1. Make the following directories. You might have to be root to make them, but then change the owner and group on /opt/atlas to a regular user.

mkdir -p /opt/atlas/apt/rpmdm
mkdir -p /opt/atlas/apt/rpm
chown -R user:group /opt/atlas

Exit root.

2. Get apt and install in /opt/atlas

cd /opt/atlas
rpm -ivh --nodeps --relocate=/=$PWD/apt --dbpath=$PWD/apt/rpmdb http://atlas-computing.web.cern.ch/atlas-computing/links/reposDirectory/apt/apt-0.5_15lorg3.90-1.slc4.atlas.i386.rpm

3. Edit ~/.rpmmacros (don’t do this to the root account)
%_dbpath /opt/atlas/apt/var/lib/rpm
%_rpmlock_path /opt/atlas/apt/rpm/transaction

4. Get and fix the apt config file

cd /opt/atlas
wget http://pcatd12.cern.ch/releases/download/config/apt.conf
sed s#INSTALL_ROOT#$PWD#g apt.conf > apt/etc/apt.conf

5. Add the sources for these files to the sources list

cd /opt/atlas/apt/etc/apt/sources.list.d
wget http://pcatd12.cern.ch/releases/download/config/atlas.list

6. Get the setup file

cd /opt/atlas/apt
wget http://pcatd12.cern.ch/releases/download/config/setup.sh

7. Source the setup file

source setup.sh

8. Update apt lists

apt-get update

9. Install the tdaq software and source (all dependencies will be installed as well)

apt-get install tdaq-01-09-01_i686_slc4_gcc34_opt
apt-get install tdaq_src

The library in which we are most-interested is in:
/opt/atlas/tdaq/tdaq-01-09-01/installed/i686-slc4-gcc34-opt/lib/libROSslink.so

We copied this file to /usr/lib.

Found a new program, called lshw. It just gives a list of the hardware in the computer, much like /etc/sysconfig/hwconf. But, the part I like is that it also gives you the model of the motherboard. So, it saves me a trip of walking to other buildings to see what’s inside a computer.

Example showing motherboard (output is long):

*-core
       description: Motherboard
       product: P4S800
       vendor: ASUSTeK Computer INC.
       physical id: 0
       version: REV 1.xx
       serial: xxxxxxxxxxx
     *-firmware
          description: BIOS
          vendor: Award Software, Inc.
          physical id: 0
          version: ASUS P4S800 ACPI BIOS Revision 1009 (06/08/2004)
          size: 64KiB
          capacity: 192KiB
          capabilities: pci pnp apm upgrade shadowing escd cdboot bootselect socketedrom edd int13floppy360 int13floppy1200 int13floppy720 int13floppy2880 int5printscreen int9keyboard int14serial int17printer int10video acpi usb agp

gtf 1440 900 60 -x

The numbers are the resolution of the montior. In this case, it’s for a Dell 1908WFP. I got the numbers by looking at the monitor info in the menu that pops up on the monitor.

Put the output in /etc/X11/xorg.conf and add the resolution to the list of resolutions near the bottom of the file.