/c2/p2 show all
.
.
.
The light on the drive corresponding to this id will flash when this command is run. Helpful for identifying which drive is which. Especially useful, since /cx/px set identifiy doesn’t work on our raid card. (9650SE-8LPML).
Archive for the ‘Hardware’ Category
In our RHEL5 monitoring station with 10 monitors, we’ve been having some problems with crashes. Turns out that the kernel we were using, which was the standard kernel, would not access all of the ram in our system. (Currently 4GB) By installing and running the PAE (Physical Address Extension) kernel, we are now able to get all the ram in our system.
Old kernel and ram:
[~]# more mem1.txt
Linux monstation 2.6.18-164.2.1.el5 #1 SMP Mon Sep 21 04:37:51 EDT 2009 i686 i686 i386 GNU/Linux
total used free shared buffers cached
Mem: 2265 938 1327 0 191 605
-/+ buffers/cache: 141 2124
Swap: 1027 0 1027
Total: 3293 938 2355
New kernel and ram:
[~]# more mem-PAE.txt
Linux monstation 2.6.18-164.2.1.el5PAE #1 SMP Mon Sep 21 04:45:05 EDT 2009 i686 i686 i386 GNU/Linux
total used free shared buffers cached
Mem: 4043 837 3205 0 65 569
-/+ buffers/cache: 202 3841
Swap: 1027 0 1027
Total: 5071 837 4233
I’ve ordered four more gb of ram for this system, so we’ll have it maxed out at 8gb.
My problem from the other day was rebuilding a raid when I got ecc errors on a different disk than the one being rebuilt. I did a rescan and the ecc errors went away, but the rebuild seemed to be stuck. I contacted 3ware, makers of our raid card and was told to do this:
//cdfs3> info c0 Unit UnitType Status %Cmpl Stripe Size(GB) Cache AVerify IgnECC ------------------------------------------------------------------------------ u0 RAID-5 OK - 64K 1396.95 ON OFF OFF u1 RAID-5 REBUILDING 89 64K 1396.95 OFF OFF OFF Port Status Unit Size Blocks Serial --------------------------------------------------------------- p0 OK u0 465.76 GB 976773168 WD-WCANU1137212 p1 OK u0 465.76 GB 976773168 WD-WCANU1090078 p2 OK u0 465.76 GB 976773168 WD-WCANU1119743 p3 OK u0 465.76 GB 976773168 WD-WCANU1089924 p4 OK u1 465.76 GB 976773168 WD-WCANU1136981 p5 OK u1 465.76 GB 976773168 WD-WCANU1109927 p6 DEGRADED u1 465.76 GB 976773168 WD-WCAPW5103756 p7 OK u1 465.76 GB 976773168 WD-WCANU1125288 //cdfs3> maint remove c0 p6 Exporting port /c0/p6 ... Done. //cdfs3> info c0 Unit UnitType Status %Cmpl Stripe Size(GB) Cache AVerify IgnECC ------------------------------------------------------------------------------ u0 RAID-5 OK - 64K 1396.95 ON OFF OFF u1 RAID-5 DEGRADED - 64K 1396.95 OFF OFF OFF Port Status Unit Size Blocks Serial --------------------------------------------------------------- p0 OK u0 465.76 GB 976773168 WD-WCANU1137212 p1 OK u0 465.76 GB 976773168 WD-WCANU1090078 p2 OK u0 465.76 GB 976773168 WD-WCANU1119743 p3 OK u0 465.76 GB 976773168 WD-WCANU1089924 p4 OK u1 465.76 GB 976773168 WD-WCANU1136981 p5 OK u1 465.76 GB 976773168 WD-WCANU1109927 p6 NOT-PRESENT - - - - p7 OK u1 465.76 GB 976773168 WD-WCANU1125288 //cdfs3> rescan Rescanning controller /c0 for units and drives ...Done. Found the following unit(s): [none]. Found the following drive(s): [/c0/p6]. //cdfs3> info c0 Unit UnitType Status %Cmpl Stripe Size(GB) Cache AVerify IgnECC ------------------------------------------------------------------------------ u0 RAID-5 OK - 64K 1396.95 ON OFF OFF u1 RAID-5 DEGRADED - 64K 1396.95 OFF OFF OFF Port Status Unit Size Blocks Serial --------------------------------------------------------------- p0 OK u0 465.76 GB 976773168 WD-WCANU1137212 p1 OK u0 465.76 GB 976773168 WD-WCANU1090078 p2 OK u0 465.76 GB 976773168 WD-WCANU1119743 p3 OK u0 465.76 GB 976773168 WD-WCANU1089924 p4 OK u1 465.76 GB 976773168 WD-WCANU1136981 p5 OK u1 465.76 GB 976773168 WD-WCANU1109927 p6 OK - 465.76 GB 976773168 WD-WCAPW5103756 p7 OK u1 465.76 GB 976773168 WD-WCANU1125288 //cdfs3> /c0/u1 start rebuild disk=6 ignoreecc Sending rebuild start request to /c0/u1 on 1 disk(s) [6] ... Done. //cdfs3> info c0 Unit UnitType Status %Cmpl Stripe Size(GB) Cache AVerify IgnECC ------------------------------------------------------------------------------ u0 RAID-5 OK - 64K 1396.95 ON OFF OFF u1 RAID-5 REBUILDING 0 64K 1396.95 OFF OFF ON Port Status Unit Size Blocks Serial --------------------------------------------------------------- p0 OK u0 465.76 GB 976773168 WD-WCANU1137212 p1 OK u0 465.76 GB 976773168 WD-WCANU1090078 p2 OK u0 465.76 GB 976773168 WD-WCANU1119743 p3 OK u0 465.76 GB 976773168 WD-WCANU1089924 p4 OK u1 465.76 GB 976773168 WD-WCANU1136981 p5 OK u1 465.76 GB 976773168 WD-WCANU1109927 p6 DEGRADED u1 465.76 GB 976773168 WD-WCAPW5103756 p7 OK u1 465.76 GB 976773168 WD-WCANU1125288
This seems to be working. I guess I’ll know in a few hours if everything is ok.
If this still doesn’t work, I’m supposed to send 3ware an error log.
./tw_CLI info c0 diag>error.txt
The easy raid rebuild from yesterday turned out to not be so easy. Checking today, I got this message:
//cdfs3> info c0 Unit UnitType Status %Cmpl Stripe Size(GB) Cache AVerify IgnECC ------------------------------------------------------------------------------ u0 RAID-5 OK - 64K 1396.95 ON OFF OFF u1 RAID-5 REBUILDING 89 64K 1396.95 OFF OFF OFF Port Status Unit Size Blocks Serial --------------------------------------------------------------- p0 OK u0 465.76 GB 976773168 WD-WCANU1137212 p1 OK u0 465.76 GB 976773168 WD-WCANU1090078 p2 OK u0 465.76 GB 976773168 WD-WCANU1119743 p3 OK u0 465.76 GB 976773168 WD-WCANU1089924 p4 OK u1 465.76 GB 976773168 WD-WCANU1136981 p5 ECC-ERROR u1 465.76 GB 976773168 WD-WCANU1109927 p6 DEGRADED u1 465.76 GB 976773168 WD-WCAPW5103756 p7 OK u1 465.76 GB 976773168 WD-WCANU1125288
This does not look good at all, but the disk seems to be ok. So I tried this:
//cdfs3> /c0 rescan Rescanning controller /c0 for units and drives ...Done. Found the following unit(s): [none]. Found the following drive(s): [none]. //cdfs3> info c0 Unit UnitType Status %Cmpl Stripe Size(GB) Cache AVerify IgnECC ------------------------------------------------------------------------------ u0 RAID-5 OK - 64K 1396.95 ON OFF OFF u1 RAID-5 REBUILDING 89 64K 1396.95 OFF OFF OFF Port Status Unit Size Blocks Serial --------------------------------------------------------------- p0 OK u0 465.76 GB 976773168 WD-WCANU1137212 p1 OK u0 465.76 GB 976773168 WD-WCANU1090078 p2 OK u0 465.76 GB 976773168 WD-WCANU1119743 p3 OK u0 465.76 GB 976773168 WD-WCANU1089924 p4 OK u1 465.76 GB 976773168 WD-WCANU1136981 p5 OK u1 465.76 GB 976773168 WD-WCANU1109927 p6 DEGRADED u1 465.76 GB 976773168 WD-WCAPW5103756 p7 OK u1 465.76 GB 976773168 WD-WCANU1125288
That’s good. Now, I know the disk p6 is good because it was just replaced. I’m unsure now, if the rebuild will continue without problems or if I need to restart it. I think I’ll leave it for a while to see if things start working again. If it stays stuck at 89% for a while, I’ll run the rebuild command again.
I tried to reissue the command and got the following:
//cdfs3> /c0/u1 start rebuild disk=6 Sending rebuild start request to /c0/u1 on 1 disk(s) [6] ... Failed. (0x0B:0x0032): Unit is rebuilding //cdfs3> info c0 Unit UnitType Status %Cmpl Stripe Size(GB) Cache AVerify IgnECC ------------------------------------------------------------------------------ u0 RAID-5 OK - 64K 1396.95 ON OFF OFF u1 RAID-5 REBUILDING 89 64K 1396.95 OFF OFF OFF Port Status Unit Size Blocks Serial --------------------------------------------------------------- p0 OK u0 465.76 GB 976773168 WD-WCANU1137212 p1 OK u0 465.76 GB 976773168 WD-WCANU1090078 p2 OK u0 465.76 GB 976773168 WD-WCANU1119743 p3 OK u0 465.76 GB 976773168 WD-WCANU1089924 p4 OK u1 465.76 GB 976773168 WD-WCANU1136981 p5 OK u1 465.76 GB 976773168 WD-WCANU1109927 p6 DEGRADED u1 465.76 GB 976773168 WD-WCAPW5103756 p7 OK u1 465.76 GB 976773168 WD-WCANU1125288
So I think I’ll just have to leave it and hope that it finishes.
Fdisk doesn’t always work properly on disk sizes over 2TB. In our new servers, we’ve been having 3TB raids installed. Here are the instructions for setting up the disk.
[root@server ~]# parted /dev/sda GNU Parted 1.6.19 Copyright (C) 1998 - 2004 Free Software Foundation, Inc. This program is free software, covered by the GNU General Public License. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. Using /dev/sda (parted) print Disk geometry for /dev/sda: 0.000-2860962.000 megabytes Disk label type: gpt Minor Start End Filesystem Name Flags 1 0.017 3000.000 ext3 (parted) mklabel gpt (parted) mkpart primary 0 -0 (parted) quit Information: Don't forget to update /etc/fstab, if necessary. [root@server ~]# mkfs.ext3 /dev/sda1 mke2fs 1.35 (28-Feb-2004) Filesystem label= OS type: Linux Block size=4096 (log=2) Fragment size=4096 (log=2) 366215168 inodes, 732406263 blocks 36620313 blocks (5.00%) reserved for the super user First data block=0 Maximum filesystem blocks=734003200 22352 block groups 32768 blocks per group, 32768 fragments per group 16384 inodes per group Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 102400000, 214990848, 512000000, 550731776, 644972544 Writing inode tables: done Creating journal (8192 blocks): done Writing superblocks and filesystem accounting information: done This filesystem will be automatically checked every 28 mounts or 180 days, whichever comes first. Use tune2fs -c or -i to override. [root@cps3 ~]# tune2fs -c0 -i0 /dev/sda1 tune2fs 1.35 (28-Feb-2004) Setting maximal mount count to -1 Setting interval between check 0 seconds
The mkpart primary 0 -0 says to use the entire disk for the partition.
//cdfs3> info Ctl Model Ports Drives Units NotOpt RRate VRate BBU ------------------------------------------------------------------------ c0 9650SE-8LPML 8 8 2 1 4 4 - //cdfs3> info c0 Unit UnitType Status %Cmpl Stripe Size(GB) Cache AVerify IgnECC ------------------------------------------------------------------------------ u0 RAID-5 OK - 64K 1396.95 ON OFF OFF u1 RAID-5 DEGRADED - 64K 1396.95 OFF OFF OFF Port Status Unit Size Blocks Serial --------------------------------------------------------------- p0 OK u0 465.76 GB 976773168 WD-WCANU1137212 p1 OK u0 465.76 GB 976773168 WD-WCANU1090078 p2 OK u0 465.76 GB 976773168 WD-WCANU1119743 p3 OK u0 465.76 GB 976773168 WD-WCANU1089924 p4 OK u1 465.76 GB 976773168 WD-WCANU1136981 p5 OK u1 465.76 GB 976773168 WD-WCANU1109927 p6 DEVICE-ERROR u1 465.76 GB 976773168 WD-WCANU1069353 p7 OK u1 465.76 GB 976773168 WD-WCANU1125288
Took out disk p6 and replaced with a new one. Waited a minute or so and got:
//cdfs3> info c0 Unit UnitType Status %Cmpl Stripe Size(GB) Cache AVerify IgnECC ------------------------------------------------------------------------------ u0 RAID-5 OK - 64K 1396.95 ON OFF OFF u1 RAID-5 DEGRADED - 64K 1396.95 OFF OFF OFF Port Status Unit Size Blocks Serial --------------------------------------------------------------- p0 OK u0 465.76 GB 976773168 WD-WCANU1137212 p1 OK u0 465.76 GB 976773168 WD-WCANU1090078 p2 OK u0 465.76 GB 976773168 WD-WCANU1119743 p3 OK u0 465.76 GB 976773168 WD-WCANU1089924 p4 OK u1 465.76 GB 976773168 WD-WCANU1136981 p5 OK u1 465.76 GB 976773168 WD-WCANU1109927 p6 OK - 465.76 GB 976773168 WD-WCAPW5103756 p7 OK u1 465.76 GB 976773168 WD-WCANU1125288 //cdfs3> /c0/u1 start rebuild disk=6 Sending rebuild start request to /c0/u1 on 1 disk(s) [6] ... Done. //cdfs3> info c0 Unit UnitType Status %Cmpl Stripe Size(GB) Cache AVerify IgnECC ------------------------------------------------------------------------------ u0 RAID-5 OK - 64K 1396.95 ON OFF OFF u1 RAID-5 REBUILDING 0 64K 1396.95 OFF OFF OFF Port Status Unit Size Blocks Serial --------------------------------------------------------------- p0 OK u0 465.76 GB 976773168 WD-WCANU1137212 p1 OK u0 465.76 GB 976773168 WD-WCANU1090078 p2 OK u0 465.76 GB 976773168 WD-WCANU1119743 p3 OK u0 465.76 GB 976773168 WD-WCANU1089924 p4 OK u1 465.76 GB 976773168 WD-WCANU1136981 p5 OK u1 465.76 GB 976773168 WD-WCANU1109927 p6 DEGRADED u1 465.76 GB 976773168 WD-WCAPW5103756 p7 OK u1 465.76 GB 976773168 WD-WCANU1125288
Now just need to wait for the rebuild to finish.
Until I find a place to store info on all the raids and disks that I’ve replaced, this blog will have to do. So here is the latest replacement.
[root@pnn tw_cli]# ./tw_cli //pnn> info c1 Unit UnitType Status %Cmpl Stripe Size(GB) Cache AVerify IgnECC ------------------------------------------------------------------------------ u0 RAID-5 DEGRADED - 64K 1629.74 OFF OFF OFF Port Status Unit Size Blocks Serial --------------------------------------------------------------- p0 OK u0 233.76 GB 490234752 WD-WCANY1850307 p1 OK u0 233.76 GB 490234752 WD-WCANY1790824 p2 OK u0 233.76 GB 490234752 WD-WCANY1851579 p3 OK u0 233.76 GB 490234752 WD-WCANY1789766 p4 NOT-PRESENT - - - - p5 OK u0 233.76 GB 490234752 WD-WCANY1787889 p6 OK u0 233.76 GB 490234752 WD-WCANY1788952 p7 OK u0 233.76 GB 490234752 WD-WCANY1788819 //pnn> /c1 remove p4 Exporting port /c1/p4 ... Failed. (0x0B:0x002E): Port empty //pnn> info c1 Unit UnitType Status %Cmpl Stripe Size(GB) Cache AVerify IgnECC ------------------------------------------------------------------------------ u0 RAID-5 DEGRADED - 64K 1629.74 OFF OFF OFF Port Status Unit Size Blocks Serial --------------------------------------------------------------- p0 OK u0 233.76 GB 490234752 WD-WCANY1850307 p1 OK u0 233.76 GB 490234752 WD-WCANY1790824 p2 OK u0 233.76 GB 490234752 WD-WCANY1851579 p3 OK u0 233.76 GB 490234752 WD-WCANY1789766 p4 NOT-PRESENT - - - - p5 OK u0 233.76 GB 490234752 WD-WCANY1787889 p6 OK u0 233.76 GB 490234752 WD-WCANY1788952 p7 OK u0 233.76 GB 490234752 WD-WCANY1788819 //pnn> /c1 rescan Rescanning controller /c1 for units and drives ...Done. Found the following unit(s): [none]. Found the following drive(s): [none]. //pnn> info c1 Unit UnitType Status %Cmpl Stripe Size(GB) Cache AVerify IgnECC ------------------------------------------------------------------------------ u0 RAID-5 DEGRADED - 64K 1629.74 OFF OFF OFF Port Status Unit Size Blocks Serial --------------------------------------------------------------- p0 OK u0 233.76 GB 490234752 WD-WCANY1850307 p1 OK u0 233.76 GB 490234752 WD-WCANY1790824 p2 OK u0 233.76 GB 490234752 WD-WCANY1851579 p3 OK u0 233.76 GB 490234752 WD-WCANY1789766 p4 OK - 233.81 GB 490350672 WD-WCAT1A628774 p5 OK u0 233.76 GB 490234752 WD-WCANY1787889 p6 OK u0 233.76 GB 490234752 WD-WCANY1788952 p7 OK u0 233.76 GB 490234752 WD-WCANY1788819 //pnn> /c1/u0 start rebuild disk=4 Sending rebuild start request to /c1/u0 on 1 disk(s) [4] ... Done. //pnn> info c1 Unit UnitType Status %Cmpl Stripe Size(GB) Cache AVerify IgnECC ------------------------------------------------------------------------------ u0 RAID-5 REBUILDING 0 64K 1629.74 OFF OFF OFF Port Status Unit Size Blocks Serial --------------------------------------------------------------- p0 OK u0 233.76 GB 490234752 WD-WCANY1850307 p1 OK u0 233.76 GB 490234752 WD-WCANY1790824 p2 OK u0 233.76 GB 490234752 WD-WCANY1851579 p3 OK u0 233.76 GB 490234752 WD-WCANY1789766 p4 DEGRADED u0 233.81 GB 490350672 WD-WCAT1A628774 p5 OK u0 233.76 GB 490234752 WD-WCANY1787889 p6 OK u0 233.76 GB 490234752 WD-WCANY1788952 p7 OK u0 233.76 GB 490234752 WD-WCANY1788819 //pnn> exit [root@pnn tw_cli]#
Here are the steps we did:
1. Make the following directories. You might have to be root to make them, but then change the owner and group on /opt/atlas to a regular user.
mkdir -p /opt/atlas/apt/rpmdm mkdir -p /opt/atlas/apt/rpm chown -R user:group /opt/atlas
Exit root.
2. Get apt and install in /opt/atlas
cd /opt/atlas rpm -ivh --nodeps --relocate=/=$PWD/apt --dbpath=$PWD/apt/rpmdb http://atlas-computing.web.cern.ch/atlas-computing/links/reposDirectory/apt/apt-0.5_15lorg3.90-1.slc4.atlas.i386.rpm
3. Edit ~/.rpmmacros (don’t do this to the root account)
%_dbpath /opt/atlas/apt/var/lib/rpm
%_rpmlock_path /opt/atlas/apt/rpm/transaction
4. Get and fix the apt config file
cd /opt/atlas wget http://pcatd12.cern.ch/releases/download/config/apt.conf sed s#INSTALL_ROOT#$PWD#g apt.conf > apt/etc/apt.conf
5. Add the sources for these files to the sources list
cd /opt/atlas/apt/etc/apt/sources.list.d wget http://pcatd12.cern.ch/releases/download/config/atlas.list
6. Get the setup file
cd /opt/atlas/apt wget http://pcatd12.cern.ch/releases/download/config/setup.sh
7. Source the setup file
source setup.sh
8. Update apt lists
apt-get update
9. Install the tdaq software and source (all dependencies will be installed as well)
apt-get install tdaq-01-09-01_i686_slc4_gcc34_opt apt-get install tdaq_src
The library in which we are most-interested is in:
/opt/atlas/tdaq/tdaq-01-09-01/installed/i686-slc4-gcc34-opt/lib/libROSslink.so
We copied this file to /usr/lib.
Found a new program, called lshw. It just gives a list of the hardware in the computer, much like /etc/sysconfig/hwconf. But, the part I like is that it also gives you the model of the motherboard. So, it saves me a trip of walking to other buildings to see what’s inside a computer.
Example showing motherboard (output is long):
*-core description: Motherboard product: P4S800 vendor: ASUSTeK Computer INC. physical id: 0 version: REV 1.xx serial: xxxxxxxxxxx *-firmware description: BIOS vendor: Award Software, Inc. physical id: 0 version: ASUS P4S800 ACPI BIOS Revision 1009 (06/08/2004) size: 64KiB capacity: 192KiB capabilities: pci pnp apm upgrade shadowing escd cdboot bootselect socketedrom edd int13floppy360 int13floppy1200 int13floppy720 int13floppy2880 int5printscreen int9keyboard int14serial int17printer int10video acpi usb agp
gtf 1440 900 60 -x
The numbers are the resolution of the montior. In this case, it’s for a Dell 1908WFP. I got the numbers by looking at the monitor info in the menu that pops up on the monitor.
Put the output in /etc/X11/xorg.conf and add the resolution to the list of resolutions near the bottom of the file.