Archive for August, 2009

My problem from the other day was rebuilding a raid when I got ecc errors on a different disk than the one being rebuilt. I did a rescan and the ecc errors went away, but the rebuild seemed to be stuck. I contacted 3ware, makers of our raid card and was told to do this:

//cdfs3> info c0

Unit  UnitType  Status         %Cmpl  Stripe  Size(GB)  Cache  AVerify  IgnECC
------------------------------------------------------------------------------
u0    RAID-5    OK             -      64K     1396.95   ON     OFF      OFF      
u1    RAID-5    REBUILDING     89     64K     1396.95   OFF    OFF      OFF      

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     OK               u0     465.76 GB   976773168     WD-WCANU1137212     
p1     OK               u0     465.76 GB   976773168     WD-WCANU1090078     
p2     OK               u0     465.76 GB   976773168     WD-WCANU1119743     
p3     OK               u0     465.76 GB   976773168     WD-WCANU1089924     
p4     OK               u1     465.76 GB   976773168     WD-WCANU1136981     
p5     OK               u1     465.76 GB   976773168     WD-WCANU1109927     
p6     DEGRADED         u1     465.76 GB   976773168     WD-WCAPW5103756     
p7     OK               u1     465.76 GB   976773168     WD-WCANU1125288     

//cdfs3> maint remove c0 p6
Exporting port /c0/p6 ... Done.

//cdfs3> info c0

Unit  UnitType  Status         %Cmpl  Stripe  Size(GB)  Cache  AVerify  IgnECC
------------------------------------------------------------------------------
u0    RAID-5    OK             -      64K     1396.95   ON     OFF      OFF      
u1    RAID-5    DEGRADED       -      64K     1396.95   OFF    OFF      OFF      

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     OK               u0     465.76 GB   976773168     WD-WCANU1137212     
p1     OK               u0     465.76 GB   976773168     WD-WCANU1090078     
p2     OK               u0     465.76 GB   976773168     WD-WCANU1119743     
p3     OK               u0     465.76 GB   976773168     WD-WCANU1089924     
p4     OK               u1     465.76 GB   976773168     WD-WCANU1136981     
p5     OK               u1     465.76 GB   976773168     WD-WCANU1109927     
p6     NOT-PRESENT      -      -           -             -
p7     OK               u1     465.76 GB   976773168     WD-WCANU1125288     

//cdfs3> rescan
Rescanning controller /c0 for units and drives ...Done.
Found the following unit(s): [none].
Found the following drive(s): [/c0/p6].

//cdfs3> info c0

Unit  UnitType  Status         %Cmpl  Stripe  Size(GB)  Cache  AVerify  IgnECC
------------------------------------------------------------------------------
u0    RAID-5    OK             -      64K     1396.95   ON     OFF      OFF      
u1    RAID-5    DEGRADED       -      64K     1396.95   OFF    OFF      OFF      

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     OK               u0     465.76 GB   976773168     WD-WCANU1137212     
p1     OK               u0     465.76 GB   976773168     WD-WCANU1090078     
p2     OK               u0     465.76 GB   976773168     WD-WCANU1119743     
p3     OK               u0     465.76 GB   976773168     WD-WCANU1089924     
p4     OK               u1     465.76 GB   976773168     WD-WCANU1136981     
p5     OK               u1     465.76 GB   976773168     WD-WCANU1109927     
p6     OK               -      465.76 GB   976773168     WD-WCAPW5103756     
p7     OK               u1     465.76 GB   976773168     WD-WCANU1125288     

//cdfs3> /c0/u1 start rebuild disk=6 ignoreecc
Sending rebuild start request to /c0/u1 on 1 disk(s) [6] ... Done.


//cdfs3> info c0

Unit  UnitType  Status         %Cmpl  Stripe  Size(GB)  Cache  AVerify  IgnECC
------------------------------------------------------------------------------
u0    RAID-5    OK             -      64K     1396.95   ON     OFF      OFF      
u1    RAID-5    REBUILDING     0      64K     1396.95   OFF    OFF      ON       

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     OK               u0     465.76 GB   976773168     WD-WCANU1137212     
p1     OK               u0     465.76 GB   976773168     WD-WCANU1090078     
p2     OK               u0     465.76 GB   976773168     WD-WCANU1119743     
p3     OK               u0     465.76 GB   976773168     WD-WCANU1089924     
p4     OK               u1     465.76 GB   976773168     WD-WCANU1136981     
p5     OK               u1     465.76 GB   976773168     WD-WCANU1109927     
p6     DEGRADED         u1     465.76 GB   976773168     WD-WCAPW5103756     
p7     OK               u1     465.76 GB   976773168     WD-WCANU1125288     

This seems to be working. I guess I’ll know in a few hours if everything is ok.

If this still doesn’t work, I’m supposed to send 3ware an error log.

./tw_CLI info c0 diag>error.txt

The easy raid rebuild from yesterday turned out to not be so easy. Checking today, I got this message:

//cdfs3> info c0

Unit  UnitType  Status         %Cmpl  Stripe  Size(GB)  Cache  AVerify  IgnECC
------------------------------------------------------------------------------
u0    RAID-5    OK             -      64K     1396.95   ON     OFF      OFF      
u1    RAID-5    REBUILDING     89     64K     1396.95   OFF    OFF      OFF      

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     OK               u0     465.76 GB   976773168     WD-WCANU1137212     
p1     OK               u0     465.76 GB   976773168     WD-WCANU1090078     
p2     OK               u0     465.76 GB   976773168     WD-WCANU1119743     
p3     OK               u0     465.76 GB   976773168     WD-WCANU1089924     
p4     OK               u1     465.76 GB   976773168     WD-WCANU1136981     
p5     ECC-ERROR        u1     465.76 GB   976773168     WD-WCANU1109927     
p6     DEGRADED         u1     465.76 GB   976773168     WD-WCAPW5103756     
p7     OK               u1     465.76 GB   976773168     WD-WCANU1125288   

This does not look good at all, but the disk seems to be ok. So I tried this:

//cdfs3> /c0 rescan
Rescanning controller /c0 for units and drives ...Done.
Found the following unit(s): [none].
Found the following drive(s): [none].

//cdfs3> info c0

Unit  UnitType  Status         %Cmpl  Stripe  Size(GB)  Cache  AVerify  IgnECC
------------------------------------------------------------------------------
u0    RAID-5    OK             -      64K     1396.95   ON     OFF      OFF      
u1    RAID-5    REBUILDING     89     64K     1396.95   OFF    OFF      OFF      

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     OK               u0     465.76 GB   976773168     WD-WCANU1137212     
p1     OK               u0     465.76 GB   976773168     WD-WCANU1090078     
p2     OK               u0     465.76 GB   976773168     WD-WCANU1119743     
p3     OK               u0     465.76 GB   976773168     WD-WCANU1089924     
p4     OK               u1     465.76 GB   976773168     WD-WCANU1136981     
p5     OK               u1     465.76 GB   976773168     WD-WCANU1109927     
p6     DEGRADED         u1     465.76 GB   976773168     WD-WCAPW5103756     
p7     OK               u1     465.76 GB   976773168     WD-WCANU1125288     

That’s good. Now, I know the disk p6 is good because it was just replaced. I’m unsure now, if the rebuild will continue without problems or if I need to restart it. I think I’ll leave it for a while to see if things start working again. If it stays stuck at 89% for a while, I’ll run the rebuild command again.

I tried to reissue the command and got the following:

//cdfs3> /c0/u1 start rebuild disk=6
Sending rebuild start request to /c0/u1 on 1 disk(s) [6] ... Failed.

(0x0B:0x0032): Unit is rebuilding
//cdfs3> info c0

Unit  UnitType  Status         %Cmpl  Stripe  Size(GB)  Cache  AVerify  IgnECC
------------------------------------------------------------------------------
u0    RAID-5    OK             -      64K     1396.95   ON     OFF      OFF      
u1    RAID-5    REBUILDING     89     64K     1396.95   OFF    OFF      OFF      

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     OK               u0     465.76 GB   976773168     WD-WCANU1137212     
p1     OK               u0     465.76 GB   976773168     WD-WCANU1090078     
p2     OK               u0     465.76 GB   976773168     WD-WCANU1119743     
p3     OK               u0     465.76 GB   976773168     WD-WCANU1089924     
p4     OK               u1     465.76 GB   976773168     WD-WCANU1136981     
p5     OK               u1     465.76 GB   976773168     WD-WCANU1109927     
p6     DEGRADED         u1     465.76 GB   976773168     WD-WCAPW5103756     
p7     OK               u1     465.76 GB   976773168     WD-WCANU1125288     

So I think I’ll just have to leave it and hope that it finishes.

Fdisk doesn’t always work properly on disk sizes over 2TB. In our new servers, we’ve been having 3TB raids installed. Here are the instructions for setting up the disk.

[root@server ~]# parted /dev/sda
GNU Parted 1.6.19
Copyright (C) 1998 - 2004 Free Software Foundation, Inc.
This program is free software, covered by the GNU General Public License.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the
implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
for more details.

Using /dev/sda
(parted) print                                                            
Disk geometry for /dev/sda: 0.000-2860962.000 megabytes
Disk label type: gpt
Minor    Start       End     Filesystem  Name                  Flags
1          0.017   3000.000  ext3                              
(parted) mklabel gpt
(parted) mkpart primary 0 -0                                              
(parted) quit                                                             
Information: Don't forget to update /etc/fstab, if necessary.             

[root@server ~]# mkfs.ext3 /dev/sda1
mke2fs 1.35 (28-Feb-2004)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
366215168 inodes, 732406263 blocks
36620313 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=734003200
22352 block groups
32768 blocks per group, 32768 fragments per group
16384 inodes per group
Superblock backups stored on blocks: 
	32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
	4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 
	102400000, 214990848, 512000000, 550731776, 644972544

Writing inode tables: done                            
Creating journal (8192 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 28 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.
[root@cps3 ~]# tune2fs -c0 -i0 /dev/sda1
tune2fs 1.35 (28-Feb-2004)
Setting maximal mount count to -1
Setting interval between check 0 seconds

The mkpart primary 0 -0 says to use the entire disk for the partition.

//cdfs3> info

Ctl   Model        Ports   Drives   Units   NotOpt   RRate   VRate   BBU
------------------------------------------------------------------------
c0    9650SE-8LPML 8       8        2       1        4       4       -        

//cdfs3> info c0

Unit  UnitType  Status         %Cmpl  Stripe  Size(GB)  Cache  AVerify  IgnECC
------------------------------------------------------------------------------
u0    RAID-5    OK             -      64K     1396.95   ON     OFF      OFF      
u1    RAID-5    DEGRADED       -      64K     1396.95   OFF    OFF      OFF      

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     OK               u0     465.76 GB   976773168     WD-WCANU1137212     
p1     OK               u0     465.76 GB   976773168     WD-WCANU1090078     
p2     OK               u0     465.76 GB   976773168     WD-WCANU1119743     
p3     OK               u0     465.76 GB   976773168     WD-WCANU1089924     
p4     OK               u1     465.76 GB   976773168     WD-WCANU1136981     
p5     OK               u1     465.76 GB   976773168     WD-WCANU1109927     
p6     DEVICE-ERROR     u1     465.76 GB   976773168     WD-WCANU1069353     
p7     OK               u1     465.76 GB   976773168     WD-WCANU1125288     

Took out disk p6 and replaced with a new one. Waited a minute or so and got:

//cdfs3> info c0

Unit  UnitType  Status         %Cmpl  Stripe  Size(GB)  Cache  AVerify  IgnECC
------------------------------------------------------------------------------
u0    RAID-5    OK             -      64K     1396.95   ON     OFF      OFF      
u1    RAID-5    DEGRADED       -      64K     1396.95   OFF    OFF      OFF      

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     OK               u0     465.76 GB   976773168     WD-WCANU1137212     
p1     OK               u0     465.76 GB   976773168     WD-WCANU1090078     
p2     OK               u0     465.76 GB   976773168     WD-WCANU1119743     
p3     OK               u0     465.76 GB   976773168     WD-WCANU1089924     
p4     OK               u1     465.76 GB   976773168     WD-WCANU1136981     
p5     OK               u1     465.76 GB   976773168     WD-WCANU1109927     
p6     OK               -      465.76 GB   976773168     WD-WCAPW5103756     
p7     OK               u1     465.76 GB   976773168     WD-WCANU1125288     

//cdfs3> /c0/u1 start rebuild disk=6
Sending rebuild start request to /c0/u1 on 1 disk(s) [6] ... Done.


//cdfs3> info c0

Unit  UnitType  Status         %Cmpl  Stripe  Size(GB)  Cache  AVerify  IgnECC
------------------------------------------------------------------------------
u0    RAID-5    OK             -      64K     1396.95   ON     OFF      OFF      
u1    RAID-5    REBUILDING     0      64K     1396.95   OFF    OFF      OFF      

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     OK               u0     465.76 GB   976773168     WD-WCANU1137212     
p1     OK               u0     465.76 GB   976773168     WD-WCANU1090078     
p2     OK               u0     465.76 GB   976773168     WD-WCANU1119743     
p3     OK               u0     465.76 GB   976773168     WD-WCANU1089924     
p4     OK               u1     465.76 GB   976773168     WD-WCANU1136981     
p5     OK               u1     465.76 GB   976773168     WD-WCANU1109927     
p6     DEGRADED         u1     465.76 GB   976773168     WD-WCAPW5103756     
p7     OK               u1     465.76 GB   976773168     WD-WCANU1125288     

Now just need to wait for the rebuild to finish.