HEP System Administration

Another Raid Replacement Story

//pnn> info c1

Unit  UnitType  Status         %Cmpl  Stripe  Size(GB)  Cache  AVerify  IgnECC
------------------------------------------------------------------------------
u0    RAID-5    DEGRADED       -      64K     1629.74   OFF    OFF      OFF      

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     OK               u0     233.76 GB   490234752     WD-WCANY1850307     
p1     OK               u0     233.76 GB   490234752     WD-WCANY1790824     
p2     OK               u0     233.76 GB   490234752     WD-WCANY1851579     
p3     OK               u0     233.76 GB   490234752     WD-WCANY1789766     
p4     OK               u0     233.76 GB   490234752     WD-WCANY1796905     
p5     NOT-PRESENT      -      -           -             -
p6     OK               u0     233.76 GB   490234752     WD-WCANY1788952     
p7     OK               u0     233.76 GB   490234752     WD-WCANY1788819     

//pnn> /c1 remove p5
Exporting port /c1/p5 ... Failed.

(0x0B:0x002E): Port empty

Got the message above, which showed that p5 failed. Since it was already missing, according to the tw_cli software, the remove command failed. So, I put a different disk in (one that had been in another raid) and got the following:

//pnn> info c1

Unit  UnitType  Status         %Cmpl  Stripe  Size(GB)  Cache  AVerify  IgnECC
------------------------------------------------------------------------------
u0    RAID-5    DEGRADED       -      64K     1629.74   OFF    OFF      OFF      
u1    RAID-5    INOPERABLE     -      64K     1629.74   OFF    OFF      OFF      

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     OK               u0     233.76 GB   490234752     WD-WCANY1850307     
p1     OK               u0     233.76 GB   490234752     WD-WCANY1790824     
p2     OK               u0     233.76 GB   490234752     WD-WCANY1851579     
p3     OK               u0     233.76 GB   490234752     WD-WCANY1789766     
p4     OK               u0     233.76 GB   490234752     WD-WCANY1796905     
p5     OK               u1     233.76 GB   490234752     WD-WCANY1787889     
p6     OK               u0     233.76 GB   490234752     WD-WCANY1788952     
p7     OK               u0     233.76 GB   490234752     WD-WCANY1788819

Crazy. It now thought that p5 was part of another raid (u1) on this controller.

//pnn> info

Ctl   Model        Ports   Drives   Units   NotOpt   RRate   VRate   BBU
------------------------------------------------------------------------
c0    9550SX-8LP   8       8        1       0        4       4       -        
c1    9550SX-8LP   8       7        2       2        4       4       -

Yep, it now shows two bad units on c1. I tried rescanning and even removing the disk and reformatting it, but it didn’t matter. The only solution was to delete that second unit, which is always nerve-wracking, for fear that I’ll delete the raid I wanted to keep. So after quadruple-checking the command, I ran:

//pnn> maint deleteunit c1 u1
Deleting unit c1/u1 ...Done.


//pnn> info c1

Unit  UnitType  Status         %Cmpl  Stripe  Size(GB)  Cache  AVerify  IgnECC
------------------------------------------------------------------------------
u0    RAID-5    DEGRADED       -      64K     1629.74   OFF    OFF      OFF      

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     OK               u0     233.76 GB   490234752     WD-WCANY1850307     
p1     OK               u0     233.76 GB   490234752     WD-WCANY1790824     
p2     OK               u0     233.76 GB   490234752     WD-WCANY1851579     
p3     OK               u0     233.76 GB   490234752     WD-WCANY1789766     
p4     OK               u0     233.76 GB   490234752     WD-WCANY1796905     
p5     OK               -      233.76 GB   490234752     WD-WCANY1787889     
p6     OK               u0     233.76 GB   490234752     WD-WCANY1788952     
p7     OK               u0     233.76 GB   490234752     WD-WCANY1788819     

//pnn> /c1/u0 start rebuild disk=5
Sending rebuild start request to /c1/u0 on 1 disk(s) [5] ... Done.

That deleted the extra unit and let me start the rebuild on disk 5 of the degraded unit.

This entry was posted by Mary Heintz on May 6, 2009 at 3:02 pm under Machine Specific, Raid, Software. Both comments and pings are currently closed.

Another Raid Replacement Story

Pages

Recent Posts

Categories

Archives