I got an email alert this morning letting me know about a disk failure in one of our raids.

//cdfs1> info c1

Unit  UnitType  Status         %Cmpl  Stripe  Size(GB)  Cache  AVerify  IgnECC
------------------------------------------------------------------------------
u0    RAID-5    DEGRADED       -      64K     1629.74   OFF    OFF      OFF      

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     OK               u0     233.76 GB   490234752     WD-WCANK2922638     
p1     OK               u0     233.76 GB   490234752     WD-WCANK2785939     
p2     OK               u0     233.76 GB   490234752     WD-WCANK2785884     
p3     DEVICE-ERROR     u0     233.76 GB   490234752     WD-WCANK2941755     
p4     OK               u0     233.76 GB   490234752     WD-WCANK2922794     
p5     OK               u0     233.76 GB   490234752     WD-WCANY3726392     
p6     OK               u0     233.76 GB   490234752     WD-WCANK2785937     
p7     OK               u0     233.76 GB   490234752     WD-WCANK2941415     

Here is a log of what I did:

//cdfs1> /c1 remove p3
Exporting port /c1/p3 ... Failed.

Drive not degraded port=3 
//cdfs1> info c1

Unit  UnitType  Status         %Cmpl  Stripe  Size(GB)  Cache  AVerify  IgnECC
------------------------------------------------------------------------------
u0    RAID-5    DEGRADED       -      64K     1629.74   OFF    OFF      OFF      

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     OK               u0     233.76 GB   490234752     WD-WCANK2922638     
p1     OK               u0     233.76 GB   490234752     WD-WCANK2785939     
p2     OK               u0     233.76 GB   490234752     WD-WCANK2785884     
p3     NOT-PRESENT      -      -           -             -
p4     OK               u0     233.76 GB   490234752     WD-WCANK2922794     
p5     OK               u0     233.76 GB   490234752     WD-WCANY3726392     
p6     OK               u0     233.76 GB   490234752     WD-WCANK2785937     
p7     OK               u0     233.76 GB   490234752     WD-WCANK2941415     

I then replaced the disk with a working one.

//cdfs1> /c1 rescan
Rescanning controller /c1 for units and drives ...Done.
Found the following unit(s): [/c1/u0].
Found the following drive(s): [none].

//cdfs1> info c1

Unit  UnitType  Status         %Cmpl  Stripe  Size(GB)  Cache  AVerify  IgnECC
------------------------------------------------------------------------------
u0    RAID-5    INOPERABLE     -      64K     1629.74   OFF    OFF      OFF      

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     OK               u0     233.76 GB   490234752     WD-WCANK2922638     
p1     OK               u0     233.76 GB   490234752     WD-WCANK2785939     
p2     OK               u0     233.76 GB   490234752     WD-WCANK2785884     
p3     OK               -      233.76 GB   490234752     WD-WCANY1569322     
p4     OK               u0     233.76 GB   490234752     WD-WCANK2922794     
p5     OK               -      233.76 GB   490234752     WD-WCANY3726392     
p6     OK               u0     233.76 GB   490234752     WD-WCANK2785937     
p7     OK               u0     233.76 GB   490234752     WD-WCANK2941415     

That’s bad. Disk p5 disappeared as well. I tried rescanning a few times, but that didn’t find it. So I tried just rebuilding disk 3 anyway.

//cdfs1> /c1/u0 start rebuild disk=3
Sending rebuild start request to /c1/u0 on 1 disk(s) [3] ... Failed.

(0x0B:0x0033): Unit busy

That didn’t work either. So I tried removing disk 3 and putting it back in.

//cdfs1> info c1

Unit  UnitType  Status         %Cmpl  Stripe  Size(GB)  Cache  AVerify  IgnECC
------------------------------------------------------------------------------
u0    RAID-5    INOPERABLE     -      64K     1629.74   OFF    OFF      OFF      

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     OK               u0     233.76 GB   490234752     WD-WCANK2922638     
p1     OK               u0     233.76 GB   490234752     WD-WCANK2785939     
p2     OK               u0     233.76 GB   490234752     WD-WCANK2785884     
p3     OK               -      233.76 GB   490234752     WD-WCANY1569322     
p4     OK               u0     233.76 GB   490234752     WD-WCANK2922794     
p5     OK               -      233.76 GB   490234752     WD-WCANY3726392     
p6     OK               u0     233.76 GB   490234752     WD-WCANK2785937     
p7     OK               u0     233.76 GB   490234752     WD-WCANK2941415     

//cdfs1> /c0 remove p3   <-----OOPS--This should have been /c1 remove p3
Exporting port /c0/p3 ... Done.


//cdfs1> info c1

Unit  UnitType  Status         %Cmpl  Stripe  Size(GB)  Cache  AVerify  IgnECC
------------------------------------------------------------------------------
u0    RAID-5    INOPERABLE     -      64K     1629.74   OFF    OFF      OFF      

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     OK               u0     233.76 GB   490234752     WD-WCANK2922638     
p1     OK               u0     233.76 GB   490234752     WD-WCANK2785939     
p2     OK               u0     233.76 GB   490234752     WD-WCANK2785884     
p3     OK               -      233.76 GB   490234752     WD-WCANY1569322     
p4     OK               u0     233.76 GB   490234752     WD-WCANK2922794     
p5     OK               -      233.76 GB   490234752     WD-WCANY3726392     
p6     OK               u0     233.76 GB   490234752     WD-WCANK2785937     
p7     OK               u0     233.76 GB   490234752     WD-WCANK2941415     

//cdfs1> /c1 rescan
Rescanning controller /c1 for units and drives ...Done.
Found the following unit(s): [/c1/u0].
Found the following drive(s): [none].

//cdfs1> info c1

Unit  UnitType  Status         %Cmpl  Stripe  Size(GB)  Cache  AVerify  IgnECC
------------------------------------------------------------------------------
u0    RAID-5    INOPERABLE     -      64K     1629.74   OFF    OFF      OFF      

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     OK               u0     233.76 GB   490234752     WD-WCANK2922638     
p1     OK               u0     233.76 GB   490234752     WD-WCANK2785939     
p2     OK               u0     233.76 GB   490234752     WD-WCANK2785884     
p3     OK               -      233.76 GB   490234752     WD-WCANY1569322     
p4     OK               u0     233.76 GB   490234752     WD-WCANK2922794     
p5     OK               -      233.76 GB   490234752     WD-WCANY3726392     
p6     OK               u0     233.76 GB   490234752     WD-WCANK2785937     
p7     OK               u0     233.76 GB   490234752     WD-WCANK2941415     

//cdfs1> /c1 remove p3
Exporting port /c1/p3 ... Done.


//cdfs1> info c1

Unit  UnitType  Status         %Cmpl  Stripe  Size(GB)  Cache  AVerify  IgnECC
------------------------------------------------------------------------------
u0    RAID-5    INOPERABLE     -      64K     1629.74   OFF    OFF      OFF      

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     OK               u0     233.76 GB   490234752     WD-WCANK2922638     
p1     OK               u0     233.76 GB   490234752     WD-WCANK2785939     
p2     OK               u0     233.76 GB   490234752     WD-WCANK2785884     
p3     NOT-PRESENT      -      -           -             -
p4     OK               u0     233.76 GB   490234752     WD-WCANK2922794     
p5     OK               -      233.76 GB   490234752     WD-WCANY3726392     
p6     OK               u0     233.76 GB   490234752     WD-WCANK2785937     
p7     OK               u0     233.76 GB   490234752     WD-WCANK2941415     

//cdfs1> /c1 rescan
Rescanning controller /c1 for units and drives ...Done.
Found the following unit(s): [/c1/u0].
Found the following drive(s): [/c1/p3].

//cdfs1> info c1

Unit  UnitType  Status         %Cmpl  Stripe  Size(GB)  Cache  AVerify  IgnECC
------------------------------------------------------------------------------
u0    RAID-5    INOPERABLE     -      64K     1629.74   OFF    OFF      OFF      

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     OK               u0     233.76 GB   490234752     WD-WCANK2922638     
p1     OK               u0     233.76 GB   490234752     WD-WCANK2785939     
p2     OK               u0     233.76 GB   490234752     WD-WCANK2785884     
p3     OK               -      233.76 GB   490234752     WD-WCANY1569322     
p4     OK               u0     233.76 GB   490234752     WD-WCANK2922794     
p5     OK               -      233.76 GB   490234752     WD-WCANY3726392     
p6     OK               u0     233.76 GB   490234752     WD-WCANK2785937     
p7     OK               u0     233.76 GB   490234752     WD-WCANK2941415     

Nope, that didn’t work either. So I decided to remove disk 5 (but I never took it out of the case) and rescan.

//cdfs1> /c1 remove p5
Exporting port /c1/p5 ... Done.


//cdfs1> /c1 rescan
Rescanning controller /c1 for units and drives ...Done.
Found the following unit(s): [/c1/u0].
Found the following drive(s): [none].

//cdfs1> info c1

Unit  UnitType  Status         %Cmpl  Stripe  Size(GB)  Cache  AVerify  IgnECC
------------------------------------------------------------------------------
u0    RAID-5    DEGRADED       -      64K     1629.74   OFF    OFF      OFF      

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     OK               u0     233.76 GB   490234752     WD-WCANK2922638     
p1     OK               u0     233.76 GB   490234752     WD-WCANK2785939     
p2     OK               u0     233.76 GB   490234752     WD-WCANK2785884     
p3     OK               -      233.76 GB   490234752     WD-WCANY1569322     
p4     OK               u0     233.76 GB   490234752     WD-WCANK2922794     
p5     OK               u0     233.76 GB   490234752     WD-WCANY3726392     
p6     OK               u0     233.76 GB   490234752     WD-WCANK2785937     
p7     OK               u0     233.76 GB   490234752     WD-WCANK2941415     

Ah, success. I don’t know why disk 5 got goofy all of a sudden, but I could now rebuild the new disk.

//cdfs1> /c1/u0 start rebuild disk=3
Sending rebuild start request to /c1/u0 on 1 disk(s) [3] ... Done.


//cdfs1> info c1

Unit  UnitType  Status         %Cmpl  Stripe  Size(GB)  Cache  AVerify  IgnECC
------------------------------------------------------------------------------
u0    RAID-5    REBUILDING     0      64K     1629.74   OFF    OFF      OFF      

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     OK               u0     233.76 GB   490234752     WD-WCANK2922638     
p1     OK               u0     233.76 GB   490234752     WD-WCANK2785939     
p2     OK               u0     233.76 GB   490234752     WD-WCANK2785884     
p3     DEGRADED         u0     233.76 GB   490234752     WD-WCANY1569322     
p4     OK               u0     233.76 GB   490234752     WD-WCANK2922794     
p5     OK               u0     233.76 GB   490234752     WD-WCANY3726392     
p6     OK               u0     233.76 GB   490234752     WD-WCANK2785937     
p7     OK               u0     233.76 GB   490234752     WD-WCANK2941415     

//cdfs1> 

What was weird was that this computer has two raids. The errors that I got were all from raid c1. After it had been rebuilding a while, I went to check on it and found that disk 3 in raid c0 was now showing up as NOT-PRESENT because stupidly above, I had run /c0 remove p3 instead of /c1 remove p3. So, I rescanned c0 and rebuilt the drive on raid u0.