I got an email alert this morning letting me know about a disk failure in one of our raids.
//cdfs1> info c1 Unit UnitType Status %Cmpl Stripe Size(GB) Cache AVerify IgnECC ------------------------------------------------------------------------------ u0 RAID-5 DEGRADED - 64K 1629.74 OFF OFF OFF Port Status Unit Size Blocks Serial --------------------------------------------------------------- p0 OK u0 233.76 GB 490234752 WD-WCANK2922638 p1 OK u0 233.76 GB 490234752 WD-WCANK2785939 p2 OK u0 233.76 GB 490234752 WD-WCANK2785884 p3 DEVICE-ERROR u0 233.76 GB 490234752 WD-WCANK2941755 p4 OK u0 233.76 GB 490234752 WD-WCANK2922794 p5 OK u0 233.76 GB 490234752 WD-WCANY3726392 p6 OK u0 233.76 GB 490234752 WD-WCANK2785937 p7 OK u0 233.76 GB 490234752 WD-WCANK2941415
Here is a log of what I did:
//cdfs1> /c1 remove p3 Exporting port /c1/p3 ... Failed. Drive not degraded port=3 //cdfs1> info c1 Unit UnitType Status %Cmpl Stripe Size(GB) Cache AVerify IgnECC ------------------------------------------------------------------------------ u0 RAID-5 DEGRADED - 64K 1629.74 OFF OFF OFF Port Status Unit Size Blocks Serial --------------------------------------------------------------- p0 OK u0 233.76 GB 490234752 WD-WCANK2922638 p1 OK u0 233.76 GB 490234752 WD-WCANK2785939 p2 OK u0 233.76 GB 490234752 WD-WCANK2785884 p3 NOT-PRESENT - - - - p4 OK u0 233.76 GB 490234752 WD-WCANK2922794 p5 OK u0 233.76 GB 490234752 WD-WCANY3726392 p6 OK u0 233.76 GB 490234752 WD-WCANK2785937 p7 OK u0 233.76 GB 490234752 WD-WCANK2941415
I then replaced the disk with a working one.
//cdfs1> /c1 rescan Rescanning controller /c1 for units and drives ...Done. Found the following unit(s): [/c1/u0]. Found the following drive(s): [none]. //cdfs1> info c1 Unit UnitType Status %Cmpl Stripe Size(GB) Cache AVerify IgnECC ------------------------------------------------------------------------------ u0 RAID-5 INOPERABLE - 64K 1629.74 OFF OFF OFF Port Status Unit Size Blocks Serial --------------------------------------------------------------- p0 OK u0 233.76 GB 490234752 WD-WCANK2922638 p1 OK u0 233.76 GB 490234752 WD-WCANK2785939 p2 OK u0 233.76 GB 490234752 WD-WCANK2785884 p3 OK - 233.76 GB 490234752 WD-WCANY1569322 p4 OK u0 233.76 GB 490234752 WD-WCANK2922794 p5 OK - 233.76 GB 490234752 WD-WCANY3726392 p6 OK u0 233.76 GB 490234752 WD-WCANK2785937 p7 OK u0 233.76 GB 490234752 WD-WCANK2941415
That’s bad. Disk p5 disappeared as well. I tried rescanning a few times, but that didn’t find it. So I tried just rebuilding disk 3 anyway.
//cdfs1> /c1/u0 start rebuild disk=3 Sending rebuild start request to /c1/u0 on 1 disk(s) [3] ... Failed. (0x0B:0x0033): Unit busy
That didn’t work either. So I tried removing disk 3 and putting it back in.
//cdfs1> info c1 Unit UnitType Status %Cmpl Stripe Size(GB) Cache AVerify IgnECC ------------------------------------------------------------------------------ u0 RAID-5 INOPERABLE - 64K 1629.74 OFF OFF OFF Port Status Unit Size Blocks Serial --------------------------------------------------------------- p0 OK u0 233.76 GB 490234752 WD-WCANK2922638 p1 OK u0 233.76 GB 490234752 WD-WCANK2785939 p2 OK u0 233.76 GB 490234752 WD-WCANK2785884 p3 OK - 233.76 GB 490234752 WD-WCANY1569322 p4 OK u0 233.76 GB 490234752 WD-WCANK2922794 p5 OK - 233.76 GB 490234752 WD-WCANY3726392 p6 OK u0 233.76 GB 490234752 WD-WCANK2785937 p7 OK u0 233.76 GB 490234752 WD-WCANK2941415 //cdfs1> /c0 remove p3 <-----OOPS--This should have been /c1 remove p3 Exporting port /c0/p3 ... Done. //cdfs1> info c1 Unit UnitType Status %Cmpl Stripe Size(GB) Cache AVerify IgnECC ------------------------------------------------------------------------------ u0 RAID-5 INOPERABLE - 64K 1629.74 OFF OFF OFF Port Status Unit Size Blocks Serial --------------------------------------------------------------- p0 OK u0 233.76 GB 490234752 WD-WCANK2922638 p1 OK u0 233.76 GB 490234752 WD-WCANK2785939 p2 OK u0 233.76 GB 490234752 WD-WCANK2785884 p3 OK - 233.76 GB 490234752 WD-WCANY1569322 p4 OK u0 233.76 GB 490234752 WD-WCANK2922794 p5 OK - 233.76 GB 490234752 WD-WCANY3726392 p6 OK u0 233.76 GB 490234752 WD-WCANK2785937 p7 OK u0 233.76 GB 490234752 WD-WCANK2941415 //cdfs1> /c1 rescan Rescanning controller /c1 for units and drives ...Done. Found the following unit(s): [/c1/u0]. Found the following drive(s): [none]. //cdfs1> info c1 Unit UnitType Status %Cmpl Stripe Size(GB) Cache AVerify IgnECC ------------------------------------------------------------------------------ u0 RAID-5 INOPERABLE - 64K 1629.74 OFF OFF OFF Port Status Unit Size Blocks Serial --------------------------------------------------------------- p0 OK u0 233.76 GB 490234752 WD-WCANK2922638 p1 OK u0 233.76 GB 490234752 WD-WCANK2785939 p2 OK u0 233.76 GB 490234752 WD-WCANK2785884 p3 OK - 233.76 GB 490234752 WD-WCANY1569322 p4 OK u0 233.76 GB 490234752 WD-WCANK2922794 p5 OK - 233.76 GB 490234752 WD-WCANY3726392 p6 OK u0 233.76 GB 490234752 WD-WCANK2785937 p7 OK u0 233.76 GB 490234752 WD-WCANK2941415 //cdfs1> /c1 remove p3 Exporting port /c1/p3 ... Done. //cdfs1> info c1 Unit UnitType Status %Cmpl Stripe Size(GB) Cache AVerify IgnECC ------------------------------------------------------------------------------ u0 RAID-5 INOPERABLE - 64K 1629.74 OFF OFF OFF Port Status Unit Size Blocks Serial --------------------------------------------------------------- p0 OK u0 233.76 GB 490234752 WD-WCANK2922638 p1 OK u0 233.76 GB 490234752 WD-WCANK2785939 p2 OK u0 233.76 GB 490234752 WD-WCANK2785884 p3 NOT-PRESENT - - - - p4 OK u0 233.76 GB 490234752 WD-WCANK2922794 p5 OK - 233.76 GB 490234752 WD-WCANY3726392 p6 OK u0 233.76 GB 490234752 WD-WCANK2785937 p7 OK u0 233.76 GB 490234752 WD-WCANK2941415 //cdfs1> /c1 rescan Rescanning controller /c1 for units and drives ...Done. Found the following unit(s): [/c1/u0]. Found the following drive(s): [/c1/p3]. //cdfs1> info c1 Unit UnitType Status %Cmpl Stripe Size(GB) Cache AVerify IgnECC ------------------------------------------------------------------------------ u0 RAID-5 INOPERABLE - 64K 1629.74 OFF OFF OFF Port Status Unit Size Blocks Serial --------------------------------------------------------------- p0 OK u0 233.76 GB 490234752 WD-WCANK2922638 p1 OK u0 233.76 GB 490234752 WD-WCANK2785939 p2 OK u0 233.76 GB 490234752 WD-WCANK2785884 p3 OK - 233.76 GB 490234752 WD-WCANY1569322 p4 OK u0 233.76 GB 490234752 WD-WCANK2922794 p5 OK - 233.76 GB 490234752 WD-WCANY3726392 p6 OK u0 233.76 GB 490234752 WD-WCANK2785937 p7 OK u0 233.76 GB 490234752 WD-WCANK2941415
Nope, that didn’t work either. So I decided to remove disk 5 (but I never took it out of the case) and rescan.
//cdfs1> /c1 remove p5 Exporting port /c1/p5 ... Done. //cdfs1> /c1 rescan Rescanning controller /c1 for units and drives ...Done. Found the following unit(s): [/c1/u0]. Found the following drive(s): [none]. //cdfs1> info c1 Unit UnitType Status %Cmpl Stripe Size(GB) Cache AVerify IgnECC ------------------------------------------------------------------------------ u0 RAID-5 DEGRADED - 64K 1629.74 OFF OFF OFF Port Status Unit Size Blocks Serial --------------------------------------------------------------- p0 OK u0 233.76 GB 490234752 WD-WCANK2922638 p1 OK u0 233.76 GB 490234752 WD-WCANK2785939 p2 OK u0 233.76 GB 490234752 WD-WCANK2785884 p3 OK - 233.76 GB 490234752 WD-WCANY1569322 p4 OK u0 233.76 GB 490234752 WD-WCANK2922794 p5 OK u0 233.76 GB 490234752 WD-WCANY3726392 p6 OK u0 233.76 GB 490234752 WD-WCANK2785937 p7 OK u0 233.76 GB 490234752 WD-WCANK2941415
Ah, success. I don’t know why disk 5 got goofy all of a sudden, but I could now rebuild the new disk.
//cdfs1> /c1/u0 start rebuild disk=3 Sending rebuild start request to /c1/u0 on 1 disk(s) [3] ... Done. //cdfs1> info c1 Unit UnitType Status %Cmpl Stripe Size(GB) Cache AVerify IgnECC ------------------------------------------------------------------------------ u0 RAID-5 REBUILDING 0 64K 1629.74 OFF OFF OFF Port Status Unit Size Blocks Serial --------------------------------------------------------------- p0 OK u0 233.76 GB 490234752 WD-WCANK2922638 p1 OK u0 233.76 GB 490234752 WD-WCANK2785939 p2 OK u0 233.76 GB 490234752 WD-WCANK2785884 p3 DEGRADED u0 233.76 GB 490234752 WD-WCANY1569322 p4 OK u0 233.76 GB 490234752 WD-WCANK2922794 p5 OK u0 233.76 GB 490234752 WD-WCANY3726392 p6 OK u0 233.76 GB 490234752 WD-WCANK2785937 p7 OK u0 233.76 GB 490234752 WD-WCANK2941415 //cdfs1>
What was weird was that this computer has two raids. The errors that I got were all from raid c1. After it had been rebuilding a while, I went to check on it and found that disk 3 in raid c0 was now showing up as NOT-PRESENT because stupidly above, I had run /c0 remove p3 instead of /c1 remove p3. So, I rescanned c0 and rebuilt the drive on raid u0.