My problem from the other day was rebuilding a raid when I got ecc errors on a different disk than the one being rebuilt. I did a rescan and the ecc errors went away, but the rebuild seemed to be stuck. I contacted 3ware, makers of our raid card and was told to do this:
//cdfs3> info c0 Unit UnitType Status %Cmpl Stripe Size(GB) Cache AVerify IgnECC ------------------------------------------------------------------------------ u0 RAID-5 OK - 64K 1396.95 ON OFF OFF u1 RAID-5 REBUILDING 89 64K 1396.95 OFF OFF OFF Port Status Unit Size Blocks Serial --------------------------------------------------------------- p0 OK u0 465.76 GB 976773168 WD-WCANU1137212 p1 OK u0 465.76 GB 976773168 WD-WCANU1090078 p2 OK u0 465.76 GB 976773168 WD-WCANU1119743 p3 OK u0 465.76 GB 976773168 WD-WCANU1089924 p4 OK u1 465.76 GB 976773168 WD-WCANU1136981 p5 OK u1 465.76 GB 976773168 WD-WCANU1109927 p6 DEGRADED u1 465.76 GB 976773168 WD-WCAPW5103756 p7 OK u1 465.76 GB 976773168 WD-WCANU1125288 //cdfs3> maint remove c0 p6 Exporting port /c0/p6 ... Done. //cdfs3> info c0 Unit UnitType Status %Cmpl Stripe Size(GB) Cache AVerify IgnECC ------------------------------------------------------------------------------ u0 RAID-5 OK - 64K 1396.95 ON OFF OFF u1 RAID-5 DEGRADED - 64K 1396.95 OFF OFF OFF Port Status Unit Size Blocks Serial --------------------------------------------------------------- p0 OK u0 465.76 GB 976773168 WD-WCANU1137212 p1 OK u0 465.76 GB 976773168 WD-WCANU1090078 p2 OK u0 465.76 GB 976773168 WD-WCANU1119743 p3 OK u0 465.76 GB 976773168 WD-WCANU1089924 p4 OK u1 465.76 GB 976773168 WD-WCANU1136981 p5 OK u1 465.76 GB 976773168 WD-WCANU1109927 p6 NOT-PRESENT - - - - p7 OK u1 465.76 GB 976773168 WD-WCANU1125288 //cdfs3> rescan Rescanning controller /c0 for units and drives ...Done. Found the following unit(s): [none]. Found the following drive(s): [/c0/p6]. //cdfs3> info c0 Unit UnitType Status %Cmpl Stripe Size(GB) Cache AVerify IgnECC ------------------------------------------------------------------------------ u0 RAID-5 OK - 64K 1396.95 ON OFF OFF u1 RAID-5 DEGRADED - 64K 1396.95 OFF OFF OFF Port Status Unit Size Blocks Serial --------------------------------------------------------------- p0 OK u0 465.76 GB 976773168 WD-WCANU1137212 p1 OK u0 465.76 GB 976773168 WD-WCANU1090078 p2 OK u0 465.76 GB 976773168 WD-WCANU1119743 p3 OK u0 465.76 GB 976773168 WD-WCANU1089924 p4 OK u1 465.76 GB 976773168 WD-WCANU1136981 p5 OK u1 465.76 GB 976773168 WD-WCANU1109927 p6 OK - 465.76 GB 976773168 WD-WCAPW5103756 p7 OK u1 465.76 GB 976773168 WD-WCANU1125288 //cdfs3> /c0/u1 start rebuild disk=6 ignoreecc Sending rebuild start request to /c0/u1 on 1 disk(s) [6] ... Done. //cdfs3> info c0 Unit UnitType Status %Cmpl Stripe Size(GB) Cache AVerify IgnECC ------------------------------------------------------------------------------ u0 RAID-5 OK - 64K 1396.95 ON OFF OFF u1 RAID-5 REBUILDING 0 64K 1396.95 OFF OFF ON Port Status Unit Size Blocks Serial --------------------------------------------------------------- p0 OK u0 465.76 GB 976773168 WD-WCANU1137212 p1 OK u0 465.76 GB 976773168 WD-WCANU1090078 p2 OK u0 465.76 GB 976773168 WD-WCANU1119743 p3 OK u0 465.76 GB 976773168 WD-WCANU1089924 p4 OK u1 465.76 GB 976773168 WD-WCANU1136981 p5 OK u1 465.76 GB 976773168 WD-WCANU1109927 p6 DEGRADED u1 465.76 GB 976773168 WD-WCAPW5103756 p7 OK u1 465.76 GB 976773168 WD-WCANU1125288
This seems to be working. I guess I’ll know in a few hours if everything is ok.
If this still doesn’t work, I’m supposed to send 3ware an error log.
./tw_CLI info c0 diag>error.txt