Experiences with Smart Array E200i

TL;DR: Smart Array E200i should not be trusted; HP Array Configuration Utilities are horrible, always prefer Linux’s md.

About four years ago we purchased a ML350 G5 server with some SATA disks and a small BBWC, put them into RAID 10 with a spare. Everything went peachy until about a week ago when we had a full server lockup.

Post-lockup, there was nothing interesting in the logs (from cpqarrayd) or any other obvious cause. Next lockup came a few days later, now I noticed that a spare had been activated successfully; replacement disks were ordered and I started preparing a replacement server.

Once the replacement server was up and running I inserted a new drive into a spare bay which already had two fault leds burning for 2/5 original drives. I rebooted into offline HP Array Configuration Utility hoping to add the new drive as a spare to the array. During reboot, E200i decided to forgive the failed drives and had automatically began to recover — not sure what that meant at the time, but apparently the spare device was deactivated.

While E200i is doing anything you cannot reconfigure the array — nor can you for example add a new spare through the Array Configuration Utility CD, you have to do it over hpacucli.

In the end I was unable to add the new spare and the old spare was deactivated, hpacucli claimed that the array was all ok (logical drives OK, physical drives OK) but the array was “Ready for Rebuild“. Ready for Rebuild seemed odd as there was no clear sign of what should I do now; later I found from serverfault.com that replacing a drive was required, E200i had postponed rebuild because rebuild had failed previously. I pulled out the drive which first received red light and replaced it with a new one. Rebuild failed at between 98% and 100%, went back to “Ready for Rebuild“. cpqarrayd reported read error at the drive which also had had a red light; now I had more new drives so I replaced the faulty one as well. Yes, another rebuild.

During the last rebuild I took the faulty drives, plugged them into a PC and fired long offline SMART tests; zero errors, no SMART problems at all. (I’m guessing when I later do the same for the rest of the drives none show any errors.)

Going back to the server it now showed it was again ready for rebuild. No obvious reason had been reported for the rebuild failure. I was rather pissed at this point for reading that everything was OK, except that the array could use a rebuild. Apparently E200i had absolutely no idea which drives were ok and which were not. Array Diagnostics Utility showed errors for even the new drives, reported as a horrible SCSI command and a sense code table. All of the errors were related to hot-swapping them in.

Though it looked like that the drives which never had any red lights also had unrecoverable read errors, which for me sound like a reason to mark drive failed.

Later I got frustrated I pulled one of the never-failed drives from a all “OK” logical drive and watched the whole array unrecoverably fail.

So, does anyone know reliable 8-channel SATA PCI-E x4 HBA cards so that I could still continue using the server and some disks through Linux’s md software raid?

Advertisements

4 Comments

  1. Tony C
    Posted 2012-11-30 at 19:47 | Permalink | Reply

    Did you ever find a good replacement? I’m using the same card and am getting ready to migrate to ESXI from VMware server and it would be as good a time as any just to blow the whole thing away.

  2. Lars Dam
    Posted 2012-12-13 at 12:48 | Permalink | Reply

    We experience a similar problem with a E200i in a Blade 460c.
    Then the RAID is failing while the HD;s are OK, rebuild never seems to end correct, Then the controller gives up a lot of POST errors and is failing, then the grub loader on the RAID ARRAY is corrupted, Then the controller is hardware failing with cmd 0h, err00h, dlu error, then the HD report SMART errors.

    WHAT IS IT!?

    Anyway. We are now in state of updating all to latest firmware. But now the Offline ACU CLI does not recognize the E200i!?

  3. Posted 2013-01-20 at 15:12 | Permalink | Reply

    Hah, I’m lucky one since Debian 5.x (or so) doesn’t recognized RAID array so I stick with Mdadm and it’s running fine since few years now on SATA disks. I’m running at least 2 servers with E200i (and using Mdadm happily, one of drives failed with hung/stuck however Mdadm kicked it off array and boot up fine).
    However, for Your problem there is firmware update from 01/26/2009:
    http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?lang=pl&cc=pl&taskId=130&prodSeriesId=3201178&prodTypeId=15351&objectID=c01318999
    ALWAYS do full backup before proceeding! Good luck.

  4. Posted 2014-09-08 at 09:55 | Permalink | Reply

    Your style is very unique in comparison to other people I have read stuff from.

    I appreciate you for posting when you have the opportunity, Guess I’ll just bookmark this site.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: