RamSan 500 Pictures / Failure
We had two DRAM cache modules fail in our RamSan 500 recently. The actual error was:
Uncorrected ECC event detected on boards 4,6
The DRAM cache fronts the Flash storage drives in the unit. Basically, loosing these two boards took down the unit completely. To get the unit back up, boards 4 and 6 needed to be removed. RamSan actually had me remove 4 of the boards instead of just the two that were bad. Guessing it has something to do with not being able to have slots not filled between active modules.
Doing so took the DRAM cache from 64 gigs down to 32. All data was lost on the FLASH RAID array, making us have to restore the Oracle database from backup. After the cards were removed, a low level format of the RAID array was done, the unit was operational again. Below are pictures from within the RamSan 500 for all the storage people out there.
- RamSan Percentage Error. What?
- Internal Batteries and Power Module
- DRAM Cache (left) and Fiber Cards (right)
- Internal Zoomed Out
- Cache With Cards Removed
- Cache DRAM module
Best part of this is we had racked two RamSan 630′s that were being configured as a mirrored pair. The plan was to implement them in two weeks. When the 500 failed, they were implemented that night.







Any idea on the root cause of failure? The data loss is concerning.
steve said this on March 9, 2011 at 1:53 pm
Too bad there are no storage vendors that leverage flash drives in their arrays. Maybe one day
Kaiser Sose said this on March 9, 2011 at 2:49 pm
Of course EMC has EFD support in their CX series arrays, but could come nowhere close to the 250,000 IOPS range of the RamSan devices (within reasonable price range). Now that the VNX series are out, we definitely will be investigating their offering more since bus speed has been improved.
Kevin Goodman said this on March 9, 2011 at 5:09 pm