|
By Robin Harris
news.zdnet.com
February 22, 2010
Three years ago I warned that RAID 5 would stop working in 2009. Sure enough, no enterprise storage vendor now recommends RAID 5.
They now recommend RAID 6, which protects against two drive failures. But in 2019 even RAID 6 won’t protect your data. Here’s why.
The power of power functions I said that even RAID 6 would have a limited lifetime.
Late last year Sun engineer, DTrace co-inventor, flash architect and ZFS developer Adam Leventhal, did the heavy lifting to analyze the expected life of RAID 6 as a viable data protection strategy. He lays it out in the Association of Computing Machinery’s Queue magazine, in the article Triple-Parity RAID and Beyond, which I draw from for much of this post.
The good news: Mr. Leventhal found that RAID 6 protection levels will be as good as RAID 5 was until 2019.
The bad news: Mr. Leventhal assumed that drives are more reliable than they really are. The lead time may be shorter unless drive vendors get their game on. More good news: one of them already has - and I’ll tell you who that is.
The crux of the problem
RAID arrays are groups of disks with special logic in the controller that stores the data with extra bits so the loss of 1 or 2 disks won’t destroy the information (I’m speaking of RAID levels 5 and 6, not 0, 1 or 10). The extra bits - parity - enable the lost data to be reconstructed by reading all the data off the remaining disks and writing to a replacement disk.
The problem with RAID 5 is that disk drives have read errors. SATA drives are commonly specified with an unrecoverable read error rate (URE) of 10^14. Which means that once every 200,000,000 sectors, the disk will not be able to read a sector.
2 hundred million sectors is about 12 terabytes. When a drive fails in a 7 drive, 2 TB SATA disk RAID 5, you’ll have 6 remaining 2 TB drives. As the RAID controller is reconstructing the data it is very likely it will see an URE. At that point the RAID reconstruction stops.
Here’s the math:
1 - 1 /(2.4 x 10^10)) ^ (2.3 x 10^10) = 0.3835
You have a 62% chance of data loss due to an uncorrectable read error on a 7 drive RAID with one failed disk, assuming a 10^14 read error rate and ~23 billion sectors in 12 TB. Feeling lucky?
RAID 6
RAID 6 tackles this problem by creating enough parity data to handle 2 failures. You can lose a disk and have a URE and still reconstruct your data.
Some complain about the increased overhead of 2 parity disks. But doubling the size of RAID 5 stripe gives you dual disk protection with the same capacity. Instead of a 7 drive RAID 5 stripe with 1 parity disk, build a 14 drive stripe with 2 parity disks: no more capacity for parity and protection against 2 failures.
http://blogs.zdnet.com/storage/?p=805
|