Among these levels, RAID 5 and 6 have been two of the most popular ones in recent times, as they provide a combination of both performance and safety. Due to their various similarities, it can be confusing to figure out when it’s best to use RAID 5 vs RAID 6. As such, we’ll discuss what these two RAID levels exactly are, their main similarities and differences, and when to use either one in this article.
What is RAID 5?
As stated, different RAID levels focus on data protection and performance improvement to varying degrees. RAID 5 provides both of these through block-interleaved distributed parity. This means that striping occurs at the block level. The size of these blocks, also known as chunk size, is up to the user to set, but it typically ranges from 64KB – 1MB. Additionally, for each stripe, one chunk of parity data is written. These parity blocks are spread across the array instead of being stored on a dedicated parity disk. We’ll cover why RAID 5 handles parity like this further in the article, but ultimately, this results in one disk worth of space being reserved for parity data.
What is RAID 6?
RAID 6 is a lot like RAID 5, but it uses two distributed parity blocks across a stripe instead of one. This one detail changes everything from the level of fault tolerance provided by the array to the performance and usable storage. Writing parity twice makes the array much more reliable but by the same token, write performance also suffers twice the penalty. Read performance, though, much like RAID 5, is excellent.
RAID 5 Vs RAID 6 – Main Differences
RAID 5 and 6 mainly differ in the fact that RAID 6 uses two parity blocks per stripe, while RAID 5 only uses one. But as stated, this leads to a number of other differences as well, which we’ll cover in the following sections.
Fault Tolerance
The first thing that the parity block count impacts is the fault tolerance level. In a RAID 5 array, one block-sized chunk of parity data is written for every stripe. In the event of disk failure, the lost data can be recomputed using the parity data and the data on the other disks in the array. Essentially, this means that a RAID 5 array can handle one disk failure without any data loss. Usually, anyway. This fault tolerance was the reason why RAID 5 was very popular until the 2010s. These days though, RAID 5 is rarely used as its reliability is no longer up to par. This is due to the way most hardware RAID controllers handle rebuilds. If the controller encounters an Unrecoverable Read Error (URE) during the rebuild, it will typically mark the entire array as failed to prevent further data corruption. Unless you have backups or plan to recover data from individual disks, the data is lost. HDD sizes grew exponentially in the last two decades, but read/write speed improvements were much more moderate. Essentially, the size of arrays increased at much greater rates than data transfer speeds, which meant that rebuild times started to get very long. Depending on the setup, rebuilding the array after a disk fails could take from hours to days. Such rebuild times meant a higher chance of encountering UREs during the rebuild, which translates to a higher chance of the entire array failing. In recent years, URE occurrence rates in HDDs have dropped significantly thanks to technological improvements. Due to this, RAID 5 is still used here and there. But the general industry consensus is to still opt for RAID 6 or other levels, and for good reason. In RAID 6, parity data is written twice per stripe. This means a RAID 6 array can sustain up to two disk failures without data loss. This makes RAID 6 much more reliable and thus better suited for larger arrays with important data.
Write Performance
A RAID 5 array has to read the data, calculate the parity, write the data, and then the parity. Due to this, RAID 5 suffers a penalty on workloads involving writes. RAID 6 involves calculating and writing parity twice, which is great for reliability, but it also means that it suffers twice the overhead for writing operations. For smaller I/O sizes (typically 256 KB and under), RAID 5 and 6 have very comparable write performance. But with larger I/O sizes, RAID 5 is definitely superior.
Number of Disks
RAID 5 requires two disks for striping and one disk worth of space to store parity data. This means that a RAID 5 array requires 3 disk units at the minimum. RAID 6 is similar, but it requires a minimum of 4 disks because parity data occupies two disks worth of space.
Usable Storage
In a RAID 5 array, the usable storage can be calculated with (N – 1) x (Smallest disk size), where N is the number of disk units. For instance, we’ve shown a RAID 5 array with three 1 TB disks below. One disk worth of space is used to store parity data, and since the smallest disk size is 1 TB, the usable space comes out to 2 TB. It’s important to try to use same-size disks, as otherwise, the smallest disk would create a bottleneck which results in a lot of unusable space. The example below shows the same scenario, where the 500 GB disk has resulted in 1.5 TB being unusable. In a RAID 6 array, the usable storage is calculated with (N – 2) x (Smallest disk size). Once again, it’s important to use same-size disks to ensure there’s no unusable space in the array.
Parity Calculation
In RAID 5, an XOR operation is performed on each byte of data to calculate parity information in RAID 5. For instance, let’s say the first byte of data in a 4-disk array looks something like this: A1 – 11010101A2 – 10001100A3 – 10101100 If we perform an XOR operation on the first two strips (A1 and A2) and then do the same with the output and the third strip (A3), the output is the parity information (Ap). In this case, its value is 11110101. When any disk (for instance, Disk 1) fails, here’s what happens. First, A2 XOR A3 gives us the output 00100000. When we use this output in an XOR operation with Ap, we get 11010101 as a result, which is the lost data. 001000001111010111010101 This is basically how parity data is calculated and used to recompute lost data in RAID 5. RAID 6 is much more complex as it computes parity twice. Depending on the setup, this is implemented in various ways, such as dual check data computation (parity and Reed–Solomon), orthogonal dual parity check data, diagonal parity, etc.
RAID Controller
RAID 5 can be implemented through both hardware and software means. The former obviously involves the use of a dedicated hardware RAID controller. As RAID 5 requires parity computation, this is the recommended route. This is especially important in certain cases, like with a NAS, where the processor isn’t powerful enough to handle the calculations without creating a significant bottleneck. Although not ideal for performance reasons, RAID 5 can also be set up using software solutions. For instance, Windows allows you to pool your disks together using the storage spaces feature. You can also create a RAID 5 volume via Disk Management. RAID 6, on the other hand, requires a hardware RAID controller. This is because the polynomial calculations performed to compute the second parity layer are quite processor intensive.
Are RAID 5 and RAID6 Similar?
It should be evident at this point that while RAID 5 and 6 have some key differences, they’re also similar in many ways. For starters, unlike RAID 1, RAID 5 and 6 provide fault tolerance through parity instead of mirroring. Specifically, they use distributed parity, which is different from the dedicated parity disks used by RAID 2, 3, and 4. With distributed parity, you don’t have to worry about bottlenecks as with a single parity disk. Both RAID 5 and 6 have excellent read performance thanks to data striping. But by the same token, both of them also suffer penalties on write performance, albeit to varying degrees.
What’s Good About RAID 5?
RAID 5 offers a good mix of usable storage, data protection, and performance. You can also set it up with fewer disks, which makes it a budget-efficient option. If you want to think in terms of performance, RAID 5 is best suited for workloads involving majorly read operations such as email servers. As for fault tolerance, we’ve already covered how RAID 5 has grown less reliable over the years. It’s still fine for small-sized arrays, but with larger arrays, where there’s a higher chance of failed rebuilds, we wouldn’t recommend RAID 5.
When is RAID 6 Better?
RAID 6’s reliability does come at the cost of write performance and usable storage. However, this slight disparity is undoubtedly worth it when the data on the disks is important. RAID 6 isn’t the best for smaller arrays (e.g., 4 disks), as a significant portion of storage is lost to redundancy. If redundancy is required in small arrays, RAID 5 or something like RAID 10 would be better. Instead, RAID 6 is best suited for larger arrays where there’s a chance of losing much more data if the setup isn’t reliable.
Final Verdict – RAID 5 Vs RAID 6
RAID 5 isn’t completely unreliable, and it’s still usable for smaller arrays. But with really critical data, you’ll want to prioritize protection over minor performance differences, and that’s where RAID 6 takes the cake. Regardless of which RAID level you opt for, though, it’s important to understand that RAID isn’t a backup. RAID’s redundancy only protects against disk failure. Even a RAID 6 array can fail during rebuilds. If the data on the disks is important enough for you to use RAID 6 or other ‘reliable’ versions, then you must not take backups and patrol reads lightly, either. Finally, to recap, here are the key differences between RAID 5 vs RAID 6: