Large disk arrays are organized into storage nodes -- SNs or bricks with their own cashed RAID controller for multiple disks. Erasure coding at SN level is attained via parity or Reed-Solomon codes. Hierarchical RAID -- HRAID -- provides an additional level of coding across SNs, e.g., check strips P, Q at intra-SN level and R at the inter-SN level. Failed disks and SNs are not replaced and rebuild is accomplished by restriping, e.g., overwriting P and Q for disk failures and R for an SN failure. For a given total redundancy level we use an approximate reliability analysis method and Monte-Carlo simulation to explore the better apportionment of check blocks for intra- vs inter-SN redundancy. Our study indicates that a higher MTTDL -- Mean-Time-to-Data-Loss -- is attained by associating higher reliability at intra-SN level rather than inter-SN level, which is contrary to that of an IBM study.
翻译:大型磁盘阵列被组织成存储节点 -- -- 在多盘盘上用现金的 RAID 控制器将软盘或砖制成。 SN 级的刻度编码是通过对等或Reed-Solomon 代码实现的。 高级RAID -- -- HRAID -- -- 在SN 级内部提供更高水平的编码,例如在SN 级和R级之间提供检查条纹P、Q,在SN 级之间提供新的编码。 失败的磁盘和SN没有被替换,而重建则通过补装补补补补完成,例如,在磁盘故障时超写P和Q,在SN 级之间提供R。 对于给定的完全冗余级别,我们使用近似可靠性分析法和Monte-Carlo模拟来探索更好地分配SN内或SN 内冗余的检查区块。我们的研究表明,通过在SNN 级上而不是SNY 级之间将更高的可靠性挂钩,这与IBS 研究相反。