In a modern distributed storage system, storage nodes are organized in racks, and the cross-rack communication dominates the system bandwidth. In We study the rack-aware storage system where all storage nodes are organized in racks and within each rack the nodes can communicate freely without taxing the system bandwidth. Rack-aware regenerating codes (RRCs) were proposed for minimizing the repair bandwidth for single erasures. In the initial setting of RRCs, the repair of a single node requires the participation of all the remaining nodes in the rack containing the failed node as well as a large number of helper racks containing no failures. Consequently, the repair may be infeasible in front of multiple node failures. In this work, a relaxed repair model that can tolerate multiple node failures by simultaneously reducing the intra-rack connections and cross-rack connections is proposed. A tradeoff between the storage and repair bandwidth under the relaxed repair model is derived, and parameters of the two extreme points on the tradeoff curve are characterized for the minimum storage and minimum bandwidth respectively. Moreover, two codes corresponding to the extreme points are explicitly constructed over the fields of size comparable to the code length and with the lowest sub-packetization. Finally, for the convenience of practical use, systematic encoding processes for the two codes are also established.
翻译:在现代分布式储存系统中,存储节点在架子上组织,而跨架通信则支配着系统带宽。我们在研究架子上的所有存储节点都组织在架子上,各节点可在每个架子内自由交流,而不必对系统带宽征税。提议了拉克觉再生代码,以尽量减少单个降压系统的修理带宽。在最初设置RRC时,对单个节点的修理需要包含失败节点的所有剩余节点的参与,以及包含不失败节点的大批辅助体架的参与。因此,在多节点故障之前,修复可能不可行。在这项工作中,提议了一个宽松的修复模式,通过同时减少架子连接和交叉架子连接,可以容忍多重节点故障。在宽松修理模式下的存储与修理带宽之间的权衡取出一个权衡点,交易曲线上两个极端点的参数分别用于最小的储存和最小带宽。此外,两个与最临界点对应的系统化代码也明确用于最接近的版本的版本。最后,两个与最接近的版本的版本的代码是两个与最接近的版本。