Distributed storage systems must store large amounts of data over long periods of time. To avoid data loss due to device failures, an $[n,k]$ erasure code is used to encode $k$ data symbols into a codeword of $n$ symbols that are stored across different devices. However, device failure rates change throughout the life of the data, and tuning $n$ and $k$ according to these changes has been shown to save significant storage space. Code conversion is the process of converting multiple codewords of an initial $[n^I,k^I]$ code into codewords of a final $[n^F,k^F]$ code that decode to the same set of data symbols. In this paper, we study conversion bandwidth, defined as the total amount of data transferred between nodes during conversion. In particular, we consider the case where the initial and final codes are MDS and a single initial codeword is split into several final codewords ($k^I=\lambda^F k^F$ for integer $\lambda^F \geq 2$), called the split regime. We derive lower bounds on the conversion bandwidth in the split regime and propose constructions that significantly reduce conversion bandwidth and are optimal for certain parameters.
翻译:分布式存储系统必须长期存储大量数据。为了避免因设备故障而丢失数据,使用 $[n,k]$(k) 取消代码将美元数据符号编码成一个由美元符号组成的编码字,该代码被存储在不同设备中。然而,在数据整个寿命期内,设备故障率会发生变化,并根据这些变化调用美元和美元美元,以节省重要的存储空间。代码转换是将初始 $[n'I,k]$($)的多个编码转换成最终 $[n'F,k]F$($)代码的编码,该代码解译为同一数据集的编码。在本文件中,我们研究转换带宽,定义为在转换期间节点之间传输的数据总量。特别是,我们考虑了初始代码和最终代码为MDS,单一初始编码被拆成几个最后编码($k ⁇ I ⁇ lambda ⁇ F)的编码。 代码转换为最终编码的过程是将一个$($\lambda_F)$($\geq)2$($)代码转换为代码的过程。我们研究转换系统时,在转换时,在转换时将某些最优化的系统上选择了最差的系统。我们选择了某些系统。