This paper investigates scheduling policies for file retrieval in linear storage devices, such as magnetic tapes. Tapes are the technology of choice for long-term storage in data centers due to their low cost per capacity, reliability, and data security. While scheduling problems associated with data retrieval in tapes are classical, existing works focus on more straightforward heuristic approaches due to limited computational times imposed by standard tape specifications. Our first contribution is a theoretical investigation of three standard policies, presenting their worst-case performance and special cases of practical relevance for which they are optimal. Next, we show that the problem is polynomially solvable via two interleaved recursive models, albeit with high computational complexity. We leverage our previous results to develop two new scheduling policies with constant-ratio performance and low computational cost. Finally, we investigate properties associated with the online variant of the problem, presenting a new constant-factor competitive algorithm. Our numerical analysis on synthetic and real-world tapes from an industry partner provides insights into dataset configurations where each policy is more effective, which is of relevance to data center managers. In particular, our new best-performing policy is practical for large datasets and significantly improves upon standard algorithms in the area.
翻译:本文调查了线性存储装置(如磁带)的文件检索时间安排政策。 磁带是选择在数据中心长期储存的技术,因为每个能力、可靠性和数据安全成本低。 与磁带数据检索有关的时间安排问题是古典的, 现有工作的重点是由于标准磁带规格规定的计算时间有限而采用更为直接的超时方法。 我们的第一个贡献是对三种标准政策进行理论调查, 展示其最差的情况性能和具有实际相关性的特例, 它们是最佳的。 其次, 我们显示, 问题在于通过两个相互分离的循环模型( 尽管计算复杂程度很高) 长期储存在数据中心的多元性。 我们利用我们以前的成果来制定两种新的时间安排政策, 其性能是恒定的, 计算成本低。 最后, 我们调查与问题在线变量相关的属性, 提出一个新的恒定因素竞争性算法。 我们对一个行业伙伴的合成磁带和真实世界磁带的数值分析, 提供了对数据集配置的洞察力, 其中每一种政策都更有效, 与数据中心管理者有关。 特别是, 我们的新的最佳执行政策在大型数据分析是实用的。