This paper investigates sequencing policies for file reading requests in linear storage devices, such as magnetic tapes. Tapes are the technology of choice for long-term storage in data centers due to their low cost and reliability. However, their physical structure imposes challenges to data retrieval operations reflected in classic optimization and operations research problems. In this work, we provide a theoretical and numerical performance analysis of low-complexity algorithms under deterministic, stochastic, and online settings, which are key in practice due to their interpretability and the large scale of existing data services. In the deterministic setting, we show that traditional policies, such as first-in first-out (FIFO), have arbitrarily poor performance, and we develop and investigate new constant-factor approximations. For the stochastic setting, we present a fully polynomial-time approximation scheme that weighs files based on their access frequencies. Finally, we investigate an online extension and propose a new algorithm with constant competitive-factor guarantees. Our numerical analysis on synthetic and real-world data suggest that the proposed algorithms may significantly outperform policies currently adopted in practice with respect to average reading times.
翻译:本文对磁带等线性存储装置的文件阅读请求的排序政策进行了调查。磁带是数据中心长期存储的首选技术,因为成本和可靠性低。然而,磁带的物理结构对典型优化和业务研究中所反映的数据检索操作提出了挑战。在这项工作中,我们对确定性、随机性和在线设置下的低复杂算法进行了理论和数字性业绩分析,这些算法因其可解释性和现有数据服务规模庞大而在实践中十分关键。在确定性环境中,我们显示传统政策,如头等先出(FIFFO)的性能任意地差,我们开发和调查新的常态近似值。对于随机设置,我们提出了一个完全的多数值时间近似方案,根据文件的存取频率对文件进行权衡。最后,我们调查了在线扩展,并提出了具有持续竞争因素保证的新算法。我们对合成数据和现实世界数据进行的数字分析表明,拟议的算法可能大大超出目前平均阅读时采用的政策。