Data prefetching is important for storage system optimization and access performance improvement. Traditional prefetchers work well for mining access patterns of sequential logical block address (LBA) but cannot handle complex non-sequential patterns that commonly exist in real-world applications. The state-of-the-art (SOTA) learning-based prefetchers cover more LBA accesses. However, they do not adequately consider the spatial interdependencies between LBA deltas, which leads to limited performance and robustness. This paper proposes a novel Stream-Graph neural network-based Data Prefetcher (SGDP). Specifically, SGDP models LBA delta streams using a weighted directed graph structure to represent interactive relations among LBA deltas and further extracts hybrid features by graph neural networks for data prefetching. We conduct extensive experiments on eight real-world datasets. Empirical results verify that SGDP outperforms the SOTA methods in terms of the hit ratio by 6.21%, the effective prefetching ratio by 7.00%, and speeds up inference time by 3.13X on average. Besides, we generalize SGDP to different variants by different stream constructions, further expanding its application scenarios and demonstrating its robustness. SGDP offers a novel data prefetching solution and has been verified in commercial hybrid storage systems in the experimental phase. Our codes and appendix are available at https://github.com/yyysjz1997/SGDP/.
翻译:数据预取对于存储系统的优化和访问性能的提高非常重要。传统的预取器可以很好地挖掘顺序逻辑块地址(LBA)的访问模式,但是无法处理现实应用中普遍存在的复杂非顺序模式。目前最先进的基于学习的预取器可以覆盖更多的LBA访问,但是它们没有充分考虑LBA增量之间的空间相互依存关系,导致了性能和鲁棒性的局限。为此,本文提出了一种新的基于流图神经网络的数据预取器(SGDP)。具体来说,SGDP使用加权有向图结构来建模LBA增量流,以代表LBA增量之间的交互关系,并通过图神经网络提取混合特征用于数据预取。我们在8个真实世界的数据集上进行了大量实验。实验结果验证了SGDP的命中率比现有最先进的方法提高了6.21%,有效预取率提高了7.00%,平均推理时间加速了3.13倍。此外,我们通过不同的流构造将SGDP泛化为不同的变体,进一步扩展了其应用场景,并展示了其鲁棒性。SGDP提供了一种新的数据预取解决方案,并在实验阶段验证了其在商业混合存储系统中的可行性。我们的代码和附录可在https://github.com/yyysjz1997/SGDP/获得。