Erasure codes have been widely considered a promising solution to enhance data reliability at low storage costs. However, in modern geo-distributed storage systems, erasure codes may incur high data access latency as they require data retrieval from multiple remote storage nodes. This hinders the extensive application of erasure codes to data-intensive applications. This paper proposes novel caching schemes to achieve low latency in distributed coded storage systems. Experiments based on Amazon Simple Storage Service confirm the positive correlation between the latency and the physical distance of data retrieval. The average data access latency is used the performance metric to quantify the benefits of caching. Assuming that the future data popularity and network latency information is available, an offline caching scheme is proposed to find the optimal caching solution. Guided by the optimal scheme, an online caching scheme is proposed according to the measured data popularity and network latency information in real time. Experiment results demonstrate that the online scheme can approximate the optimal scheme well with dramatically reduced computation complexity.
翻译:在现代地理分布式储存系统中,去除代码可能带来高数据存取延迟,因为它们需要从多个远程储存节点检索数据。这妨碍了数据密集应用中广泛应用去除代码。本文件提出了实现分布式编码存储系统低延迟的新缓存计划。基于亚马逊简单存储服务的实验证实了数据检索的延时和物理距离之间的正相关关系。平均数据存取延迟是用来量化缓存的好处的性能衡量标准。假设未来的数据受欢迎性和网络延缓信息已经存在,则提议一项离线缓存计划,以找到最佳的缓存解决方案。在最佳计划的指导下,根据计量的数据受欢迎程度和实时网络延缓存信息,提出了在线缓存计划。实验结果显示,在线计划可以与最佳计划相近,而计算复杂性则大大降低。