Storage systems using Peer-to-Peer (P2P) architecture are an alternative to the traditional client-server systems. They offer better scalability and fault tolerance while at the same time eliminate the single point of failure. The nature of P2P storage systems (which consist of heterogeneous nodes) introduce however data placement challenges that create implementation trade-offs (e.g., between performance and scalability). Existing Kademlia-based DHT data placement method stores data at closest node, where the distance is measured by bit-wise XOR operation between data and a given node. This approach is highly scalable because it does not require global knowledge for placing data nor for the data retrieval. It does not however consider the heterogeneous performance of the nodes, which can result in imbalanced resource usage affecting the overall latency of the system. Other works implement criteria-based selection that addresses heterogeneity of nodes, however often cause subsequent data retrieval to require global knowledge of where the data stored. This paper introduces Residual Performance-based Data Placement (RPDP), a novel data placement method based on dynamic temporal residual performance of data nodes. RPDP places data to most appropriate selected nodes based on their throughput and latency with the aim to achieve lower overall latency by balancing data distribution with respect to the individual performance of nodes. RPDP relies on Kademlia-based DHT with modified data structure to allow data subsequently retrieved without the need of global knowledge. The experimental results indicate that RPDP reduces the overall latency of the baseline Kademlia-based P2P storage system (by 4.87%) and it also reduces the variance of latency among the nodes, with minimal impact to the data retrieval complexity.
翻译:基于点对点(P2P)架构的存储系统是传统客户端服务器系统的替代方案。它们提供更好的可扩展性和容错性,并消除了单点故障。P2P 存储系统的本质(由异构节点组成)引入了数据放置挑战,创建了实现权衡(例如,在性能和可扩展性之间)。现有的基于 Kademlia 的 DHT 数据放置方法将数据存储在最接近的节点上,其中距离由数据和给定节点之间的按位 XOR 运算测量。这种方法高度可扩展,因为它不需要全局知识来放置数据或检索数据。但是,它并不考虑节点的异构性能,可能导致不平衡的资源使用,影响系统的整体延迟。其他工作实现了基于标准的选择,以解决节点异构性,但通常需要随后的数据检索需要所存储数据的全局知识。本文介绍了基于剩余性能的数据放置(RPDP),这是一种基于数据节点的动态时间剩余性能的新型数据放置方法。RPDP根据它们的吞吐量和延迟将数据放置到最合适的选定节点,旨在通过平衡与节点的个体性能相关的数据分布来实现更低的整体延迟。RPDP依赖于基于Kademlia的DHT,具有修改的数据结构,使得数据随后可检索,而无需全局知识。实验结果表明,RPDP减少了基线Kademlia-based P2P存储系统的整体延迟(4.87%),并且它还减少了节点间延迟差异,对数据检索复杂度的影响很小。