The concept of memory disaggregation has recently been gaining traction in research. With memory disaggregation, data center compute nodes can directly access memory on adjacent nodes and are therefore able to overcome local memory restrictions, introducing a new data management paradigm for distributed computing. This paper proposes and demonstrates a memory disaggregated in-memory object store framework for big data applications by leveraging the newly introduced ThymesisFlow memory disaggregation system. The framework extends the functionality of the pre-existing Apache Arrow Plasma object store framework to distributed systems by enabling clients to easily and efficiently produce and consume data objects across multiple compute nodes. This allows big data applications to increasingly leverage parallel processing at reduced development costs. In addition, the paper includes latency and throughput measurements that indicate only a modest performance penalty is incurred for remote disaggregated memory access as opposed to local (~6.5 vs ~5.75 GiB/s). The results can be used to guide the design of future systems that leverage memory disaggregation as well as the newly presented framework. This work is open-source and publicly accessible at https://doi.org/10.5281/zenodo.6368998.
翻译:内存分解的概念最近一直在得到研究的牵引力。随着记忆分解,数据中心计算节点可以直接访问邻近节点上的内存,从而能够克服当地记忆限制,为分布式计算采用新的数据管理模式。本文件建议并展示一个用于大数据应用的内存分解存储器框架,利用新引入的TymesisFlow内存分解系统。该框架将原有的阿帕奇箭 Plasma 对象储存框架的功能扩大到分布式系统,使客户能够方便和高效地在多个计算节点上制作和消耗数据对象。这使得大数据应用程序能够以较低的开发成本来日益利用平行处理。此外,该文件还包含延时和吞量测量,表明对远程分解内存存存访问与本地(~6.5 v ~5.75 GIB/s)相比,只实施微量的绩效处罚。其结果可用于指导未来系统的设计,从而利用记忆分解法和新推出的框架。这项工作在https://doi.org/10.5281/zeno.63698上公开查阅。