Memory disaggregation has attracted great attention recently because of its benefits in efficient memory utilization and ease of management. So far, memory disaggregation research has all taken one of two approaches, building/emulating memory nodes with either regular servers or raw memory devices with no processing power. The former incurs higher monetary cost and face tail latency and scalability limitations, while the latter introduce performance, security, and management problems. Server-based memory nodes and memory nodes with no processing power are two extreme approaches. We seek a sweet spot in the middle by proposing a hardware-based memory disaggregation solution that has the right amount of processing power at memory nodes. Furthermore, we take a clean-slate approach by starting from the requirements of memory disaggregation and designing a memory-disaggregation-native system. We propose a hardware-based disaggregated memory system, Clio, that virtualizes and manages disaggregated memory at the memory node. Clio includes a new hardware-based virtual memory system, a customized network system, and a framework for computation offloading. In building Clio, we not only co-design OS functionalities, hardware architecture, and the network system, but also co-design the compute node and memory node. We prototyped Clio's memory node with FPGA and implemented its client-node functionalities in a user-space library. Clio achieves 100 Gbps throughput and an end-to-end latency of 2.5 us at median and 3.2 us at the 99th percentile. Clio scales much better and has orders of magnitude lower tail latency than RDMA, and it has 1.1x to 3.4x energy saving compared to CPU-based and SmartNIC-based disaggregated memory systems and is 2.7x faster than software-based SmartNIC solutions.
翻译:内存分解最近引起极大关注, 因为它对有效记忆利用和易于管理的好处。 到目前为止, 内存分解研究已经采取了两种方法之一, 即用正常服务器或没有处理电的原始内存装置来建立/ 模拟内存节点。 前一种产生更高的货币成本, 并面临尾部延缩和缩缩缩限制, 而后一种则引入了性能、 安全和管理问题。 基于服务器的内存节点和内存节点是两种极端的办法。 我们通过提出基于硬件的内存分解办法, 寻求中间的甜蜜点。 我们寻求一种基于硬件的内存分解办法, 在内存节点上处理的电量是正确的。 此外, 我们采用清洁式的内存节点方法, 从内存分解要求开始, 设计一个内存分解的内存系统, Clio, 在内存节点中, C- 内存流流流流流流系统比 C- 内存系统要快得多, 网络内存系统比 C- 内存和内存系统要多。