Resource disaggregation offers a cost effective solution to resource scaling, utilization, and failure-handling in data centers by physically separating hardware devices in a server. Servers are architected as pools of processor, memory, and storage devices, organized as independent failure-isolated components interconnected by a high-bandwidth network. A critical challenge, however, is the high performance penalty of accessing data from a remote memory module over the network. Addressing this challenge is difficult as disaggregated systems have high runtime variability in network latencies/bandwidth, and page migration can significantly delay critical path cache line accesses in other pages. This paper introduces DaeMon, the first software-transparent and robust mechanism to significantly alleviate data movement overheads in fully disaggregated systems. First, to enable scalability to multiple hardware components in the system, we enhance each compute and memory unit with specialized engines that transparently handle data migrations. Second, to achieve high performance and provide robustness across various network, architecture and application characteristics, we implement a synergistic approach of bandwidth partitioning, link compression, decoupled data movement of multiple granularities, and adaptive granularity selection in data movements. We evaluate DaeMon in a wide variety of workloads at different network and architecture configurations using a state-of-the-art accurate simulator and demonstrate that DaeMon significantly improves system performance and data access costs over the widely-adopted approach of moving data at page granularity.
翻译:资源分类为数据中心的资源规模、利用和故障处理提供了一种成本效益高的解决办法,在服务器上将硬件设备物理分离,从而在数据中心实现资源规模的扩大、利用和故障处理。服务器被设计成一个处理器、内存和存储装置的集合体,由高带宽网络连接,作为独立的故障孤立组件。然而,一个严峻的挑战是,从远程存储模块访问数据时,使用网络远程存储模块的性能处罚很高。由于分解系统在网络延缓/带宽方面具有高度的运行时间差异,因此难以应对这一挑战,而页面迁移可大大推迟其他页面的关键路径缓存线访问。本文介绍了大蒙,这是第一个在完全分解的系统中大幅减缓数据移动管理器、内存和存储装置的透明性机制。首先,为了能够对系统中多个硬件组件进行扩缩,我们用透明处理数据迁移的专用引擎加强每个折叠和记忆单元的功能。第二,要达到高性,在网络、结构和应用特点中提供稳健性,我们采用带宽度分隔、连接、分解多颗粒度数据移动、在数据移动系统上适应性弹性选择数据移动的同步方法。我们在数据库中,在数据库中,要对数据库进行大幅评估。