Resource disaggregation offers a cost effective solution to resource scaling, utilization, and failure-handling in data centers by physically separating hardware devices in a server. Servers are architected as pools of processor, memory, and storage devices, organized as independent failure-isolated components interconnected by a high-bandwidth network. A critical challenge, however, is the high performance penalty of accessing data from a remote memory module over the network. Addressing this challenge is difficult as disaggregated systems have high runtime variability in network latencies/bandwidth, and page migration can significantly delay critical path cache line accesses in other pages. This paper conducts a characterization analysis on different data movement strategies in fully disaggregated systems, evaluates their performance overheads in a variety of workloads, and introduces DaeMon, the first software-transparent mechanism to significantly alleviate data movement overheads in fully disaggregated systems. First, to enable scalability to multiple hardware components in the system, we enhance each compute and memory unit with specialized engines that transparently handle data migrations. Second, to achieve high performance and provide robustness across various network, architecture and application characteristics, we implement a synergistic approach of bandwidth partitioning, link compression, decoupled data movement of multiple granularities, and adaptive granularity selection in data movements. We evaluate DaeMon in a wide variety of workloads at different network and architecture configurations using a state-of-the-art accurate simulator. DaeMon improves system performance and data access costs by 2.39$\times$ and 3.06$\times$, respectively, over the widely-adopted approach of moving data at page granularity.
翻译:资源分类为数据中心的资源缩放、利用和故障处理提供了一个成本效益高的解决办法,具体办法是将硬件设备从服务器上物理分离,从而在数据中心实现资源缩放、利用和故障处理。服务器设计成一个处理器、内存和存储装置的组合,由高带宽网络组成独立的故障孤立组件。然而,一个严峻的挑战是从远程存储模块访问网络的数据的性能高罚罚罚。由于分解系统在网络延缓/宽宽度中具有高度的运行时间变异性,因此难以应对这一挑战。页面迁移可大大推迟其他页面的关键路径缓存线路访问。本文对完全分解的系统中的不同数据移动战略进行定性分析,评估其在各种工作量中的业绩管理,并采用DaeMon这一第一个软件透明机制,以大幅减缓完全分解的系统中的数据移动。首先,为系统中多个硬件组件的可扩缩缩缩缩缩,我们通过透明处理数据迁移的专用引擎加强每个计算和记忆单位。第二,实现高性运行,为各种网络、结构和应用特性提供稳健的硬性数据移动方法,我们分别在多种工作量中采用对数据结构进行同步的升级、压缩数据结构进行数据移动。