We present RDMAbox, a set of low level RDMA opti-mizations that provide better performance than previous ap-proaches. The optimizations are packaged in easy-to-use ker-nel and userspace libraries and presented through simple nodelevel abstractions. We demonstrate the flexibility and effec-tiveness of RDMAbox by implementing a kernel remote pag-ing system and a userspace file system using RDMAbox.RDMAbox employs two optimization techniques. First, wesuggest Load-aware Batching to further reduce the total num-ber of I/O operations to the RDMA NIC beyond existing door-bell batching. The I/O merge queue at the same time functionsas a traffic regulator to enforce admission control and avoidoverloading the NIC. Second, we propose Adaptive Pollingto achieve higher efficiency of polling Work Completion thanexisting busy polling while maintaining the low CPU over-head of event trigger. Our implementation of a remote paging system with RDMAbox outperforms existing representative solutions with up to 6.48x throughput improvement and up to 83% decrease in average tail latency in bigdata workloads, and up to 83% reduction in completion time in machine learn-ing workloads. Our implementation of a user space file system based on RDMAbox achieves up to 6x higher throughput over existing representative solutions.
翻译:我们展示了RDMAbox, 这套低层次的 RDMA 优化组合, 其性能优于先前的 Ap- proaches 。 优化被包装在方便使用的内核和用户空间库中, 并通过简单的节点式抽象显示。 我们通过实施一个内核远程传声系统和一个使用 RDMAbox. RDMAbox 使用两种优化技术的用户空间文件系统来显示RDMAbox 。 首先, 我们向RDMA NIC 提供最先进的 I/ O 操作, 以进一步减少现有门铃键分批的 I/ O 操作的总 num-ber 。 I/ O 合并队列同时功能是交通调控器, 以实施接收控制并避免加载NIC 。 其次, 我们提议调整 Pollingto 实现比现有繁忙的投票工作更高效的完成率, 同时保持低的 CPUPU超前的触发技术。 我们实施的远程调控管系统, RDMAbox 超越了现有代号解决方案, 达6.48x 的更高代号解决方案, 改进了我们的平均用户工作量, 达83 完成了我们的平均系统, 完成了系统, 达83%。