Storage disaggregation, wherein storage is accessed over the network, is popular because it allows applications to independently scale storage capacity and bandwidth based on dynamic application demand. However, the added network processing introduced by disaggregation can consume significant CPU resources. In many storage systems, logical storage operations (e.g., lookups, aggregations) involve a series of simple but dependent I/O access patterns. Therefore, one way to reduce the network processing overhead is to execute dependent series of I/O accesses at the remote storage server, reducing the back-and-forth communication between the storage layer and the application. We refer to this approach as \emph{remote-storage pushdown}. We present BPF-oF, a new remote-storage pushdown protocol built on top of NVMe-oF, which enables applications to safely push custom eBPF storage functions to a remote storage server. The main challenge in integrating BPF-oF with storage systems is preserving the benefits of their client-based in-memory caches. We address this challenge by designing novel caching techniques for storage pushdown, including splitting queries into separate in-memory and remote-storage phases and periodically refreshing the client cache with sampled accesses from the remote storage device. We demonstrate the utility of BPF-oF by integrating it with three storage systems, including RocksDB, a popular persistent key-value store that has no existing storage pushdown capability. We show BPF-oF provides significant speedups in all three systems when accessed over the network, for example improving RocksDB's throughput by up to 2.8$\times$ and tail latency by up to 2.6$\times$.
翻译:暂无翻译