朝向一个箭型储存系统 (Towards an Arrow-native Storage System)

With the ever-increasing dataset sizes, several file formats like Parquet, ORC, and Avro have been developed to store data efficiently and to save network and interconnect bandwidth at the price of additional CPU utilization. However, with the advent of networks supporting 25-100 Gb/s and storage devices delivering 1, 000, 000 reqs/sec the CPU has become the bottleneck, trying to keep up feeding data in and out of these fast devices. The result is that data access libraries executed on single clients are often CPU-bound and cannot utilize the scale-out benefits of distributed storage systems. One attractive solution to this problem is to offload data-reducing processing and filtering tasks to the storage layer. However, modifying legacy storage systems to support compute offloading is often tedious and requires extensive understanding of the internals. Previous approaches re-implemented functionality of data processing frameworks and access library for a particular storage system, a duplication of effort that might have to be repeated for different storage systems. In this paper, we introduce a new design paradigm that allows extending programmable object storage systems to embed existing, widely used data processing frameworks and access libraries into the storage layer with minimal modifications. In this approach data processing frameworks and access libraries can evolve independently from storage systems while leveraging the scale-out and availability properties of distributed storage systems. We present one example implementation of our design paradigm using Ceph, Apache Arrow, and Parquet. We provide a brief performance evaluation of our implementation and discuss key results.

翻译：随着数据集规模的不断增加,已经开发出若干文件格式,如Parquet、ORC和Avro等,以便以更多CPU的利用为代价,高效率地储存数据,节省网络和连接带宽,但随着支持25-100Gb/s的网络的出现,提供1 000 000 000 reqs/sec的存储装置,CPU已成为瓶颈,试图在这些快速装置中不断不断输入数据,结果使单个客户执行的数据存取图书馆经常受CPU约束,无法利用分布式存储系统的扩大效益。这一问题的一个有吸引力的解决办法是将数据减少处理和过滤任务卸载到存储层。然而,修改遗留存储系统以支持计算机卸载的25-100 Gb/s和储存装置,往往很乏味,需要广泛了解内部情况。以往的做法是重新实施数据处理框架的功能和进入特定存储系统的存取图书馆,这可能需要为不同的存储系统重复努力。在本文中,我们引入新的设计模式,允许将可编程的存储器存储系统扩展到将现有的、广泛使用的关键处理框架的存储框架和访问图书馆。