以近数据计算方式加速转移对云体储存的学习 (Accelerating Transfer Learning with Near-Data Computation on Cloud Object Stores)

Storage disaggregation is fundamental to today's cloud due to cost and scalability benefits. Unfortunately, this design must cope with an inherent network bottleneck between the storage and the compute tiers. The widely deployed mitigation strategy is to provide computational resources next to storage to push down a part of an application and thus reduce the amount of data transferred to the compute tier. Overall, users of disaggregated storage need to consider two main constraints: the network may remain a bottleneck, and the storage-side computational resources are limited. This paper identifies transfer learning (TL) as a natural fit for the disaggregated cloud. TL, famously described as the next driver of ML commercial success, is widely popular and has broad-range applications. We show how to leverage the unique structure of TL's fine-tuning phase (i.e., a combination of feature extraction and training) to flexibly address the aforementioned constraints and improve both user and operator-centric metrics. The key to improving user-perceived performance is to mitigate the network bottleneck by carefully splitting the TL deep neural network (DNN) such that feature extraction is, partially or entirely, executed next to storage. Crucially, such splitting enables decoupling the batch size of feature extraction from the training batch size, facilitating efficient storage-side batch size adaptation to increase concurrency in the storage tier while avoiding out-of-memory errors. Guided by these insights, we present HAPI, a processing system for TL that spans the compute and storage tiers while remaining transparent to the user. Our evaluation with several DNNs, such as ResNet, VGG, and Transformer, shows up to 11x improvement in application runtime and up to 8.3x reduction in the data transferred from the storage to the compute tier compared to running the computation in the compute tier.

翻译：由于成本和可缩缩的效益,对今天的云层而言,存储的分解是基本的。不幸的是, 此设计必须应对存储层和计算层之间固有的网络瓶颈。广泛部署的缓解战略是提供存储处旁边的计算资源, 以推下应用程序的一部分, 从而减少转移到计算层的数据数量。总体而言, 分类存储的用户需要考虑两个主要制约因素: 网络可能仍然是一个瓶颈, 存储端计算资源有限。本文将传输学习( TL) 确定为分解云层的自然适应性。以ML 商业成功的下一个驱动者为名的 TL 。广度部署的缓解战略是提供计算资源, 将存储库的独特的结构( 即功能提取和培训的组合), 以灵活的方式解决上述制约因素, 改善用户和操作器中心的测量资源。改进网络的连接点是, 仔细地将深度的内置的内置系统( DNNU ) 转换为ML 商业成功的下一个驱动力驱动力驱动器,, 将精细的存储层的存储系统升级升级到升级,, 升级的存储器的存储器的存储中, 运行的升级到升级的升级的存储,, 升级到升级到升级到升级到升级到升级的存储, 运行到升级到升级的存储的升级到升级到升级到升级到升级到升级到升级到升级到升级到升级的存储,, 性, 性,,, 运行到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级。