Near-data computation techniques have been successfully deployed to mitigate the cloud network bottleneck between the storage and compute tiers. At Huawei, we are currently looking to get more value from these techniques by broadening their applicability. Machine learning (ML) applications are an appealing and timely target. This paper describes our experience applying near-data computation techniques to transfer learning (TL), a widely popular ML technique, in the context of disaggregated cloud object stores. Our techniques benefit both cloud providers and users. They improve our operational efficiency while providing users the performance improvements they demand from us. The main practical challenge to consider is that the storage-side computational resources are limited. Our approach is to split the TL deep neural network (DNN) during the feature extraction phase, before the training phase. This reduces the network transfers to the compute tier and further decouples the batch size of feature extraction from the training batch size. This facilitates our second technique, storage-side batch adaptation, which enables increased concurrency in the storage tier while avoiding out-of-memory errors. Guided by these insights, we present HAPI, our processing system for TL that spans the compute and storage tiers while remaining transparent to the user. Our evaluation with several state-of-the-art DNNs, such as ResNet, VGG, and Transformer, shows up to 11x improvement in application runtime and up to 8.3x reduction in the data transferred from the storage to the compute tier compared to running the computation entirely in the compute tier.
翻译:近数据计算技术已被成功应用, 以缓解存储层和计算层之间的云网络瓶颈。 在华威, 我们目前期待通过扩大应用范围从这些技术中获得更多价值。 机器学习( ML) 应用是一个吸引和及时的目标。 本文描述了我们运用近数据计算技术在分解云标存储库中转让学习( TL)的经验, 这是一种广受欢迎的 ML 技术。 我们的技术既有利于云端提供者,也有利于用户。 它们提高了操作效率,同时为用户提供了他们要求我们改进的性能。 需要考虑的主要实际挑战是存储端计算资源有限。 我们的方法是在功能提取阶段之前将TL深神经网络分割。 这减少了网络向计算层的传输,并进一步将特性提取的批量从培训批量大小拆分拆分。 这有利于我们的第二个技术, 储存端调整, 从而在存储层中增加调值, 同时避免出外差错误差。 以这些洞见, 我们展示了 HAPI, 我们的存储层高级计算系统, 在功能提取层的升级处理系统中, 将数据转换到升级到升级的升级系统, 运行到升级到升级到升级的系统, 运行到升级到升级到升级到升级的服务器, 系统, 升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级到升级。