Training a Convolutional Neural Network (CNN) model typically requires significant computing power, and cloud computing resources are widely used as a training environment. However, it is difficult for CNN algorithm developers to keep up with system updates and apply them to their training environment due to quickly evolving cloud services. Thus, it is important for cloud computing service vendors to design and deliver an optimal training environment for various training tasks to lessen system operation management overhead of algorithm developers. To achieve the goal, we propose PROFET, which can predict the training latency of arbitrary CNN implementation on various Graphical Processing Unit (GPU) devices to develop a cost-effective and time-efficient training cloud environment. Different from the previous training latency prediction work, PROFET does not rely on the implementation details of the CNN architecture, and it is suitable for use in a public cloud environment. Thorough evaluations reveal the superior prediction accuracy of PROFET compared to the state-of-the-art related work, and the demonstration service presents the practicality of the proposed system.
翻译:培训进化神经网络(CNN)模式通常要求巨大的计算能力,云计算资源被广泛用作培训环境,然而,CNN算法开发者由于云迅速变化,很难跟上系统更新,并将其应用于培训环境,因此,云计算服务供应商必须设计和提供最佳培训环境,以开展各种培训任务,减少系统操作管理对算法开发者的管理间接费用。为了实现这一目标,我们提议PROFET,它可以预测在各种图形处理单位(GPU)装置上任意实施CNN的延迟培训时间,以发展具有成本效益和时间效率的培训云层环境。与以往的培训长期预测工作不同,PROFET不依赖CNN结构的实施细节,适合在公共云层环境中使用。Troough评价显示PROFET相对于最新相关工作的预测准确性较高,演示服务展示了拟议系统的实用性。