Deploying deep learning (DL) on mobile devices has been a notable trend in recent years. To support fast inference of on-device DL, DL libraries play a critical role as algorithms and hardware do. Unfortunately, no prior work ever dives deep into the ecosystem of modern DL libs and provides quantitative results on their performance. In this paper, we first build a comprehensive benchmark that includes 6 representative DL libs and 15 diversified DL models. We then perform extensive experiments on 10 mobile devices, which help reveal a complete landscape of the current mobile DL libs ecosystem. For example, we find that the best-performing DL lib is severely fragmented across different models and hardware, and the gap between those DL libs can be rather huge. In fact, the impacts of DL libs can overwhelm the optimizations from algorithms or hardware, e.g., model quantization and GPU/DSP-based heterogeneous computing. Finally, atop the observations, we summarize practical implications to different roles in the DL lib ecosystem.
翻译:在移动设备上部署深度学习(DL)是近年来一个值得注意的趋势。为了支持快速推断在设计DL上的位置,DL图书馆作为算法和硬件发挥着关键作用。 不幸的是,以前从未做过任何深入现代DLlib生态系统的工作,也没有提供其性能的量化结果。在本文中,我们首先建立了一个包括6个具有代表性的DL libs和15个多样化DL模型的全面基准。我们随后对10个移动设备进行了广泛的实验,这有助于揭示目前移动DL libs生态系统的完整景观。例如,我们发现最优秀的DLlib在不同的模型和硬件上严重分散,而这些DLlibs之间的鸿沟可能相当大。事实上,DLlibs的影响可以压倒来自算法或硬件的优化,例如模型四分化和基于GPU/DSP的混合计算。最后,我们在观察中总结了DLlib生态系统中不同角色的实际影响。