Faster inference of deep learning models is highly demanded on edge devices and even servers, for both financial and environmental reasons. To address this issue, we propose SoftNeuro, a novel, high-performance inference framework with efficient performance tuning. The key idea is to separate algorithmic routines from network layers. Our framework maximizes the inference performance by profiling various routines for each layer and selecting the fastest path. To efficiently find the best path, we propose a routine-selection algorithm based on dynamic programming. Experiments show that the proposed framework achieves both fast inference and efficient tuning.
翻译:出于财政和环境方面的原因,在边缘装置甚至服务器上都非常需要更快地推断深层学习模式。为了解决这一问题,我们提议SoftNeuro,这是一个具有高效性能调试的新型高性能推论框架。关键的想法是将算法常规与网络层区分开来。我们的框架通过为每一层绘制各种例程特征和选择最快的路径,最大限度地提高推论性能。为了有效找到最佳路径,我们建议采用基于动态编程的例行选择算法。实验显示,拟议框架既能快速推断,又能高效调试。