A lot of deep learning applications are desired to be run on mobile devices. Both accuracy and inference time are meaningful for a lot of them. While the number of FLOPs is usually used as a proxy for neural network latency, it may be not the best choice. In order to obtain a better approximation of latency, research community uses look-up tables of all possible layers for latency calculation for the final prediction of the inference on mobile CPU. It requires only a small number of experiments. Unfortunately, on mobile GPU this method is not applicable in a straight-forward way and shows low precision. In this work, we consider latency approximation on mobile GPU as a data and hardware-specific problem. Our main goal is to construct a convenient latency estimation tool for investigation(LETI) of neural network inference and building robust and accurate latency prediction models for each specific task. To achieve this goal, we build open-source tools which provide a convenient way to conduct massive experiments on different target devices focusing on mobile GPU. After evaluation of the dataset, we learn the regression model on experimental data and use it for future latency prediction and analysis. We experimentally demonstrate the applicability of such an approach on a subset of popular NAS-Benchmark 101 dataset and also evaluate the most popular neural network architectures for two mobile GPUs. As a result, we construct latency prediction model with good precision on the target evaluation subset. We consider LETI as a useful tool for neural architecture search or massive latency evaluation. The project is available at https://github.com/leti-ai
翻译:想要在移动设备上运行大量深层次的学习应用程序。 不幸的是, 在移动 GPU 上, 这种方法在直向前方的搜索方式中不适用, 并且显示的精确度较低。 虽然 FLOP 的数量通常用作神经网络内嵌值的代金。 虽然 FLOP 的数量通常用作神经网络内嵌值的代金工具, 但这不是最佳选择。 为了更好地接近潜伏, 研究界使用所有可能层次的搜索表来进行潜伏计算, 以便最终预测移动 CPU 的推断值。 它只需要进行少量的实验。 不幸的是, 在移动 GPU 上, 移动 GPU 的这一方法不能适用于直向前方的搜索方式, 并且显示低精确度。 在这项工作中, 我们把移动 GPUPU 的延缩缩贴近度近值视为一个数据和硬件特有的问题。 我们的主要目标是为实验性 GLETI 网络调查建立一个精确度估计工具的精确度工具 。 为了实现这一目标, 我们建立开放源工具工具, 在移动 GPUPU 评估后, 我们用一个回归模型来进行大规模的模型 。