Many mobile applications demand selective execution of multiple correlated deep learning inference tasks on resource-constrained platforms. Given a set of deep neural networks, each pre-trained for a single task, it is desired that executing arbitrary combinations of tasks yields minimal computation cost. Pruning each network separately yields suboptimal computation cost due to task relatedness. A promising remedy is to merge the networks into a multitask network to eliminate redundancy across tasks before network pruning. However, pruning a multitask network combined by existing network merging schemes cannot minimise the computation cost of every task combination because they do not consider such a future pruning. To this end, we theoretically identify the conditions such that pruning a multitask network minimises the computation of all task combinations. On this basis, we propose Pruning-Aware Merging (PAM), a heuristic network merging scheme to construct a multitask network that approximates these conditions. The merged network is then ready to be further pruned by existing network pruning methods. Evaluations with different pruning schemes, datasets, and network architectures show that PAM achieves up to 4.87x less computation against the baseline without network merging, and up to 2.01x less computation against the baseline with a state-of-the-art network merging scheme.
翻译:许多移动应用程序要求有选择地在资源限制的平台上执行多个相关的深深学习推导任务。 鉴于一组深层神经网络, 每个都经过预先培训, 完成一个单一的任务, 我们期望执行任意合并的任务将产生最低的计算成本。 由于任务相关, 将每个网络分别产生亚最佳计算成本。 一个大有希望的补救措施是将网络合并成一个多任务网络, 以便在网络运行之前消除任务之间的冗余。 然而, 在现有网络合并计划下运行一个多任务网络, 无法将每个任务组合的计算成本降到最低, 因为他们不考虑这样的未来调整 。 为此, 我们理论上确定这样的条件, 使运行多任务网络能够使所有任务组合的计算最小化。 在此基础上, 我们提议将Pruning- Aware 合并( PAM), 一个超自然网络合并计划, 以构建一个与这些条件相近的多任务网络。 然后, 合并的网络将准备通过现有网络重组方法进行进一步调整。 与不同的运行计划、 数据设置和网络结构的评估显示, 将PAM 的网络的基线化到低于基准的计算。