State-of-the-art deep learning models have achieved significant performance levels on various benchmarks. However, the excellent performance comes at a cost of inefficient computational cost. Light-weight architectures, on the other hand, achieve moderate accuracies, but at a much more desirable latency. This paper presents a new method of jointly using the large accurate models together with the small fast ones. To this end, we propose an Energy-Based Joint Reasoning (EBJR) framework that adaptively distributes the samples between shallow and deep models to achieve an accuracy close to the deep model, but latency close to the shallow one. Our method is applicable to out-of-the-box pre-trained models as it does not require an architecture change nor re-training. Moreover, it is easy to use and deploy, especially for cloud services. Through a comprehensive set of experiments on different down-stream tasks, we show that our method outperforms strong state-of-the-art approaches with a considerable margin. In addition, we propose specialized EBJR, an extension of our method where we create a smaller specialized side model that performs the target task only partially, but yields an even higher accuracy and faster inference. We verify the strengths of our methods with both theoretical and experimental evaluations.
翻译:最先进的深层学习模型在各种基准中取得了显著的绩效水平。然而,优异的绩效是以低效率计算成本的代价取得的。轻量结构则实现了中度封闭,但更理想的潜伏。本文件介绍了一种新方法,即共同使用大型准确模型和小型快速模型。为此,我们提议了一个基于能源的联合理由框架,在浅层和深层模型之间进行适应性分配,以便实现接近深层模型的准确性,但与浅层模型相近。我们的方法适用于箱外的预培训模型,因为它不需要改变结构或再培训。此外,使用和部署,特别是云服务是容易的。通过一系列关于不同下流任务的全面试验,我们表明我们的方法超越了强大的状态方法,并有很大的幅度。此外,我们提议了专门的EBJR,这是我们方法的延伸,我们的方法范围越小,我们创造了一个较小型的专门侧模型,可以进行目标精确度的实验方法,我们只能部分地进行实验性评估,甚至将实验性方法进行。