Recent decades have seen the rise of large-scale Deep Neural Networks (DNNs) to achieve human-competitive performance in a variety of artificial intelligence tasks. Often consisting of hundreds of millions, if not hundreds of billion parameters, these DNNs are too large to be deployed to, or efficiently run on resource-constrained devices such as mobile phones or IoT microcontrollers. Systems relying on large-scale DNNs thus have to call the corresponding model over the network, leading to substantial costs for hosting and running the large-scale remote model, costs which are often charged on a per-use basis. In this paper, we propose BiSupervised, a novel architecture, where, before relying on a large remote DNN, a system attempts to make a prediction on a small-scale local model. A DNN supervisor monitors said prediction process and identifies easy inputs for which the local prediction can be trusted. For these inputs, the remote model does not have to be invoked, thus saving costs, while only marginally impacting the overall system accuracy. Our architecture furthermore foresees a second supervisor to monitor the remote predictions and identify inputs for which not even these can be trusted, allowing to raise an exception or run a fallback strategy instead. We evaluate the cost savings, and the ability to detect incorrectly predicted inputs on four diverse case studies: IMDB movie review sentiment classification, Github issue triaging, Imagenet image classification, and SQuADv2 free-text question answering
翻译:近几十年来,大规模深度神经网络在各种人工智能任务中实现了与人类相媲美的表现。这些深度神经网络通常由数十亿甚至数百亿个参数组成,因此无法部署到或在资源受限的设备上(如手机或物联网微控制器)进行高效运行。因此,依赖大规模深度神经网络的系统必须通过网络调用相应的模型,这导致托管和运行大规模远程模型的成本大幅增加,而这些成本通常是基于使用量计费的。在本文中,我们提出了一种新的架构“BiSupervised”。在依赖大型远程深度神经网络之前,该系统会尝试使用小型本地模型进行预测。DNN主管监视该预测过程并识别出可以信任本地预测的简单输入。对于这些输入,不必调用远程模型,从而节省成本,同时只对整个系统的准确性产生轻微影响。我们的架构还预见了第二个主管来监视远程预测并识别出哪些输入甚至连这些都不能信任,从而允许引发异常或运行备用策略。我们在四个不同的案例研究中评估了成本节约和检测预测错误输入的能力:IMDB电影评价情感分类、Github问题分类、Imagenet图像分类和SQuADv2自由文本问答。