An increasing number of artificial intelligence (AI) applications involve the execution of deep neural networks (DNNs) on edge devices. Many practical reasons motivate the need to update the DNN model on the edge device post-deployment, such as refining the model, concept drift, or outright change in the learning task. In this paper, we consider the scenario where retraining can be done on the server side based on a copy of the DNN model, with only the necessary data transmitted to the edge to update the deployed model. However, due to bandwidth constraints, we want to minimise the transmission required to achieve the update. We develop a simple approach based on matrix factorisation to compress the model update -- this differs from compressing the model itself. The key idea is to preserve existing knowledge in the current model and optimise only small additional parameters for the update which can be used to reconstitute the model on the edge. We compared our method to similar techniques used in federated learning; our method usually requires less than half of the update size of existing methods to achieve the same accuracy.
翻译:越来越多的人工智能(AI)应用程序涉及在边缘设备上执行深神经网络(DNN) 。 许多实际原因促使需要更新边端设备上的 DNN 模型,例如改进模型、概念漂移或彻底改变学习任务。 在本文中,我们考虑的是服务器方面能够根据DNN模型的复制件进行再培训的设想,只有传送到边缘的必要数据才能更新部署的模型。然而,由于带宽限制,我们想要将更新所需的传输减少到最低程度。我们根据矩阵系数制定了一种简单的方法来压缩模型更新 -- -- 这与压缩模型本身不同。关键的想法是保留当前模型中的现有知识,并且只优化更新的少量附加参数,用于在边缘重新构建模型。我们比较了我们的方法与在节能学习中使用的类似技术;我们的方法通常需要低于现有方法更新规模的一半,才能达到同样的精确度。