The use of deep neural networks has been highly successful in reinforcement learning and control, although few theoretical guarantees for deep learning exist for these problems. There are two main challenges for deriving performance guarantees: a) control has state information and thus is inherently online and b) deep networks are non-convex predictors for which online learning cannot provide provable guarantees in general. Building on the linearization technique for overparameterized neural networks, we derive provable regret bounds for efficient online learning with deep neural networks. Specifically, we show that over any sequence of convex loss functions, any low-regret algorithm can be adapted to optimize the parameters of a neural network such that it competes with the best net in hindsight. As an application of these results in the online setting, we obtain provable bounds for online episodic control with deep neural network controllers.
翻译:深神经网络的使用在强化学习和控制方面非常成功,尽管对于这些问题的深度学习的理论保障很少,但在产生性能保障方面存在着两大挑战:(a) 控制具有国家信息,因此具有内在的在线性质;(b) 深神经网络是非康韦克斯预测器,而在线学习一般无法提供可验证的保障。基于超分神经网络的线性化技术,我们为与深神经网络的有效在线学习获得了可证实的遗憾界限。具体地说,我们表明,对于任何一系列的convex损失功能,任何低记录算法都可以加以调整,以优化神经网络的参数,使其与后视中的最佳网络进行竞争。作为这些结果在网络设置中的应用,我们获得了与深神经网络控制器进行在线直觉控制的可证实的界限。