关于 " 稳定受约束的模拟学习 " 的抽样复杂性 (On the Sample Complexity of Stability Constrained Imitation Learning)

We study the following question in the context of imitation learning for continuous control: how are the underlying stability properties of an expert policy reflected in the sample-complexity of an imitation learning task? We provide the first results showing that a surprisingly granular connection can be made between the underlying expert system's incremental gain stability, a novel measure of robust convergence between pairs of system trajectories, and the dependency on the task horizon $T$ of the resulting generalization bounds. In particular, we propose and analyze incremental gain stability constrained versions of behavior cloning and a DAgger-like algorithm, and show that the resulting sample-complexity bounds naturally reflect the underlying stability properties of the expert system. As a special case, we delineate a class of systems for which the number of trajectories needed to achieve $\varepsilon$-suboptimality is sublinear in the task horizon $T$, and do so without requiring (strong) convexity of the loss function in the policy parameters. Finally, we conduct numerical experiments demonstrating the validity of our insights on both a simple nonlinear system for which the underlying stability properties can be easily tuned, and on a high-dimensional quadrupedal robotic simulation.

翻译：我们从模拟学习中研究下列问题,以便不断控制:模拟学习任务样本复杂性所反映的专家政策的基本稳定性特性如何?我们提供了初步结果,显示在基础专家系统递增增增增益稳定性、系统轨迹对任务地平线的高度趋同的新衡量标准、以及由此产生的概括界限对任务地平线的依赖性之间可以建立出奇的颗粒联系,我们特别提议并分析增益稳定性受行为克隆和像Dagger一样的算法限制的版本,并表明由此形成的样本兼容性界限自然反映了专家系统的基本稳定性特性。作为一个特例,我们划定了在任务地平线上实现美元和瓦列普西隆元的次优性所需的轨迹数量是亚线性的系统类别,而不需要(强)政策参数中损失函数的共性。最后,我们进行数字实验,以证明我们对简单非线性系统的认识的有效性,而基础的机器人性能的高度模型可以轻易地对之进行模拟。

相关内容

Expert Systems

关注 322

专家系统（Expert Systems）发表的论文涉及知识工程的各个方面，包括知识获取和表达的各个方法和技术，以及它们在基于这些方法和技术的系统(包括专家系统)构建中的应用。详细的科学评价是任何论文的重要组成部分。除了传统的应用领域，如软件与需求工程、人机交互和人工智能，我们还瞄准了这些技术的新兴市场，如商业、经济、市场研究和医疗卫生保健。向这一新的重点的转变将以一系列特别问题为标志，这些问题包括热点和新出现的主题。官网地址：http://dblp.uni-trier.de/db/journals/es/

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

迁移学习简明教程，11页ppt

专知会员服务

108+阅读 · 2020年8月4日

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日