Machine-learned black-box policies are ubiquitous for nonlinear control problems. Meanwhile, crude model information is often available for these problems from, e.g., linear approximations of nonlinear dynamics. We study the problem of equipping a black-box control policy with model-based advice for nonlinear control on a single trajectory. We first show a general negative result that a naive convex combination of a black-box policy and a linear model-based policy can lead to instability, even if the two policies are both stabilizing. We then propose an adaptive $\lambda$-confident policy, with a coefficient $\lambda$ indicating the confidence in a black-box policy, and prove its stability. With bounded nonlinearity, in addition, we show that the adaptive $\lambda$-confident policy achieves a bounded competitive ratio when a black-box policy is near-optimal. Finally, we propose an online learning approach to implement the adaptive $\lambda$-confident policy and verify its efficacy in case studies about the CartPole problem and a real-world electric vehicle (EV) charging problem with data bias due to COVID-19.
翻译:机器学的黑箱政策对于非线性控制问题来说是无处不在的。 同时, 这些问题往往可以从非线性动态线性近似线性近似线性来获得粗略的模型信息。 我们研究在单一轨道上为非线性控制提供基于模型的建议来装备黑箱控制政策的问题。 我们首先显示一个普遍的负面结果, 即即使两种政策都稳定下来, 黑箱政策和线性模式性政策之间天真的结合会导致不稳定。 我们然后提出一个适应性的 $lambda$- confident 政策, 以 $\ lambda$ 表示对黑箱政策的信心, 并证明它的稳定性。 此外, 我们用不线性线性来研究黑箱政策的适应性 $\ lambda$- condifity 政策在黑箱政策接近最佳时, 能够实现一个约束性竞争比率。 最后, 我们提出一个在线学习方法, 以实施适应性的 $\lambda$- confidentive 政策, 在CartPole 问题和实体- 19 电车( CO- VI) 将数据与错误联系起来的情况下进行案例研究, 。