Many real-world systems often involve physical components or operating environments with highly nonlinear and uncertain dynamics. A number of different control algorithms can be used to design optimal controllers for such systems, assuming a reasonably high-fidelity model of the actual system. However, the assumptions made on the stochastic dynamics of the model when designing the optimal controller may no longer be valid when the system is deployed in the real-world. The problem addressed by this paper is the following: Suppose we obtain an optimal trajectory by solving a control problem in the training environment, how do we ensure that the real-world system trajectory tracks this optimal trajectory with minimal amount of error in a deployment environment. In other words, we want to learn how we can adapt an optimal trained policy to distribution shifts in the environment. Distribution shifts are problematic in safety-critical systems, where a trained policy may lead to unsafe outcomes during deployment. We show that this problem can be cast as a nonlinear optimization problem that could be solved using heuristic method such as particle swarm optimization (PSO). However, if we instead consider a convex relaxation of this problem, we can learn policies that track the optimal trajectory with much better error performance, and faster computation times. We demonstrate the efficacy of our approach on tracking an optimal path using a Dubin's car model, and collision avoidance using both a linear and nonlinear model for adaptive cruise control.
翻译:许多真实世界的系统往往包含高度非线性和不确定动力学的物理组件或环境。可以使用许多不同的控制算法为这种系统设计最优控制器,假设实际系统的模型具有相当高的保真度。然而,在实际部署系统时,设计最优控制器时对模型的随机动态所作的假设可能不再有效。本文讨论的问题是:假设我们通过在训练环境中解决控制问题来获得最优轨迹,我们如何确保在部署环境中,真实系统轨迹以最小的误差跟踪这个最优轨迹。换句话说,我们想知道如何适应最优训练策略来适应环境的分布转移。分布转移在安全关键系统中会带来问题,因为训练策略可能会在部署过程中导致不安全的结果。我们展示了这个问题可以被建模为一个非线性优化问题,可以用启发式方法如粒子群优化(PSO)来解决。但是,如果我们考虑这个问题的凸松弛,我们可以学习到跟踪最优轨迹的策略,具有更好的误差性能和更快的计算时间。我们在使用Dubin's car 模型跟踪最优路径和使用自适应巡航控制进行线性和非线性模型碰撞避免的场景中展示了我们方法的功效。