Safety-critical cyber-physical systems require control strategies whose worst-case performance is robust against adversarial disturbances and modeling uncertainties. In this paper, we present a framework for approximate control and learning in partially observed systems to minimize the worst-case discounted cost over an infinite time-horizon. We model disturbances to the system as finite-valued uncertain variables with unknown probability distributions. For problems with known system dynamics, we construct a dynamic programming (DP) decomposition to compute the optimal control strategy. Our first contribution is to define information states that improve the computational tractability of this DP without loss of optimality. Then, we describe a simplification for a class of problems where the incurred cost is observable at each time-instance. Our second contribution is a definition of approximate information states that can be constructed or learned directly from observed data for problems with observable costs. We derive bounds on the performance loss of the resulting approximate control strategy.
翻译:关键性安全物理系统需要控制策略,其最坏情况表现对抗扰动和建模不确定性具有鲁棒性。在本文中,我们提出了一个框架,用于部分观测系统中的近似控制和学习,以最小化无限时间轴上的最坏情况折扣成本。我们将系统对扰动建模为未知概率分布的有限值不确定变量。对于已知系统动力学的问题,我们构造了一个动态规划(DP)分解,以计算最优的控制策略。我们的第一个贡献是定义信息状态,以提高DP的计算可处理性,而不会损失最优性。然后,我们描述了一种简化适用于在每个时间区间内可观察到已发生损失的问题的类别。我们的第二个贡献是定义了大致信息状态,适用于具有可观测代价的问题,可以直接从观察到的数据中构造或学习。我们推导出了由此产生的近似控制策略的性能损失的界限。