Learned models and policies can generalize effectively when evaluated within the distribution of the training data, but can produce unpredictable and erroneous outputs on out-of-distribution inputs. In order to avoid distribution shift when deploying learning-based control algorithms, we seek a mechanism to constrain the agent to states and actions that resemble those that it was trained on. In control theory, Lyapunov stability and control-invariant sets allow us to make guarantees about controllers that stabilize the system around specific states, while in machine learning, density models allow us to estimate the training data distribution. Can we combine these two concepts, producing learning-based control algorithms that constrain the system to in-distribution states using only in-distribution actions? In this work, we propose to do this by combining concepts from Lyapunov stability and density estimation, introducing Lyapunov density models: a generalization of control Lyapunov functions and density models that provides guarantees on an agent's ability to stay in-distribution over its entire trajectory.
翻译:学习模式和政策在培训数据分布范围内进行评估时可以有效地概括化模型和政策,但可以在分配外投入方面产生不可预测和错误的产出。 为了在部署基于学习的控制算法时避免分配转移,我们寻求一种机制,将代理人限制在与其所训练的相似的州和行动。 在控制理论中,Lyapunov稳定性和控制性变异组合使我们能够保证控制者稳定在特定州周围的系统,而在机器学习中,密度模型允许我们估计培训数据分布。 我们能否将这两个概念结合起来,产生基于学习的控制算法,将系统限制在分配内状态,只使用分配行动? 在这项工作中,我们提议通过将Lyapunov稳定性和密度估计的概念结合起来来做到这一点,引入Lyapunov密度模型:控制功能和密度模型的普遍化和密度模型,以保障代理人在整个轨道上坚持分配的能力。