We study finite-time horizon continuous-time linear-convex reinforcement learning problems in an episodic setting. In this problem, the unknown linear jump-diffusion process is controlled subject to nonsmooth convex costs. We show that the associated linear-convex control problems admit Lipchitz continuous optimal feedback controls and further prove the Lipschitz stability of the feedback controls, i.e., the performance gap between applying feedback controls for an incorrect model and for the true model depends Lipschitz-continuously on the magnitude of perturbations in the model coefficients; the proof relies on a stability analysis of the associated forward-backward stochastic differential equation. We then propose a novel least-squares algorithm which achieves a regret of the order $O(\sqrt{N\ln N})$ on linear-convex learning problems with jumps, where $N$ is the number of learning episodes; the analysis leverages the Lipschitz stability of feedback controls and concentration properties of sub-Weibull random variables.
翻译:我们在一个偶发环境中研究有限时间跨地平线直线式螺旋强化学习问题。 在这个问题中,未知的线性跳跃扩散过程受非单向锥形成本的控制。 我们显示,相关的线性锥体控制问题认可了Lipchitz持续的最佳反馈控制,并进一步证明了反馈控制Lipschitz的稳定性,即对不正确的模型和真实模型应用反馈控制之间的性能差距取决于Lipschitz持续地取决于模型系数的扰动程度;证据依赖于对相关前向后偏移差异方程式的稳定性分析。我们然后提出一个新的最小方程算法,该算法对线性锥体跳动学习问题的排序($O(sqrt{N\lnN})产生遗憾,在跳动中线性锥体学习问题中,美元是学习过程的数量; 分析利用Lipschitz的反馈控制稳定性和亚韦布尔随机变量的浓度特性。