We present a general framework for optimizing the Conditional Value-at-Risk for dynamical systems using stochastic search. The framework is capable of handling the uncertainty from the initial condition, stochastic dynamics, and uncertain parameters in the model. The algorithm is compared against a risk-sensitive distributional reinforcement learning framework and demonstrates outperformance on a pendulum and cartpole with stochastic dynamics. We also showcase the applicability of the framework to robotics as an adaptive risk-sensitive controller by optimizing with respect to the fully nonlinear belief provided by a particle filter on a pendulum, cartpole, and quadcopter in simulation.
翻译:我们提出了一个利用随机搜索优化动态系统有条件值风险的一般框架。框架能够处理模型初始状态、随机动态和不确定参数的不确定性。算法与风险敏感分布强化学习框架进行了比较,并展示了在带有随机动态的钟摆和马车上的表现。我们还展示了框架对作为适应性风险敏感控制器的机器人的适用性,优化了在模拟中通过粒子过滤器提供的完全非线性信念。