持续行动、国家和时间中的价值迭代 (Value Iteration in Continuous Actions, States and Time)

Classical value iteration approaches are not applicable to environments with continuous states and actions. For such environments, the states and actions are usually discretized, which leads to an exponential increase in computational complexity. In this paper, we propose continuous fitted value iteration (cFVI). This algorithm enables dynamic programming for continuous states and actions with a known dynamics model. Leveraging the continuous-time formulation, the optimal policy can be derived for non-linear control-affine dynamics. This closed-form solution enables the efficient extension of value iteration to continuous environments. We show in non-linear control experiments that the dynamic programming solution obtains the same quantitative performance as deep reinforcement learning methods in simulation but excels when transferred to the physical system. The policy obtained by cFVI is more robust to changes in the dynamics despite using only a deterministic model and without explicitly incorporating robustness in the optimization. Videos of the physical system are available at \url{https://sites.google.com/view/value-iteration}.

翻译：经典值迭代方法不适用于具有连续状态和行动的环境。对于这些环境, 状态和行动通常是分散的, 从而导致计算复杂性的指数性增加。在本文中, 我们提出连续安装的值迭代( cFVI) 。这个算法能够为已知动态模型的连续状态和行动提供动态编程。利用连续时间的配制, 可以为非线性控制- 情感动态得出最佳政策。这个封闭式解决方案可以将值迭代有效扩展至连续环境。我们在非线性控制实验中显示, 动态编程解决方案在模拟中获得与深度强化学习方法相同的量化性能, 但是在转移到物理系统时优异。 CFVI 所获得的政策对于动态变化来说更加强大, 尽管只使用了一种确定性模型, 并且没有在优化中明确纳入稳健性。物理系统的视频可以在\ url{ https:// sites.gogle. com/view/ iteration) 上查阅。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

【开放书】数据可视化基础，《Fundamentals of Data Visualization》

专知会员服务

65+阅读 · 2021年6月13日

Python编程基础，121页ppt

专知会员服务

49+阅读 · 2021年1月1日

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日