CACTO: 具有轨迹优化的连续动-动-动-动-动-动 -- -- 实现全球最佳化 (CACTO: Continuous Actor-Critic with Trajectory Optimization -- Towards global optimality)

This paper presents a novel algorithm for the continuous control of dynamical systems that combines Trajectory Optimization (TO) and Reinforcement Learning (RL) in a single framework. The motivations behind this algorithm are the two main limitations of TO and RL when applied to continuous nonlinear systems to minimize a non-convex cost function. Specifically, TO can get stuck in poor local minima when the search is not initialized close to a "good" minimum. On the other hand, when dealing with continuous state and control spaces, the RL training process may be excessively long and strongly dependent on the exploration strategy. Thus, our algorithm learns a "good" control policy via TO-guided RL policy search that, when used as initial guess provider for TO, makes the trajectory optimization process less prone to converge to poor local optima. Our method is validated on several reaching problems featuring non-convex obstacle avoidance with different dynamical systems, including a car model with 6D state, and a 3-joint planar manipulator. Our results show the great capabilities of CACTO in escaping local minima, while being more computationally efficient than the Deep Deterministic Policy Gradient (DDPG) and Proximal Policy Optimization (PPO) RL algorithms.

翻译：本文展示了一种新型的动态系统连续控制算法, 将轨迹优化(TO) 和强化学习( RL) 结合在一个单一的框架里进行。此算法背后的动机是连续的非线性系统应用到连续的非线性系统以最大限度地减少非线性成本功能时, 和 RL 的两个主要限制。具体地说, 当搜索未在接近“ 良好” 最低“ ” 的初始化时, 可以卡在贫穷的本地迷你中。另一方面, 当处理连续状态和控制空间时, RL 培训过程可能会过长, 并在很大程度上依赖于勘探战略。因此, 我们的算法通过TO 引导的 RL 政策搜索学习了“ 良好” 控制政策, 当用作初始的猜想提供者时, 轨迹优化进程不易与本地的偏差相交汇。我们的方法被验证于几个问题, 这些问题涉及非节迹障碍避免与不同的动态系统, 包括具有 6D 状态的汽车模型和三联式平板操纵器。我们的结果表明, CACTO 在逃离本地微型微型系统时, 并且比远为更具有计算效率 IPPIDDGDG 政策( ) 。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

机器学习组合优化

专知会员服务

110+阅读 · 2021年2月16日

不可错过！华盛顿大学最新《生成式模型》课程，附PPT

专知会员服务

65+阅读 · 2020年12月11日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日