条约核心行动空间的有效规划 (Efficient Planning in a Compact Latent Action Space)

While planning-based sequence modelling methods have shown great potential in continuous control, scaling them to high-dimensional state-action sequences remains an open challenge due to the high computational complexity and innate difficulty of planning in high-dimensional spaces. We propose the Trajectory Autoencoding Planner (TAP), a planning-based sequence modelling RL method that scales to high state-action dimensionalities. Using a state-conditional Vector-Quantized Variational Autoencoder (VQ-VAE), TAP models the conditional distribution of the trajectories given the current state. When deployed as an RL agent, TAP avoids planning step-by-step in a high-dimensional continuous action space but instead looks for the optimal latent code sequences by beam search. Unlike $O(D^3)$ complexity of Trajectory Transformer, TAP enjoys constant $O(C)$ planning computational complexity regarding state-action dimensionality $D$. Our empirical evaluation also shows the increasingly strong performance of TAP with the growing dimensionality. For Adroit robotic hand manipulation tasks with high state and action dimensionality, TAP surpasses existing model-based methods, including TT, with a large margin and also beats strong model-free actor-critic baselines.

翻译：虽然基于规划的序列建模方法在连续控制方面显示出巨大的潜力,但由于高空间规划的计算复杂程度和内在困难,将其推广到高维状态动作序列仍是一个公开的挑战。我们建议采用基于规划的序列建模RL方法(TAP)这个基于规划的序列建模RL方法,该方法将升至国家行动的高度维度。使用州有条件的矢量量量化自动算法(VQ-VAE),TAP模型将轨迹按当前状态有条件地分布。在作为RL代理时,TAP将避免在高维持续行动空间逐步规划,而是通过波束搜索寻找最佳的潜在代码序列。与美元(DQ3)相比,TAP拥有恒定的美元(C),在州-行动维度方面规划的计算复杂性(VQ-VAE)值(VAE-VAE),我们的经验评估还显示TAP在日益增强的维度上表现的日益强劲。对于Adroitroit 机器人操纵任务,包括高基、高基级、高基级、高基级的TAP,还有高基级的TAP。

相关内容

TAP

关注 812

ACM应用感知TAP(ACM Transactions on Applied Perception)旨在通过发表有助于统一这些领域研究的高质量论文来增强计算机科学与心理学/感知之间的协同作用。该期刊发表跨学科研究，在跨计算机科学和感知心理学的任何主题领域都具有重大而持久的价值。所有论文都必须包含感知和计算机科学两个部分。主题包括但不限于：视觉感知：计算机图形学，科学/数据/信息可视化，数字成像，计算机视觉，立体和3D显示技术。听觉感知：听觉显示和界面，听觉听觉编码，空间声音，语音合成和识别。触觉：触觉渲染，触觉输入和感知。感觉运动知觉：手势输入，身体运动输入。感官感知：感官整合，多模式渲染和交互。官网地址：http://dblp.uni-trier.de/db/journals/tap/

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

2020数据工程师成长路线图

专知会员服务

19+阅读 · 2020年9月6日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日