TAAC: 用于持续控制的暂时性抽象行为者-批评 (TAAC: Temporally Abstract Actor-Critic for Continuous Control)

We present temporally abstract actor-critic (TAAC), a simple but effective off-policy RL algorithm that incorporates closed-loop temporal abstraction into the actor-critic framework. TAAC adds a second-stage binary policy to choose between the previous action and a new action output by an actor. Crucially, its "act-or-repeat" decision hinges on the actually sampled action instead of the expected behavior of the actor. This post-acting switching scheme let the overall policy make more informed decisions. TAAC has two important features: a) persistent exploration, and b) a new compare-through Q operator for multi-step TD backup, specially tailored to the action repetition scenario. We demonstrate TAAC's advantages over several strong baselines across 14 continuous control tasks. Our surprising finding reveals that while achieving top performance, TAAC is able to "mine" a significant number of repeated actions with the trained policy even on continuous tasks whose problem structures on the surface seem to repel action repetition. This suggests that aside from encouraging persistent exploration, action repetition can find its place in a good policy behavior. Code is available at https://github.com/hnyu/taac.

翻译：我们提出了时间抽象的行动者-批评(TAAC),这是一个简单而有效的脱离政策的RL算法,将封闭环状时间抽象抽取纳入行动者-批评框架。TAAC增加了一个第二阶段的二进制政策,以在先前的行动和一个新的行动者的行动输出之间作出选择。关键是,它的“行动或重复”决定取决于实际抽样行动,而不是行为者的预期行为。这个后动作转换方案让整个政策作出更知情的决定。TAAC有两个重要特征:(a) 持续探索,和(b) 新的多步TD备份比较通式Q操作员,特别针对行动重复情况。我们展示了TAAC在14项连续控制任务中在若干强基线上所具有的优势。我们令人惊讶的发现,在取得顶级业绩的同时,TAAC能够“布雷”大量重复行动,即使经过训练的政策在表面的问题结构似乎可以重复行动。这表明,除了鼓励持续探索外,行动重复也可以在良好的政策行为行为中找到它的位置。代码可在 https://giuthubcom/hnhnata查阅。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

【ICML2021】策略梯度贝叶斯鲁棒优化的模仿学习

专知会员服务

25+阅读 · 2021年6月15日

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【ICML2020】用于强化学习的对比无监督表示嵌入

专知会员服务

28+阅读 · 2020年7月6日