Ask-AC: 一种初始化"小知识者"-在-环境中自适应学会的演员-评论家框架 (Ask-AC: An Initiative Advisor-in-the-Loop Actor-Critic Framework)

Despite the promising results achieved, state-of-the-art interactive reinforcement learning schemes rely on passively receiving supervision signals from advisor experts, in the form of either continuous monitoring or pre-defined rules, which inevitably result in a cumbersome and expensive learning process. In this paper, we introduce a novel initiative advisor-in-the-loop actor-critic framework, termed as Ask-AC, that replaces the unilateral advisor-guidance mechanism with a bidirectional learner-initiative one, and thereby enables a customized and efficacious message exchange between learner and advisor. At the heart of Ask-AC are two complementary components, namely action requester and adaptive state selector, that can be readily incorporated into various discrete actor-critic architectures. The former component allows the agent to initiatively seek advisor intervention in the presence of uncertain states, while the latter identifies the unstable states potentially missed by the former especially when environment changes, and then learns to promote the ask action on such states. Experimental results on both stationary and non-stationary environments and across different actor-critic backbones demonstrate that the proposed framework significantly improves the learning efficiency of the agent, and achieves the performances on par with those obtained by continuous advisor monitoring.

翻译：摘要：尽管取得了一些有前途的成果，但最先进的交互式强化学习方案依赖于从环境中不断接收监督信号的被动学习者，或者依赖于预定义规则，这必然导致学习过程繁琐、昂贵。在本文中，我们介绍了一种新型的初始化“小知识者”-在-环境中自适应学会的演员-评论家框架，称为Ask-AC，它将单向"小知识者"指导机制替换为双向学习者-主动指导者机制，从而实现了学习者和指导者之间的定制化和有效的信息交流。Ask-AC 的核心在于两个互补的组件，即行动请求者和自适应状态选择器，这些组件可以轻松地纳入各种离散演员 - 评论家构架中。前者组件允许代理程序在存在不确定状态时主动寻求指导者干预，而后者组件则在环境变化时确定被前者可能忽略的不稳定状态，并学会促进在这些状态下的请求行为。对于不同的演员-评论家框架，在稳定和不稳定环境中进行的实验结果表明，所提出的框架显著提高了智能体的学习效率，并取得了与连续监督相当的性能。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

终身学习如何构建？NeurIPS2022《终身学习机》教程，70页ppt

专知会员服务

46+阅读 · 2023年1月26日

【硬核书】深度强化学习实践手册：应用现代RL方法，包括深度Q网络、值迭代、策略梯度、TRPO、AlphaGo等，547页pdf

专知会员服务

79+阅读 · 2022年12月11日

【CVPR 2022】单黑箱和多黑箱预测的领域适应，DINE: Domain Adaptation from Single and Multiple Black-box Predictors

专知会员服务

14+阅读 · 2022年3月12日

【AAAI 2022】一种样本高效的基于模型的保守 actor-critic 算法

专知会员服务

24+阅读 · 2022年1月10日