机器学习领域中的“欺骗性强化学习”在无模型领域的应用研究 (Deceptive Reinforcement Learning in Model-Free Domains)

This paper investigates deceptive reinforcement learning for privacy preservation in model-free and continuous action space domains. In reinforcement learning, the reward function defines the agent's objective. In adversarial scenarios, an agent may need to both maximise rewards and keep its reward function private from observers. Recent research presented the ambiguity model (AM), which selects actions that are ambiguous over a set of possible reward functions, via pre-trained $Q$-functions. Despite promising results in model-based domains, our investigation shows that AM is ineffective in model-free domains due to misdirected state space exploration. It is also inefficient to train and inapplicable in continuous action space domains. We propose the deceptive exploration ambiguity model (DEAM), which learns using the deceptive policy during training, leading to targeted exploration of the state space. DEAM is also applicable in continuous action spaces. We evaluate DEAM in discrete and continuous action space path planning environments. DEAM achieves similar performance to an optimal model-based version of AM and outperforms a model-free version of AM in terms of path cost, deceptiveness and training efficiency. These results extend to the continuous domain.

翻译：本文研究了如何在无模型和连续动作空间的领域中使用欺骗性强化学习来实现隐私保护。在强化学习中，奖励函数定义了智能体的目标。在对抗行动的情况下，代理需要既最大化奖励，又保持其奖励函数对观察者保密。最近的研究提出了不确定性模型（AM），它通过预训练的$Q$函数选择模糊的动作，使其适用于模型为基础领域。然而，我们的研究表明，AM对于无模型领域来说是无效的，因为它无法正确探索状态空间。它还难以训练，并且无法应用于连续动作空间领域。我们提出了欺骗性探索不确定性模型（DEAM），这种训练方法使用了欺骗性策略，从而实现了对状态空间的有针对性的探索。DEAM同样适用于连续动作空间。我们在离散和连续的行动规划环境中评估了DEAM。DEAM实现了与AM基于模型的最优版本类似的性能，并在路径成本、欺骗性和训练效率方面优于AM的无模型版本。这些结果也适用于连续领域。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

【硬核书】深度强化学习实践手册：应用现代RL方法，包括深度Q网络、值迭代、策略梯度、TRPO、AlphaGo等，547页pdf

专知会员服务

79+阅读 · 2022年12月11日

JCIM丨DRlinker：深度强化学习优化片段连接设计

专知会员服务

7+阅读 · 2022年12月9日

【AI+商业投资】法国兴业银行《深度强化学习在投资组合分配中的应用》26页PPT，Deep Reinforcement Learning for portfolio allocation

专知会员服务

24+阅读 · 2022年4月1日

【2022新书】强化学习工业应用，408页pdf

专知会员服务

231+阅读 · 2022年2月3日