学习行为与不确定的人类反馈 (Learning Behaviors with Uncertain Human Feedback) - 专知论文

会员服务 ·

0

优化器 · E步 · 学成 · 近似 · Obvious ·

2020 年 6 月 7 日

Learning Behaviors with Uncertain Human Feedback

翻译：学习行为与不确定的人类反馈

Xu He,Haipeng Chen,Bo An

Human feedback is widely used to train agents in many domains. However, previous works rarely consider the uncertainty when humans provide feedback, especially in cases that the optimal actions are not obvious to the trainers. For example, the reward of a sub-optimal action can be stochastic and sometimes exceeds that of the optimal action, which is common in games or real-world. Trainers are likely to provide positive feedback to sub-optimal actions, negative feedback to the optimal actions and even do not provide feedback in some confusing situations. Existing works, which utilize the Expectation Maximization (EM) algorithm and treat the feedback model as hidden parameters, do not consider uncertainties in the learning environment and human feedback. To address this challenge, we introduce a novel feedback model that considers the uncertainty of human feedback. However, this incurs intractable calculus in the EM algorithm. To this end, we propose a novel approximate EM algorithm, in which we approximate the expectation step with the Gradient Descent method. Experimental results in both synthetic scenarios and two real-world scenarios with human participants demonstrate the superior performance of our proposed approach.

翻译：人类的反馈被广泛用于在许多领域培训代理人。然而,以前的工作很少考虑当人类提供反馈时的不确定性,特别是当最佳行动对培训员来说并不明显时。例如,亚最佳行动的奖励可能是随机的,有时会超过最佳行动的奖励,这在游戏或现实世界中是常见的。培训者可能会为次优的行动提供积极的反馈,对最佳行动的消极反馈,甚至在某些混乱的情况下甚至不会提供反馈。利用现有的工作,利用期望最大化算法,将反馈模型作为隐性参数处理,不考虑学习环境和人类反馈中的不确定性。为了应对这一挑战,我们引入了一个新的反馈模型,考虑人类反馈的不确定性。然而,这在EM算法中造成了难以控制的微积分。为此,我们提出了一种新颖的EM算法,其中我们将期望与梯族法相近。在合成情景和两种真实世界情景中与人类参与者的实验结果显示了我们拟议方法的优异性表现。

0

相关内容

优化器

可解释强化学习，Explainable Reinforcement Learning: A Survey

可解释强化学习，Explainable Reinforcement Learning: A Survey

专知会员服务

131+阅读 · 2020年5月14日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

【反馈循环自编码器】FEEDBACK RECURRENT AUTOENCODER

【反馈循环自编码器】FEEDBACK RECURRENT AUTOENCODER

专知会员服务

23+阅读 · 2020年1月28日

【WSDM 2020】RecVAE:一种新的变分自编码器，用于具有隐式反馈的Top-N推荐（RecVAE: a New Variational Autoencoder for Top-NRecommendations with Implicit Feedback）

【WSDM 2020】RecVAE:一种新的变分自编码器，用于具有隐式反馈的Top-N推荐（RecVAE: a New Variational Autoencoder for Top-NRecommendations with Implicit Feedback）

专知会员服务

32+阅读 · 2019年12月26日

【贝叶斯规则因果推理】《Causal Inference with Bayes Rule》by Finn Lattimore, David Rohde

【贝叶斯规则因果推理】《Causal Inference with Bayes Rule》by Finn Lattimore, David Rohde

专知会员服务

46+阅读 · 2019年12月13日

【伯克利】机器学习诊断偏倚，Diagnosing bias with machine learning（附pdf链接）

【伯克利】机器学习诊断偏倚，Diagnosing bias with machine learning（附pdf链接）

专知会员服务

11+阅读 · 2019年11月30日

动手学深度学习Dive into Deep Learning中英文版本（附全套代码）

动手学深度学习Dive into Deep Learning中英文版本（附全套代码）

专知会员服务

110+阅读 · 2019年10月26日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Finding Needles in a Moving Haystack: Prioritizing Alerts with Adversarial Reinforcement Learning

Finding Needles in a Moving Haystack: Prioritizing Alerts with Adversarial Reinforcement Learning

Arxiv

3+阅读 · 2019年6月20日

Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation

Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation

Arxiv

9+阅读 · 2018年11月25日

Reward learning from human preferences and demonstrations in Atari

Arxiv

8+阅读 · 2018年11月15日

Learning under Misspecified Objective Spaces

Arxiv

3+阅读 · 2018年10月11日

Reinforcement Learning with Perturbed Rewards

Arxiv

4+阅读 · 2018年10月5日

End-to-end Active Object Tracking via Reinforcement Learning

Arxiv

3+阅读 · 2018年6月1日

Addressing the Item Cold-start Problem by Attribute-driven Active Learning

Arxiv

8+阅读 · 2018年5月23日

Dialogue Learning with Human Teaching and Feedback in End-to-End Trainable Task-Oriented Dialogue Systems

Arxiv

6+阅读 · 2018年4月18日

Collaborative Filtering with Topic and Social Latent Factors Incorporating Implicit Feedback

Arxiv

7+阅读 · 2018年3月26日

Safety-aware Adaptive Reinforcement Learning with Applications to Brushbot Navigation

Arxiv

4+阅读 · 2018年1月29日

VIP会员

文章信息

相关主题

相关VIP内容

可解释强化学习，Explainable Reinforcement Learning: A Survey

可解释强化学习，Explainable Reinforcement Learning: A Survey

专知会员服务

131+阅读 · 2020年5月14日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

【反馈循环自编码器】FEEDBACK RECURRENT AUTOENCODER

【反馈循环自编码器】FEEDBACK RECURRENT AUTOENCODER

专知会员服务

23+阅读 · 2020年1月28日

【WSDM 2020】RecVAE:一种新的变分自编码器，用于具有隐式反馈的Top-N推荐（RecVAE: a New Variational Autoencoder for Top-NRecommendations with Implicit Feedback）

【WSDM 2020】RecVAE:一种新的变分自编码器，用于具有隐式反馈的Top-N推荐（RecVAE: a New Variational Autoencoder for Top-NRecommendations with Implicit Feedback）

专知会员服务

32+阅读 · 2019年12月26日

【贝叶斯规则因果推理】《Causal Inference with Bayes Rule》by Finn Lattimore, David Rohde

【贝叶斯规则因果推理】《Causal Inference with Bayes Rule》by Finn Lattimore, David Rohde

专知会员服务

46+阅读 · 2019年12月13日

【伯克利】机器学习诊断偏倚，Diagnosing bias with machine learning（附pdf链接）

【伯克利】机器学习诊断偏倚，Diagnosing bias with machine learning（附pdf链接）

专知会员服务

11+阅读 · 2019年11月30日

动手学深度学习Dive into Deep Learning中英文版本（附全套代码）

动手学深度学习Dive into Deep Learning中英文版本（附全套代码）

专知会员服务

110+阅读 · 2019年10月26日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

从面向科学的人工智能到智能体科学：自主科学发现综述

面向具身智能的多模态数据存储与检索：综述

【斯坦福博士论文】迈向用于序列建模的结构化智能

【CMU博士论文】水下三维视觉感知与生成

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Finding Needles in a Moving Haystack: Prioritizing Alerts with Adversarial Reinforcement Learning

Finding Needles in a Moving Haystack: Prioritizing Alerts with Adversarial Reinforcement Learning

Arxiv

3+阅读 · 2019年6月20日

Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation

Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation

Arxiv

9+阅读 · 2018年11月25日

Reward learning from human preferences and demonstrations in Atari

Arxiv

8+阅读 · 2018年11月15日

Learning under Misspecified Objective Spaces

Arxiv

3+阅读 · 2018年10月11日

Reinforcement Learning with Perturbed Rewards

Arxiv

4+阅读 · 2018年10月5日

End-to-end Active Object Tracking via Reinforcement Learning

Arxiv

3+阅读 · 2018年6月1日

Addressing the Item Cold-start Problem by Attribute-driven Active Learning

Arxiv

8+阅读 · 2018年5月23日

Dialogue Learning with Human Teaching and Feedback in End-to-End Trainable Task-Oriented Dialogue Systems

Arxiv

6+阅读 · 2018年4月18日

Collaborative Filtering with Topic and Social Latent Factors Incorporating Implicit Feedback

Arxiv

7+阅读 · 2018年3月26日

Safety-aware Adaptive Reinforcement Learning with Applications to Brushbot Navigation

Arxiv

4+阅读 · 2018年1月29日

微信扫码咨询专知VIP会员