由人指导的人类-海洋与私人信息互动的离线强化学习 (Offline Reinforcement Learning for Human-Guided Human-Machine Interaction with Private Information) - 专知论文

会员服务 ·

0

INTERACT · Learning · Pair · INFORMS · 有偏 ·

2022 年 12 月 23 日

Offline Reinforcement Learning for Human-Guided Human-Machine Interaction with Private Information

翻译：由人指导的人类-海洋与私人信息互动的离线强化学习

Zuyue Fu,Zhengling Qi,Zhuoran Yang,Zhaoran Wang,Lan Wang

Motivated by the human-machine interaction such as training chatbots for improving customer satisfaction, we study human-guided human-machine interaction involving private information. We model this interaction as a two-player turn-based game, where one player (Alice, a human) guides the other player (Bob, a machine) towards a common goal. Specifically, we focus on offline reinforcement learning (RL) in this game, where the goal is to find a policy pair for Alice and Bob that maximizes their expected total rewards based on an offline dataset collected a priori. The offline setting presents two challenges: (i) We cannot collect Bob's private information, leading to a confounding bias when using standard RL methods, and (ii) a distributional mismatch between the behavior policy used to collect data and the desired policy we aim to learn. To tackle the confounding bias, we treat Bob's previous action as an instrumental variable for Alice's current decision making so as to adjust for the unmeasured confounding. We develop a novel identification result and use it to propose a new off-policy evaluation (OPE) method for evaluating policy pairs in this two-player turn-based game. To tackle the distributional mismatch, we leverage the idea of pessimism and use our OPE method to develop an off-policy learning algorithm for finding a desirable policy pair for both Alice and Bob. Finally, we prove that under mild assumptions such as partial coverage of the offline data, the policy pair obtained through our method converges to the optimal one at a satisfactory rate.

翻译：受人机互动的驱动,例如为提高客户满意度而培训闲聊爱好者等,我们研究由人指导的涉及私人信息的人类机器互动。我们把这种互动模拟作为双玩者翻转游戏的模式,在这个游戏中,一个玩家(爱丽丝,一个人)引导另一个玩家(鲍勃,一个机器)走向一个共同目标。具体地说,我们注重在这个游戏中进行离线强化学习(RL),其目标是为爱丽丝和鲍勃找到一对政策配对,在离线数据集的基础上,最大限度地实现预期的总回报。离线设置带来了两个挑战:(一) 我们无法收集鲍勃的私人信息,导致在使用标准的 RL 方法时出现一种折合的偏向偏向偏向偏向的偏向。具体地,我们把Bob的先前行动当作一个工具变量,用于爱丽丝当前决策的调整,以适应未测得的对等。我们开发了新式的识别结果,并用它来提出一个新的离线政策缩策略评价方法,在使用双向双向双向方向上,我们用一种选择的游戏政策推算方法来,用一种双向双向方向分析。

0

相关内容

INTERACT

IFIP TC13 Conference on Human-Computer Interaction是人机交互领域的研究者和实践者展示其工作的重要平台。多年来，这些会议吸引了来自几个国家和文化的研究人员。官网链接：http://interact2019.org/

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

23+阅读 · 2022年3月19日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

胚乳在水曲柳种子热休眠调控中的作用及其生理机制解析

国家自然科学基金

0+阅读 · 2016年12月31日

PM2.5关键成核物--低挥发性化合物在多孔材料中的吸附特性研究

国家自然科学基金

0+阅读 · 2014年12月31日

MicroRNA调控BACE1在AD发病中的作用与机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

LOC283683-NIPA1-BMPRII途径对胆固醇平衡和动脉粥样硬化的影响及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

RNA结合蛋白HuR在磷酸鞘胺醇诱导的骨髓间充质干细胞迁移中的作用和机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

5HRE与CEAp联合调控抑癌基因RASSF1A系统治疗CEA阳性肿瘤的基础研究

国家自然科学基金

0+阅读 · 2011年12月31日

mTOR激活对吗啡耐受的调控及其分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

环境诱导家蚕滞育的CREB调控机制

国家自然科学基金

0+阅读 · 2011年12月31日

La(2-x)GdxHf2O7:RE新型闪烁透明陶瓷的晶体结构和闪烁性能研究

国家自然科学基金

0+阅读 · 2011年12月31日

组蛋白乙酰化/去乙酰化对Myocardin诱导的心肌肥厚影响及机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

Multi-Agent Reinforcement Learning for Adaptive Mesh Refinement

Arxiv

0+阅读 · 2023年2月23日

Provable Benefits of Representational Transfer in Reinforcement Learning

Arxiv

0+阅读 · 2023年2月22日

Near-Optimal Differentially Private Reinforcement Learning

Arxiv

0+阅读 · 2023年2月22日

Deep Reinforcement Learning for Cost-Effective Medical Diagnosis

Arxiv

0+阅读 · 2023年2月20日

Uncertainty-Aware Reward-based Deep Reinforcement Learning for Intent Analysis of Social Media Information

Arxiv

0+阅读 · 2023年2月19日

A Survey of Meta-Reinforcement Learning

Arxiv

12+阅读 · 2023年1月19日

Deep Reinforcement Learning for Multi-Agent Interaction

Arxiv

45+阅读 · 2022年8月2日

Reinforcement Learning on Graph: A Survey

Arxiv

67+阅读 · 2022年4月13日

Recent Advances in Reinforcement Learning in Finance

Arxiv

11+阅读 · 2021年12月8日

MetaCURE: Meta Reinforcement Learning with Empowerment-Driven Exploration

Arxiv

12+阅读 · 2021年2月7日

VIP会员

文章信息

相关主题

相关VIP内容

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

23+阅读 · 2022年3月19日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《复杂工程系统模型驱动设计决策支持系统：早期设计阶段挑战》最新138页

《日本陆上自卫队2040年作战方式与未来作战研究》最新23页slides

人工智能作为战争武器

《后勤保障》最新23页

相关资讯

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

相关论文

Multi-Agent Reinforcement Learning for Adaptive Mesh Refinement

Arxiv

0+阅读 · 2023年2月23日

Provable Benefits of Representational Transfer in Reinforcement Learning

Arxiv

0+阅读 · 2023年2月22日

Near-Optimal Differentially Private Reinforcement Learning

Arxiv

0+阅读 · 2023年2月22日

Deep Reinforcement Learning for Cost-Effective Medical Diagnosis

Arxiv

0+阅读 · 2023年2月20日

Uncertainty-Aware Reward-based Deep Reinforcement Learning for Intent Analysis of Social Media Information

Arxiv

0+阅读 · 2023年2月19日

A Survey of Meta-Reinforcement Learning

Arxiv

12+阅读 · 2023年1月19日

Deep Reinforcement Learning for Multi-Agent Interaction

Arxiv

45+阅读 · 2022年8月2日

Reinforcement Learning on Graph: A Survey

Arxiv

67+阅读 · 2022年4月13日

Recent Advances in Reinforcement Learning in Finance

Arxiv

11+阅读 · 2021年12月8日

MetaCURE: Meta Reinforcement Learning with Empowerment-Driven Exploration

Arxiv

12+阅读 · 2021年2月7日

相关基金

胚乳在水曲柳种子热休眠调控中的作用及其生理机制解析

国家自然科学基金

0+阅读 · 2016年12月31日

PM2.5关键成核物--低挥发性化合物在多孔材料中的吸附特性研究

国家自然科学基金

0+阅读 · 2014年12月31日

MicroRNA调控BACE1在AD发病中的作用与机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

LOC283683-NIPA1-BMPRII途径对胆固醇平衡和动脉粥样硬化的影响及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

RNA结合蛋白HuR在磷酸鞘胺醇诱导的骨髓间充质干细胞迁移中的作用和机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

5HRE与CEAp联合调控抑癌基因RASSF1A系统治疗CEA阳性肿瘤的基础研究

国家自然科学基金

0+阅读 · 2011年12月31日

mTOR激活对吗啡耐受的调控及其分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

环境诱导家蚕滞育的CREB调控机制

国家自然科学基金

0+阅读 · 2011年12月31日

La(2-x)GdxHf2O7:RE新型闪烁透明陶瓷的晶体结构和闪烁性能研究

国家自然科学基金

0+阅读 · 2011年12月31日

组蛋白乙酰化/去乙酰化对Myocardin诱导的心肌肥厚影响及机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员