有能力进行自我修改的边界代理人的性能 (Performance of Bounded-Rational Agents With the Ability to Self-Modify) - 专知论文

会员服务 ·

0

Performer · 回合 · 衰减系数 · CASES · MoDELS ·

2021 年 1 月 18 日

Performance of Bounded-Rational Agents With the Ability to Self-Modify

翻译：有能力进行自我修改的边界代理人的性能

Jakub Tětek,Marek Sklenka,Tomáš Gavenčiak

from arxiv, Fixed minor problems; To appear in SafeAI @ AAAI 2021

Self-modification of agents embedded in complex environments is hard to avoid, whether it happens via direct means (e.g. own code modification) or indirectly (e.g. influencing the operator, exploiting bugs or the environment). It has been argued that intelligent agents have an incentive to avoid modifying their utility function so that their future instances work towards the same goals. Everitt et al. (2016) formally show that providing an option to self-modify is harmless for perfectly rational agents. We show that this result is no longer true for agents with bounded rationality. In such agents, self-modification may cause exponential deterioration in performance and gradual misalignment of a previously aligned agent. We investigate how the size of this effect depends on the type and magnitude of imperfections in the agent's rationality (1-4 below). We also discuss model assumptions and the wider problem and framing space. We examine four ways in which an agent can be bounded-rational: it either (1) doesn't always choose the optimal action, (2) is not perfectly aligned with human values, (3) has an inaccurate model of the environment, or (4) uses the wrong temporal discounting factor. We show that while in the cases (2)-(4) the misalignment caused by the agent's imperfection does not increase over time, with (1) the misalignment may grow exponentially.

翻译：复杂的环境中嵌入物剂的自我调整很难避免,无论是直接手段(如自己的编码修改)还是间接手段(如影响操作者、利用虫子或环境)或间接手段(如影响操作者、利用虫子或环境)发生。据认为,智能剂有避免改变其效用功能的动机,以便其未来的事例有利于同一目标。 Everitt 等人(2016年)正式表明,为自我调整提供选择对完全理性物剂是无害的。我们表明,对于受约束的物剂来说,这一结果已不再是真实的。在此种物剂中,自我调整可能导致性能迅速恶化,并逐渐使先前与之结盟的物剂逐渐不匹配。我们调查这种效果的规模如何取决于该物剂理性性(下文第1-4段)的不完善性的类型和程度。我们还讨论模型假设和更广泛的问题和框架空间。我们研究了一种方法,即一个物剂可以相互连接,即不是总选择最佳行动,就是不完全符合人类价值观的物理,就是有不精确的环境模型,或者(4)使用错误的时间折误的物剂。我们调查该物态因素导致不精确的时差因素。

0

相关内容

Performer

深度学习自然语言处理综述论文，Natural Language Processing Advancements By Deep Learning: A Survey

深度学习自然语言处理综述论文，Natural Language Processing Advancements By Deep Learning: A Survey

专知会员服务

80+阅读 · 2020年3月5日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

Successor representations 强化学习表示的生物学启发

Successor representations 强化学习表示的生物学启发

CreateAMind

6+阅读 · 2019年9月5日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

已删除

将门创投

5+阅读 · 2018年3月21日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

【推荐】树莓派/OpenCV/dlib人脸定位/瞌睡检测

【推荐】树莓派/OpenCV/dlib人脸定位/瞌睡检测

机器学习研究会

9+阅读 · 2017年10月24日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Learning the aerodynamic design of supercritical airfoils through deep reinforcement learning

Arxiv

0+阅读 · 2021年3月12日

Efficiently Breaking the Curse of Horizon in Off-Policy Evaluation with Double Reinforcement Learning

Arxiv

0+阅读 · 2021年3月11日

A Quadratic Actor Network for Model-Free Reinforcement Learning

Arxiv

0+阅读 · 2021年3月11日

Utility of Traffic Information in Dynamic Routing: Is Sharing Information Always Useful?

Arxiv

0+阅读 · 2021年3月11日

"This Browser is Lightning Fast": The Effects of Message Content on Perceived Performance

"This Browser is Lightning Fast": The Effects of Message Content on Perceived Performance

Arxiv

1+阅读 · 2021年3月10日

Learning from Imperfect Demonstrations from Agents with Varying Dynamics

Arxiv

0+阅读 · 2021年3月10日

On Modeling Human Perceptions of Allocation Policies with Uncertain Outcomes

Arxiv

0+阅读 · 2021年3月10日

Reward learning from human preferences and demonstrations in Atari

Arxiv

8+阅读 · 2018年11月15日

Do deep reinforcement learning agents model intentions?

Arxiv

5+阅读 · 2018年5月21日

Modeling Others using Oneself in Multi-Agent Reinforcement Learning

Arxiv

4+阅读 · 2018年3月22日

VIP会员

文章信息

相关主题

相关VIP内容

深度学习自然语言处理综述论文，Natural Language Processing Advancements By Deep Learning: A Survey

深度学习自然语言处理综述论文，Natural Language Processing Advancements By Deep Learning: A Survey

专知会员服务

80+阅读 · 2020年3月5日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】低维与高维空间中潜在表征的分析、建模与变换

《生态建模密码破译：建模与编程实践》美陆军最新报告

大模型解决方案白皮书：社交陪伴场景全流程落地指南

面向具身操作的视觉-语言-动作模型综述

相关资讯

Successor representations 强化学习表示的生物学启发

Successor representations 强化学习表示的生物学启发

CreateAMind

6+阅读 · 2019年9月5日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

已删除

将门创投

5+阅读 · 2018年3月21日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

【推荐】树莓派/OpenCV/dlib人脸定位/瞌睡检测

【推荐】树莓派/OpenCV/dlib人脸定位/瞌睡检测

机器学习研究会

9+阅读 · 2017年10月24日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

相关论文

Learning the aerodynamic design of supercritical airfoils through deep reinforcement learning

Arxiv

0+阅读 · 2021年3月12日

Efficiently Breaking the Curse of Horizon in Off-Policy Evaluation with Double Reinforcement Learning

Arxiv

0+阅读 · 2021年3月11日

A Quadratic Actor Network for Model-Free Reinforcement Learning

Arxiv

0+阅读 · 2021年3月11日

Utility of Traffic Information in Dynamic Routing: Is Sharing Information Always Useful?

Arxiv

0+阅读 · 2021年3月11日

"This Browser is Lightning Fast": The Effects of Message Content on Perceived Performance

"This Browser is Lightning Fast": The Effects of Message Content on Perceived Performance

Arxiv

1+阅读 · 2021年3月10日

Learning from Imperfect Demonstrations from Agents with Varying Dynamics

Arxiv

0+阅读 · 2021年3月10日

On Modeling Human Perceptions of Allocation Policies with Uncertain Outcomes

Arxiv

0+阅读 · 2021年3月10日

Reward learning from human preferences and demonstrations in Atari

Arxiv

8+阅读 · 2018年11月15日

Do deep reinforcement learning agents model intentions?

Arxiv

5+阅读 · 2018年5月21日

Modeling Others using Oneself in Multi-Agent Reinforcement Learning

Arxiv

4+阅读 · 2018年3月22日

微信扫码咨询专知VIP会员