探究培训过的智能体可能的权力追求行为及其预测性 (Power-seeking can be probable and predictive for trained agents) - 专知论文

会员服务 ·

0

智能体 · 情境 · 奖励函数 · 一致 · 人工智能 ·

2023 年 4 月 13 日

Power-seeking can be probable and predictive for trained agents

翻译：探究培训过的智能体可能的权力追求行为及其预测性

Victoria Krakovna,Janos Kramar

Power-seeking behavior is a key source of risk from advanced AI, but our theoretical understanding of this phenomenon is relatively limited. Building on existing theoretical results demonstrating power-seeking incentives for most reward functions, we investigate how the training process affects power-seeking incentives and show that they are still likely to hold for trained agents under some simplifying assumptions. We formally define the training-compatible goal set (the set of goals consistent with the training rewards) and assume that the trained agent learns a goal from this set. In a setting where the trained agent faces a choice to shut down or avoid shutdown in a new situation, we prove that the agent is likely to avoid shutdown. Thus, we show that power-seeking incentives can be probable (likely to arise for trained agents) and predictive (allowing us to predict undesirable behavior in new situations).

翻译：---- 权力追求行为是高级人工智能风险的关键来源，但我们对此现象的理论了解相对有限。在现有理论结果的基础上，证明了大部分奖励函数会激励智能体追求权力，我们研究了训练过程对权力追求激励的影响，并证明这些激励在某些简化的假设下，依然有可能适用于培训过的智能体。我们正式定义了与培训奖励一致的目标集合，假设培训过的智能体从该集合中学习目标。在一个情境中，当训练过的智能体面临关闭或避免关闭的选择时，我们证明智能体很有可能避免关闭。因此，我们展示了权力追求激励可能是可预测的，允许我们在新情境中预测不良行为。

1

相关内容

智能体

智能体，顾名思义，就是具有智能的实体，英文名是Agent。

JCIM丨DRlinker：深度强化学习优化片段连接设计

JCIM丨DRlinker：深度强化学习优化片段连接设计

专知会员服务

7+阅读 · 2022年12月9日

IJCAI2022开会了! Brescia等《证据推理和学习》教程，阐述其最新进展，附96页Slides

IJCAI2022开会了! Brescia等《证据推理和学习》教程，阐述其最新进展，附96页Slides

专知会员服务

25+阅读 · 2022年7月26日

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

23+阅读 · 2022年3月19日

【ICML2021】异质风险最小化，Heterogeneous Risk Minimization

专知会员服务

16+阅读 · 2021年5月21日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

开放知识图谱

2+阅读 · 2022年5月20日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

19篇ICML2019论文摘录选读！

19篇ICML2019论文摘录选读！

专知

28+阅读 · 2019年4月28日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

专知

30+阅读 · 2018年3月22日

【论文推荐】最新7篇聊天机器人（Chatbot）相关论文—触动你的心、DeepProbe、饮食推荐、知识学习、交互、挑战、管理

【论文推荐】最新7篇聊天机器人（Chatbot）相关论文—触动你的心、DeepProbe、饮食推荐、知识学习、交互、挑战、管理

专知

12+阅读 · 2018年3月15日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

44+阅读 · 2015年12月31日

拓扑绝缘体人工微结构材料的热电效应与原型器件研究

国家自然科学基金

0+阅读 · 2014年12月31日

肿瘤抗原HCA587与STAT3的相互作用及其促进肿瘤转移的分子机制研究

国家自然科学基金

1+阅读 · 2014年12月31日

Rb/E2F1通路调控细胞增殖和凋亡的动力学机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

氮氟掺杂石墨烯量子点的控制合成及其光致发光调控研究

国家自然科学基金

0+阅读 · 2012年12月31日

高饱和磁化强度和高居里温度Ge基磁性半导体、异质结的制备及其磁性和电输运性质研究

国家自然科学基金

0+阅读 · 2012年12月31日

不对称聚合物囊泡的构筑及其行为和功能

国家自然科学基金

0+阅读 · 2012年12月31日

αctinin 4介导NHERF1调节细胞微丝骨架及其对肿瘤细胞黏附与迁移的影响

国家自然科学基金

0+阅读 · 2011年12月31日

MAWD/MAWBP复合体调节TGF-beta通路的机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

组蛋白乙酰化/去乙酰化对Myocardin诱导的心肌肥厚影响及机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

GAN-MPC: Training Model Predictive Controllers with Parameterized Cost Functions using Demonstrations from Non-identical Experts

Arxiv

0+阅读 · 2023年5月30日

What is Essential for Unseen Goal Generalization of Offline Goal-conditioned RL?

Arxiv

0+阅读 · 2023年5月30日

Experience Filter: Using Past Experiences on Unseen Tasks or Environments

Arxiv

0+阅读 · 2023年5月29日

Privileged Knowledge Distillation for Sim-to-Real Policy Generalization

Arxiv

0+阅读 · 2023年5月29日

RL + Model-based Control: Using On-demand Optimal Control to Learn Versatile Legged Locomotion

Arxiv

0+阅读 · 2023年5月29日

GAME-UP: Game-Aware Mode Enumeration and Understanding for Trajectory Prediction

Arxiv

0+阅读 · 2023年5月28日

A Model-Based Solution to the Offline Multi-Agent Reinforcement Learning Coordination Problem

Arxiv

0+阅读 · 2023年5月26日

Online Dynamic Acknowledgement with Learned Predictions

Arxiv

0+阅读 · 2023年5月25日

Dynamic neighbourhood optimisation for task allocation using multi-agent

Arxiv

101+阅读 · 2022年5月11日

DeepPath: A Reinforcement Learning Method for Knowledge Graph Reasoning

Arxiv

20+阅读 · 2018年1月8日

VIP会员

文章信息

相关主题

相关VIP内容

JCIM丨DRlinker：深度强化学习优化片段连接设计

JCIM丨DRlinker：深度强化学习优化片段连接设计

专知会员服务

7+阅读 · 2022年12月9日

IJCAI2022开会了! Brescia等《证据推理和学习》教程，阐述其最新进展，附96页Slides

IJCAI2022开会了! Brescia等《证据推理和学习》教程，阐述其最新进展，附96页Slides

专知会员服务

25+阅读 · 2022年7月26日

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

23+阅读 · 2022年3月19日

【ICML2021】异质风险最小化，Heterogeneous Risk Minimization

专知会员服务

16+阅读 · 2021年5月21日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

人工智能驱动弹药制造现代化：美国陆军转型之路

《敏捷作战部署中枢纽-辐条基地选址优化研究》80页

构建军事人工智能信任体系始于破除黑盒机制

《乌克兰无人机产业：志愿者与政策在构建新兴无人机产业中的协同作用》最新报告

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

开放知识图谱

2+阅读 · 2022年5月20日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

19篇ICML2019论文摘录选读！

19篇ICML2019论文摘录选读！

专知

28+阅读 · 2019年4月28日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

专知

30+阅读 · 2018年3月22日

【论文推荐】最新7篇聊天机器人（Chatbot）相关论文—触动你的心、DeepProbe、饮食推荐、知识学习、交互、挑战、管理

【论文推荐】最新7篇聊天机器人（Chatbot）相关论文—触动你的心、DeepProbe、饮食推荐、知识学习、交互、挑战、管理

专知

12+阅读 · 2018年3月15日

相关论文

GAN-MPC: Training Model Predictive Controllers with Parameterized Cost Functions using Demonstrations from Non-identical Experts

Arxiv

0+阅读 · 2023年5月30日

What is Essential for Unseen Goal Generalization of Offline Goal-conditioned RL?

Arxiv

0+阅读 · 2023年5月30日

Experience Filter: Using Past Experiences on Unseen Tasks or Environments

Arxiv

0+阅读 · 2023年5月29日

Privileged Knowledge Distillation for Sim-to-Real Policy Generalization

Arxiv

0+阅读 · 2023年5月29日

RL + Model-based Control: Using On-demand Optimal Control to Learn Versatile Legged Locomotion

Arxiv

0+阅读 · 2023年5月29日

GAME-UP: Game-Aware Mode Enumeration and Understanding for Trajectory Prediction

Arxiv

0+阅读 · 2023年5月28日

A Model-Based Solution to the Offline Multi-Agent Reinforcement Learning Coordination Problem

Arxiv

0+阅读 · 2023年5月26日

Online Dynamic Acknowledgement with Learned Predictions

Arxiv

0+阅读 · 2023年5月25日

Dynamic neighbourhood optimisation for task allocation using multi-agent

Arxiv

101+阅读 · 2022年5月11日

DeepPath: A Reinforcement Learning Method for Knowledge Graph Reasoning

Arxiv

20+阅读 · 2018年1月8日

相关基金

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

44+阅读 · 2015年12月31日

拓扑绝缘体人工微结构材料的热电效应与原型器件研究

国家自然科学基金

0+阅读 · 2014年12月31日

肿瘤抗原HCA587与STAT3的相互作用及其促进肿瘤转移的分子机制研究

国家自然科学基金

1+阅读 · 2014年12月31日

Rb/E2F1通路调控细胞增殖和凋亡的动力学机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

氮氟掺杂石墨烯量子点的控制合成及其光致发光调控研究

国家自然科学基金

0+阅读 · 2012年12月31日

高饱和磁化强度和高居里温度Ge基磁性半导体、异质结的制备及其磁性和电输运性质研究

国家自然科学基金

0+阅读 · 2012年12月31日

不对称聚合物囊泡的构筑及其行为和功能

国家自然科学基金

0+阅读 · 2012年12月31日

αctinin 4介导NHERF1调节细胞微丝骨架及其对肿瘤细胞黏附与迁移的影响

国家自然科学基金

0+阅读 · 2011年12月31日

MAWD/MAWBP复合体调节TGF-beta通路的机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

组蛋白乙酰化/去乙酰化对Myocardin诱导的心肌肥厚影响及机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员