鲍勃和爱丽丝去酒吧:用概率方案为未来找理由 (Bob and Alice Go to a Bar: Reasoning About Future With Probabilistic Programs) - 专知论文

会员服务 ·

0

推断 · 强化学习 · MoDELS · 学成 · 贝叶斯推断 ·

2021 年 10 月 6 日

Bob and Alice Go to a Bar: Reasoning About Future With Probabilistic Programs

翻译：鲍勃和爱丽丝去酒吧:用概率方案为未来找理由

David Tolpin,Tomer Dobkin

from arxiv, 31 pages, 9 figures, 2 tables

It is well known that reinforcement learning can be cast as inference in an appropriate probabilistic model. However, this commonly involves introducing a distribution over agent trajectories with probabilities proportional to exponentiated rewards. In this work, we formulate reinforcement learning as Bayesian inference without resorting to rewards, and show that rewards are derived from agent's preferences, rather than the other way around. We argue that agent preferences should be specified stochastically rather than deterministically. Reinforcement learning via inference with stochastic preferences naturally describes agent behaviors, does not require introducing rewards and exponential weighing of trajectories, and allows to reason about agents using the solid foundation of Bayesian statistics. Stochastic conditioning, a probabilistic programming paradigm for conditioning models on distributions rather than values, is the formalism behind agents with probabilistic preferences. We demonstrate realization of our approach on case studies using both a two-agent coordinate game and a single agent acting in a noisy environment, showing that despite superficial differences, both cases can be modeled and reasoned about based on the same principles.

翻译：众所周知,在适当的概率模型中,强化学习可以作为适当的概率模型的推论。然而,这通常涉及对代理人轨迹的分布,其概率与推算奖励的概率成比例。在这项工作中,我们将强化学习作为贝叶斯人的推论,而不诉诸于奖励,并表明奖励来自代理人的偏好,而不是其他方式。我们主张代理人的偏好应当以分法而不是以其他方式加以规定。我们主张,代理人的偏好应当以分法而不是以确定性的方式加以规定。通过以随机偏好的推论来强化学习自然描述代理人的行为,并不要求引入奖励和加速称重的轨迹,而是允许利用贝叶斯人的坚实数据基础来解释代理人的道理。调控调模式而不是以价值来调节模式的概率方案范式,是具有概率偏好倾向的代理人背后的形式主义。我们证明,我们通过使用双试调游戏和一个在紧张的环境中行事的单一代理人,实现了案例研究方法,表明,尽管存在表面差异,但两个案件都可以根据同一原则进行模拟和推理。

0

相关内容

【杜克-Bhuwan Dhingra】语言模型即知识图谱，46页ppt

【杜克-Bhuwan Dhingra】语言模型即知识图谱，46页ppt

专知会员服务

67+阅读 · 2021年11月15日

深度概率图模型，Deep Probabilistic Models

专知会员服务

29+阅读 · 2021年8月2日

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

专知会员服务

52+阅读 · 2020年6月1日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【论文】用于推理的概率逻辑神经网络（Probabilistic Logic Neural Networks for Reasoning）

【论文】用于推理的概率逻辑神经网络（Probabilistic Logic Neural Networks for Reasoning）

专知会员服务

104+阅读 · 2019年12月30日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

197+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

【AAMSA 2019 | tutorial】多智能体系统中的认知推理Epistemic Reasoning In Multiagent Systems ,法国雷恩François Schwarzentruber

【AAMSA 2019 | tutorial】多智能体系统中的认知推理Epistemic Reasoning In Multiagent Systems ,法国雷恩François Schwarzentruber

专知会员服务

24+阅读 · 2019年5月14日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

CCF C类 | DSAA 2019 诚邀稿件

CCF C类 | DSAA 2019 诚邀稿件

Call4Papers

6+阅读 · 2019年5月13日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

已删除

将门创投

3+阅读 · 2017年9月12日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Separating k-Player from t-Player One-Way Communication, with Applications to Data Streams

Arxiv

0+阅读 · 2021年11月30日

A Novel Occupancy Mapping Framework for Risk-Aware Path Planning in Unstructured Environments

Arxiv

0+阅读 · 2021年11月30日

A framework to measure the robustness of programs in the unpredictable environment

Arxiv

0+阅读 · 2021年11月30日

TaDA Live: Compositional Reasoning for Termination of Fine-grained Concurrent Programs

Arxiv

0+阅读 · 2021年11月30日

A Separation Logic for Negative Dependence

Arxiv

0+阅读 · 2021年11月29日

Distributionally robust possibilistic optimization problems

Arxiv

0+阅读 · 2021年11月28日

Is Causal Reasoning Harder than Probabilistic Reasoning?

Arxiv

0+阅读 · 2021年11月27日

RNN with Particle Flow for Probabilistic Spatio-temporal Forecasting

Arxiv

5+阅读 · 2021年6月10日

Inference in Probabilistic Graphical Models by Graph Neural Networks

Arxiv

3+阅读 · 2018年5月25日

A Causal And-Or Graph Model for Visibility Fluent Reasoning in Tracking Interacting Objects

Arxiv

6+阅读 · 2018年3月29日

VIP会员

文章信息

相关主题

贝叶斯推断

相关VIP内容

【杜克-Bhuwan Dhingra】语言模型即知识图谱，46页ppt

【杜克-Bhuwan Dhingra】语言模型即知识图谱，46页ppt

专知会员服务

67+阅读 · 2021年11月15日

深度概率图模型，Deep Probabilistic Models

专知会员服务

29+阅读 · 2021年8月2日

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

专知会员服务

52+阅读 · 2020年6月1日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【论文】用于推理的概率逻辑神经网络（Probabilistic Logic Neural Networks for Reasoning）

【论文】用于推理的概率逻辑神经网络（Probabilistic Logic Neural Networks for Reasoning）

专知会员服务

104+阅读 · 2019年12月30日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

197+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

【AAMSA 2019 | tutorial】多智能体系统中的认知推理Epistemic Reasoning In Multiagent Systems ,法国雷恩François Schwarzentruber

【AAMSA 2019 | tutorial】多智能体系统中的认知推理Epistemic Reasoning In Multiagent Systems ,法国雷恩François Schwarzentruber

专知会员服务

24+阅读 · 2019年5月14日

热门VIP内容

开通专知VIP会员享更多权益服务

《美国海军陆战队软件定义网络应用案例：分布式防火墙自动化系统》148页

《多体环境下定位导航授时（PNT）系统研究》228页

软件定义无线电（SDR）：商业与军事领域的技术、应用及未来趋势

《攻势防空作战中无人追击者/规避者最优轨迹研究（含动态交战区建模）》95页

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

CCF C类 | DSAA 2019 诚邀稿件

CCF C类 | DSAA 2019 诚邀稿件

Call4Papers

6+阅读 · 2019年5月13日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

已删除

将门创投

3+阅读 · 2017年9月12日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Separating k-Player from t-Player One-Way Communication, with Applications to Data Streams

Arxiv

0+阅读 · 2021年11月30日

A Novel Occupancy Mapping Framework for Risk-Aware Path Planning in Unstructured Environments

Arxiv

0+阅读 · 2021年11月30日

A framework to measure the robustness of programs in the unpredictable environment

Arxiv

0+阅读 · 2021年11月30日

TaDA Live: Compositional Reasoning for Termination of Fine-grained Concurrent Programs

Arxiv

0+阅读 · 2021年11月30日

A Separation Logic for Negative Dependence

Arxiv

0+阅读 · 2021年11月29日

Distributionally robust possibilistic optimization problems

Arxiv

0+阅读 · 2021年11月28日

Is Causal Reasoning Harder than Probabilistic Reasoning?

Arxiv

0+阅读 · 2021年11月27日

RNN with Particle Flow for Probabilistic Spatio-temporal Forecasting

Arxiv

5+阅读 · 2021年6月10日

Inference in Probabilistic Graphical Models by Graph Neural Networks

Arxiv

3+阅读 · 2018年5月25日

A Causal And-Or Graph Model for Visibility Fluent Reasoning in Tracking Interacting Objects

Arxiv

6+阅读 · 2018年3月29日

微信扫码咨询专知VIP会员