双层因果多用途发展方案的高效干预算法 (Intervention Efficient Algorithm for Two-Stage Causal MDPs) - 专知论文

会员服务 ·

0

可辨认的 · 图 · 示例 · Atom（文本编辑器） · SimPLe ·

2021 年 11 月 1 日

Intervention Efficient Algorithm for Two-Stage Causal MDPs

翻译：双层因果多用途发展方案的高效干预算法

Rahul Madhavan,Aurghya Maiti,Gaurav Sinha,Siddharth Barman

from arxiv, 29 pages

We study Markov Decision Processes (MDP) wherein states correspond to causal graphs that stochastically generate rewards. In this setup, the learner's goal is to identify atomic interventions that lead to high rewards by intervening on variables at each state. Generalizing the recent causal-bandit framework, the current work develops (simple) regret minimization guarantees for two-stage causal MDPs, with parallel causal graph at each state. We propose an algorithm that achieves an instance dependent regret bound. A key feature of our algorithm is that it utilizes convex optimization to address the exploration problem. We identify classes of instances wherein our regret guarantee is essentially tight, and experimentally validate our theoretical results.

翻译：我们研究Markov决定过程(MDP ), 其中各州对应的因果图表可以产生丰厚的回报。在这个设置中, 学习者的目标是通过干预每个州的变数来确定导致高回报的原子干预。概括最近的因果断层框架, 当前的工作为两阶段因果 MDP 制定了( 简单) 最小化的保证, 在每个州都有平行的因果图表。我们提出了一种实现一个依存的因果图表的算法。我们算法的一个关键特征是它利用 convex 优化来解决勘探问题。我们找出了几类我们的遗憾保证基本上很紧的事例, 实验性地验证了我们的理论结果。

0

相关内容

可辨认的

【因果人工智能系统】106页ppt，Causal AI for Systems

专知会员服务

97+阅读 · 2021年8月28日

【斯坦福新书】决策算法，464页pdf，Algorithms for Decision Making

【斯坦福新书】决策算法，464页pdf，Algorithms for Decision Making

专知会员服务

124+阅读 · 2020年12月7日

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

39+阅读 · 2020年11月3日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

专知会员服务

21+阅读 · 2019年12月2日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习相关资源(框架、库、软件)大列表

机器学习相关资源(框架、库、软件)大列表

专知会员服务

40+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

Escape saddle points by a simple gradient-descent based algorithm

Arxiv

4+阅读 · 2021年11月28日

Multi-task Learning of Order-Consistent Causal Graphs

Arxiv

10+阅读 · 2021年11月3日

Efficient Iterative Amortized Inference for Learning Symmetric and Disentangled Multi-Object Representations

Arxiv

5+阅读 · 2021年6月7日

Network Inference and Influence Maximization from Samples

Arxiv

7+阅读 · 2021年6月7日

Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret

Arxiv

8+阅读 · 2021年4月22日

Efficient Fully-Offline Meta-Reinforcement Learning via Distance Metric Learning and Behavior Regularization

Arxiv

8+阅读 · 2020年11月26日

Causal Discovery with Reinforcement Learning

Arxiv

4+阅读 · 2020年3月19日

A Survey on Causal Inference

Arxiv

112+阅读 · 2020年2月5日

Large-Scale Stochastic Sampling from the Probability Simplex

Arxiv

3+阅读 · 2018年6月19日

The Search Problem in Mixture Models

Arxiv

3+阅读 · 2018年2月24日

VIP会员

文章信息

相关主题

Atom（文本编辑器）

相关VIP内容

【因果人工智能系统】106页ppt，Causal AI for Systems

专知会员服务

97+阅读 · 2021年8月28日

【斯坦福新书】决策算法，464页pdf，Algorithms for Decision Making

【斯坦福新书】决策算法，464页pdf，Algorithms for Decision Making

专知会员服务

124+阅读 · 2020年12月7日

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

39+阅读 · 2020年11月3日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

专知会员服务

21+阅读 · 2019年12月2日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习相关资源(框架、库、软件)大列表

机器学习相关资源(框架、库、软件)大列表

专知会员服务

40+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】数据驱动决策中的激励、信息与不确定性

DGP双粒度提示框架：图增强大模型助力欺诈检测

【ICCV2025】ESSENTIAL：用于视频类增量学习的情景记忆与语义记忆整合

唯快不破：大型语言模型高效架构综述

相关资讯

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Escape saddle points by a simple gradient-descent based algorithm

Arxiv

4+阅读 · 2021年11月28日

Multi-task Learning of Order-Consistent Causal Graphs

Arxiv

10+阅读 · 2021年11月3日

Efficient Iterative Amortized Inference for Learning Symmetric and Disentangled Multi-Object Representations

Arxiv

5+阅读 · 2021年6月7日

Network Inference and Influence Maximization from Samples

Arxiv

7+阅读 · 2021年6月7日

Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret

Arxiv

8+阅读 · 2021年4月22日

Efficient Fully-Offline Meta-Reinforcement Learning via Distance Metric Learning and Behavior Regularization

Arxiv

8+阅读 · 2020年11月26日

Causal Discovery with Reinforcement Learning

Arxiv

4+阅读 · 2020年3月19日

A Survey on Causal Inference

Arxiv

112+阅读 · 2020年2月5日

Large-Scale Stochastic Sampling from the Probability Simplex

Arxiv

3+阅读 · 2018年6月19日

The Search Problem in Mixture Models

Arxiv

3+阅读 · 2018年2月24日

微信扫码咨询专知VIP会员