Bayesian停用时间问题反强化学习的必要和充分条件 (Necessary and Sufficient Conditions for Inverse Reinforcement Learning of Bayesian Stopping Time Problems) - 专知论文

会员服务 ·

0

可辨认的 · 逆强化学习 · 代价函数 · 优化器 · 泛函 ·

2023 年 1 月 17 日

Necessary and Sufficient Conditions for Inverse Reinforcement Learning of Bayesian Stopping Time Problems

翻译：Bayesian停用时间问题反强化学习的必要和充分条件

Kunal Pattanayak,Vikram Krishnamurthy

This paper presents an inverse reinforcement learning~(IRL) framework for Bayesian stopping time problems. By observing the actions of a Bayesian decision maker, we provide a necessary and sufficient condition to identify if these actions are consistent with optimizing a cost function. In a Bayesian (partially observed) setting, the inverse learner can at best identify optimality wrt the observed actions. Our IRL algorithm identifies optimality and then constructs set valued estimates of the cost function. To achieve this IRL objective, we use novel ideas from Bayesian revealed preferences stemming from microeconomics. We illustrate the proposed IRL scheme using two important examples of stopping time problems, namely, sequential hypothesis testing and Bayesian search, and also on a real-world YouTube dataset. Finally, for finite datasets, we propose an IRL detection algorithm and give finite sample bounds on its error probabilities.

翻译：本文为巴伊西亚停止时间问题提供了一个反向强化学习 ~ (IRL) 框架。通过观察巴伊西亚决策者的行动, 我们为确定这些行动是否符合优化成本功能提供了必要和充分的条件。在巴伊西亚( 部分观察的) 设置中, 反向学习者最多可以确定所观察到的行动的最佳性。我们的IRL 算法确定了最佳性, 然后构建了成本函数的估价估计值。为了实现该IRL 目标, 我们使用来自巴伊西亚的新思想揭示了来自微观经济学的偏好。我们用两个重要的例子来说明拟议的IRL 计划, 即连续的假设测试和巴伊西亚搜索, 以及真实世界YouTube数据集。最后, 对于有限的数据集, 我们建议使用IRL 检测算法, 并给出其错误概率的限定样本界限。

0

相关内容

可辨认的

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

44+阅读 · 2020年12月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

MARVELD1基因调控肝细胞癌介入治疗的机制研究

国家自然科学基金

0+阅读 · 2016年12月31日

聚精氨酸诱导肿瘤微环境的免疫活性及逆转cetuximab耐药性的调控机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

细胞周期蛋白Cyclin G1与肿瘤分子靶向治疗诱导多倍体耐药的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

长链非编码RNA-AALT1在海生素诱导的白血病细胞凋亡中的调控机制

国家自然科学基金

0+阅读 · 2013年12月31日

Prohibitin1在胆管癌中的作用及分子机制

国家自然科学基金

0+阅读 · 2013年12月31日

最小最大后悔准则下的应急设施选址策略研究

国家自然科学基金

1+阅读 · 2012年12月31日

肿瘤相关巨噬细胞分泌CCL18上调HOTAIR促进食管癌转移

国家自然科学基金

0+阅读 · 2012年12月31日

长链非编码RNA HOTTIP参与小细胞肺癌耐药的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

NS5ATP9基因相关miRNAs在食管癌中的鉴定和临床研究

国家自然科学基金

0+阅读 · 2012年12月31日

AlGaN基PIN太阳光盲雪崩探测器研究

国家自然科学基金

0+阅读 · 2008年12月31日

A General Recipe for the Analysis of Randomized Multi-Armed Bandit Algorithms

Arxiv

0+阅读 · 2023年3月10日

SHINE: SHaring the INverse Estimate from the forward pass for bi-level optimization and implicit models

Arxiv

0+阅读 · 2023年3月10日

Adaptive Gaussian Process Regression for Efficient Building of Surrogate Models in Inverse Problems

Arxiv

0+阅读 · 2023年3月10日

Convergence of a series associated with the convexification method for coefficient inverse problems

Arxiv

0+阅读 · 2023年3月10日

An Automatic Finite-Sample Robustness Metric: When Can Dropping a Little Data Make a Big Difference?

Arxiv

0+阅读 · 2023年3月9日

Hindsight States: Blending Sim and Real Task Elements for Efficient Reinforcement Learning

Arxiv

0+阅读 · 2023年3月9日

A Framework for History-Aware Hyperparameter Optimisation in Reinforcement Learning

Arxiv

0+阅读 · 2023年3月9日

Necessary and sufficient conditions for multiple objective optimal regression designs

Arxiv

0+阅读 · 2023年3月8日

Mean-Semivariance Policy Optimization via Risk-Averse Reinforcement Learning

Arxiv

0+阅读 · 2023年3月8日

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Arxiv

34+阅读 · 2019年10月24日

VIP会员

文章信息

相关主题

逆强化学习

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

44+阅读 · 2020年12月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《小型无人机系统侦测追踪技术：声学、计算机视觉与深度学习融合方案》最新98页

《"牧羊人网格"拦截策略：实现无人机集群可靠拦截的新范式》

光纤无人机：反无人机系统的重大挑战

《作战建模与仿真实证研究》

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

A General Recipe for the Analysis of Randomized Multi-Armed Bandit Algorithms

Arxiv

0+阅读 · 2023年3月10日

SHINE: SHaring the INverse Estimate from the forward pass for bi-level optimization and implicit models

Arxiv

0+阅读 · 2023年3月10日

Adaptive Gaussian Process Regression for Efficient Building of Surrogate Models in Inverse Problems

Arxiv

0+阅读 · 2023年3月10日

Convergence of a series associated with the convexification method for coefficient inverse problems

Arxiv

0+阅读 · 2023年3月10日

An Automatic Finite-Sample Robustness Metric: When Can Dropping a Little Data Make a Big Difference?

Arxiv

0+阅读 · 2023年3月9日

Hindsight States: Blending Sim and Real Task Elements for Efficient Reinforcement Learning

Arxiv

0+阅读 · 2023年3月9日

A Framework for History-Aware Hyperparameter Optimisation in Reinforcement Learning

Arxiv

0+阅读 · 2023年3月9日

Necessary and sufficient conditions for multiple objective optimal regression designs

Arxiv

0+阅读 · 2023年3月8日

Mean-Semivariance Policy Optimization via Risk-Averse Reinforcement Learning

Arxiv

0+阅读 · 2023年3月8日

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Arxiv

34+阅读 · 2019年10月24日

相关基金

MARVELD1基因调控肝细胞癌介入治疗的机制研究

国家自然科学基金

0+阅读 · 2016年12月31日

聚精氨酸诱导肿瘤微环境的免疫活性及逆转cetuximab耐药性的调控机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

细胞周期蛋白Cyclin G1与肿瘤分子靶向治疗诱导多倍体耐药的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

长链非编码RNA-AALT1在海生素诱导的白血病细胞凋亡中的调控机制

国家自然科学基金

0+阅读 · 2013年12月31日

Prohibitin1在胆管癌中的作用及分子机制

国家自然科学基金

0+阅读 · 2013年12月31日

最小最大后悔准则下的应急设施选址策略研究

国家自然科学基金

1+阅读 · 2012年12月31日

肿瘤相关巨噬细胞分泌CCL18上调HOTAIR促进食管癌转移

国家自然科学基金

0+阅读 · 2012年12月31日

长链非编码RNA HOTTIP参与小细胞肺癌耐药的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

NS5ATP9基因相关miRNAs在食管癌中的鉴定和临床研究

国家自然科学基金

0+阅读 · 2012年12月31日

AlGaN基PIN太阳光盲雪崩探测器研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员