This paper presents an inverse reinforcement learning~(IRL) framework for Bayesian stopping time problems. By observing the actions of a Bayesian decision maker, we provide a necessary and sufficient condition to identify if these actions are consistent with optimizing a cost function. In a Bayesian (partially observed) setting, the inverse learner can at best identify optimality wrt the observed actions. Our IRL algorithm identifies optimality and then constructs set valued estimates of the cost function. To achieve this IRL objective, we use novel ideas from Bayesian revealed preferences stemming from microeconomics. We illustrate the proposed IRL scheme using two important examples of stopping time problems, namely, sequential hypothesis testing and Bayesian search, and also on a real-world YouTube dataset. Finally, for finite datasets, we propose an IRL detection algorithm and give finite sample bounds on its error probabilities.
翻译:本文为巴伊西亚停止时间问题提供了一个反向强化学习 ~ (IRL) 框架。 通过观察巴伊西亚决策者的行动, 我们为确定这些行动是否符合优化成本功能提供了必要和充分的条件。 在巴伊西亚( 部分观察的) 设置中, 反向学习者最多可以确定所观察到的行动的最佳性 。 我们的IRL 算法确定了最佳性, 然后构建了成本函数的估价估计值。 为了实现该IRL 目标, 我们使用来自巴伊西亚的新思想揭示了来自微观经济学的偏好。 我们用两个重要的例子来说明拟议的IRL 计划, 即连续的假设测试和巴伊西亚搜索, 以及真实世界YouTube数据集。 最后, 对于有限的数据集, 我们建议使用IRL 检测算法, 并给出其错误概率的限定样本界限 。