This paper presents an inverse reinforcement learning~(IRL) framework for Bayesian stopping time problems. By observing the actions of a Bayesian decision maker, we provide a necessary and sufficient condition to identify if these actions are consistent with optimizing a cost function. In a Bayesian (partially observed) setting, the inverse learner can at best identify optimality wrt the observed actions. Our IRL algorithm identifies optimality and then constructs set valued estimates of the cost function. To achieve this IRL objective, we use novel ideas from Bayesian revealed preferences stemming from microeconomics. We illustrate the proposed IRL scheme using two important examples of stopping time problems, namely, sequential hypothesis testing and Bayesian search. Finally, for finite datasets, we propose an IRL detection algorithm and give finite sample bounds on its error probabilities.
翻译:本文为巴伊西亚停止时间问题提供了一个反向强化学习~( IRL) 框架。 通过观察巴伊西亚决策者的行动, 我们为确定这些行动是否符合优化成本功能提供了必要和充分的条件。 在巴伊西亚( 部分观察的) 设置中, 反向学习者最多可以确定所观察到的动作的最佳性 。 我们的 IRL 算法可以确定最佳性, 然后构建成本函数的有价估计值。 为了实现这个 IRL 目标, 我们使用来自巴伊西亚的新想法来揭示来自微观经济学的偏好。 我们用两个重要的例子来说明拟议的IRL 计划, 即连续的假设测试和Bayesian 搜索。 最后, 对于有限的数据集, 我们建议一个IRL 检测算法, 并给出其错误概率的有限样本界限 。