Bayesian approaches developed to solve the optimal design of sequential experiments are mathematically elegant but computationally challenging. Recently, techniques using amortization have been proposed to make these Bayesian approaches practical, by training a parameterized policy that proposes designs efficiently at deployment time. However, these methods may not sufficiently explore the design space, require access to a differentiable probabilistic model and can only optimize over continuous design spaces. Here, we address these limitations by showing that the problem of optimizing policies can be reduced to solving a Markov decision process (MDP). We solve the equivalent MDP with modern deep reinforcement learning techniques. Our experiments show that our approach is also computationally efficient at deployment time and exhibits state-of-the-art performance on both continuous and discrete design spaces, even when the probabilistic model is a black box.
翻译:为解决连续实验的最佳设计而开发的贝叶斯方法在数学上是优雅的,但在计算上具有挑战性。最近,提议采用摊还技术,通过培训一项参数化政策,提出在部署时高效设计,使这些贝叶斯方法切合实际。然而,这些方法可能不足以探索设计空间,需要使用不同的概率模型,并且只能在连续设计空间中实现最佳化。在这里,我们通过表明优化政策的问题可以降低到解决马尔科夫决定程序(MDP ) 来解决这些局限性。我们用现代深度强化学习技术解决了等效的MDP。我们的实验表明,我们的方法在部署时间也是计算效率高的,在连续和离散设计空间都展示了最先进的性能,即使概率模型是一个黑盒子。