Bayesian最佳实验设计基于强化学习的序列批量抽样 (Reinforcement Learning based Sequential Batch-sampling for Bayesian Optimal Experimental Design)

Engineering problems that are modeled using sophisticated mathematical methods or are characterized by expensive-to-conduct tests or experiments, are encumbered with limited budget or finite computational resources. Moreover, practical scenarios in the industry, impose restrictions, based on logistics and preference, on the manner in which the experiments can be conducted. For example, material supply may enable only a handful of experiments in a single-shot or in the case of computational models one may face significant wait-time based on shared computational resources. In such scenarios, one usually resorts to performing experiments in a manner that allows for maximizing one's state-of-knowledge while satisfying the above mentioned practical constraints. Sequential design of experiments (SDOE) is a popular suite of methods, that has yielded promising results in recent years across different engineering and practical problems. A common strategy, that leverages Bayesian formalism is the Bayesian SDOE, which usually works best in the one-step-ahead or myopic scenario of selecting a single experiment at each step of a sequence of experiments. In this work, we aim to extend the SDOE strategy, to query the experiment or computer code at a batch of inputs. To this end, we leverage deep reinforcement learning (RL) based policy gradient methods, to propose batches of queries that are selected taking into account entire budget in hand. The algorithm retains the sequential nature, inherent in the SDOE, while incorporating elements of reward based on task from the domain of deep RL. A unique capability of the proposed methodology is its ability to be applied to multiple tasks, for example optimization of a function, once its trained. We demonstrate the performance of the proposed algorithm on a synthetic problem, and a challenging high-dimensional engineering problem.

翻译：以精密的数学方法建模的工程问题,或以昂贵的从操作测试或实验为特点的工程问题,往往以有限的预算或有限的计算资源来填补。此外,该行业的实际假设,根据物流和偏好,对实验的进行方式施加限制。例如,材料供应可能只允许在单发或计算模型中进行少量试验,这种试验通常会面临基于共享计算资源的等待时间。在这种假设中,人们通常会以能够最大限度地提高一个人的知识水平的方式进行实验,同时满足上述的多种实际限制因素。实验的顺序设计(SDOE)是一套受欢迎的方法,近年来在不同工程和实际问题中产生了有希望的结果。一种共同的战略,利用贝叶斯式的正规主义是巴伊斯式的SDOE,这通常在单步或短视情景中最有效,即选择一个单一的实验,一个具有挑战性的任务序列。在这项工作中,我们的目标是扩大SDOE战略的应用范围,将实验或机精细的精度能力设计一套方法,在深度的递增预算的精度上进行一个基于SDO的精度分析。