A fundamental task in science is to design experiments that yield valuable insights about the system under study. Mathematically, these insights can be represented as a utility or risk function that shapes the value of conducting each experiment. We present PDBAL, a targeted active learning method that adaptively designs experiments to maximize scientific utility. PDBAL takes a user-specified risk function and combines it with a probabilistic model of the experimental outcomes to choose designs that rapidly converge on a high-utility model. We prove theoretical bounds on the label complexity of PDBAL and provide fast closed-form solutions for designing experiments with common exponential family likelihoods. In simulation studies, PDBAL consistently outperforms standard untargeted approaches that focus on maximizing expected information gain over the design space. Finally, we demonstrate the scientific potential of PDBAL through a study on a large cancer drug screen dataset where PDBAL quickly recovers the most efficacious drugs with a small fraction of the total number of experiments.
翻译:科学的一项基本任务是设计能对所研究的系统产生宝贵洞察力的实验。从数学角度讲,这些洞察力可以作为一种影响每项实验价值的实用或风险功能来表现。我们介绍了PDBAL,这是一种有针对性的积极学习方法,通过适应性设计实验以最大限度地扩大科学效用。PDBAL使用一种用户指定的风险功能,并将它与实验结果的概率模型结合起来,以选择快速汇集在高功率模型上的设计。我们证明了PDBAL的标签复杂性理论界限,并为设计具有共同指数家庭可能性的实验提供了快速的封闭式解决方案。在模拟研究中,PDBAL一贯优于标准的非目标方法,侧重于尽量扩大设计空间的预期信息收益。最后,我们通过对大型癌症筛查数据集的研究来展示PDBAL的科学潜力,在这个模型中,PDBAL迅速回收最有效的药物,其总实验数量只有一小部分。