Selecting input variables or design points for statistical models has been of great interest in adaptive design and active learning. Motivated by two scientific examples, this paper presents a strategy of selecting the design points for a regression model when the underlying regression function is discontinuous. The first example we undertook was for the purpose of accelerating imaging speed in a high resolution material imaging; the second was use of sequential design for the purpose of mapping a chemical phase diagram. In both examples, the underlying regression functions have discontinuities, so many of the existing design optimization approaches cannot be applied because they mostly assume a continuous regression function. Although some existing adaptive design strategies developed from treed regression models can handle the discontinuities, the Bayesian approaches come with computationally expensive Markov Chain Monte Carlo techniques for posterior inferences and subsequent design point selections, which is not appropriate for the first motivating example that requires computation at least faster than the original imaging speed. In addition, the treed models are based on the domain partitioning that are inefficient when the discontinuities occurs over complex sub-domain boundaries. We propose a simple and effective adaptive design strategy for a regression analysis with discontinuities: some statistical properties with a fixed design will be presented first, and then these properties will be used to propose a new criterion of selecting the design points for the regression analysis. Sequential design with the new criterion will be presented with comprehensive simulated examples, and its application to the two motivating examples will be presented.
翻译:为统计模型选择输入变量或设计点对适应性设计和积极学习非常感兴趣。 在两个科学实例的推动下,本文件展示了在基础回归函数不连续的情况下选择回归模型设计点的战略。 我们开展的第一个例子是在高分辨率材料成像中加快成像速度; 第二个例子是为绘制化学阶段图而使用顺序设计。 在这两个例子中,基础回归功能具有不连续性,因此许多现有的设计优化方法无法应用,因为它们大多具有持续的回归功能。虽然从树型回归模型制定的一些现有适应性设计战略可以处理不连续性问题,但巴耶西亚方法将采用计算成本昂贵的Markov链蒙特卡洛技术,用于后方推断和随后的设计点选择。 对于第一个需要至少以原始成像速度进行计算的积极性实例来说,这不合适。 此外,树型模型的基础是当不连续发生于复杂的子多界界限时效率不高的域分配。 我们提出一个简单有效的调整性设计战略,用于进行不连续性的回归分析:一些具有固定设计标准的统计性属性将先选择新的设计标准,然后是使用新的结构性标准。 将提出这些精确性分析,将采用新的设计标准。