Deep learning (DL) is revolutionizing the scientific computing community. To reduce the data gap, active learning has been identified as a promising solution for DL in the scientific computing community. However, the deep active learning (DAL) literature is dominated by image classification problems and pool-based methods. Here we investigate the robustness of pool-based DAL methods for scientific computing problems (dominated by regression) where DNNs are increasingly used. We show that modern pool-based DAL methods all share an untunable hyperparameter, termed the pool ratio, denoted $\gamma$, which is often assumed to be known apriori in the literature. We evaluate the performance of five state-of-the-art DAL methods on six benchmark problems if we assume $\gamma$ is \textit{not} known - a more realistic assumption for scientific computing problems. Our results indicate that this reduces the performance of modern DAL methods and that they sometimes can even perform worse than random sampling, creating significant uncertainty when used in real-world settings. To overcome this limitation we propose, to our knowledge, the first query synthesis DAL method for regression, termed NA-QBC. NA-QBC removes the sensitive $\gamma$ hyperparameter and we find that, on average, it outperforms the other DAL methods on our benchmark problems. Crucially, NA-QBC always outperforms random sampling, providing more robust performance benefits.
翻译:深度学习( DL) 正在使科学计算界发生革命。 为了缩小数据差距, 积极学习已被确定为科学计算界中DL的一个有希望的解决方案。 然而, 深度积极学习( DAL) 文献以图像分类问题和以池为基础的方法为主。 我们在这里调查基于池的DAL 方法对于科学计算问题( 以回归为主) 的稳健性( 以回归为主) 。 我们显示, 以池为基础的现代 DAL 方法都有一个不可调试的超参数, 称为集合比率, 意指$\ gamma$, 这在科学计算界中常常被假定为优先。 我们评估了五种最先进的DAL 方法在六个基准问题上的绩效。 如果我们假设$=gammam 美元, 是已知的科学计算问题更现实的假设。 我们的结果表明, 现代DAL 方法的性能会降低现代DAL 方法的性能, 有时甚至会比随机抽样方法更差, 在现实世界环境中使用时, 造成很大的不确定性。 为了克服这一限制, 我们提议, 向我们第一次查询的Dal- groal- bal- bRC 方法, 我们的DAL Q ex ex ex exbalgleglegleg 。