In the past decade, deep active learning (DAL) has heavily focused upon classification problems, or problems that have some 'valid' data manifolds, such as natural languages or images. As a result, existing DAL methods are not applicable to a wide variety of important problems -- such as many scientific computing problems -- that involve regression on relatively unstructured input spaces. In this work we propose the first DAL query-synthesis approach for regression problems. We frame query synthesis as an inverse problem and use the recently-proposed neural-adjoint (NA) solver to efficiently find points in the continuous input domain that optimize the query-by-committee (QBC) criterion. Crucially, the resulting NA-QBC approach removes the one sensitive hyperparameter of the classical QBC active learning approach - the "pool size"- making NA-QBC effectively hyperparameter free. This is significant because DAL methods can be detrimental, even compared to random sampling, if the wrong hyperparameters are chosen. We evaluate Random, QBC and NA-QBC sampling strategies on four regression problems, including two contemporary scientific computing problems. We find that NA-QBC achieves better average performance than random sampling on every benchmark problem, while QBC can be detrimental if the wrong hyperparameters are chosen.
翻译:在过去十年中,深入积极学习(DAL)主要侧重于分类问题,或具有自然语言或图像等某些“有效”数据元的问题。因此,现有的DAL方法不适用于范围广泛的重要问题,如许多科学计算问题,这些问题涉及相对非结构化输入空间的回归。我们在此工作中提出了第一种DAL查询合成方法,以解决回归问题。我们将查询合成作为反向问题,并使用最近提出的神经连接(NA)解答器,以便有效地找到连续输入域中的点,从而优化逐项查询(QBC)标准。显然,由此产生的NA-QBC方法消除了典型QBC积极学习方法中的一种敏感的超参数,即“组合大小”使NA-QBC能够有效地摆脱超光度。这很重要,因为如果选择错误的超参数,DAL方法可能有害,甚至与随机抽样比较。我们评估随机、QBC和NA-QBC取样战略在四个回归问题上的点,包括两个当代科学计算基准,我们发现,如果每类BC的概率标准是更好的,则NAQ能够更好。