This paper considers deep out-of-distribution active learning. In practice, fully trained neural networks interact randomly with out-of-distribution (OOD) inputs and map aberrant samples randomly within the model representation space. Since data representations are direct manifestations of the training distribution, the data selection process plays a crucial role in outlier robustness. For paradigms such as active learning, this is especially challenging since protocols must not only improve performance on the training distribution most effectively but further render a robust representation space. However, existing strategies directly base the data selection on the data representation of the unlabeled data which is random for OOD samples by definition. For this purpose, we introduce forgetful active learning with switch events (FALSE) - a novel active learning protocol for out-of-distribution active learning. Instead of defining sample importance on the data representation directly, we formulate "informativeness" with learning difficulty during training. Specifically, we approximate how often the network "forgets" unlabeled samples and query the most "forgotten" samples for annotation. We report up to 4.5\% accuracy improvements in over 270 experiments, including four commonly used protocols, two OOD benchmarks, one in-distribution benchmark, and three different architectures.
翻译:本文审视了深度分配外的积极学习。 在实践中, 训练有素的神经网络随机地与分配外( OOD) 输入随机地互动, 并在模型代表空间中随机地绘制异常样本。 由于数据显示是培训分布的直接表现, 数据选择过程在外强力方面发挥着关键作用。 对于诸如积极学习等范例, 这一点特别具有挑战性, 因为协议不仅必须最有效地改进培训分布的绩效, 而且还要进一步提供一个强大的代表空间。 但是, 现有的战略直接根据无标签数据的数据表示方式进行数据选择。 为此, 我们引入了与交换事件( FALSE) 一起的忘却式积极学习( FALSE) —— 这是用于分配外积极学习的新颖的积极学习协议。 我们不是直接界定数据表达的样本重要性, 而是制定“ 规范性” 与培训过程中的学习困难。 具体地说, 我们估计网络“ 放弃” 的样本和查询最“ 被遗忘” 样本的样本的频率, 以作注释。 我们报告在270多个实验中, 的准确性改进了4.5, 包括四个常用的标准 和三个标准 。