Open world classification is a task in natural language processing with key practical relevance and impact. Since the open or {\em unknown} category data only manifests in the inference phase, finding a model with a suitable decision boundary accommodating for the identification of known classes and discrimination of the open category is challenging. The performance of existing models is limited by the lack of effective open category data during the training stage or the lack of a good mechanism to learn appropriate decision boundaries. We propose an approach based on \underline{a}daptive \underline{n}egative \underline{s}amples (ANS) designed to generate effective synthetic open category samples in the training stage and without requiring any prior knowledge or external datasets. Empirically, we find a significant advantage in using auxiliary one-versus-rest binary classifiers, which effectively utilize the generated negative samples and avoid the complex threshold-seeking stage in previous works. Extensive experiments on three benchmark datasets show that ANS achieves significant improvements over state-of-the-art methods.
翻译:开放世界分类是自然语言处理中的一项任务,具有关键的实际意义和影响。由于开放或未知类别数据仅显示于推断阶段,因此找到一个适合确定已知类别和对开放类别歧视的决定界限的模型具有挑战性。现有模型的性能因培训阶段缺乏有效的开放类别数据或缺乏学习适当决定界限的良好机制而受到限制。我们建议采用基于以下基准数据集的方法:在培训阶段生成有效的合成开放类别样本,而不需要任何先前的知识或外部数据集。我们偶然地发现,在使用辅助的一反二入式分类器方面有很大的优势,这些分类器有效利用了生成的负面样本,避免了以往工作中的复杂临界阶段。在三个基准数据集上进行的广泛实验表明,ANS在最新技术方法上取得了显著改进。</s>