The vast majority of text transformation techniques in NLP are inherently limited in their ability to expand input space coverage due to an implicit constraint to preserve the original class label. In this work, we propose the notion of sibylvariance (SIB) to describe the broader set of transforms that relax the label-preserving constraint, knowably vary the expected class, and lead to significantly more diverse input distributions. We offer a unified framework to organize all data transformations, including two types of SIB: (1) Transmutations convert one discrete kind into another, (2) Mixture Mutations blend two or more classes together. To explore the role of sibylvariance within NLP, we implemented 41 text transformations, including several novel techniques like Concept2Sentence and SentMix. Sibylvariance also enables a unique form of adaptive training that generates new input mixtures for the most confused class pairs, challenging the learner to differentiate with greater nuance. Our experiments on six benchmark datasets strongly support the efficacy of sibylvariance for generalization performance, defect detection, and adversarial robustness.
翻译:NLP的绝大多数文本转换技术在扩大输入空间覆盖的能力方面必然受到限制,这是因为隐含了保存原类标签的制约。 在这项工作中,我们提议了“静脉变异”的概念,以描述放宽标签保留限制、可以理解地改变预期类别和导致更多样化的输入分布的更广泛的变异。我们提供了一个统一框架来组织所有数据变异,包括两种类型的 SIB:(1) 变异将一种离散类型转换成另一种类型,(2) 混杂了两个或更多类。为了探索 NLP 中的静脉变异作用,我们实施了41种文本变异,包括若干新颖技术,如“概念2感”和“SentMix”。 Sibylversion”还促成了一种独特的适应培训形式,为最混乱的类配对产生新的输入混合物,挑战学习者如何以更大的微分辨。我们在六个基准数据集上的实验有力地支持了一般化性能、缺陷检测和对抗性强性能的比变效力。