The automatic detection of hypernymy relationships represents a challenging problem in NLP. The successful application of state-of-the-art supervised approaches using distributed representations has generally been impeded by the limited availability of high quality training data. We have developed two novel data augmentation techniques which generate new training examples from existing ones. First, we combine the linguistic principles of hypernym transitivity and intersective modifier-noun composition to generate additional pairs of vectors, such as "small dog - dog" or "small dog - animal", for which a hypernymy relationship can be assumed. Second, we use generative adversarial networks (GANs) to generate pairs of vectors for which the hypernymy relation can also be assumed. We furthermore present two complementary strategies for extending an existing dataset by leveraging linguistic resources such as WordNet. Using an evaluation across 3 different datasets for hypernymy detection and 2 different vector spaces, we demonstrate that both of the proposed automatic data augmentation and dataset extension strategies substantially improve classifier performance.
翻译:利用分布式演示成功应用最先进的监督方法普遍受到质量高的培训数据有限的影响。我们开发了两种新颖的数据增强技术,从现有的培训中产生新的培训实例。首先,我们结合了超敏性以及交叉式修饰-线性成份的语言原则,以产生更多的矢量,如“小狗-狗”或“小狗-动物”,可以假定存在超尼关系。第二,我们使用基因化对抗网络(GANs)生成一对矢量,也可以假定存在超尼关系。我们还提出了两个互补战略,利用WordNet等语言资源扩大现有数据集。我们利用三个不同的数据集进行评估,以探测超尼性强和两个不同的矢量空间,我们证明拟议的自动数据增强和数据集扩展战略都大大改进了分类的性能。