Span Identification (SpanID) is a family of NLP tasks that aims to detect and classify text spans. Different from previous works that merely leverage Subordinate (\textsc{Sub}) relation about \textit{if a span is an instance of a certain category} to train SpanID models, we explore Peer (\textsc{Pr}) relation, which indicates that \textit{the two spans are two different instances from the same category sharing similar features}, and propose a novel \textbf{Peer} \textbf{D}ata \textbf{A}ugmentation (PeerDA) approach to treat span-span pairs with the \textsc{Pr} relation as a kind of augmented training data. PeerDA has two unique advantages: (1) There are a large number of span-span pairs with the \textsc{Pr} relation for augmenting the training data. (2) The augmented data can prevent over-fitting to the superficial span-category mapping by pushing SpanID models to leverage more on spans' semantics. Experimental results on ten datasets over four diverse SpanID tasks across seven domains demonstrate the effectiveness of PeerDA. Notably, seven of them achieve state-of-the-art results.
翻译:Span 身份( SpanID) 是一个 NLP 任务家族, 旨在检测和分类文本跨度。 与以前的工作不同, 过去的工作只是利用 subserv( textsc{Sub}) 关系( textit{Pr}) 来对 SpanID 模型进行训练, 我们探索Peal (\ textsc{Pr}) 关系, 这表明\ textit{ the two space 是同一类别中两个不同的例子, 具有相似的特性, 并提出了一个新的\ textbf{Peer}\ textbf{D} 数据\ textbf{A}A}ugment (PeerDA) 方法, 将 span- spanID 的双对关系作为某种强化培训数据的例子处理 。 我们探索Perpan(\ textsc{Pr} ) 关系, 表明 Perview 有两个独特的优势:(1) 与\ span span span 配对的双对,, 关系是增加培训数据数据, 能够防止在四个 SpanID 区域上过度展示S- 7 等域 测试结果的四等域中, 实现其不同结果。