Existing approaches in disfluency detection focus on solving a token-level classification task for identifying and removing disfluencies in text. Moreover, most works focus on leveraging only contextual information captured by the linear sequences in text, thus ignoring the structured information in text which is efficiently captured by dependency trees. In this paper, building on the span classification paradigm of entity recognition, we propose a novel architecture for detecting disfluencies in transcripts from spoken utterances, incorporating both contextual information through transformers and long-distance structured information captured by dependency trees, through graph convolutional networks (GCNs). Experimental results show that our proposed model achieves state-of-the-art results on the widely used English Switchboard for disfluency detection and outperforms prior-art by a significant margin. We make all our codes publicly available on GitHub (https://github.com/Sreyan88/Disfluency-Detection-with-Span-Classification)
翻译:此外,大多数工作的重点是仅利用文本线性序列所收集的背景资料,从而忽视依赖性树所有效捕捉的文字结构化信息。在本文中,我们以实体承认的分类范式为基础,提出了一个新结构,用以探测口头话语录音记录中的不便情况,通过变压器和依赖性树所捕捉的长距离结构化信息,通过图象共变网络(GCNs)纳入背景信息。 实验结果表明,我们提议的模型在广泛使用的英国不便检测交换板上取得了最先进的结果,并大大超越了先前设计,我们在GitHub上公开了我们的所有代码(https://github.com/Sreyan88/Disfluency-Dervemention-with-Span-Classization)。