Disfluencies (i.e. interruptions in the regular flow of speech), are ubiquitous to spoken discourse. Fillers ("uh", "um") are disfluencies that occur the most frequently compared to other kinds of disfluencies. Yet, to the best of our knowledge, there isn't a resource that brings together the research perspectives influencing Spoken Language Understanding (SLU) on these speech events. This aim of this article is to synthesise a breadth of perspectives in a holistic way; i.e. from considering underlying (psycho)linguistic theory, to their annotation and consideration in Automatic Speech Recognition (ASR) and SLU systems, to lastly, their study from a generation standpoint. This article aims to present the perspectives in an approachable way to the SLU and Conversational AI community, and discuss moving forward, what we believe are the trends and challenges in each area.
翻译:然而,据我们所知,对于这些演讲事件,并没有一种资源可以汇集影响语言理解的研究观点。 本条的目的是以整体方式综合各种观点;即从考虑基本(心理)语言理论到自动语音识别和SLU系统的批注和考量,最后从一代人的角度研究。 本条的目的是以一种可以接近的方式向SLU和相互交流的AI群体展示观点,并讨论前进的方向,我们认为每个领域的趋势和挑战是什么。