Disfluencies (i.e. interruptions in the regular flow of speech), are ubiquitous to spoken discourse. Fillers ("uh", "um") are disfluencies that occur the most frequently compared to other kinds of disfluencies. Yet, to the best of our knowledge, there isn't a resource that brings together the research perspectives influencing Spoken Language Understanding (SLU) on these speech events. This aim of this article is to survey a breadth of perspectives in a holistic way; i.e. from considering underlying (psycho)linguistic theory, to their annotation and consideration in Automatic Speech Recognition (ASR) and SLU systems, to lastly, their study from a generation standpoint. This article aims to present the perspectives in an approachable way to the SLU and Conversational AI community, and discuss moving forward, what we believe are the trends and challenges in each area.
翻译:然而,据我们所知,没有一种资源可以将影响这些演讲活动的口头语言理解(SLU)的研究观点汇集在一起。 本条的目的是以整体的方式考察各种观点;即从考虑基本(心理)语言理论到自动语音识别(ASR)和SLU系统中的批注和考虑,最后从一代人的角度研究这些观点。 本条的目的是以一种可以接近的方式向SLU和相互交流的AI社区介绍这些观点,并讨论前进的方向,我们认为每个领域的趋势和挑战是什么。</s>