Disfluencies (i.e. interruptions in the regular flow of speech), are ubiquitous to spoken discourse. Fillers ("uh", "um") are disfluencies that occur the most frequently compared to other kinds of disfluencies. Yet, to the best of our knowledge, there isn't a resource that brings together the research perspectives influencing Spoken Language Understanding (SLU) on these speech events. This aim of this article is to survey a breadth of perspectives in a holistic way; i.e. from considering underlying (psycho)linguistic theory, to their annotation and consideration in Automatic Speech Recognition (ASR) and SLU systems, to lastly, their study from a generation standpoint. This article aims to present the perspectives in an approachable way to the SLU and Conversational AI community, and discuss moving forward, what we believe are the trends and challenges in each area.
翻译:失语(即口语流畅性中的中断)普遍存在于口头语篇中。与其他种类的失语相比,填充词("嗯"、"啊")是出现最频繁的失语。然而,据我们所知,还没有一个资源能够将影响口语理解(SLU)的这些语音事件的研究视角综合起来。本文的目的是以全面的方式概述各种视角,即从考虑基础(心理)语言学理论,到它们在自动语音识别(ASR)和SLU系统中的注释和考虑,再到最后从生成角度研究它们。本文旨在以可接受的方式向SLU和会话AI社区展示这些视角,并讨论未来在每个领域中可能的趋势和挑战。