The speech signal is a consummate example of time-series data. The acoustics of the signal change over time, sometimes dramatically. Yet, the most common type of comparison we perform in phonetics is between instantaneous acoustic measurements, such as formant values. In the present paper, I discuss the concept of absement as a quantification of differences between two time-series. I then provide an experimental example of absement applied to phonetic analysis for human and/or computer speech recognition. The experiment is a template-based speech recognition task, using dynamic time warping to compare the acoustics between recordings of isolated words. A recognition accuracy of 57.9% was achieved. The results of the experiment are discussed in terms of using absement as a tool, as well as the implications of using acoustics-only models of spoken word recognition with the word as the smallest discrete linguistic unit.
翻译:语音信号是时序数据的典型案例。信号的声学特性随着时间变化而变化,有时会发生显著变化。然而,我们在语音学中最常见的比较类型是即刻的声学测量,如共振峰值。在本文中,我讨论了抗扰性的概念,对两个时间序列之间的差异进行了量化。然后,我提供了一个用于人类和/或计算机语音识别的抗扰性实验示例。实验是一个基于模板的语音识别任务,使用动态时间规整来比较孤立的单词录音之间的声学特征。获得了57.9%的识别准确率。讨论了实验结果的抗扰性工具以及使用只有声学模型的语音单词识别模型的最小离散语言单元的含义。