Fundamental frequency (F0) has long been treated as the physical definition of "pitch" in phonetic analysis. But there have been many demonstrations that F0 is at best an approximation to pitch, both in production and in perception: pitch is not F0, and F0 is not pitch. Changes in the pitch involve many articulatory and acoustic covariates; pitch perception often deviates from what F0 analysis predicts; and in fact, quasi-periodic signals from a single voice source are often incompletely characterized by an attempt to define a single time-varying F0. In this paper, we find strong support for the existence of covariates for pitch in aspects of relatively coarse spectra, in which an overtone series is not available. Thus linear regression can predict the pitch of simple vocalizations, produced by an articulatory synthesizer or by human, from single frames of such coarse spectra. Across speakers, and in more complex vocalizations, our experiments indicate that the covariates are not quite so simple, though apparently still available for more sophisticated modeling. On this basis, we propose that the field needs a better way of thinking about speech pitch, just as celestial mechanics requires us to go beyond Newton's point mass approximations to heavenly bodies.
翻译:在语音分析中,基本频率 (F0) 长期以来一直被视为“pitch” 的物理定义。 但是,在制作和感知方面,许多演示都显示F0充其量是投球的近似点:投球不是F0, F0不是投球。 投球的变化涉及许多动脉和声学共变异体; 投球感知往往与F0分析预测的结果不同; 事实上,一个声音源的半周期信号往往不完全,其特征是试图定义一个单一的时间变异F0。 但在本文中,我们发现许多支持在相对粗糙的光谱中投球的共变异体的存在,而其中没有一个超导系列。 因此线性回归可以预测由动脉动合成器或人类产生的简单声化的声频, 与F0分析结果的单一框架不同; 在不同的发言者和较复杂的发声化中, 我们的实验表明, 共变异体并不那么简单, 尽管显然仍然可用于更复杂的模拟。 在此基础上, 我们提议, 场需要一种更好的空间定位, 以更好的方式来思考新的空间结构。