Despite much research, traditional methods to pitch prediction are still not perfect. With the emergence of neural networks (NNs), researchers hope to create a NN-based pitch predictor that outperforms traditional methods. Three pitch detection algorithms (PDAs), pYIN, YAAPT, and CREPE are compared in this paper. pYIN and YAAPT are conventional approaches considering time domain and frequency domain processing. CREPE utilizes a data-trained deep convolutional neural network to estimate pitch. It involves 6 densely connected convolutional hidden layers and determines pitch probabilities for a given input signal. The performance of CREPE representing neural network pitch predictors is compared to more classical approaches represented by pYIN and YAAPT. The figure of merit (FOM) will include the amount of unvoiced-to-voiced errors, voiced-to-voiced errors, gross pitch errors, and fine pitch errors.
翻译:尽管进行了许多研究,但传统的预测方法仍然不尽人意。随着神经网络的出现,研究人员希望创建一个以NN为基的预测结果超过传统方法。本文比较了三种定位检测算法(PDAs)、PYIN、YAPT和CREPE。PYIN和YAPT是考虑时间域域和频率域处理的常规方法。CREPE使用经过数据分析的深层神经网络来估计定位。它涉及6个紧密相连的脉冲隐藏层,并确定了给定输入信号的定位概率。CREPE代表神经网络预测器的性能与由PYIN和YAPT所代表的更为经典的方法相比。优点图(FOM)将包括无声音到声音错误、声音错误、粗声出错和小声错误。