In this paper, we study the performance and generalizability of three approaches for AD detection from speech on the recent ADReSSo challenge dataset: 1) using conventional acoustic features 2) using novel pre-trained acoustic embeddings 3) combining acoustic features and embeddings. We find that while feature-based approaches have a higher precision, classification approaches relying on the combination of embeddings and features prove to have a higher, and more balanced performance across multiple metrics of performance. Our best model, using such a combined approach, outperforms the acoustic baseline in the challenge by 2.8\%.
翻译:在本文中,我们研究了从最近ADRESSo挑战数据集的演讲中发现三种反倾销方法的性能和通用性:(1)使用传统的声学特征(2),使用经过预先训练的新型声学嵌入器(3),结合声学特征和嵌入器。我们发现,虽然基于地物的方法具有更高的精确度,但依赖嵌入和特征相结合的分类方法证明在多种性能指标中具有更高和更平衡的性能。我们的最佳模型,采用这种综合方法,优于2.8的声学基线。