The ComParE 2021 COVID-19 Speech Sub-challenge provides a test-bed for the evaluation of automatic detectors of COVID-19 from speech. Such models can be of value by providing test triaging capabilities to health authorities, working alongside traditional testing methods. Herein, we leverage the usage of pre-trained, problem agnostic, speech representations and evaluate their use for this task. We compare the obtained results against a CNN architecture trained from scratch and traditional frequency-domain representations. We also evaluate the usage of Self-Attention Pooling as an utterance-level information aggregation method. Experimental results demonstrate that models trained on features extracted from self-supervised models perform similarly or outperform fully-supervised models and models based on handcrafted features. Our best model improves the Unweighted Average Recall (UAR) from 69.0\% to 72.3\% on a development set comprised of only full-band examples and achieves 64.4\% on the test set. Furthermore, we study where the network is attending, attempting to draw some conclusions regarding its explainability. In this relatively small dataset, we find the network attends especially to vowels and aspirates.
翻译:ComParE 2021 COVID-19 语音子挑战为评价语音COVID-19自动探测器提供了一个测试台,通过向卫生当局提供测试三角能力,并与传统测试方法一起工作,这些模型具有价值。在这里,我们利用预先训练的、有问题的语音表达方法,并评估其用于这项任务的使用情况。我们对照从零到传统的频率代表方式训练的CNN架构比较所获得的结果。我们还评估了自控集合作为一种发音级信息汇总方法的使用情况。实验结果显示,从自我监督模型中提取的特征所培训的模型具有类似或超模的完全监督模型和基于手工制作特征的模型。我们的最佳模型改进了未加权平均计数(UAR),从69.0 ⁇ 提高到72.3 ⁇ 用于一套仅由全频实例组成的开发图集,并在测试集上达到64.4 ⁇ 。此外,我们研究网络正在到哪里,试图就其可解释性得出一些结论。在这个相对小的数据集中,我们发现网络特别以誓言和誓言进行。