Aiming to automatically detect COVID-19 from cough sounds, we propose a deep attentive multi-model fusion system evaluated on the Track-1 dataset of the DiCOVA 2021 challenge. Three kinds of representations are extracted, including hand-crafted features, image-from-audio-based deep representations, and audio-based deep representations. Afterwards, the best models on the three types of features are fused at both the feature level and the decision level. The experimental results demonstrate that the proposed attention-based fusion at the feature level achieves the best performance (AUC: 77.96%) on the test set, resulting in an 8.05% improvement over the official baseline.
翻译:为了从咳嗽声中自动检测COVID-19,我们提议在DiCOVA 2021挑战的第1轨数据集中评价一个深为关注的多模式聚合系统,其中选取了三种表现形式,包括手工制作的特征、基于视听的深层图象和基于声音的深层图象。随后,关于这三种特征的最佳模型在地物层面和决策层面融合在一起。实验结果显示,在地物层面拟议的基于关注的聚合在测试集上取得最佳效果(AUC:77.96%),结果比官方基线提高了8.05 % 。