In this work, we propose a bi-directional long short-term memory (BiLSTM) network based COVID-19 detection method using breath/speech/cough signals. By using the acoustic signals to train the network, respectively, we can build individual models for three tasks, whose parameters are averaged to obtain an average model, which is then used as the initialization for the BiLSTM model training of each task. This initialization method can significantly improve the performance on the three tasks, which surpasses the official baseline results. Besides, we also utilize a public pre-trained model wav2vec2.0 and pre-train it using the official DiCOVA datasets. This wav2vec2.0 model is utilized to extract high-level features of the sound as the model input to replace conventional mel-frequency cepstral coefficients (MFCC) features. Experimental results reveal that using high-level features together with MFCC features can improve the performance. To further improve the performance, we also deploy some preprocessing techniques like silent segment removal, amplitude normalization and time-frequency mask. The proposed detection model is evaluated on the DiCOVA dataset and results show that our method achieves an area under curve (AUC) score of 88.44% on blind test in the fusion track.
翻译:在这项工作中,我们提出一个双向长短期内存(BILSTM)网络,以COVID-19为主的双向短期内存(BILSTM)网络检测方法,使用呼吸/语音/口腔/口腔信号;通过分别使用声学信号对网络进行培训,我们可以为三项任务建立单个模型,其参数平均以获得平均模型,然后作为BILSTM模型对每项任务的初始化培训。这种初始化方法可以大大改进三项任务的业绩,这三项任务超过官方基线结果。此外,我们还利用公共预培训模型 wav2vec2.0, 并使用正式的 DiCOVA数据集进行预处理。这一 wav2vec2.0 模型被用来提取声音的高层次特征,作为模型输入,以取代传统的Mel-频率中位系数(MFCC)特点。实验结果表明,使用高层次和MFCC特性可以改善业绩。此外,我们还采用一些预处理技术,如静脉冲切除、倾斜度正常化和时间频段掩码仪。 提议的DVA/DVABRO 测试模型在88方法下,在DVA测试区域进行了评估。